Posted on

Cancer treatment monitoring using cell-free DNA fragmentomes

Cancer treatment monitoring using cell-free DNA fragmentomes

Study design and population

The phase III randomized CAIRO5 trial (NCT02162563)24 investigates the optimal first-line systemic therapy for patients with histologically proven CRC with isolated, previously untreated, initially unresectable liver metastases. Patients treated with doublet chemotherapy (FOLFOX or FOLFIRI) and bevacizumab with at least one blood draw prior to and after treatment initiation between March 2015 and November 2020 were included in the present study. All patients were considered unresectable at inclusion, i.e. R0-resection could not be achieved in one procedure with one surgical intervention. Upon treatment with doublet chemotherapy and bevacizumab, patients were evaluated every two months by an expert panel of liver surgeons and abdominal radiologists for the possibility of local treatment of colorectal liver metastases following current clinical practice. Clinical follow-up was performed according to the standard of care, including a clinical review every 3 months and CT imaging and serum CEA every six months. When the liver metastases stayed unresectable, chemotherapy was continued without the targeted agent for the total duration of pre- and post-operative treatment of six months, and patients were continuously evaluated until the progression of the disease by serum CEA and CT imaging every 2 months. Follow-up was recorded until September 1, 2021. The trial and follow-on studies were approved by a medical ethical committee of the Amsterdam University Medical Centre, performed according to the Declaration of Helsinki, and patients signed written informed consent for study participation and blood collection for translational research. Longitudinal plasma samples from lung cancer patients treated with chemotherapy and immune checkpoint blockade were obtained from Indivumed (Germany) with patient informed consent according to the Declaration of Helsinki.

Blood collection and cfDNA extraction

Collection of liquid biopsy samples was performed at the medical center of inclusion prior to study treatment (baseline), pre-operatively, post-operatively, and every 3 months during follow-up until disease progression or treatment completion. Blood samples were obtained using 10 mL cell-free DNA BCT® tubes (Streck, La Vista, USA) and collected centrally at the Netherlands Cancer Institute (Amsterdam, the Netherlands) for CAIRO5 samples or at Indivumed (Hamburg, Germany) for samples from the lung cancer cohort. A two-step centrifugation process, 10 min at 1700×g and 10 min at 20,000×g, were used to isolate the cell-free plasma. The cell-free plasma was stored at −80 °C until further use. We aimed to use 4 mL of plasma for cfDNA isolation, which was feasible for the majority of the patients/time points. In case this was not feasible, we used a minimum of 1.5 mL. From the plasma, 60 uL of cfDNA was isolated and 9 uL in replicate (18 uL total) was used for the ddPCR assay, independent of the cfDNA concentration. cfDNA concentration was assessed using the double-strand DNA High-Sensitivity Qubit assay for all samples. The process for performing isolation of cfDNA from plasma was performed with the QIAsymphony robot, with an elution volume of 60 µL. Regardless of the cfDNA concentration, and per manufacturer’s instructions, 9 µL of cfDNA sample was used as input into the ddPCR assay per duplicate.

Library preparation and cfDNA sequencing

Aliquots of 15 ng cfDNA were used for the DELFI-TF analyses (requiring a minimum of 800 uL or 1 ng of cfDNA input) and ichorCNA analysis. NGS libraries were constructed using the NEBNext DNA Library Prep kit (New England Biolabs; Ipswich, MA, USA) with up to 15 ng of cfDNA input, as previously described17, with modifications to the manufacturer’s guidelines. AMPure XP beads (Beckman Coulter; Brea, CA, USA) were used exclusively for all library purification steps, in lieu of spin columns, and utilized an on-bead approach to minimize sample loss during elution and transfer steps. In this approach, AMPure XP beads were initially added during the end repair purification step, and the subsequent dA-tailing and adapter ligation reactions were conducted with beads present in the reaction mixture. Post-PCR purification was also performed using AMPure XP beads. cfDNA libraries were amplified using Phusion HotStart Polymerase (Thermo Fisher; Waltham, MA, USA). WGS library quality was determined using the 2100 Bioanalyzer (Agilent Technologies; Santa Clara, CA, USA) or the TapeStation 4200 (Agilent Technologies; Santa Clara, CA, USA). cfDNA libraries were pooled together and sequenced using a NovaSeq 6000 (Illumina; San Diego, CA, USA).

To limit batch effects, all time points collected from a single individual were processed together in a single library preparation batch to create genomic libraries, including a duplicate library as an inter-batch control and a technical replicate of nucleosomal DNA obtained from nuclease-digested human peripheral blood mononuclear cells as an intra-batch control (Supplementary Data 3). Samples in the lung cancer validation set were processed separately from samples in the CAIRO5 study.

RAS/BRAF mutation analyses

RAS and BRAF V600 mutation analyses were performed on tumor tissue DNA following routine clinical practice. For the subset of patients with a RAS/BRAF tumor tissue mutation, longitudinal liquid biopsy hotspot mutation analyses by ddPCR (Bio-Rad, Hercules, CA, USA) and DELFI-TF fragmentation analyses were performed. The ddPCR™ KRAS G12/G13 (#1863506), ddPCR™ KRAS Q61 (#12001626), ddPCR™ KRAS A146T (#10049550), ddPCRTM BRAF V600 (#12001037), NRAS G12/G13 (#12001627), and NRAS Q61 (#12001006) Screening Kits were purchased from Bio-Rad and used according to the manufacturer’s instruction, using 9 µL of sample, 11 µL of ddPCR supermix for probes (no dUTP), 1 µL of the multiplex assay and 1 µL of nuclease-free water. All measurements were performed in duplicate, including a blank (nuclease-free water) and a positive control. Patients with a RAS/BRAF mutation that could not be tracked by ddPCR because the variant identified on tumor tissue was not present on one of the available ddPCR Screening Kits were excluded (Supplementary Fig. 1). Data were analyzed using the QuantaSoftTM software version 1.6.6 (Bio-Rad, Hercules, CA, USA) and an automated correction algorithm as previously described in ref. 25. For the ddPCR assay, the limit of detection was determined based on the limit of blank, adjusting the outcome according to a predefined ratio of false-positive mutants found in WT samples, as described in a previous publication28. In all analyses, RAS/BRAF MT+ was defined as a ctDNA analysis with a positive result, that is, detectable mutant droplets above the limit of detection.

Analyses of cfDNA sequencing data

On a per-sample basis, the paired-end sequenced reads were aligned to a reference genome (hg19) using paired-end alignment with Bowtie 229(version 2.4.2). Aligned reads were sorted, PCR duplicates were removed, and read pairs representing unique fragments were converted to BED format using Samtools30 (version 1.13) and Bedtools31(version 2.26.0), respectively. Fragment lengths were calculated based on start and end coordinates, and the fragments were divided into 504 5 Mb bins, covering ~2.6 Gb of the genome. We tiled the hg19 autosomes into 26,236 adjacent, non-overlapping 100-kb bins, excluding regions of low mappability and excluding reads that fell into publicly available blacklisted regions17. Using this approach, we excluded 361 Mb (13%) of the hg19 reference genome, including centromeric and telomeric regions. Short fragments were defined as having lengths between 100 and 150 bp, and long fragments as having lengths between 151 and 220 bp17. Next, the number of short and long fragments per bin was calculated using R/Bioconductor (version 3.6.2), and these counts were corrected by GC content17. The corrected count of short fragments was divided by the corrected count of long fragments by bin (short-to-long ratios) to obtain fragmentation profiles for each sample. We performed a principal component analysis (PCA) on the fragmentation profiles, retaining the top two principal components of variance between samples. Additional cfDNA-derived features included arm-level aneuploidy scores (a z-score calculated for each of 39 acrocentric arms17, plasma-aneuploidy score (PA-score; 1 feature)20, and the overall fragment-length distribution summarized by a 12-component mixture of normals (35 parameters). As previously described in ref. 20, the plasma-aneuploidy score (PA-score) was constructed from five chromosomes whose arms had the highest absolute z-scores and converted to p values (using a Student’s t distribution with three degrees of freedom), and the negative of the sum of the logarithms of the p values was calculated for each sample.

Using the above features calculated for each sample, a random forest model was trained against the allele frequencies of the tumor-specific driver RAS/BRAF variant measured by ddPCR in the longitudinal cfDNA samples. This model takes the mixture model weights (11 features), PA-score, the maximum absolute z-score scores (2 features), and the principal components of short-to-long ratios (2 features) as inputs and outputs a predicted MAF. In order to generate unbiased predictions, avoid overfitting, and assess generalizability, training was done via leave-one-patient-out cross-validation. In this cross-validation scheme, each patient’s data was held out in turn, the model was trained on the remaining samples, and that trained model was then used to generate predictions for the held-out patient’s samples. DELFI-TF was defined as the predicted MAF from this cross-validation scheme. We evaluated the quality of the generated predictions by assessing the correlation of these predictions with the observed ddPCR MAF values and by evaluating the relationship between those predictions and time to progression or death.

DELFI-TF dynamics analysis

To capture the molecular dynamics of tumor burden over time, we computed the slope of the linear regression line fitted to the DELFI-TF values at time T1 and all subsequent time points until progression for the PFS analysis and up to 60 days after the progression date for the OS analysis. We limited the analysis to patients that had at least three samples before progression, and at least one sample collected in the progression window, which was the period 120 days before the progression date for PFS analysis (79 patients) and 120 days before and up to 60 days after the progression date for the OS analysis (80 patients). The regression lines were computed using Python/scikit-learn (version 3.9.13/1.1.1).

IchorCNA methodology

ichorCNA analysis code was obtained from GitHub ( ichorCNA) and run using default parameters with the exception that the expectation-maximization algorithm was seeded to account for low tumor purity samples21. A panel of 30 plasma samples from healthy individuals was constructed for use as a panel of normals17.

Chromatin structure analysis

A/B compartments were evaluated for colon adenocarcinoma (COAD) from TCGA and lymphoblastoid cells, as previously described in ref. 32. The median fragmentation profiles for ten CRC samples with high DELFI-TF values and ten randomly selected cancer-free individuals were constructed and compared across 100-kb bins. The median fragmentation profiles of the cancer-free individuals and the DELFI-TF of the individual CRC samples were used to extract an estimated median CRC component in the plasma. Copy-neutral regions were annotated if five or more of the ten CRC samples were copy-neutral in that region.

Formalin-fixed paraffin-embedded (FFPE) tissue genomics

Formalin-fixed paraffin-embedded (FFPE) tissue blocks from surgical resection or biopsies of the primary tumors were collected from all patients, and DNA was isolated from the available material using the Qiagen AllPrep DNA/RNA/miRNA Universal Kit (Qiagen, Hilden, Germany). Next, 250 ng of double-stranded genomic DNA was fragmented by Covaris shearing to obtain fragment sizes of 160–180 bp. Samples were purified using 2X Agencourt AMPure XP PCR Purification beads according to the manufacturer’s instructions (Beckman Coulter; Brea, CA, USA; #A63881). The sheared DNA samples were quantified and qualified on a BioAnalyzer system using the DNA7500 assay kit (Agilent Technologies; Santa Clara, CA, USA; #5067-1506). With an input of a maximum of 1 μg sheared DNA, libraries for sequencing were constructed using the KAPA Hyper Prep Kit (KAPA Biosystems; Wilmington, MA, USA; #KK8504) and amplified using PCR. After library preparation, the libraries were purified using 1X AMPure XP beads, and the molarity was determined using a BioAnalyzer DNA7500 chip. All samples were sequenced on a NovaSeq S1 (Illumina; San Diego, CA, USA), single-read, 100 bp run, according to the manufacturer’s instructions. The resulting paired-end reads were aligned to hg19 with Bowtie 231 (version 2.4.5) and converted to BED format using Samtools32 (version 1.6) and Bedtools33 (version 2.30.0), respectively. Next, the fragment counts in non-overlapping 100-kb bins across the genome were GC-corrected and normalized using a panel of ten non-cancer samples and used for generating copy number profiles. Tumor copy number alterations were identified with Python (version 3.9.13).

Lung cancer validation cohort targeted panel processing

Longitudinal plasma samples collected from stage III and IV lung cancer patients were acquired from Indivumed (Indivumed Services GmbH; Hamburg, Germany). Plasma was collected in Streck BCT tubes. cfDNA was extracted from 3 to 4 mL of plasma using the MagMAX™ Cell-Free DNA Isolation kit (Life Technologies; Austin, TX, USA). Extracted cfDNA was examined using the TapeStation 4200 (Agilent Technologies; Santa Clara, CA, USA) and processed using the PGDx elio™ plasma complete kit (Personal Genome Diagnostics; Baltimore, MD, USA) following the manufacturer’s protocol and recommendations for library preparation, hybridization, and sequencing. All longitudinal time points from a single patient were processed together on the same PGDx elio batch to avoid confounding batch with patient response. Raw sequencing data were analyzed through the PGDx elio server and automated pipeline.

MaxMAF determination for lung cancer validation cohort

We used the maximum mutant allele fraction (maxMAF) of somatic variants identified in the targeted panel analyses as a surrogate for ctDNA fraction in the lung cancer validation cohort. We first analyzed the PGDx elio plasma complete single nucleotide variant and insertion/deletion report to identify candidate mutations for calculating maxMAF. Candidate mutations were classified as somatic hotspots if the nucleotide changes were identical to an alteration observed in ≥20 cancer cases reported in the COSMIC database24,33. Putative germline mutations were identified as non-hotspots with MAF greater than 40%. As a stringent cutoff, we filtered all non-hotspot mutations and putative germline mutations from the PGDx elio plasma complete reports. Using the remaining somatic hotspot variants, we calculated the maxMAF. We then performed a correlation analysis between maxMAF and DELFI-TF for all samples in the lung cancer validation cohort.

Statistical analyses

The Random Forest model was trained with scikit-learn (version 1.1.1) and Python (version 3.9.13). The percentages of variable contributions were determined by the Random Forest feature importance tool. Correlations between variables were calculated using Pearson’s correlation coefficient. All hypothesis testing was performed using non-parametric tests (Wilcoxon rank-sum test, Kruskal–Wallis). Survival analyses were performed using Mantel-Cox log-rank tests. Analyses were performed with R Statistical Software (version 4.3.3 Foundation for Statistical Computing, Vienna, Austria). Unless otherwise noted, hypothesis tests were two-sided with a type 1 error of 5% for determining statistical significance. P values in figures correspond to the following aliases: (p = ****, p p p p > 0.05 = ns).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.