RESEARCH ARTICLE for clinical develo ent of WRN antas s an adiunc 20K b.A S.We P.Waite.M.Adman,E T.Loeb.LA The therapy to approve that effe h as WRN.Efforts in t 052o0 ch as ne hu text-specific dependences as part ormative to improve suc Nucleic Acids Res. Online content 25 5(2009 tp//doi.org/10.103 /s41586 019-1103- arch 01. cy.N Eng 27.Ts map.Cel1764-7 D.Adams.G. and L.Parts to ust grant 20619Y a,H.LI,Y.lan,E-P.Del UD g 32267-273201 A ived the project.F.M.B nd ve ape of pha cancer.Cell 166 ses and ngu v.ca ed ou and contribu dto in v expe 2016 ed by lar restsEAS.D.D.CB.-D.RS.andYR try.K.Y 02016. .KK a practica m9 2n9cdnrgm59bietorhspaperathips/aoiorg10.1038ys415gG 103.9 tion is available at http://ww spondence and requests for materials should be addressed to K.Y.or 200 The Author(s).under exclusive licence to Springer Nature Limited 2019
RESEARCH Article for clinical development of WRN antagonists would be as an adjunct therapy to approved immune checkpoint inhibitors in MSI tumours26. In summary, we developed an unbiased and systematic framework that effectively ranks priority targets, such as WRN. Efforts such as ours, and from others5,8,12,22,27,28, to build a compendium of fitness genes, and the identification of context-specific dependencies as part of a cancer dependency map, could be transformative to improve success rates in the development of cancer drugs. Online content Any methods, additional references, Nature Research reporting summaries, source data, statements of data availability and associated accession codes are available at https://doi.org/10.1038/s41586-019-1103-9. Received: 3 August 2018; Accepted: 8 March 2019; Published online xx xx xxxx. 1. Garraway, L. A. Genomics-driven oncology: framework for an emerging paradigm. J. Clin. Oncol. 31, 1806–1814 (2013). 2. Zehir, A. et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 23, 703–713 (2017). 3. Hay, M., Thomas, D. W., Craighead, J. L., Economides, C. & Rosenthal, J. Clinical development success rates for investigational drugs. Nat. Biotechnol. 32, 40–51 (2014). 4. Koike-Yusa, H., Li, Y., Tan, E.-P., Del Castillo Velasco-Herrera, M. & Yusa, K. Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR–guide RNA library. Nat. Biotechnol. 32, 267–273 (2014). 5. Meyers, R. M. et al. Computational correction of copy number efect improves specifcity of CRISPR–Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017). 6. van der Meer, D. et al. Cell Model Passports—a hub for clinical, genetic and functional datasets of preclinical cancer models. Nucleic Acids Res. 47, D923–D929 (2019). 7. Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016). 8. Hart, T. et al. High-resolution CRISPR screens reveal ftness genes and genotype-specifc cancer liabilities. Cell 163, 1515–1526 (2015). 9. Hart, T. et al. Evaluation and design of genome-wide CRISPR/SpCas9 knockout screens. G3 (Bethesda) 7, 2719–2727 (2017). 10. Tzelepis, K. et al. A CRISPR dropout screen identifes genetic vulnerabilities and therapeutic targets in acute myeloid leukemia. Cell Rep. 17, 1193–1205 (2016). 11. Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CRISPR–Cas9 system. Science 343, 80–84 (2014). 12. McDonald, E. R. III et al. Project DRIVE: a compendium of cancer dependencies and synthetic lethal relationships uncovered by large-scale, deep RNAi screening. Cell 170, 577–592 (2017). 13. Massacesi, C. et al. PI3K inhibitors as new cancer therapeutics: implications for clinical trial design. OncoTargets Ther. 9, 203–210 (2016). 14. Brown, K. K. et al. Approaches to target tractability assessment — a practical perspective. MedChemComm 9, 606–613 (2018). 15. Viswanathan, V. S. et al. Dependency of a therapy-resistant state of cancer cells on a lipid peroxidase pathway. Nature 547, 453–457 (2017). 16. Chu, W. K. & Hickson, I. D. RecQ helicases: multifunctional genome caretakers. Nat. Rev. Cancer 9, 644–654 (2009). 17. Cortes-Ciriano, I., Lee, S., Park, W.-Y., Kim, T.-M. & Park, P. J. A molecular portrait of microsatellite instability across multiple cancers. Nat. Commun. 8, 15180 (2017). 18. Haugen, A. C. et al. Genetic instability caused by loss of MutS homologue 3 in human colorectal cancer. Cancer Res. 68, 8465–8472 (2008). 19. Perry, J. J. P. et al. WRN exonuclease structure and molecular mechanism imply an editing role in DNA end processing. Nat. Struct. Mol. Biol. 13, 414–422 (2006). 20. Kamath-Loeb, A. S., Welcsh, P., Waite, M., Adman, E. T. & Loeb, L. A. The enzymatic activities of the Werner syndrome protein are disabled by the amino acid polymorphism R834C. J. Biol. Chem. 279, 55499–55505 (2004). 21. Ketkar, A., Voehler, M., Mukiza, T. & Eof, R. L. Residues in the RecQ C-terminal domain of the human Werner Syndrome helicase are involved in unwinding G-quadruplex DNA. J. Biol. Chem. 292, 3154–3163 (2017). 22. Chan, E. M. et al. WRN helicase is a synthetic lethal target in microsatellite unstable cancers. Nature https://doi.org/10.1038/s41586-019-1102-x (2019). 23. Saydam, N. et al. Physical and functional interactions between Werner syndrome helicase and mismatch-repair initiation factors. Nucleic Acids Res. 35, 5706–5716 (2007). 24. Opresko, P. L., Sowd, G. & Wang, H. The Werner syndrome helicase/exonuclease processes mobile D-loops through branch migration and degradation. PLoS ONE 4, e4825 (2009). 25. Myung, K., Datta, A., Chen, C. & Kolodner, R. D. SGS1, the Saccharomyces cerevisiae homologue of BLM and WRN, suppresses genome instability and homeologous recombination. Nat. Genet. 27, 113–116 (2001). 26. Le, D. T. et al. PD-1 blockade in tumors with mismatch-repair defciency. N. Engl. J. Med. 372, 2509–2520 (2015). 27. Tsherniak, A. et al. Defning a cancer dependency map. Cell 170, 564–576 (2017). 28. Wang, T. et al. Gene essentiality profling reveals gene networks and synthetic lethal interactions with oncogenic Ras. Cell 168, 890–903 (2017). Acknowledgements We thank D. Adams, G. Vassiliou and L. Parts for comments on the manuscript, members of the M.J.G. laboratory and Sanger Institute facilities (Wellcome Trust grant 206194). Work was funded by Open Targets (OTAR015) to M.J.G., K.Y. and J.S.-R. The K.Y. laboratory is supported by Wellcome Trust (206194). The M.J.G. laboratory is supported by SU2C (SU2CAACR-DT1213) and Wellcome Trust (102696 and 206194). Support was also received from AIRC 20697 (A.B.) and 18532 (L.T.); 5x1000 grant 21091 (A.B. and L.T.); ERC Consolidator Grant 724748 – BEAT (A.B.); FPRC-ONLUS, 5x1000 Ministero della Salute 2011 and 2014 (L.T.); and Transcan, TACTIC (L.T.). Author contributions M.J.G., K.Y. and C.B.-D. conceived the project. F.M.B. led CRISPR–Cas9 screening, co-developed the project Score web portal, contributed to analysis strategy, performed validation analyses and verified WRN dependency. F.I. led computational analyses and figure preparation, and contributed to the project Score web portal. G.P. performed experiments to verify WRN dependency, carried out analyses and contributed to in vivo studies. E.G. contributed to computational analysis and figure preparation. D.v.d.M. contributed to the project Score web portal. G.M., F.S., M.P., A.B. and L.T. performed in vivo studies. C.M.B., R.A., D.A.J., R.M., R.P. and P.W. performed CRISPR–Cas9 screens. R.S. performed tractability analysis. Y.R. performed WRN rescue experiments. C.M.B., S.H., A.B., L.T., E.A.S., D.D. and J.S.-R. assisted with project supervision. F.M.B., F.I., E.G., G.P., K.Y. and M.J.G. wrote the manuscript. K.Y. and M.J.G. directed the project. J.S.-R., A.B., L.T., M.J.G. and K.Y. acquired funding. All authors approved the manuscript. Competing interests E.A.S., D.D., C.B.-D., R.S. and Y.R. are GlaxoSmithKline employees. Open Targets is a public–private initiative involving academia and industry. K.Y. and M.J.G. receive funding from AstraZeneca. M.J.G. performed consultancy for Sanofi. All other authors declare no competing interests. Additional information Extended data is available for this paper at https://doi.org/10.1038/s41586- 019-1103-9. Supplementary information is available for this paper at https://doi.org/ 10.1038/s41586-019-1103-9. Reprints and permissions information is available at http://www.nature.com/ reprints. Correspondence and requests for materials should be addressed to K.Y. or M.J.G. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. © The Author(s), under exclusive licence to Springer Nature Limited 2019 N A t U r e | www.nature.com/nature
ARTICLE RESEARCH METHODS after a minimum of 96 h of puromyci https:// DNA Midi Ka D (SI (CR ing wi the 200 N74 itial ol the r orrelat OVMIU 0.8) nd Ovs and Marc mis r than c RNAS erage 1 RISPR L e pre CRISPR uysis set of 32. pe of lib CBISPR-Ca50 ng a celrTiter-Glo 20 Ass RNAs on a targeted-go
Article RESEARCH Methods CRISPR–Cas9 screening. Plasmids. All plasmids have previously been described10 and are available through Addgene (Cas9 vector, 68343; gRNA vector, 67974). Plasmids were packaged using the ViraPower Lentiviral Expression System (Invitrogen, K4975-00) as per the manufacturer’s instructions. Cell culture. Cell lines used in this study (Supplementary Table 1) were selected from 1,000 cell line panel7 of the Genomics of Drug Sensitivity in Cancer study, had been annotated in the Cell Model Passports database (https://cellmodelpassports.sanger.ac.uk/) and were maintained as previously described7 . To control for cross-contamination and sample swaps a panel of 92 single-nucleotide polymorphisms was profiled for each cell line before and following completion of the CRISPR–Cas9 screening pipeline. This study includes commonly misidentified cell lines: Ca9-22, short tandem repeat (STR) analysis confirmed that the identity matched the Japanese Collection of Research Bioresources Cell Bank (JCRB) reference (JCRB0625) and RIKEN (RCB1976); MKN28, noted as derivative of MKN74 in Cell Model Passports and clinical information matches MKN74; KP-1N, known misidentification issue, Cell Model Passports data for both KP-1N & Panc-1 are identical; OVMIU, known misidentification issue, Cell Model Passports data for both OVMIU and OVSAYO are identical; SK-MG-1, STR profile matches JCRB profile, which internally matches Marcus, Cell Model Passport data for both SK-MG-1 and Marcus are identical. Commonly misidentified lines have been noted in Supplementary Table 1 and on the Cell Model Passport. All commonly misidentified cell lines were retained, because the misidentification does not impact tissue or cancer type of origin, and all datasets used were generated in-house from the same matched cell line. A separate set of HCT116 cell lines was used for WRN validation experiments: HCT116 parental cells and HCT116 cells carrying Chr.3 or Chr.5, or both were a gift from M. Koi. HCT116 cells carrying Chr.2 were a gift from A. Goel. HCT116 cells carrying Chr.2 or Chr.3 were maintained in 400 μg ml−1 G418 (Thermo Fisher Scientific, 10131027); HCT116 cells carrying Chr.5 were maintained in 6 μg ml−1 blasticidin (Thermo Fisher Scientific, A1113903); and HCT116 cells carrying Chr.3 + Chr.5 were maintained in the presence of 400 μg ml−1 G418 and 6 μg ml−1 blasticidin. All cells were cultured in McCoy’s 5A medium (SigmaAldrich, M4892) with 10% FBS. Generation of Cas9-expressing cancer cell lines. Cells were transduced with a lentivirus containing Cas9 in T25 or T75 flasks at approximately 80% confluence in the presence of polybrene (8 μg ml−1 ). Cells were incubated overnight followed by replacement of the lentivirus-containing medium with fresh complete medium. Blasticidin selection commenced 72 h after transduction at an appropriate concentration determined for each cell line using a blasticidin dose–response assay (blasticidin range, 10–75 μg ml−1 ) and cell viability was assessed using the CellTiter-Glo 2.0 Assay (Promega, G9241). Cas9 activity was assessed as described previously10. Cell lines with Cas9 activity over 75% were used for sgRNA library transduction. Genome-wide sgRNA library and screen. Two genome-wide sgRNA libraries were used in this study: the Human CRISPR Library v.1.0 and v.1.1. The Human CRISPR Library v.1.0 was described previously and targets 18,009 genes with 90,709 sgRNAs (Addgene, 67989)10. The Human CRISPR Library v.1.1 contains all sgRNAs from v.1.0 plus 1,004 non-targeting sgRNAs and 5 additional sgRNAs against 1,876 selected genes that encode kinases, epigenetic-related proteins and pre-defined fitness genes. An oligo pool of Library v.1.1 was synthesized using high-throughput silicon platform technology (Twist Bioscience) and cloned as described previously10. For consistency, all computational analyses were performed considering only the overlapping sgRNAs between the two libraries (90,709 sgRNAs). Data for the additional sgRNAs in Library v.1.1 can be found in the raw read count files for cell lines screened with this library version (available at available at https://cog.sanger.ac.uk/cmp/download/raw_sgrnas_counts.zip), but have been removed before quality control analysis. The HT-29 cell line was screened with both libraries and resulting datasets were kept separated for comparative analyses (results are summarized in Extended Data Fig. 2j). A total of 3.3 × 107 cells were transduced with an appropriate volume of the lentiviral-packaged whole-genome sgRNA library to achieve 30% transduction efficiency (100× library coverage). The volume was determined for each cell line using a titration of the packaged library and assessing the percentage of blue fluorescent protein (BFP)-positive cells by flow cytometry. Transductions were performed in technical triplicate (or duplicate for cell lines with a large cell size such as glioblastoma). Owing to the large number of screens performed, multiple batches of packaged library virus were prepared. Each batch was tested in HT-29 cells to ensure consistency between batch preparations. In addition, the HT-29 cell line was screened every 3 months to ensure the quality of data generated by the pipeline was consistent. Transduction efficiency was assessed 72 h after transduction. Samples with a transduction efficiency between 15 and 60% were used for puromycin selection. The appropriate concentration of puromycin for each individual cell line was determined from a dose–response curve (puromycin range, 1–5 μg ml−1 ) and cell viability was assessed using a CellTiter-Glo 2.0 Assay (Promega, G9241). The percentage BFP-positive cells was reassessed after a minimum of 96 h of puromycin selection. For samples with less than 80% BFP-positive cells, puromycin selection was extended for an additional 3 days and the percentage of BFP-positive cells was assessed again. Cells were maintained until day 14 after transduction with a minimum of 5.0 × 107 cells reseeded at each passage (500× library coverage). Approximately 2.5 × 107 cells were collected, pelleted and stored at −80 °C for DNA extraction. DNA extraction, sgRNA PCR amplification, Illumina sequencing and sgRNA counting. Genomic DNA was extracted from cell pellets using either the QIAsymphony automated extraction platform (Qiagen, QIAsymphony DSP DNA Midi Kit, 937255) or by manual extraction (Qiagen, Blood & Cell Culture DNA Maxi Kit, 13362) as per the manufacturer’s instructions. PCR amplification, Illumina sequencing (19-bp single-end sequencing with custom primers on the HiSeq2000 v.4 platform) and sgRNA counting were performed as described previously10. CRISPR screen data analyses. Low-level quality control assessment and filtering. To perform initial low-level quality control, the Pearson’s correlation of treatment counts between replicates was assessed for each cell line (Extended Data Fig. 1c). The resulting correlation scores were generally high (median = 0.8), but not sufficiently distinguishable from expectation (median correlation between replicates of any pair of randomly selected cell lines). Thus, to define a reproducibility threshold, we developed an approach based on a previously published study29. Specifically, we selected a set of the 838 most-informative sgRNAs, defined as those with an average pairwise Pearson’s correlation greater than 0.6 between corresponding patterns of the count fold changes 14 days after transfection versus plasmid library across all screened cell lines. We next computed average gene-level profiles for 308 genes targeted by these informative sgRNAs for each individual technical replicate, and then computed all possible pairwise Pearson’s correlation scores between the resulting profiles. This enabled the estimation of a null distribution of replicate correlations (plotted in grey in Extended Data Fig. 1d). We then defined a reproducibility threshold R value of 0.68, for which the estimated probability mass function of the correlation scores that was computed between replicates of the same cell line (considering the identified 308 genes only) was at least twice that of the null mass probability function (Extended Data Fig. 1d). Of the 332 screened cell lines with at least two technical replicates, 305 had an average replicate correlation higher than this threshold, and therefore passed the reproducibility assessment; for 7 cell lines there were no replicates. Excluding the least reproducible replicate for the 14 cell lines that did not pass the first reproducibility assessment allowed their average replicate correlation to exceed the threshold defined above, thus resulting in a set of 326 cell lines that passed the low-level quality control assessment (Supplementary Table 1). Screening performance assessment. We considered the genome-wide profiles of gene-level sgRNA fold change values (averaged across targeting sgRNAs and replicates) of each cell line to be a classifier of predefined sets of essential and non-essential genes30 by means of receiver operating characteristic (ROC) indicators (Extended Data Fig. 1g and Supplementary Table 1). In addition, we measured the magnitude of the depletion signal observed in each screened cell line by evaluating the median log(change in sgRNA count), and the discriminative distance between their distributions (as measured by the Glass’s Δ) for predefined essential and non-essential genes30 and ribosomal protein genes31. In total, 2 out of the 326 cell lines were manually removed, because they had area under the ROC curve, area under the precision/recall curve and both Glass’s Δ values that were 3 s.d. lower than the average. On the basis of our low-level quality control and screening performance, the final analysis set was composed of 324 cell lines (Supplementary Table 1). Further details on these analyses are included in the Supplementary Information. sgRNA count preprocessing and CRISPR-bias correction. The analysis set of 324 cell lines was further processed using CRISPRcleanR32 (https://github.com/francescojm/CRISPRcleanR). sgRNAs with less than 30 reads in the plasmid counts and sgRNAs belonging to only the Library v1.1 were first removed. The remaining sgRNAs were assembled into one file per cell line, including the read counts from the matching library plasmid and all replicates and then normalized using a median–ratio method to adjust for the effect of library sizes and read count distributions33. Depletion/enrichment fold changes for individual sgRNAs were quantified between post library-transduction read counts and library plasmid read counts at the individual replicate level. This was performed using the ccr. NormfoldChanges function of CRISPRcleanR. Next we performed a correction of gene-independent responses to CRISPR–Cas9 targeting34 using the ccr.GWclean function of CRISPRcleanR with default parameters. Calling CRISPR–Cas9 gene knockout fitness effects. The CRISPRcleanR-corrected sgRNAs-level values (corrected fold change values) were used as input into an in-house-generated R implementation of the BAGEL method30 to call significantly depleted genes (code publicly available at https://github.com/francescojm/ BAGELR). Our BAGEL implementation computes gene-level Bayesian factors by the sgRNAs on a targeted-gene basis, by averaging instead of summing them
RESEARCH ARTICLE ally it uses refer of predefin otial and n r to e at httn ell lin at the urther these analyses ar ctedsgRArca MAC s pe that n ith all the NAs reshold of FDR thresholdequalt (ADaM)to identify Es or msibe er of tests performed in the th at lcast 1 Glass's△>1fo (f ibed furt der t .aue for pan-cance canc r types for a gene sho be pred 1s tto the sd.of the tw ted fror in th utation of the tar tability,an t be d ou ysis.To ave gene oaded from the GTEx Portal hed and cal and computational relate 33 ncer as well as ind 41
RESEARCH Article Additionally, it uses reference sets of predefined essential and non-essential genes30. However, in order to avoid their status (essential or non-essential) being defined a priori, we removed any high-confidence cancer driver genes as defined previously7 from these sets. The resulting curated reference gene sets are available as built-in data objects in the R implementation of BAGEL (curated_BAGEL_essential.rdata and curated_BAGEL_nonEssential.rdata, both available at https://github.com/ francescojm/BAGELR/tree/master/data). A statistical significance threshold for gene-level Bayesian factors was determined for each cell line as described previously8 . Each gene was assigned a scaled Bayesian factor computed by subtracting the Bayesian factor at the 5% FDR threshold defined for each cell line from the original Bayesian factor, and a binary fitness score equal to 1 if the resulting scaled Bayesian factor was greater than 0. Further details on these analyses are included in the Supplementary Information. In addition, CRISPRcleanR-corrected sgRNA treatment counts were derived from the corrected sgRNA-level count fold changes (using the ccr.correctCounts function of CRISPRcleanR) and used as input into MAGeCK35 to compute the depletion significance using mean–variance modelling. This was performed using the MAGeCK Python package (version 0.5.3), specifying in the command line call that no normalization was required (as this was already performed by CRISPRcleanR). At the end of this stage, the following gene-level depletion score matrices were produced for each cell line: raw count fold changes, copy number bias-corrected count fold changes, Bayesian factors, scaled Bayesian factors, binary fitness scores and MAGeCK depletion FDRs. All scores are summarized for each cell line and available at https://cog.sanger.ac.uk/cmp/download/essentiality_matrices.zip, together with all the sgRNAs raw count files (available at https:// cog.sanger.ac.uk/cmp/download/raw_sgrnas_counts.zip). High-level CRISPR screen data analyses. Adaptive daisy model (ADaM) to identify core fitness genes. We designed the adaptive daisy model (ADaM), an heuristic algorithm for the identification of core fitness genes, implemented it in an R package and made it publicly available at https://github.com/francescojm/ADaM. ADaM is based on the daisy model8 , but it adaptively determines the minimal number of cell lines m from a given cancer type in which a gene should exert a significant fitness effect for that gene to be considered a core fitness gene for that cancer type. ADaM is described further in the Supplementary Information. In order to identify pan-cancer core fitness genes, we applied the same method to determine the minimal number k of cancer types for which a gene should be predicted as a pan-cancer core fitness gene. Characterization of ADaM pan-cancer core fitness genes. Reference sets of essential and non-essential genes were extracted from a previously published study30. Other reference gene sets (used while characterizing the ADaM pan-cancer core fitness genes, described below) were derived from the Molecular Signature Database (MSigDB36) and post-processed as described previously32. A more recent set of a priori known essential genes was derived from a previously published study9 . The pan-cancer core fitness genes that did not belong to any of the aforementioned gene sets were tested for gene family enrichments (using a hypergeometric test) by deriving gene annotations using the BioMart R package37 and biological pathway enrichments using a comprehensive collection of pathways gene sets from Pathway Commons38 (post-processed to reduce redundancies across different sets as described previously39). All enrichment P values were corrected using the Benjamini–Hochberg method. Results are shown in Supplementary Table 4. Comparison between the ADaM pan-cancer core fitness genes and other reference sets of essential genes. We compared the pan-cancer core fitness genes identified by ADaM with the BAGEL reference set of essential genes30, and a more recently proposed larger set of essential genes9 in terms of size, estimated precision (number of included true positive genes/number of included genes) and recall (number of included true positive genes/total number of true positive genes). In these comparisons, we used gold-standard essential genes involved in cell essential processes (downloaded from the MSigDB36 and post-processed as described previously32). In addition, we estimated FDRs for the three gene sets (number of included false positive genes/total number of false positive genes) considering genes predicted to be strongly context-specific essential (thus not core-fitness essential) to be false-positive genes according to a previous publication12, and using three different confidence levels, as further described in the Supplementary Information. Basal expression of cancer-type specific core fitness genes in normal tissues. Basal gene median reads per kilobase of transcript per million mapped reads in normal human tissues were downloaded from the GTEx Portal40, log-transformed and quantile-normalized on a tissue-type basis. Statistical and computational analyses. ANOVA to identify genomic correlates with gene fitness. We performed a systematic ANOVA to test associations between gene-level fitness effects and the presence of 484 cancer driver events (CDEs; 151 single-nucleotide variants and 333 copy number variants)7 or MSI status at the pan-cancer as well as individual cancer-type levels. In total, 10 cancer types with at least 10 screened cell lines were analysed (breast carcinoma, colorectal carcinoma, gastric carcinoma, head and neck carcinoma, lung adenocarcinoma, neuroblastoma, oral cavity carcinoma, ovarian carcinoma, pancreatic carcinoma and squamous cell lung carcinoma). The remaining cancer types were collapsed on a tissue basis (annotation in Supplementary Table 1) and the resulting tissues with at least 10 cell lines were included in the analysis (bone, central nervous system, oesophagus, haematopoietic and lymphoid). A total of 14 analyses (referred for simplicity as cancer-type-specific ANOVAs in the main text and below) plus a pan-cancer analysis including all screened cell lines were performed. Each ANOVA was performed using the analytical framework described previously7 and implemented in a Python package41 (https://github.com/CancerRxGene/gdsctools). Only genes that did not belong to any set of prior known essential genes (defined in the previous sections) and not predicted by ADaM to be core fitness genes were included in the analyses. For all tested gene fitness–CDE associations, effect size estimations versus pooled s.d. (quantified using Cohen’s d), effect sizes versus individual s.d. (quantified using two different Glass’s Δ metrics, for the CDEpositive and the CDE-negative populations separately), CDE P values and all other statistical scores were obtained from the fitted models. An association was tested only if at least three cell lines were contained in the two sets resulting from the dichotomy induced by CDE status (that is, at least three CDE-positive and three CDE-negative cell lines). The P values from all ANOVAs were corrected together using the Tibshirani–Storey method42. Subsequently, MSI status was also tested for statistical associations with differential gene fitness effects for pan-cancer and cancer types with at least three MSI cell lines. We used the following statistical significance and effect size thresholds for category associations between gene fitness effects and genomic markers: Class A marker: a P-value threshold of 10−3 with a FDR threshold equal to 25% (or 5% for MSI) and with Glass’s Δ > 1. Different FDR thresholds were used for associations with CDEs or MSI because the number of tests performed in the former was six orders of magnitude larger than the latter. Class B marker: a FDR threshold of 30% with at least one Glass’s Δ > 1 for pan-cancer associations. Class C marker or weaker: an ANOVA P-value threshold of 10−3 and for pan-cancer associations at least one Glass’s Δ > 1; for weaker, a simple Student’s t-test (for difference assessment of the mean depletion fold change between CDEpositive/CDE-negative cell lines) P-value threshold of 0.05 and for pan-cancer associations, at least one Glass’s Δ > 1. The additional constraint of Glass’s Δ values (quantifying the effect size with respect to the s.d. of the two involved sub-populations of samples) was considered for the pan-cancer markers in order to account for the significantly larger number of samples analysed in the pan-cancer setting, which might result in highly significant P values even for small effect size associations. Further details on this analysis are reported in the Supplementary Information. Target priority scores and target tractability. Computation of the target priority scores and their significance is described in the Supplementary Information. To estimate the likelihood of a target to bind a small molecule or the likelihood of a target to be accessible to an antibody, we made use of a genome-wide target tractability assessment pipeline14. The in silico pipeline integrates data from public sources, and assigns human protein-coding genes into hierarchical qualitative buckets. Predicted tractability and confidence in the data increased from bucket 10 to bucket 1; targets in bucket 1 were considered to be the most tractable. Of note, targets in lower buckets (that is, buckets 10 to 8) were considered to have an uncertain tractability, and should not be ruled out as ‘intractable’ without a deep tractability assessment. Further details are provided in the Supplementary Information. Characterization of target protein families and enrichment analysis. To characterize protein families and compute statistical enrichment, we made use of the Panther online tool43. GPX4 differential expression analysis. RNA-sequencing gene expression measurements transformed using voom44 were obtained from a previously published study45. For GPX4 analysis, cell lines were divided into two groups according to their loss-of-fitness response to GPX4 knockout (using BAGEL FDR < 5% as significance threshold for gene depletion) and gene expression fold changes were calculated between the GPX4 non-dependent and dependent cell lines (log2 values of the mean difference). Differential gene expression was statistically assessed using the R package Limma46. Gene set enrichment analysis was performed with ssGSEA36 and cancer hallmark gene sets were used to identify significant enrichment among the top differentially expressed genes. Then, 10,000 random permutations were performed for each signature to calculate empirical P values and a Benjamini–Hochberg FDR correction was applied. WRN dependency in MSI cell lines. Co-competition assay. The sequences of sgRNAs that target WRN and cell lines used in validation experiments are described in Supplementary Table 10. This included two sgRNA from the original screen and two independent sgRNAs. The sgRNAs were cloned into pKLV2-U6gRNA5(BbsI)-PGKpuro2ABFP-W (Addgene, 67974). Cell lines were transduced at around 50% efficiency as described above in six-well plates. A co-competition
ARTICLE RESEARCH was determ tage BFP-positive cells (that is purchased from C arles Rive r Labora ned in hy cage cT16 (00 Ce 351 tin (Cell n our in viy 3500 and this l RD 2212) bbitIgG(H+L)(LI-COR. in-fixed, SW62 nd SWa troll r10min)i led in a afte which D n Liguid DAR Sub ing I ved as po ng p as calc edcel e sum of f WRN depe HCT116 parental cells Data availability included in the data 1.2 an re a e fr r( nich the c o d ty was assessed sin Code availability SPR- d co HC the 6f2 31. Cell growth Cyte-FLR 4> very6h usingan ders&Hube al expre 420182 ence count data 2010 35. 2014 et al by山 902555801
Article RESEARCH score was determined as the ratio of the percentage BFP-positive cells (that is, sgRNA-positive cells) on day 14 compared to day 4, as measured by flow cytometry. A co-competition score less than 1 indicates a relative reduction in BFP-positive cells, resulting from targeting of a loss-of-fitness gene. Clonogenic assay. Cell lines were transduced with lentivirus that encodes WRN sgRNA at around 100% efficiency as described above in six-well plates (2,000 cells per well), typically for 15–21 days. Cells were fixed using 100% ice-cold ethanol for 30 min followed by Giemsa staining overnight at room temperature. Western blot analysis. Cells were transduced at around 100% as described above in 10-cm dishes. Day 5 after transduction, cells were lysed with 200 μl RIPA buffer supplemented with protease and phosphatase inhibitors and lysates were used for SDS–PAGE and immunoblot analysis. Antibodies used were: WRN (Cell Signaling Technologies, 4666; dilution 1:2,000), WRN for domain rescue experiment (Thermo Fisher Scientific, PA5-27319); MLH1 (Cell Signaling Technologies, 3515; dilution 1:1,000); MSH3 (Santa Cruz Biotechnology, sc-271080; dilution 1:1,000); anti-Flag M2 (Sigma-Aldrich, F3165); β-actin (Cell Signaling Technologies, 4970); and anti-β-tubulin (Sigma-Aldrich, T4026: dilution 1:5,000). Secondary antibodies included: IRDye 800CW donkey anti-mouse antibody (LI-COR, 926-32212); IRDye 680LT donkey anti-rabbit IgG (H+L) (LI-COR, 925-68023); anti-mouse IgG HRP-linked secondary antibody (GE Healthcare, NA931). Molecular weight markers included: SeeBlue Plus2 Pre-stained Protein Standard (Thermo Fisher Scientific, 5925) and Precision Plus Protein Standards (BioRad, 161-0373). WRN rescue experiment. SW620 and SW48 cells (2 × 105 cells) were transfected by nucleofection (Lonza 4D Nucleofector Unit X) with Cas9–sgRNA ribonucleoproteins (RNP) targeting human MAVS (used as a non-essential knockout control) or WRN, together with overexpression of 200 ng pmGFP control or 200 ng mouse Wrn cDNA (Origene, MR226496). From each sample after nucleofection, 5,000 cells were seeded in a 96-well plate and allowed to grow for 5 days, after which cells were collected for either CellTiter-Glo assay (Promega, G9241) or western blot analysis. CellTiter-Glo data were read on an Envision Multiplate Reader and data analysis was performed using GraphPad Prism 7 software. Student’s t-test was performed using the multiple t-test module in Prism 7. The sgRNA sequences that were used are listed in Supplementary Table 10. RNA interference. A pool of four siRNAs that target WRN were used (Dharmacon, L-010378-00-0005). HCT116 cells were grown and transfected with siRNA using the RNAiMAX (Invitrogen) transfection reagent following the manufacturer’s instructions. Each experiment included: mock control (transfection lipid only), ON-TARGETplus Non-targeting Control Pool (Dharmacon, D-001810-10-05) as a negative control, and polo‐like kinase 1 (PLK1) (Dharmacon, L-003290-00-0010), which served as a positive control. siRNA sequences are listed in Supplementary Table 10. Rescue of WRN dependency in HCT116 isogenic lines. HCT116 parental cells and derivatives carrying Chr.2, Chr.3, Chr.5 or Chr.3 + Chr.5 were transduced to express Cas9. After transduction, all lines displayed Cas9 activity >80%. To assess WRN dependency, cells were seeded at 1.5 × 103 cells per well in 100 μl complete growth medium in 96-well plastic cell culture plates. At day 0, cells were transduced with viral particles containing sgRNAs targeting essential or non-essential genes, or WRN sgRNA 1 and WRN sgRNA 4 in order to achieve a >90% transduction efficiency. The following day, the medium was replaced and 48 h after transduction puromycin was added at final concentration of 2 μg ml−1 . Plates were incubated at 37 °C in 5% CO2 for 7 days, after which the cell viability was assessed using CellTiter-Glo (Promega) by measuring luminescence on an Envision multiplate reader. Clonogenic assays were performed as described in the ‘WRN dependency in MSI cell lines’ section; and 48 h after transduction puromycin was added at a final concentration of 2 μg ml−1 . In vivo validation. WRN knockout using an inducible CRISPR–Cas9 system. To generate inducible WRN sgRNA-expressing HCT116 cells, we cloned WRN sgRNA 4 into the pRSGT16H-U6Tet-(sg)-CMV-TetRep-TagRFP-2A-Hygro vector (Cellecta). Cas9-expressing HCT116 cells were transduced and selected with 500 μg ml−1 of hygromycin (Thermo Fisher Scientific). To obtain cell populations that both uniformly express Cas9 and contain the inducible WRN-targeting sgRNA, we generated single-cell clones by serial dilution. To measure the growth rate of WRN sgRNA-expressing HCT116 cells after conditional induction of WRN knockout, cells were grown in flasks in the presence or absence of 2 μg ml−1 doxycycline for 24 h and then seeded in 96-well plates, with or without the same concentration of doxycycline. Cell growth was monitored every 6 h using an automated IncuCyte-FLR 4X phase-contrast microscope (Essen Instruments). The average object-summed intensity was calculated using the IncuCyte software (Essen Instruments). Mouse xenograft studies. Female non-obese diabetic/severe combined immunodeficiency (NOD/SCID) mice (Charles River Laboratories) were used in all in vivo studies. All animal procedures were approved by the Ethical Committee of the Institute and by the Italian Ministry of Health (authorization 806/2016-PR). The methods were carried out in accordance with the approved guidelines. Mice were purchased from Charles River Laboratories, maintained in hyperventilated cages and manipulated under pathogen-free conditions. In particular, mice were housed in individually sterilized cages; each cage contained a maximum of seven mice and optimal amounts of sterilized food, water and bedding. HCT116 xenografts were established by subcutaneous inoculation of 2 × 106 cells into the right posterior flank of 5- to 6-week-old mice. Tumour size was evaluated by calliper measurements, and the approximate volume of the mass was calculated using the formula 4/3π × (d/2)2 × (D/2), where d is the minor tumour axis and D is the major tumour axis. When tumours reached an average size of approximately 250–300 mm3 , animals with the most homogeneous size were selected and randomized by tumour size. Doxycycline (Sigma-Aldrich, D9891) was dissolved in water and administered daily at a 50 mg kg−1 concentration by oral gavage. For each experimental group, 8–10 mice were used to enable reliable estimation of within-group variability. Operators allocated mice to the different treatment groups during randomization but were blinded during measurements. The maximal tumour volume permitted in our in vivo experiments was 3,500 mm3 and this limit was never exceeded. In vivo procedures and related biobank data were managed using the Laboratory Assistant Suite, a web-based proprietary data management system for automated data tracking47. Immunohistochemistry. Formalin-fixed, paraffin-embedded tissues explanted from cell xenografts were partially sectioned (10-μm thick) using a microtome. Then, 4-μm paraffin tissue sections were dried in a 37 °C oven overnight. Slides were deparaffinized in xylene and rehydrated through graded alcohol to water. Endogenous peroxidase was blocked in 3% hydrogen peroxide for 30 min. Microwave antigen retrieval was carried out using a microwave oven (750 W for 10 min) in 10 mmol l−1 citrate buffer, pH 6.0. Slides were incubated with monoclonal mouse anti-human KI-67 (1:100; DAKO) overnight at 4 °C inside a moist chamber. After washings in TBS, anti-mouse secondary antibody (DAKO Envision+System horseradish peroxidase-labelled polymer, DAKO) was added. Incubations were carried out for 1 h at room temperature. Immunoreactivities were revealed by incubation in DAB chromogen (DakoCytomation Liquid DAB Substrate Chromogen System, DAKO) for 10 min. Slides were counterstained in Mayer’s haematoxylin, dehydrated in graded alcohol, cleared in xylene and a coverslip was applied using DPX (Sigma-Aldrich). A negative control slide was processed with only the secondary antibody, omitting the primary antibody incubation. Immunohistochemically stained slides for KI-67 were scanned with a 40× objective. Ten representative images selected from three cases were then analysed using ImageJ (NIH), which segmented cells with positive and negative nuclei. The percentage of the area containing positive cells was calculated as the brown area (positively stained cells) divided by the sum of brown and blue areas (negatively stained cells). The software interpretation was manually verified by visual inspection of the digital images to ensure accuracy. Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this paper. Data availability Data and analyses are included in the published article and supplementary data 1, 2 and 3 are available from FigShare (https://figshare.com/projects/ CRISPRtargetID/60146). The gene fitness scores of the cell lines, raw counts of the sgRNA data, and processed data and results are available from the project Score web portal: https://score.depmap.sanger.ac.uk. Code availability Software code are available through GitHub at https://github.com/francescojm/ CRISPRcleanR, https://github.com/francescojm/ADAM and https://github.com/ francescojm/BAGELR. 29. Ballouz, S. & Gillis, J. AuPairWise: a method to estimate RNA-seq replicability through co-expression. PLOS Comput. Biol. 12, e1004868 (2016). Home (25 Doggett St) 30. Hart, T. & Mofat, J. BAGEL: a computational framework for identifying essential genes from pooled library screens. BMC Bioinformatics 17, 164 (2016). 31. Yoshihama, M. et al. The human ribosomal protein genes: sequencing and comparative analysis of 73 genes. Genome Res. 12, 379–390 (2002). 32. Iorio, F. et al. Unsupervised correction of gene-independent cell responses to CRISPR–Cas9 targeting. BMC Genomics 19, 604 (2018). 33. Anders, S. & Huber, W. Diferential expression analysis for sequence count data. Genome Biol. 11, R106 (2010). 34. Aguirre, A. J. et al. Genomic copy number dictates a gene-independent cell response to CRISPR/Cas9 targeting. Cancer Discov. 6, 914–929 (2016). 35. Li, W. et al. MAGeCK enables robust identifcation of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 15, 554 (2014). 36. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005)