Online citations, reference lists, and bibliographies.
← Back to Search

Cn.MOPS: Mixture Of Poissons For Discovering Copy Number Variations In Next-generation Sequencing Data With A Low False Discovery Rate

G. Klambauer, Karin Schwarzbauer, Andreas Mayr, Djork-Arné Clevert, Andreas Mitterecker, Ulrich Bodenhofer, Sepp Hochreiter
Published 2012 · Biology, Medicine

Save to my Library
Download PDF
Analyze on Scholarcy
Share
Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose ‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1–FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor.
This paper references
10.1093/bioinformatics/btl646
A faster circular binary segmentation algorithm for the analysis of array CGH data
E. S. Venkatraman (2007)
10.1093/bioinformatics/btq587
CNAseg - a novel framework for identification of copy number changes in cancer from second-generation sequencing data
S. Ivakhno (2010)
10.1093/bioinformatics/btl033
A new summarization method for affymetrix probe level data
S. Hochreiter (2006)
10.1093/nar/gkr068
Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm
Alberto Magi (2011)
10.1093/bioinformatics/btr462
Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV
J. Sathirapongsasuti (2011)
10.1038/nature09534
A map of human genome variation from population-scale sequencing
G. Abecasis (2010)
10.1093/bioinformatics/btq635
Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization
V. Boeva (2011)
10.1038/ng.437
Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing
C. Alkan (2009)
10.1038/nature07517
Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry
D. Bentley (2008)
10.1093/bioinformatics/btr247
Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling
Pawel P. Labaj (2011)
10.1038/nature07484
The diploid genome sequence of an Asian individual
J. Wang (2008)
10.1101/gr.106344.110
Detecting copy number variation with mated short reads.
P. Medvedev (2010)
10.1093/bioinformatics/btm478
I/NI-calls for the exclusion of non-informative genes: a highly effective filtering tool for microarray data
W. Talloen (2007)
10.1073/pnas.1010604107
Filtering data from high-throughput experiments based on measurement reliability
W. Talloen (2010)
10.1038/nature07943
The cancer genome
M. Stratton (2009)
FARMS : a latent variable model to detect copy number variations in microarray data with a low false discovery rate
D.-A. Clevert (2011)
10.1038/nature09792
Initial impact of the sequencing of the human genome
E. Lander (2011)
Catching Change-points with Lasso
Z. Harchaoui (2007)
10.1038/nmeth.1276
High-resolution mapping of copy-number alterations with massively parallel sequencing
D. Chiang (2009)
10.1038/ng.806
A framework for variation discovery and genotyping using next-generation DNA sequencing data
M.A. DePristo (2011)
10.1093/nar/gkr197
cn.FARMS: a latent variable model to detect copy number variations in microarray data with a low false discovery rate
Djork-Arné Clevert (2011)
10.1093/nar/gkn425
Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
J. Dohm (2008)
10.1101/gr.113084.110
SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples.
S. Le (2011)
10.1038/ng.128
Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing
P. Campbell (2008)
10.1186/1471-2105-11-94
Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
J. H. Bullard (2009)
10.1186/gb-2009-10-3-r25
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
B. Langmead (2008)
10.1038/nature06884
The complete genome of an individual by massively parallel DNA sequencing
D. Wheeler (2008)
10.1101/gr.092981.109
Sensitive and accurate detection of copy number variants using read depth of coverage.
Seungtai Yoon (2009)
10.1126/SCIENCE.1072047
Recent Segmental Duplications in the Human Genome
J. Bailey (2002)
10.1038/NATURE09534
A map of human genome variation from population-scale sequencing
Life Technologies (2011)
10.1038/nature09298
Integrating common and rare genetic variation in diverse human populations
D. Altshuler (2010)
10.1126/science.1160342
A Global View of Gene Activity and Alternative Splicing by Deep Sequencing of the Human Transcriptome
M. Sultan (2008)
e69 Nucleic Acids Research
(2012)
A new test for the Poisson distribution
L. Brown (2001)
Genome Res
10.1186/1471-2105-11-432
rSW-seq: Algorithm for detection of copy number alterations in deep sequencing data
Tae-Min Kim (2009)
10.1186/1471-2105-10-80
CNV-seq, a new method to detect copy number variation using high-throughput sequencing
C. Xie (2008)



This paper is referenced by
10.1534/genetics.114.172320
Variation in Crossover Frequencies Perturb Crossover Assurance Without Affecting Meiotic Chromosome Segregation in Saccharomyces cerevisiae
G. N. Krishnaprasad (2014)
10.1042/ETLS20190007
Variant calling and quality control of large-scale human genome sequencing data
Brandon Jew (2019)
10.1111/ANZS.12175
Statistical models for DNA copy number variation detection using read-depth data from next generation sequencing experiments
Tieming Ji (2016)
Addressing NGS Data Challenges: Efficient High Throughput Processing and Sequencing Error Detection
A. Kawalia (2015)
10.5713/ajas.18.0204
Detection of genome-wide structural variations in the Shanghai Holstein cattle population using next-generation sequencing
D. Liu (2019)
10.17863/CAM.15617
Host and pathogen genetics associated with pneumococcal meningitis
John A. Lees (2017)
10.1016/J.OGLA.2018.08.003
Absence of Goniodysgenesis in Patients with Chromosome 13Q Microdeletion-Related Microcoria.
C. Gerth-Kahlert (2018)
10.1371/journal.pbio.3000135
Uncovering and resolving challenges of quantitative modeling in a simplified community of interacting cells
Samuel F. M. Hart (2019)
10.1101/101766
Novel insights into the molecular heterogeneity of hepatocellular carcinoma
J. Jovel (2017)
10.1101/047266
Transient structural variations alter gene expression and quantitative traits in Schizosaccharomyces pombe.
D. Jeffares (2016)
10.1371/journal.pone.0126321
Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow
A. Kawalia (2015)
10.1093/bioinformatics/btv070
CNOGpro: detection and quantification of CNVs in prokaryotic whole-genome sequencing data
O. Brynildsrud (2015)
10.4172/2469-9853.S1-007
Structural Variation Detection from Next Generation Sequencing
Kai Ye (2015)
10.1371/journal.pcbi.1004873
CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing
Eric Talevich (2016)
Detecting de novo mutations in intellectual disability.
J. Ligt (2014)
10.1016/j.jmoldx.2016.05.008
Certified DNA Reference Materials to Compare HER2 Gene Amplification Measurements Using Next-Generation Sequencing Methods.
Chih-Jian Lih (2016)
10.1155/2014/910390
biomvRhsmm: Genomic Segmentation with Hidden Semi-Markov Model
Y. Du (2014)
10.1214/18-AOAS1223
Phylogeny-based tumor subclone identification using a Bayesian feature allocation model
Li Zeng (2018)
10.1093/femsyr/foz047
QTL analysis of natural Saccharomyces cerevisiae isolates reveals unique alleles involved in lignocellulosic inhibitor tolerance.
R. D. de Witt (2019)
10.1101/2019.12.11.869693
Chromosome-scale genome assembly provides insights into rye biology, evolution, and agronomic potential
M. T. Rabanus-Wallace (2019)
10.1186/1471-2105-14-S11-S1
Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives
M. Zhao (2013)
10.1093/bib/bbv055
Evaluation of somatic copy number estimation tools for whole-exome sequencing data
Jae-Yong Nam (2016)
Copy Number Variation in the Porcine Genome Detected from Whole-Genome Sequence
Rebecca Anderson (2018)
10.1093/nar/gky1263
WisecondorX: improved copy number detection for routine shallow whole-genome sequencing
Lennart Raman (2019)
10.1186/s12864-017-3952-7
The adaptive landscape of wildtype and glycosylation-deficient populations of the industrial yeast Pichia pastoris
Josef W. Moser (2017)
10.1111/cas.13982
Integrative molecular profiling identifies a novel cluster of estrogen receptor‐positive breast cancer in very young women
C. Park (2019)
10.1371/journal.pcbi.1007069
Comprehensively benchmarking applications for detecting copy number variation
L. Zhang (2019)
10.1101/2020.04.16.045120
A comprehensive benchmarking of WGS-based structural variant callers
Varuni Sarwal (2020)
Computational Methods for Analysis of Dynamic Transcriptome and Its Regulation Through Chromatin Remodeling and Intracellular Signaling
Tarmo Äijö (2014)
Basic Methods of Data Analysis
Johannes Kepler (2015)
10.1186/s12934-017-0661-5
Implications of evolutionary engineering for growth and recombinant protein production in methanol-based growth media in the yeast Pichia pastoris
Josef W. Moser (2017)
10.3389/fgene.2019.00737
A Survey of Copy Number Variation in the Porcine Genome Detected From Whole-Genome Sequence
B. N. Keel (2019)
See more
Semantic Scholar Logo Some data provided by SemanticScholar