Online citations, reference lists, and bibliographies.
← Back to Search

Cn.MOPS: Mixture Of Poissons For Discovering Copy Number Variations In Next-generation Sequencing Data With A Low False Discovery Rate

G. Klambauer, Karin Schwarzbauer, Andreas Mayr, Djork-Arné Clevert, Andreas Mitterecker, Ulrich Bodenhofer, Sepp Hochreiter
Published 2012 · Biology, Medicine

Save to my Library
Download PDF
Analyze on Scholarcy
Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose ‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1–FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at and at Bioconductor.
This paper references
A faster circular binary segmentation algorithm for the analysis of array CGH data
E. S. Venkatraman (2007)
CNAseg - a novel framework for identification of copy number changes in cancer from second-generation sequencing data
S. Ivakhno (2010)
A new summarization method for affymetrix probe level data
S. Hochreiter (2006)
Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm
Alberto Magi (2011)
Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV
J. Sathirapongsasuti (2011)
A map of human genome variation from population-scale sequencing
G. Abecasis (2010)
Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization
V. Boeva (2011)
Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing
C. Alkan (2009)
Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry
D. Bentley (2008)
Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling
Pawel P. Labaj (2011)
The diploid genome sequence of an Asian individual
J. Wang (2008)
Detecting copy number variation with mated short reads.
P. Medvedev (2010)
I/NI-calls for the exclusion of non-informative genes: a highly effective filtering tool for microarray data
W. Talloen (2007)
Filtering data from high-throughput experiments based on measurement reliability
W. Talloen (2010)
The cancer genome
M. Stratton (2009)
FARMS : a latent variable model to detect copy number variations in microarray data with a low false discovery rate
D.-A. Clevert (2011)
Initial impact of the sequencing of the human genome
E. Lander (2011)
Catching Change-points with Lasso
Z. Harchaoui (2007)
High-resolution mapping of copy-number alterations with massively parallel sequencing
D. Chiang (2009)
A framework for variation discovery and genotyping using next-generation DNA sequencing data
M.A. DePristo (2011)
cn.FARMS: a latent variable model to detect copy number variations in microarray data with a low false discovery rate
Djork-Arné Clevert (2011)
Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
J. Dohm (2008)
SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples.
S. Le (2011)
Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing
P. Campbell (2008)
Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
J. H. Bullard (2009)
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
B. Langmead (2008)
The complete genome of an individual by massively parallel DNA sequencing
D. Wheeler (2008)
Sensitive and accurate detection of copy number variants using read depth of coverage.
Seungtai Yoon (2009)
Recent Segmental Duplications in the Human Genome
J. Bailey (2002)
A map of human genome variation from population-scale sequencing
Life Technologies (2011)
Integrating common and rare genetic variation in diverse human populations
D. Altshuler (2010)
A Global View of Gene Activity and Alternative Splicing by Deep Sequencing of the Human Transcriptome
M. Sultan (2008)
e69 Nucleic Acids Research
A new test for the Poisson distribution
L. Brown (2001)
Genome Res
rSW-seq: Algorithm for detection of copy number alterations in deep sequencing data
Tae-Min Kim (2009)
CNV-seq, a new method to detect copy number variation using high-throughput sequencing
C. Xie (2008)

This paper is referenced by
Variation in Crossover Frequencies Perturb Crossover Assurance Without Affecting Meiotic Chromosome Segregation in Saccharomyces cerevisiae
G. N. Krishnaprasad (2014)
Variant calling and quality control of large-scale human genome sequencing data
Brandon Jew (2019)
Statistical models for DNA copy number variation detection using read-depth data from next generation sequencing experiments
Tieming Ji (2016)
Addressing NGS Data Challenges: Efficient High Throughput Processing and Sequencing Error Detection
A. Kawalia (2015)
Detection of genome-wide structural variations in the Shanghai Holstein cattle population using next-generation sequencing
D. Liu (2019)
Host and pathogen genetics associated with pneumococcal meningitis
John A. Lees (2017)
Absence of Goniodysgenesis in Patients with Chromosome 13Q Microdeletion-Related Microcoria.
C. Gerth-Kahlert (2018)
Uncovering and resolving challenges of quantitative modeling in a simplified community of interacting cells
Samuel F. M. Hart (2019)
Novel insights into the molecular heterogeneity of hepatocellular carcinoma
J. Jovel (2017)
Transient structural variations alter gene expression and quantitative traits in Schizosaccharomyces pombe.
D. Jeffares (2016)
Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow
A. Kawalia (2015)
CNOGpro: detection and quantification of CNVs in prokaryotic whole-genome sequencing data
O. Brynildsrud (2015)
Structural Variation Detection from Next Generation Sequencing
Kai Ye (2015)
CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing
Eric Talevich (2016)
Detecting de novo mutations in intellectual disability.
J. Ligt (2014)
Certified DNA Reference Materials to Compare HER2 Gene Amplification Measurements Using Next-Generation Sequencing Methods.
Chih-Jian Lih (2016)
biomvRhsmm: Genomic Segmentation with Hidden Semi-Markov Model
Y. Du (2014)
Phylogeny-based tumor subclone identification using a Bayesian feature allocation model
Li Zeng (2018)
QTL analysis of natural Saccharomyces cerevisiae isolates reveals unique alleles involved in lignocellulosic inhibitor tolerance.
R. D. de Witt (2019)
Chromosome-scale genome assembly provides insights into rye biology, evolution, and agronomic potential
M. T. Rabanus-Wallace (2019)
Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives
M. Zhao (2013)
Evaluation of somatic copy number estimation tools for whole-exome sequencing data
Jae-Yong Nam (2016)
Copy Number Variation in the Porcine Genome Detected from Whole-Genome Sequence
Rebecca Anderson (2018)
WisecondorX: improved copy number detection for routine shallow whole-genome sequencing
Lennart Raman (2019)
The adaptive landscape of wildtype and glycosylation-deficient populations of the industrial yeast Pichia pastoris
Josef W. Moser (2017)
Integrative molecular profiling identifies a novel cluster of estrogen receptor‐positive breast cancer in very young women
C. Park (2019)
Comprehensively benchmarking applications for detecting copy number variation
L. Zhang (2019)
A comprehensive benchmarking of WGS-based structural variant callers
Varuni Sarwal (2020)
Computational Methods for Analysis of Dynamic Transcriptome and Its Regulation Through Chromatin Remodeling and Intracellular Signaling
Tarmo Äijö (2014)
Basic Methods of Data Analysis
Johannes Kepler (2015)
Implications of evolutionary engineering for growth and recombinant protein production in methanol-based growth media in the yeast Pichia pastoris
Josef W. Moser (2017)
A Survey of Copy Number Variation in the Porcine Genome Detected From Whole-Genome Sequence
B. N. Keel (2019)
See more
Semantic Scholar Logo Some data provided by SemanticScholar