Online citations, reference lists, and bibliographies.
← Back to Search

De Novo Diploid Genome Assembly For Genome-wide Structural Variant Detection

L. Zhang, X. Zhou, Z. Weng, A. Sidow
Published 2019 · Biology

Cite This
Download PDF
Analyze on Scholarcy
Share
Structural variants (SVs) in a personal genome are important but, for all practical purposes, impossible to detect comprehensively by standard short-fragment sequencing. De novo assembly, traditionally used to generate reference genomes, offers an alternative means for variant detection and phasing but has not been applied broadly to human genomes because of fundamental limitations of short-fragment approaches and high cost of long-read technologies. We here show that 10x linked-read sequencing, which has been applied to assemble human diploid genomes into high quality contigs, supports accurate SV detection. We examined variants in six de novo 10x assemblies with diverse experimental parameters from two commonly used human cell lines, NA12878 and NA24385. The assemblies are effective in detecting mid-size SVs, which were discovered by simple pairwise alignment of the assemblies’ contigs to the reference (hg38). Our study also shows that the accuracy of SV breakpoint at base-pair level is high, with a majority (80% for deletion and 70% for insertion) of SVs having precisely correct sizes and breakpoints (<2bp difference). Finally, setting the ancestral state of SV loci by comparing to ape orthologs allows inference of the actual molecular mechanism (insertion or deletion) causing the mutation, which in about half of cases is opposite to that of the reference-based call. Interestingly, we uncover 214 SVs that may have been maintained as polymorphisms in the human lineage since before our divergence from chimp. Overall, we show that de novo assembly of 10x linked-read data can achieve cost-effective SV detection for personal genomes.
This paper references
10.1186/gb-2013-14-6-405
The advantages of SMRT sequencing
R. Roberts (2013)
10.1038/nmeth.4366
Genome-wide reconstruction of complex structural variants using read clouds
Noah Spies (2017)
10.1038/nbt.2835
Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls
J. Zook (2014)
10.1038/nrg2986
Genotype and SNP calling from next-generation sequencing data
R. Nielsen (2011)
10.1186/gb-2014-15-6-r84
LUMPY: a probabilistic framework for structural variant discovery
Ryan M. Layer (2013)
10.1093/NAR/27.2.573
Tandem repeats finder: a program to analyze DNA sequences.
G. Benson (1999)
10.1093/bioinformatics/btp324
Fast and accurate short read alignment with Burrows–Wheeler transform
Heng Li (2009)
10.1093/bib/bbs039
Current opportunities and challenges in microbial metagenome analysis—a bioinformatic perspective
H. Teeling (2012)
10.1093/bioinformatics/btx712
Identifying structural variants using linked-read sequencing data
R. Elyanow (2018)
10.1038/nrg3117
Repetitive DNA and next-generation sequencing: computational challenges and solutions
Todd J. Treangen (2012)
10.1038/s41592-018-0001-7
Accurate detection of complex structural variations using single molecule sequencing
F. Sedlazeck (2018)
10.1038/nbt.3432
Haplotyping germline and cancer genomes using high-throughput linked-read sequencing
Grace X. Y. Zheng (2016)
10.1101/281006
Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials
J. Zook (2018)
10.1038/nrg2958
Genome structural variation discovery and genotyping
C. Alkan (2011)
10.1073/pnas.1809969115
Coding mutations in NUS1 contribute to Parkinson’s disease
J. Guo (2018)
10.1093/bioinformatics/bts378
DELLY: structural variant discovery by integrated paired-end and split-read analysis
T. Rausch (2012)
10.1371/journal.pgen.1004234
A General Approach for Haplotype Phasing across the Full Spectrum of Relatedness
Jared O'Connell (2014)
10.1038/ncomms9212
Outbred genome sequencing and CRISPR/Cas9 gene editing in butterflies
Xueyan Li (2015)
10.3390/genes8120379
The Genome of the Northern Sea Otter (Enhydra lutris kenyoni)
S. J. Jones (2017)
10.1038/ng.3583
Haplotype estimation for biobank scale datasets
Jared O'Connell (2016)
10.1038/gim.2017.86
Long-read genome sequencing identifies causal structural variation in a Mendelian disease
J. Merker (2018)
10.1101/gr.213405.116
Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm.
Aleksey V. Zimin (2017)
10.1038/nrg2626
Sequencing technologies — the next generation
M. Metzker (2010)
10.1093/bioinformatics/bts280
Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly
Heng Li (2012)
10.1038/nature24286
DNA sequencing at 40: past, present and future
J. Shendure (2017)
10.1038/nmeth.3865
A Hybrid Approach for de novo Human Genome Sequence Assembly and Phasing
Yulia Mostovoy (2016)
10.1101/gr.191189.115
Read clouds uncover variation in complex regions of the human genome.
Alex Bishara (2015)
10.1002/bies.201100075
Challenges in studying genomic structural variant formation mechanisms: The short‐read dilemma and beyond
Megumi Onishi-Seebacher (2011)
10.1038/nchembio.1414
Wnt acylation: seeing is believing.
L. Berthiaume (2014)
10.1089/cmb.2014.0157
WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads
M. Patterson (2015)
10.1038/ng.3036
Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications
Andy Rimmer (2014)
10.1038/nmeth.2307
Improved whole-chromosome phasing for disease and population genetic studies
O. Delaneau (2012)
10.1093/bioinformatics/btv478
Svviz: a Read Viewer for Validating Structural Variants
Noah Spies (2015)
10.1038/s41438-017-0011-0
Reference quality assembly of the 3.5-Gb genome of Capsicum annuum from a single linked-read library
Amanda M Hulse-Kemp (2017)
10.1093/bioinformatics/bty191
Minimap2: pairwise alignment for nucleotide sequences
H. Li (2018)
10.1038/nature13907
Resolving the complexity of the human genome using single-molecule sequencing
M. Chaisson (2015)
10.1016/j.gpb.2016.05.004
Oxford Nanopore MinION Sequencing and Genome Assembly
Hengyun Lu (2016)
10.1016/j.ajhg.2011.07.023
Chromosomal haplotypes by genetic phasing of human families.
J. Roach (2011)
10.1038/nmeth.1527
Limitations of next-generation genome sequence assembly
C. Alkan (2011)
10.1038/ng.806
A framework for variation discovery and genotyping using next-generation DNA sequencing data
M.A. DePristo (2011)
10.1038/nmeth.1923
Fast gapped-read alignment with Bowtie 2
B. Langmead (2012)
10.1038/s41467-017-01343-4
Mapping and phasing of structural variation in patient genomes using nanopore sequencing
Mircea Cretu Stancu (2017)
10.1101/gr.213462.116
HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies.
P. Edge (2017)
10.1186/s13059-016-1103-0
The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community
M. Jain (2016)
10.1038/nbt.4266
High-quality genome sequences of uncultured microbes by assembly of read clouds
Alex Bishara (2018)
10.1101/gr.214874.116
Direct determination of diploid genome sequences.
Neil I. Weisenfeld (2017)
10.1101/gr.168450.113
Reconstructing complex regions of genomes using long-read sequencing technology.
John Huddleston (2014)
10.1101/gr.170720.113
Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads.
Rei Kajitani (2014)
10.1093/bioinformatics/btv204
MetaSV: an accurate and integrative structural-variant caller for next generation sequencing
M. Mohiyuddin (2015)
10.1093/NAR/GKH340
MUSCLE: multiple sequence alignment with high accuracy and high throughput.
R. Edgar (2004)



This paper is referenced by
Semantic Scholar Logo Some data provided by SemanticScholar