Online citations, reference lists, and bibliographies.
← Back to Search

Combining Partially Overlapping Multi-Omics Data In Databases Using Relationship Matrices

D. Akdemir, R. Knox, Julio Isidro y Sánchez
Published 2020 · Medicine, Computer Science

Save to my Library
Download PDF
Analyze on Scholarcy Visualize in Litmaps
Share
Reduce the time it takes to create your bibliography by a factor of 10 by using the world’s favourite reference manager
Time to take this seriously.
Get Citationsy
Private and public breeding programs, as well as companies and universities, have developed different genomics technologies that have resulted in the generation of unprecedented amounts of sequence data, which bring new challenges in terms of data management, query, and analysis. The magnitude and complexity of these datasets bring new challenges but also an opportunity to use the data available as a whole. Detailed phenotype data, combined with increasing amounts of genomic data, have an enormous potential to accelerate the identification of key traits to improve our understanding of quantitative genetics. Data harmonization enables cross-national and international comparative research, facilitating the extraction of new scientific knowledge. In this paper, we address the complex issue of combining high dimensional and unbalanced omics data. More specifically, we propose a covariance-based method for combining partial datasets in the genotype to phenotype spectrum. This method can be used to combine partially overlapping relationship/covariance matrices. Here, we show with applications that our approach might be advantageous to feature imputation based approaches; we demonstrate how this method can be used in genomic prediction using heterogeneous marker data and also how to combine the data from multiple phenotypic experiments to make inferences about previously unobserved trait relationships. Our results demonstrate that it is possible to harmonize datasets to improve available information across gene-banks, data repositories, or other data resources.
This paper references
10.1002/J.2333-8504.1975.TB01053.X
INFERENCE AND MISSING DATA
D. Rubin (1975)
10.1093/BIOMET/63.3.581
Inference and missing data
D. Rubin (1976)
10.1111/J.2517-6161.1977.TB01600.X
Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper
A. Dempster (1977)
10.1080/01621459.1981.10477653
Estimation in Covariance Components Models
A. Dempster (1981)
Estimation in covariance
D. B. Rubin (1981)
Anderson (1984)
10.1002/BIES.950070109
Human genetics: the molecular challenge.
W. Bodmer (1986)
10.1126/science.273.5281.1516
The Future of Genetic Studies of Complex Human Diseases
N. Risch (1996)
10.1006/JMVA.1998.1739
Conditional Iterative Proportional Fitting for Gaussian Distributions
E. Cramer (1998)
10.1524/STRM.2000.18.3.311
PROBABILITY MEASURES WITH GIVEN MARGINALS AND CONDITIONALS: I-PROJECTIONS AND CONDITIONAL ITERATIVE PROPORTIONAL FITTING
Gramer Erhard (2000)
10.1524/strm.2000.18.3.311
PROBABILITY MEASURES WITH GIVEN MARGINALS AND CONDITIONALS: I-PROJECTIONS AND CONDITIONAL ITERATIVE PROPORTIONAL FITTING
Erhard Gramer (2000)
Probability measure with given marginals and conditionals : I - projections and conditional iterative proportional fi tting
E. Cramer (2000)
2000).Matrix Variate Distributions 11 of Chapman and Hall/CRC Monographs and Surveys in Pure and Applied Mathematics (Boca Raton, FL
A. Gupta (2000)
Matrix Variate Distributions 11 of Chapman and Hall/CRC Monographs and Surveys in Pure and Applied Mathematics
A Gupta (2000)
Gupta (2000)
10.1093/bioinformatics/17.6.520
Missing value estimation methods for DNA microarrays
O. Troyanskaya (2001)
Prediction of total
T.H.E. Meuwissen (2001)
10.1007/978-3-642-57489-4_7
Supervised Learning from Microarray Data
T. Hastie (2002)
10.1002/9781119013563
Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data
R. Little (2002)
Statistical analysis with missing data. (New York: wiley)
R. Little (2002)
10.1198/tech.2003.s167
Statistical Analysis With Missing Data
N. Lazar (2003)
Anderson (2003)
10.1023/A:1010933404324
Random Forests
L. Breiman (2004)
10.1198/tech.2004.s754
An Introduction to Multivariate Statistical Analysis
A. Vogler (2004)
D. S. Falconer and Introduction to quantitative genetics.
W. Hill (2004)
Ds falconer and introduction to quantitative
W. G. Hill (2004)
Learning with kernels (Cambridge, MA: MIT
B. Schölkopf (2005)
Schölkopf (2005)
10.1186/1471-2288-6-57
Dealing with missing data in a multi-question depression scale: a comparison of imputation methods
F. Shrive (2006)
10.4324/9780203848852.CH10
Multiple Imputation of Multilevel Data
S. Buuren (2006)
Kollo (2006)
Dealing with missing
F. M. Press. Shrive (2006)
10.1534/genetics.107.070953
Multipoint Identity-by-Descent Prediction Using Dense Markers to Map Quantitative Trait Loci and Estimate Effective Population Size
T. Meuwissen (2007)
10.2337/db07-1324
Genome-Wide Association
K. Taylor (2007)
10.1534/genetics.107.084616
Kernel-Based Association Test
Hsin-Chou Yang (2008)
10.1007/s00439-008-0568-7
Missing data imputation and haplotype phase inference for genome-wide association studies
S. Browning (2008)
10.3168/jds.2007-0980
Efficient methods to compute genomic predictions.
P. VanRaden (2008)
10.1016/j.tig.2007.12.007
The impact of next-generation sequencing technology on genetics.
E. Mardis (2008)
10.1146/annurev.genom.9.081307.164359
Next-generation DNA sequencing methods.
E. Mardis (2008)
10.1016/j.ajhg.2009.01.005
A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals.
B. Browning (2009)
10.1186/jbiol133
Q&A: Genetic analysis of quantitative traits
T. Mackay (2009)
10.3168/jds.2009-2061
A relationship matrix including full pedigree and genomic information.
A. Legarra (2009)
A unified approach to genotype
B. L. A1010933404324 Browning (2009)
10.2135/CROPSCI2009.11.0662
Plant Breeding with Genomic Selection: Gain per Unit Time and Cost
E. Heffner (2010)
10.2135/CROPSCI2011.05.0253
Genomic Selection Accuracy for Grain Quality Traits in Biparental Wheat Populations
E. Heffner (2011)
10.3835/plantgenome2011.08.0024
Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP
J. Endelman (2011)
10.1534/g3.111.001198
Genotype Imputation with Thousands of Genomes
B. Howie (2011)
10.1017/S1751731112000742
Single-step methods for genomic evaluation in pigs.
O. F. Christensen (2012)
10.18637/JSS.V048.I04
Qgraph: Network visualizations of relationships in psychometric data
S. Epskamp (2012)
10.1534/genetics.112.143313
Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding
G. de los Campos (2013)
10.3168/jds.2012-6062
Short communication: imputing genotypes using PedImpute fast algorithm combining pedigree and population information.
E. Nicolazzi (2013)
10.1007/978-1-62703-447-0
Genome-Wide Association Studies and Genomic Prediction
C. Gondro (2013)
10.1038/nrg3433
Computational solutions for omics data
B. Berger (2013)
10.1038/nrg3404
Genotype to phenotype: lessons from model organisms for human genetics
Ben Lehner (2013)
Imputing genotypes
S. Biffani (2013)
10.1371/journal.pone.0097857
Weighted Multiplex Networks
G. Menichetti (2014)
10.1038/hdy.2013.13
Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions
T. Druet (2014)
10.1007/s00122-014-2418-4
Training set optimization under population structure in genomic selection
Julio Isidro (2014)
10.1007/978-1-4939-0446-4_10
Genomic selection in plant breeding.
M. Newell (2014)
10.1016/j.tplants.2014.05.006
Genomic selection: genome-wide prediction in plant improvement.
Zeratsion Abera Desta (2014)
10.1038/nmeth.2810
Similarity network fusion for aggregating data types on a genomic scale
B. Wang (2014)
Toward genomic prediction
T. 05.006 Druet (2014)
Genomic selection: genome-wide prediction
Z. A. 10477653 Desta (2014)
10.1534/genetics.114.173658
Locally Epistatic Genomic Relationship Matrices for Genomic Association and Prediction
D. Akdemir (2015)
10.1186/s12863-015-0243-7
Fast imputation using medium or low-coverage sequence data
P. VanRaden (2015)
10.1186/s12859-015-0857-9
Methods for the integration of multi-omics data: mathematical aspects
Matteo Bersanelli (2015)
Locally epistatic genomic relationship
D. Akdemir (2015)
softImpute: Matrix Completion via Iterative Soft-Thresholded SVD. R package version 1.4
T. Hastie (2015)
Hastie (2015)
softImpute: Matrix Completion via Iterative Soft-Thresholded SVD
T Hastie (2015)
10.1016/j.ajhg.2015.11.020
Genotype Imputation with Millions of Reference Samples.
B. Browning (2016)
Advanced Multivariate Statistics With Matrices
T. Frei (2016)
10.1016/j.ymeth.2016.09.002
Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying.
M. Masseroli (2016)
10.1186/s12711-016-0217-x
Multi-omic data integration and analysis using systems genomics approaches: methods and applications in animal production, health and welfare
P. Suravajhala (2016)
10.2527/JAM2016-0409
0409 Genomic prediction using imputed sequence data in dairy and dual purpose breeds.
M. Erbe (2016)
10.3835/plantgenome2016.01.0009
AGHmatrix: R Package to Construct Relationship Matrices for Autotetraploid and Diploid Species: A Blueberry Example
R. R. Amadeu (2016)
Genotype imputation with millions
B. L. Browning (2016)
10.1016/j.ajhg.2017.06.005
10 Years of GWAS Discovery: Biology, Function, and Translation.
P. Visscher (2017)
10.1016/j.tplants.2017.08.011
Genomic Selection in Plant Breeding: Methods, Models, and Perspectives.
J. Crossa (2017)
10.1016/J.TIFS.2018.01.008
Whole genome sequencing as a typing tool for foodborne pathogens like Listeria monocytogenes – The way towards global harmonisation and data exchange
S. Lüth (2018)
10.1007/s00439-018-1924-x
Conclusion: harmonisation in genomic and health data sharing for research: an impossible dream?
D. Townend (2018)
10.3835/plantgenome2018.02.0013
An R Package for Multitrait and Multienvironment Data with the Item-Based Collaborative Filtering Algorithm
O. Montesinos-López (2018)
10.1186/s13059-018-1491-4
Linking the International Wheat Genome Sequencing Consortium bread wheat reference genome sequence to wheat genetic and phenomic data
M. Alaux (2018)
10.1007/s00122-018-3156-9
A heuristic method for fast and accurate phasing and imputation of single-nucleotide polymorphism data in bi-parental plant populations
S. Gonen (2018)
10.1534/genetics.118.300685
Genetic Variance Partitioning and Genome-Wide Prediction with Allele Dosage Information in Autotetraploid Potato
J. Endelman (2018)
10.3835/plantgenome2018.03.0017
Prospects and Challenges of Applied Genomic Selection—A New Paradigm in Breeding for Grain Yield in Bread Wheat
P. Juliana (2018)
10.1016/j.ejmg.2018.01.013
Harmonising phenomics information for a better interoperability in the rare disease field.
S. Maiella (2018)
10.1016/j.ajhg.2018.11.014
Integrating Genomics into Healthcare: A Global Responsibility.
Z. Stark (2019)
10.1101/857425
Adventures in Multi-Omics I: Combining heterogeneous data sets via relationships matrices
D. Akdemir (2019)
(2019)
10.1287/ijoo.2018.0001
Robust Classification
D. Bertsimas (2020)
10.1105/tpc.19.00332
Transcriptome-Based Prediction of Complex Traits in Maize[OPEN].
Christina B. Azodi (2020)
CovCombR: Combine Partial Covariance or Relationship Matrices
D Akdemir (2020)
Akdemir (2020)



Semantic Scholar Logo Some data provided by SemanticScholar