Online citations, reference lists, and bibliographies.

Optimal Algorithm For Metabolomics Classification And Feature Selection Varies By Dataset

Charles E. Determan
Published 2014 · Computer Science
Cite This
Download PDF
Analyze on Scholarcy
Share
Metabolomics, the systematic identification and quantification of all metabolites in a biological system, is increasingly applied towards identification of biomarkers for disease diagnosis, prognosis and risk prediction. Applications of metabolomics extend across the health spectrum including Alzheimer's, cancer, diabetes, and trauma. Despite the continued interest in metabolomics there are numerous techniques for analyzing metabolomics datasets with the intent to classify group membership (e.g. Control or Treated). These include Partial Least Squares Discriminant Analysis, Support Vector Machines, Random Forest, Regularized Generalized Linear Models, and Prediction Analysis for Microarrays. Each classification algorithm is dependent upon different assumptions and can potentially lead to alternate conclusions. This project seeks to conduct an in depth comparison of algorithm performance on both simulated and real datasets to determine which algorithms perform best given alternate dataset structures. Three simulated datasets were generated to validate algorithm performance and mimic 'real' metabolomics data: (Han et al., 2011) independent null dataset (no correlation, no discriminatory variables), (Davis, Schiller, Eurich, & Sawyer, 2012) correlated null (no discriminating variables), (Guan et al., 2009) correlated discriminatory. This comparison is also applied to 3 open-access datasets including two Nuclear Magnetic Resonance (NMR) and one Mass Spectrometry (MS) dataset. Performance was evaluated based on the Robustness-Performance-Trade-off (RPT) incorporating a balance between model classification accuracy and feature selection stability. We also provide a free, open-source R Bioconductor package (OmicsMarkeR) that conducts the analyses herein. The proposed work provides an important advancement in metabolomics analysis and helps alleviate the confusion of potentially paradoxical analyses thereby leading to improved exploration of disease states and identification of clinically important biomarkers.
This paper references
10.1164/rccm.200606-769OC
Metabolomics applied to exhaled breath condensate in childhood asthma.
Silvia Carraro (2007)
10.1073/pnas.102102699
Selection bias in gene extraction on the basis of microarray gene-expression data
C. Ambroise (2002)
Tools of the Trade for Discriminant Analysis
Gastón Sánchez (2013)
10.1111/j.1467-9868.2005.00503.x
Regularization and variable selection via the elastic net
H. Zou (2005)
10.1074/mcp.M110.004945
Serum and Urine Metabolite Profiling Reveals Potential Biomarkers of Human Hepatocellular Carcinoma*
Tianlu Chen (2011)
Misc Functions of the Department of Statistics (e1071), TU Wien
David Meyer (2014)
10.1007/978-1-4615-5703-6_3
The Support Vector Method of Function Estimation
V. Vapnik (1998)
A stability index for feature selection
L. Kuncheva (2007)
10.1093/sysbio/45.3.380
The Probabilistic Basis of Jaccard's Index of Similarity
R. Real (1996)
10.1093/brain/awm304
Metabolomic profiling to develop blood biomarkers for Parkinson's disease.
M. Bogdanov (2008)
10.1038/nm.3175
A colorectal cancer classification system that associates cellular phenotype and responses to therapy
Anguraj Sadanandam (2013)
3 D QSAR in drug design : theory , methods and applications
R. D. Cramer (1993)
10.1016/j.patcog.2010.08.011
Mining data with random forests: A survey and results of new tests
Antanas Verikas (2011)
10.1016/S0140-6736(05)66539-7
Prediction of cancer outcome with microarrays
S. Michiels (2005)
in liver
L. R. Dice (1945)
10.1007/978-3-540-87481-2_21
Robust Feature Selection Using Ensemble Feature Selection Techniques
Yvan Saeys (2008)
10.1097/TA.0b013e3182609821
Metabolomics classifies phase of care and identifies risk for mortality in a porcine model of multiple injuries and hemorrhagic shock
Daniel R. Lexcen (2012)
A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons
T. Sørensen (1948)
10.1093/bioinformatics/btl400
Reliable gene signatures for microarray classification: assessment of stability and performance
C. Davis (2006)
10.1007/s11306-010-0224-9
Serum metabolomics as a novel diagnostic approach for pancreatic cancer
Shin Nishiumi (2010)
10.1007/978-1-60327-194-3_14
Computational approaches to metabolomics.
David Scott Wishart (2010)
10.1186/1477-7819-10-271
Urinary metabolomic signature of esophageal cancer and Barrett’s esophagus
Vanessa Wylie Davis (2012)
10.2527/jas.2012-5338
Phenotypic prediction based on metabolomic data for growing pigs from three main European breeds.
Florian Rohart (2012)
10.1038/nm1653
Classification and prediction of clinical Alzheimer's diagnosis based on plasma signaling proteins
S. Ray (2007)
10.1093/toxsci/kfi102
Metabonomics in toxicology: a review.
D. Robertson (2005)
Path with Latent Variables: The NIPALS Approach
Herman Wold (1975)
10.1007/s11306-010-0232-9
Learning to predict cancer-associated skeletal muscle wasting from 1H-NMR profiles of urinary metabolites
Roman Eisner (2010)
10.1073/pnas.082099299
Diagnosis of multiple cancer types by shrunken centroids of gene expression
R. Tibshirani (2002)
3D QSAR in drug design : theory, methods and applications
H. Kubinyi (2000)
10.1109/CEC.2008.4631321
Software quality modeling: The impact of class noise on the random forest classifier
A. Folleco (2008)
10.1109/4235.585893
No free lunch theorems for optimization
D. Wolpert (1997)
10.1093/bioinformatics/btp630
Robust biomarker identification for cancer diagnosis with ensemble feature selection methods
Thomas Abeel (2010)
10.1186/2043-9113-2-3
A distinct metabolic signature predicts development of fasting plasma glucose
Manuela Hische (2011)
10.1093/bioinformatics/btm550
Algebraic stability indicators for ranked lists in molecular profiling
Giuseppe Jurman (2008)
10.1016/J.CHEMOLAB.2011.03.010
Algorithms and tools for the preprocessing of LC–MS metabolomics data
Sandra Castillo (2011)
10.1021/ci034160g
Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling
V. Svetnik (2003)
10.1021/pr901081y
Urinary metabonomic study on colorectal cancer.
Yunping Qiu (2010)
10.1007/s11306-009-0164-4
Monte-Carlo methods for determining optimal number of significant variables. Application to mouse urinary profiles
Kanet Wongravee (2009)
Serum
S. Nishiumi (2010)
10.1080/02664769922322
Robustness of partial least-squares method for estimating latent variable quality structures
C. Cassel (1999)
gbm: Generalized Boosted Regression Models
G. Ridgeway (2013)
10.1016/B978-0-12-103950-9.50017-4
Path Models with Latent Variables: The NIPALS Approach
H. Wold (1975)
10.1023/A:1012487302797
Gene Selection for Cancer Classification using Support Vector Machines
I. Guyon (2002)
Classification and Regression by randomForest
A. Liaw (2007)
10.1186/1471-2105-10-259
Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines
Wei Guan (2009)
10.1038/srep00134
The metabolic footprint of aging in mice
R. Houtkooper (2011)
10.1093/bioinformatics/btm344
A review of feature selection techniques in bioinformatics
Y. Saeys (2007)
10.1023/A:1010933404324
Random Forests
L. Breiman (2004)
10.1007/s00428-010-0993-6
Can nuclear magnetic resonance (NMR) spectroscopy reveal different metabolic signatures for lung tumours?
Iola F Duarte (2010)
10.1371/journal.pone.0021643
Metabolomics in Early Alzheimer's Disease: Identification of Altered Plasma Sphingolipidome Using Shotgun Lipidomics
Xianlin Han (2011)
Metabolomics Applied to Diabetes Research Moving From Information to Knowledge
J. R. Bain (2009)
10.1186/1471-2105-6-S2-S12
Cross-platform comparability of microarray technology: Intra-platform consistency and appropriate data analysis procedures are essential
L. Shi (2005)
10.1007/s11306-010-0227-6
Metabolomics reveals unhealthy alterations in rumen metabolism with increased proportion of cereal grain in the diet of dairy cows
B. Ametaj (2010)
10.1021/ac800954c
Analysis of metabolomic data using support vector machines.
S. Mahadevan (2008)
10.2337/db09-0580
Metabolomics Applied to Diabetes Research
James R. Bain (2009)
10.3109/14767058.2010.482618
Metabolomics in premature labor: a novel approach to identify patients at risk for preterm delivery
Roberto Romero (2010)
10.1158/0008-5472.CAN-11-0885
Aberrant lipid metabolism in hepatocellular carcinoma revealed by plasma metabolomics and lipid profiling.
Andrew D. Patterson (2011)
10.1007/978-0-387-84858-7_10
Boosting and Additive Trees
Trevor J. Hastie (2009)
10.1007/s11306-014-0621-6
Carbohydrate fed state alters the metabolomic response to hemorrhagic shock and resuscitation in liver
Charles E. Determan (2014)
10.1016/j.jacc.2011.09.083
Metabolomic profile of human myocardial ischemia by nuclear magnetic resonance spectroscopy of peripheral blood serum: a translational study based on transient coronary occlusion models.
Vicente Bodí (2012)
10.2202/1544-6115.1307
Comparing the Characteristics of Gene Expression Profiles Derived by Univariate and Multivariate Classification Methods
M. Zucknick (2008)
10.2307/1932409
Measures of the Amount of Ecologic Association Between Species
L. R. Dice (1945)
10.1093/nar/gks1004
MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data
Kenneth Haug (2013)
10.1021/pr300673x
LC-MS based serum metabolomics for identification of hepatocellular carcinoma biomarkers in Egyptian cohort.
Jun Feng Xiao (2012)
10.2331/suisan.22.522
Zoogeographical Studies on the Soleoid Fishes Found in Japan and its Neighbouring Regions-III
Akira Ochiai (1957)
Path models with laten variables : The NIPALS approach ( pp . 307 - 357 )
S. Wold (1975)
10.2307/2064797
Quantitative sociology: International perspectives on mathematical and statistical modeling
N Wagner Henry (1975)
10.1007/s11306-013-0550-9
Metabolomics reveals determinants of weight loss during lifestyle intervention in obese children
Simone Wahl (2013)
10.1007/BF00058655
Bagging predictors
L. Breiman (1996)
10.1073/pnas.0601231103
Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer.
Liat Ein-Dor (2006)
10.1016/j.csda.2004.03.017
An extensive comparison of recent classification tools applied to microarray data
J. Lee (2005)
10.18637/JSS.V033.I01
Regularization Paths for Generalized Linear Models via Coordinate Descent.
J. Friedman (2010)



This paper is referenced by
10.3390/metabo9040066
Computational Methods for the Discovery of Metabolic Markers of Complex Traits
Michael Y Lee (2019)
biosigner : A new method for signature discovery from omics data
Philippe Rinaudo (2016)
10.1101/393454
Prediction and analysis of skin cancer progression using genomics profiles of patients
Sherry Bhalla (2018)
10.1109/TCBB.2018.2831212
Robust Microbial Markers for Non-Invasive Inflammatory Bowel Disease Identification
Benjamin Wingfield (2019)
10.1038/s41598-019-52134-4
Prediction and Analysis of Skin Cancer Progression using Genomics Profiles of Patients
Sherry Bhalla (2019)
10.1007/978-981-10-1503-8_5
Informatics for Metabolomics.
Kanthida Kusonmano (2016)
10.3389/fmolb.2016.00026
biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data
Philippe Rinaudo (2016)
10.1007/978-981-10-1503-8
Translational Biomedical Informatics
Bairong Shen (2016)
10.1186/s13040-017-0134-8
Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery
Nathaniel Crabtree (2017)
10.5691/JJB.39.55
A New Association Analysis Method for Gut Microbial Compositional Data Using Ensemble Learning
Tasuku Okui (2019)
10.1038/s41598-018-21851-7
Elastic net regularized regression for time-series analysis of plasma metabolome stability under sub-optimal freezing condition
Gerard Bryan Gonzales (2018)
10.3389/fonc.2020.00423
Computational Oncology in the Multi-Omics Era: State of the Art
Guillermo de Anda-Jáuregui (2020)
10.3390/metabo9100200
The metaRbolomics Toolbox in Bioconductor and beyond
J. Stanstrup (2019)
Semantic Scholar Logo Some data provided by SemanticScholar