# Statistical Strategies For Avoiding False Discoveries In Metabolomics And Related Experiments

David I. Broadhurst, Douglas B. Kell

Published 2006 · Computer Science

Many metabolomics, and other high-content or high-throughput, experiments are set up such that the primary aim is the discovery of biomarker metabolites that can discriminate, with a certain level of certainty, between nominally matched ‘case’ and ‘control’ samples. However, it is unfortunately very easy to find markers that are apparently persuasive but that are in fact entirely spurious, and there are well-known examples in the proteomics literature. The main types of danger are not entirely independent of each other, but include bias, inadequate sample size (especially relative to the number of metabolite variables and to the required statistical power to prove that a biomarker is discriminant), excessive false discovery rate due to multiple hypothesis testing, inappropriate choice of particular numerical methods, and overfitting (generally caused by the failure to perform adequate validation and cross-validation). Many studies fail to take these into account, and thereby fail to discover anything of true significance (despite their claims). We summarise these problems, and provide pointers to a substantial existing literature that should assist in the improved design and evaluation of metabolomics experiments, thereby allowing robust scientific conclusions to be drawn from the available data. We provide a list of some of the simpler checks that might improve one’s confidence that a candidate biomarker is not simply a statistical artefact, and suggest a series of preferred tests and visualisation tools that can assist readers and authors in assessing papers. These tools can be applied to individual metabolites by using multiple univariate tests performed in parallel across all metabolite peaks. They may also be applied to the validation of multivariate models. We stress in particular that classical p-values such as “p < 0.05”, that are often used in biomedicine, are far too optimistic when multiple tests are done simultaneously (as in metabolomics). Ultimately it is desirable that all data and metadata are available electronically, as this allows the entire community to assess conclusions drawn from them. These analyses apply to all high-dimensional ‘omics’ datasets.

This paper references

10.1201/b10391-5

Inference in Bayesian networks

Poo ja (2008)

10.1016/S0021-9673(00)88077-2

Classification of human cancer cells by means of capillary gas chromatography and pattern recognition analysis.

E. J. Jellum (1981)

10.1093/bioinformatics/17.6.520

Missing value estimation methods for DNA microarrays

Olga G. Troyanskaya (2001)

Microarray Biochip Technology

Mark Schena (2000)

10.1016/j.mib.2004.04.012

Metabolomics and systems biology: making sense of the soup.

Douglas B. Kell (2004)

10.1186/1471-2105-6-S4-S24

Web services and workflow management for biological resources

Paolo Romano (2005)

Heuristics of instability in model selection

Leo Breiman (1994)

10.1126/science.274.5287.546

Life with 6000 Genes

André Goffeau (1996)

10.1007/978-1-4615-3612-3

Multiobjective Optimization: Behavioral and Computational Considerations

Jeffrey L. Ringuest (1992)

Semi-Supervised Support Vector Machines

Kristin P. Bennett (1998)

10.1147/sj.402.0532

Transparent access to multiple bioinformatics information sources

Carole A. Goble (2001)

10.1016/S0167-7799(00)89006-X

GMP — good modelling practice: an essential component of good manufacturing practice

Douglas B. Kell (1995)

10.1016/0003-2670(95)00163-T

Further investigation on a comparative study of simulated annealing and genetic algorithm for wavelength selection

Uwe Hörchner (1995)

10.1007/BF01908075

Comparing partitions

Lawrence Hubert (1985)

Self-organization and associative memory: 3rd edition

Teuvo Kohonen (1989)

Semi-Supervised Clustering Using Genetic Algorithms

Ayhan Demiriz (1999)

Regression,Wiley, New York

G.A.F. Seber (1989)

10.2307/2026705

Probabilistic reasoning in intelligent systems - networks of plausible inference

Judea Pearl (1989)

10.1016/j.accreview.2005.09.021

Contradicted and Initially Stronger Effects in Highly Cited Clinical Research

John P. A. Ioannidis (2005)

10.2307/2287009

Statistics for experimenters : an introduction to design, data analysis, and model building

George E. P. Box (1978)

Predictive data mining in intensive care

Fabian Guiza Grandas (2006)

10.1016/S0167-7799(99)01407-9

On the optimization of classes for the assignment of unidentified reading frames in functional genomics programmes: the need for machine learning.

Douglas B. Kell (2000)

10.1093/jnci/djh326

Assessing the probability that a positive report is false: an approach for molecular epidemiology studies.

Sholom Wacholder (2004)

10.1198/tech.2004.s738

Chemometrics: Data Analysis for the Laboratory and Chemical Plant

David E. Booth (2004)

10.1002/1097-024X(200009)30:11<1203::AID-SPE338>3.3.CO;2-E

An open graph visualization system and its applications to software engineering

Emden R. Gansner (2000)

10.1007/978-3-322-86853-4

Multiobjective Heuristic Search

Pallab Dasgupta (1999)

10.1038/nbt0302-243

Metabolic control analysis in drug discovery and disease

Marta Cascante (2002)

10.1186/1471-2288-2-3

Meta-analysis, Simpson's paradox, and the number needed to treat

Douglas G. Altman (2002)

10.1146/annurev.genom.2.1.9

Pharmacogenomics: the inherited basis for interindividual differences in drug response.

Williams E Evans (2001)

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection

Ron Kohavi (1995)

10.1038/nrc1550

Bias as a threat to the validity of cancer molecular-marker research

David F. Ransohoff (2005)

Designing network diagrams

Systems (1980)

Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems

S. Merrill Weiss (1990)

10.1016/j.chroma.2007.03.082

Multivariate calibration.

Michele Forina (2007)

10.1104/pp.126.3.943

Genomic computing. Explanatory analysis of plant expression profiling data using machine learning.

Douglas B. Kell (2001)

10.1109/T-C.1969.222678

A Nonlinear Mapping for Data Structure Analysis

John W. Sammon (1969)

The elements of statistical learning: data mining

Trevor J. Hastie (2000)

Statistical Modeling: The Two Cultures

Leo Breiman (2001)

10.1007/978-3-540-30217-9_109

Evolutionary Multiobjective Clustering

Julia Handl (2004)

10.1016/0003-2670(93)80430-S

The parsimony principle applied to multivariate calibration

Mary Beth Seasholtz (1993)

10.1016/S1359-6446(05)03609-3

Metabolomics: from pattern recognition to biological interpretation.

Wolfram Weckwerth (2005)

10.1126/science.286.5439.487

Pharmacogenomics: translating functional genomics into rational therapeutics.

Williams E Evans (1999)

10.1002/spe.4380211102

Graph Drawing by Force-directed Placement

Thomas M. J. Fruchterman (1991)

Multi- and megavariate data analysis

Lasse Eriksson (2006)

10.1007/978-1-4614-6170-8_100658

Cluster Analysis

B. S. Everitt (1974)

10.1145/1248800.1248811

Multiobjective Optimization in Bioinformatics and Computational Biology

Julia Handl (2007)

10.5860/choice.34-6252

Understanding the Control of Metabolism

David A. Fell (1996)

10.1016/S1741-8364(04)02412-6

myGrid and the drug discovery process

Robert Stevens (2004)

10.1016/S0303-2647(03)00143-6

Model selection methodology in supervised learning with evolutionary computation.

Jem J. Rowland (2003)

10.1093/bioinformatics/bth361

Taverna: a tool for the composition and enactment of bioinformatics workflows

Thomas M. Oinn (2004)

10.1007/978-1-4757-5184-0

Evolutionary Algorithms for Solving Multi-Objective Problems

Carlos A. Coello Coello (2002)

Machine learning (mcgraw-hill

Tom Michael Mitchell (1997)

10.1093/bioinformatics/bti102

Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data

Roger M. Jarvis (2005)

10.1136/bmj.310.6973.170

Multiple significance tests: the Bonferroni method.

J. Bland (1995)

10.1002/(SICI)1097-0258(19970515)16:9<981::AID-SIM510>3.0.CO;2-N

Variation of sensitivity, specificity, likelihood ratios and predictive values with disease prevalence.

Hermann Brenner (1997)

10.1021/ci010375j

Combinatorial Library Design Using a Multiobjective Genetic Algorithm

Valerie J. Gillet (2002)

10.1177/0272989X9401400209

ROC Curves for Classification Trees

Richard F. Raubertas (1994)

10.1038/nbt1235

Protein biomarker discovery and validation: the long and uncertain path to clinical utility

Nader Rifai (2006)

10.1016/S0960-9822(99)80208-5

Functional genomics: Learning to think about gene expression data

Roger Brent (1999)

10.1007/s10142-005-0006-z

Associating phenotypes with molecular events: recent statistical advances and challenges underpinning microarray experiments

Yulan Liang (2005)

10.1093/clinchem/17.8.802

Metabolic profiles: gas-phase methods for analysis of metabolites.

Evan C. Horning (1971)

10.1016/S0140-6736(06)69506-8

Are statins analogues of vitamin D?

Thomas F. Hiemstra (2006)

10.1162/106365600568158

Multiobjective Evolutionary Algorithms: Analyzing the State-of-the-Art

David A. van Veldhuizen (2000)

10.1191/0962280205sm415oa

Outcome selection bias in meta-analysis

Paula R Williamson (2005)

Predictive ability of DNA

E. E. Ntzani (2003)

10.1023/A:1020342216314

Metabolomics and Machine Learning: Explanatory Analysis of Complex Metabolome Data Using Genetic Programming to Produce Simple, Robust Rules

Douglas B. Kell (2004)

10.1080/00401706.1989.10488527

Experimental Design: a chemometric approach

Stanley N. Deming (1987)

10.1136/jcp.42.12.1315-c

Statistics with Confidence. Confidence Intervals and Statistical Guidelines

Linda M. Anderson (1989)

10.1093/bioinformatics/bti345

Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data

Muhammad Shoaib B. Sehgal (2005)

10.2307/2983471

Multiplicity considerations in the design and analysis of clinical trials

Richard J Cook (1996)

10.3929/ethz-a-003856832

Evolutionary algorithms for multiobjective optimization: methods and applications

Eckart Zitzler (1999)

10.1007/978-1-4612-2538-6

The Data Handbook

Brand Fortner (1995)

Fuzzy models for pattern recognition : methods that search for structures in data

James C. Bezdek (1992)

10.1201/9780203738665

Statistical Evidence: A Likelihood Paradigm

Richard M. Royall (1997)

Mastering the Art of Data

M.J.A. Berry (2000)

Learning optimization from nature : Genetic algorithms and simulated annealing

Ronald E. Shaffer (1997)

Data Mining Cookbook, Wiley, New York

O. P. Rud (2001)

10.2307/2344557

Nonparametric Statistical Methods.

Malcolm Knott (1974)

10.1145/369133.369228

Gene functional classification from heterogeneous data

Paul Pavlidis (2001)

10.1038/nature02626

Moving towards individualized medicine with pharmacogenomics

Williams E Evans (2004)

Statistical learning theory wiley

V. Vapnik (1998)

10.2105/AJPH.78.12.1568

Evidence and scientific research.

Steven N. Goodman (1988)

10.1038/ng0706-731

Statistical false positive or true disease pathway?

John A. Todd (2006)

Artificial Neural Networks: Approximation and Learning Theory, Blackwell, Oxford

H. White (1992)

Neural Networks for Chemists, Verlag Chemie, Weinheim

J. Zupan (1993)

10.1042/bst0311476

Interpreting correlations in metabolomic networks.

Ralf Steuer (2003)

10.1038/nrmicro1177

Metabolic footprinting and systems biology: the medium is the message

Douglas B. Kell (2005)

10.1214/ss/1177012420

[Design and Analysis of Computer Experiments]: Rejoinder

Jerome Sacks (1989)

10.1016/S0169-7439(01)00156-3

Some recent developments in PLS modeling

Svante Wold (2001)

10.1038/129188a0

An Introduction to Medical Statistics

Mark Greenwood (1932)

10.2307/j.ctv301h9r.8

Statistical Inference

George Casella (1990)

10.1111/j.1467-985X.2005.358_16.x

Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models

Andrea Saltelli (2004)

10.1111/j.1365-2753.2005.00598.x

Measuring diagnostic and predictive accuracy in disease management: an introduction to receiver operating characteristic (ROC) analysis.

Ariel Linden (2006)

Systematic functional analysis of the yeast genome

R. D'ari (1998)

10.2307/2684909

Statistics: A Bayesian Perspective

Donald A. Berry (1995)

Principal Component Analysis Springer Verlag

Ian T. Jolliffe (1986)

10.1016/S0140-6736(03)12516-0

Genetic associations in large versus small studies: an empirical assessment

John P A Ioannidis (2003)

10.1007/bf02919627

What is a gene?

Subhash Chandra Lakhotia (1997)

10.1002/bies.10385

Here is the evidence, now what is the hypothesis? The complementary roles of inductive and hypothesis-driven science in the post-genomic era.

Douglas B. Kell (2004)

10.1007/springerreference_64265

Visualizing Categorical Data

Michael Friendly (2000)

10.1373/49.3.433

Comparison of eight computer programs for receiver-operating characteristic analysis.

Carsten Stephan (2003)

10.1111/j.1742-4658.2006.05136.x

Metabolomics, modelling and machine learning in systems biology – towards an understanding of the languages of cells

Douglas B. Kell (2006)

10.1038/sj.embor.embor944

Parallel analysis of transcript and metabolic profiles: a new approach in systems biology.

Ewa Urbanczyk-Wochniak (2003)

10.1023/A:1016063422605

Design of Computer Experiments for Metamodel Generation

Selden B. Crary (2002)

10.1007/978-3-540-31880-4_13

Multiobjective Optimization on a Budget of 250 Evaluations

Joshua D. Knowles (2005)

10.1038/nm1432

Proton NMR analysis of plasma is a weak predictor of coronary artery disease

Heide L. Kirschenlohr (2006)

Computational Learning Theory

Norman Biggs (1992)

10.1148/radiology.143.1.7063747

The meaning and use of the area under a receiver operating characteristic (ROC) curve.

James A Hanley (1982)

10.1111/j.1574-6968.1986.tb01863.x

Metabolic control theory: its role in microbiology and biotechnology

Douglas B. Kell (1986)

Biometry, (3 edn)

R. R. Sokal (1995)

Introduction to Graphical Modeling:Introduction to Graphical Modeling

Ronald Christensen (2001)

Multi-Objective Optimization Using Evolutionary Algorithms, Wiley, New York

K. Deb (2001)

10.1016/S0092-8674(00)81693-1

Genomic Biology

Roger Brent (2000)

Response Surface Meth

edn. Wiley (1995)

10.1093/bioinformatics/btg323

Missing-value estimation using linear and non-linear regression with Bayesian gene selection

Xiaobo Zhou (2003)

10.1039/A608255F

Wavelet Denoising of Infrared Spectra

Bjørn K. Alsberg (1997)

10.1101/gr.1859804

The Ensembl analysis pipeline.

Simon C. Potter (2004)

10.1002/9780470125878.ch1

Genetic Algorithms and Their Use in Chemistry

Richard S. Judson (2007)

10.4088/JCP.v65n1111

Multiplicity-adjusted sample size requirements: a strategy to maintain statistical power with Bonferroni adjustments.

Andrew C. Leon (2004)

10.1017/S0376892997000088

A review of methods for the assessment of prediction errors in conservation presence/absence models

Alan Fielding (1997)

10.1073/pnas.0503955102

Hierarchical metabolomics demonstrates substantial compositional similarity between genetically modified and conventional potato crops.

Gareth Catchpole (2005)

10.1056/NEJMoa021967

A gene-expression signature as a predictor of survival in breast cancer.

Marc J. van de Vijver (2002)

10.1016/0003-2670(94)80155-X

Genetic algorithms in wavelength selection : a comparative study

Carlos B. Lucasius (1994)

10.2307/2532327

Bradford Hill's Principles of Medical Statistics.

Anthony B. Hill (1992)

Semi-Supervised Learning with Trees

Charles Kemp (2003)

10.1515/znc-1991-1-218

Identification by Gas Chromatography-Mass Spectrometry of 150 Compounds in Propolis

W. Greenaway (1991)

10.1038/73696

From genome to cellular phenotype—a role for metabolic flux analysis?

Athel Cornish-Bowden (2000)

10.1016/j.jbi.2005.05.005

A medical bioinformatics approach for metabolic disorders: Biomedical data prediction, modeling, and systematic analysis

Ming Chen (2006)

10.1093/bioinformatics/18.6.825

Modelling biological processes using workflow and Petri Net models

Mor Peleg (2002)

10.1016/S0749-3797(03)00092-8

Cholesterol measures to identify and treat individuals at risk for coronary heart disease.

Sundar Natarajan (2003)

10.1021/ac049146x

Closed-loop, multiobjective optimization of analytical instrumentation: gas chromatography/time-of-flight mass spectrometry of the metabolomes of human serum and of yeast fermentations.

Steve O’Hagan (2004)

10.1016/S0140-6736(05)17947-1

Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer

Yixin Wang (2005)

Exploratory data analysis addison-wesley (1977)

J. W. Tukey (1977)

10.1016/j.toxlet.2003.09.011

Application of high-throughput Fourier-transform infrared spectroscopy in toxicology studies: contribution to a study on the development of an animal model for idiosyncratic toxicity.

George G. Harrigan (2004)

10.1007/978-1-4615-0333-0_13

Evolutionary Computation for the Interpretation of Metabolomic Data

Royston Goodacre (2003)

10.1093/bioinformatics/btg484

Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments

Keith A. Baggerly (2004)

10.1007/978-1-4615-5731-9

Genetic Programming and Data Structures

William B. Langdon (1998)

Associating phenotypes

A. Kelemen (2006)

10.1088/0031-9112/35/7/022

The Left Hand of Creation: The Origin and Evolution of the Expanding Universe

John M. Irvine (1984)

10.1073/pnas.97.26.14295

Explaining the high mutation rates of cancer cells to drug and multidrug resistance by chromosome reassortments that are catalyzed by aneuploidy.

Peter H Duesberg (2000)

Pattern Classification, (2 edn)

R. O. Duda (2001)

10.2307/3619013

How to Solve It

A. C. Robson (1945)

10.1002/mas.1280130204

Mass spectrometric profiling and pattern recognition

A. C. Tas (1994)

Genetic Programming: Routine Human-Com

G. Lanza (2003)

Theory and Practice of Bayesian Belief Networks, Edward Arnold, London

M. Ramoni (1998)

10.7551/mitpress/6604.003.0003

The role of modeling in systems biology

Douglas B. Kell (2005)

10.1093/jnci/dji034

The central role of receiver operating characteristic (ROC) curves in evaluating tests for the early detection of cancer.

Stuart G. Baker (2003)

Multiplicity-adjusted sample size requirements

Processes (2004)

10.1093/bioinformatics/btl205

springScape: visualisation of microarray and contextual bioinformatic data using spring embedding and an "information landscape"

Timothy M. D. Ebbels (2006)

10.1126/science.1110535

A Fishing Buddy for Hypothesis Generators

Roger Brent (2005)

10.1093/bioinformatics/bti685

A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data

Yang Xie (2005)

10.1016/S0001-2998(78)80014-2

Basic principles of ROC analysis.

Charles E. Metz (1978)

10.1126/science.1061603

A Gene Expression Map for Caenorhabditis elegans

Stuart K. Kim (2001)

10.1201/b18469-8

Multiobjective Optimization

Matthias Ehrgott (2005)

10.1111/j.2517-6161.1995.tb02031.x

Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing

Yoav Benjamini (1995)

10.1093/bioinformatics/bti517

Computational cluster validation in post-genomic data analysis

Julia Handl (2005)

10.1136/bmj.313.7059.735

The relation between treatment benefit and underlying risk in meta-analysis

Stephen J Sharp (1996)

10.1093/bioinformatics/btf877

Identifying differentially expressed genes using false discovery rate controlling procedures

Anat Reiner (2003)

10.1080/00031305.1983.10483087

A Leisurely Look at the Bootstrap, the Jackknife, and

Bradley Efron (1983)

10.1016/S0003-2670(97)00065-2

Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry

David Broadhurst (1997)

10.1373/clinchem.2004.031823

ROC curves in clinical chemistry: uses, misuses, and possible solutions.

Nancy A. Obuchowski (2004)

10.1093/clinchem/39.4.561

Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine.

Mark H. Zweig (1993)

10.1007/978-1-4899-4541-9

An Introduction to the Bootstrap

Bradley Efron (1993)

10.2307/2986830

Practical Nonparametric Statistics.

M. F. Rowett Fuller (1973)

A Heuristic for Graph Drawing

Peter Eades (1984)

10.1007/978-0-387-35973-1_636

Information Retrieval

C. J. V. Rijsbergen (1979)

10.2307/2983440

Model Uncertainty, Data Mining and Statistical Inference

Chris Chatfield (1995)

10.1038/415530a

Gene expression profiling predicts clinical outcome of breast cancer

Laura van 't Veer (2002)

10.1038/ng749

Replication validity of genetic association studies

John P. A. Ioannidis (2001)

10.1186/gb-2003-4-4-210

Statistical tests for differential expression in cDNA microarray experiments

Xiangqin Cui (2003)

10.1002/cfg.411

Comparative Genomic Assessment of Novel Broad-Spectrum Targets for Antibacterial Drugs

Thomas A. White (2004)

No turning back, Reductonism and Biological Complexity

D. B. Kell (1991)

Metabolic control theory: its

D. B. Kell (1986)

Measuring diagnostic and predictive accuracy

A. 1–13. Linden (2006)

10.1016/S0168-9525(02)02765-8

Genotype-phenotype mapping: genes as computer programs.

Douglas B. Kell (2002)

10.1016/S0140-6736(02)07746-2

Use of proteomic patterns in serum to identify ovarian cancer

Emanuel F. Petricoin (2002)

10.1080/07408170500232495

A review on design, modeling and applications of computer experiments

Victoria C. P. Chen (2006)

10.1093/bib/bbl009

Review: On the analysis and interpretation of correlations in metabolomic data

Ralf Steuer (2006)

10.1016/S1093-3263(01)00123-1

Beware of q2!

Alexander Golbraikh (2002)

10.1016/S0140-6736(03)14686-7

Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment

Evangelia E. Ntzani (2003)

10.1016/j.urolonc.2008.07.015

Why most published research findings are false: Ioannidis JP, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece

James D Brooks (2008)

10.2307/2283175

Adaptive Control Processes: A Guided Tour.

Marshall Freimer (1962)

10.1109/IJCNN.2006.247330

Semi-supervised feature selection via multiobjective optimization

Julia Handl (2006)

10.1111/1467-9868.00346

A direct approach to false discovery rates

John D. Storey (2002)

Chaos in the neurosciences: Cautionary tales from the frontier

P. Rapp (1993)

10.2165/00002018-200730070-00010

Principles of Data Mining

David J. Hand (2001)

10.1093/bioinformatics/17.6.509

A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes

Pierre Baldi (2001)

10.1198/jasa.2002.s479

Principles of Multivariate Analysis: A User's Perspective

James R. Schott (2002)

10.1002/bit.260300115

Matrix method for determining steps most rate-limiting to metabolic fluxes in biotechnological processes.

Hans Victor Westerhoff (1987)

10.1109/TEVC.2006.877146

An Evolutionary Approach to Multiobjective Clustering

Julia Handl (2007)

10.1007/s11306-005-1106-4

A metabolome pipeline: from concept to data to knowledge

Marie Brown (2005)

Signal detection theory and ROC analysis

James P. Egan (1975)

10.1111/j.1467-9868.2004.05304.x

Probabilistic sensitivity analysis of complex models: a Bayesian approach

Jeremy E. Oakley (2004)

10.1007/BF01890988

The elements of graphing data

Issei Fujishiro (2005)

10.1073/pnas.222164199

Quantitative noise analysis for gene expression microarray experiments

Y. Tu (2002)

10.1007/s11306-005-0003-1

Novel biomarkers for pre-eclampsia detected using metabolomics and machine learning

Louise C Kenny (2005)

10.1007/s11306-005-1107-3

The origin of correlations in metabolomics data

Diogo M Camacho (2005)

Understanding and using

C. B. Lucasius (1994)

Molecular diversity in Gasteiger, J

Farnum M.A. (2003)

10.1002/9780470906514

Statistical methods in diagnostic medicine

Xiao-Hua Zhou (2002)

10.1056/NEJM197810262991705

Problems of spectrum and bias in evaluating the efficacy of diagnostic tests.

David F. Ransohoff (1978)

10.1073/pnas.1530509100

Statistical significance for genomewide studies

John D. Storey (2003)

10.1073/pnas.0601231103

Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer.

Liat Ein-Dor (2006)

10.1038/nbt1075

Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks

Diego di Bernardo (2005)

Discovering Data Mining: From Concept to Implementation

Peter Cabena (1997)

10.2307/3172576

Subset Selection in Regression

Sudeep Haldar (1992)

10.1198/jasa.2006.s92

Epidemiology: Study Design and Data Analysis

Simon H. R. Davies (2006)

10.1038/83496

A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations

Léonie M. Raamsdonk (2001)

10.1080/716099686

The Visual Display of Quantitative Information

Bernard Marx (1985)

10.1021/ac980506o

Variable selection in discriminant partial least-squares analysis.

B K Alsberg (1998)

10.1093/jnci/djh056

Analysis of serum proteomic patterns for early cancer diagnosis: drawing attention to potential problems.

Eleftherios P. Diamandis (2004)

10.1093/bioinformatics/bti670

Analysis of mass spectral serum profiles for biomarker selection

Habtom W. Ressom (2005)

10.1093/bioinformatics/17.12.1198

Visualizing plant metabolomic correlation networks using clique-metabolite matrices

Frank Kose (2001)

10.1080/00031305.1966.10479786

Sequential Trials, Sequential Analysis and the Likelihood Principle

Jerome Cornfield (1966)

Modern Epidemiology, (2 edn)

K. J. Rothman (1998)

10.1016/S0167-7799(98)01214-1

Systematic functional analysis of the yeast genome.

Stephen G. Oliver (1998)

10.1007/978-1-4613-1161-4

The Regulation of Cellular Systems

Reinhart Heinrich (1996)

10.1177/108705719900400206

A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays

Ji-Hu Zhang (1999)

10.1016/S0002-9149(99)80474-3

Range of serum cholesterol values in the population developing coronary artery disease.

William B. Kannel (1995)

10.1136/bmj.316.7139.1236

What's wrong with Bonferroni adjustments

Thomas V. Perneger (1998)

10.1111/0272-4332.00039

Identification and review of sensitivity analysis methods.

H. Frey (2002)

10.1016/0169-7439(94)85038-0

Understanding and using genetic algorithms Part 2. Representation, configuration and hybridization

Carlos B. Lucasius (1994)

10.5860/choice.47-3771

Causality: Models, Reasoning, and Inference

Judea Pearl (2000)

10.1080/00401706.1965.10490308

Fundamental Concepts In the Design of Experiments

Charles Robert Hicks (1965)

10.1016/j.jclinepi.2004.10.019

Early extreme contradictory estimates may appear in published research: the Proteus phenomenon in molecular genetics research and randomized trials.

John P A Ioannidis (2005)

10.1007/978-1-4757-3502-4

Bayesian Networks and Decision Graphs

Finn Verner Jensen (2001)

Visualizing Data, Hobart Press, Summit, NJ

W. S. Cleveland (1993)

10.1016/S0934-8840(96)80004-1

Quantitative analysis of multivariate data using artificial neural networks: a tutorial review and applications to the deconvolution of pyrolysis mass spectra.

Royston Goodacre (1996)

The parsimony principle appliedtomultivariatecalibration

B. Kowalski (1993)

10.1016/0020-0190(89)90102-6

An Algorithm for Drawing General Undirected Graphs

Tomihisa Kamada (1989)

Novel biomarkers

GOPEC Consortium (2005)

10.1038/nrc1322

Rules of evidence for cancer molecular-marker discovery and validation

David F. Ransohoff (2004)

10.1145/1968.1972

A theory of the learnable

Leslie G. Valiant (1984)

10.1002/gepi.1124

Empirical bayes methods and false discovery rates for microarrays.

Bradley Efron (2002)

10.1038/nature750

Comparative assessment of large-scale data sets of protein–protein interactions

Christian von Mering (2002)

10.1093/bioinformatics/bti456

Sample size for FDR-control in microarray data analysis

Sin-Ho Jung (2005)

This paper is referenced by

10.1007/s00216-013-7254-x

VIZR—an automated chemometric technique for metabolic profiling

Gregory A. Barding (2013)

10.1016/J.JCS.2013.10.002

High-throughput cereal metabolomics: Current analytical technologies, challenges and perspectives

Bekzod Khakimov (2014)

10.1093/IJE/DYW145

Random Survival Forest in practice: a method for modelling complex metabolomics data in time to event analysis.

Stefan Dietrich (2016)

10.1371/journal.pcbi.1005479

Identifying stochastic oscillations in single-cell live imaging time series using Gaussian processes

Nick E Phillips (2017)

10.1016/j.cyto.2015.11.018

Reductions in circulating levels of IL-16, IL-7 and VEGF-A in myalgic encephalomyelitis/chronic fatigue syndrome.

Abdolamir Landi (2016)

10.1016/j.jbiotec.2018.07.027

Establishment of a five-enzyme cell-free cascade for the synthesis of uridine diphosphate N-acetylglucosamine.

Reza Mahour (2018)

10.1021/pr500462f

Metabolomic Profiling of Autoimmune Hepatitis: The Diagnostic Utility of Nuclear Magnetic Resonance Spectroscopy.

Jia-bo Wang (2014)

10.3389/fbioe.2015.00129

Joint Analysis of Dependent Features within Compound Spectra Can Improve Detection of Differential Features

Diana Trutschel (2015)

10.1016/j.aca.2015.02.068

Lipidomic data analysis: tutorial, practical guidelines and applications.

Antonio Checa (2015)

10.1007/s00216-015-8977-7

Using non-targeted direct analysis in real time-mass spectrometry (DART-MS) to discriminate seeds based on endogenous or exogenous chemicals

Arvind K Subbaraj (2015)

10.1007/978-1-62703-986-4_4

On the statistics of identifying candidate pathogen effectors.

Leighton Pritchard (2014)

10.1155/2014/756138

A Metabolomic Perspective on Coeliac Disease

Antonio Calabrò (2014)

10.4155/ebo.13.487

Considerations in the design of clinical and epidemiological metabolic phenotyping studies

Georgios Theodoridis (2014)

10.1039/c2mb25512j

Yeast cells with impaired drug resistance accumulate glycerol and glucose.

Duygu Dikicioglu (2014)

10.1016/S1474-4422(13)70233-3

Body fluid biomarkers in multiple sclerosis

Manuel Comabella (2014)

10.1186/1752-0509-4-151

Multi-level reproducibility of signature hubs in human interactome for breast cancer metastasis

Chen Yao (2010)

10.1002/pca.1181

The photographer and the greenhouse: how to analyse plant metabolomics data.

Jeroen J. Jansen (2010)

10.1002/EJLT.200800144

Bioinformatics and computational approaches applicable to lipidomics

Matej Orešič (2009)

10.1098/rsif.2017.0941

Both lipopolysaccharide and lipoteichoic acids potently induce anomalous fibrin amyloid formation: assessment with novel Amytracker™ stains†

Etheresia Pretorius (2018)

10.1378/chest.14-0781

Breathomics in lung disease.

Marc P C van der Schee (2015)

NMR-based metabolomics: global analysis of metabolites to address problems in prostate cancer

Matthew J. Roberts (2014)

10.1201/b19567-33

Mining with Inference: Data-Adaptive Target Parameters

Alan Hubbard (2016)

10.3390/metabo9050092

Comparison of Bi- and Tri-Linear PLS Models for Variable Selection in Metabolomic Time-Series Experiments

Qian Gao (2019)

10.1016/j.aca.2019.04.011

Removal of false positive features to generate authentic peak table for high-resolution mass spectrometry-based metabolomics study.

Ran Ju (2019)

10.3945/ajcn.113.065235

A diet rich in high-glucoraphanin broccoli interacts with genotype to reduce discordance in plasma metabolite profiles by modulating mitochondrial function123

Charlotte N. Armah (2013)

10.1016/j.aca.2012.12.023

Mid-infrared (MIR) metabolic fingerprinting of amniotic fluid: a possible avenue for early diagnosis of prenatal disorders?

Gonçalo Graça (2013)

10.3390/metabo9040076

Systems Biology and Multi-Omics Integration: Viewpoints from the Metabolomics Research Community

Farhana R Pinu (2019)

10.1007/s10646-012-0928-x

Earthworm metabolomic responses after exposure to aged PCB contaminated soils

Melissa L. Whitfield Åslund (2012)

10.1016/j.envpol.2011.08.002

Metabolic responses of Eisenia fetida after sub-lethal exposure to organic contaminants with different toxic modes of action.

Jennifer R. McKelvie (2011)

10.1007/s11306-011-0348-6

The metabolome of human placental tissue: investigation of first trimester tissue and changes related to preeclampsia in late pregnancy

Warwick B. Dunn (2011)

10.1007/978-1-61779-594-7_18

A strategy for selecting data mining techniques in metabolomics.

Ahmed BaniMustafa (2012)

10.1007/s10646-011-0638-9

1H NMR metabolomics of earthworm responses to polychlorinated biphenyl (PCB) exposure in soil

Melissa L. Whitfield Åslund (2011)

See more