Online citations, reference lists, and bibliographies.

Statistical Strategies For Avoiding False Discoveries In Metabolomics And Related Experiments

David I. Broadhurst, Douglas B. Kell
Published 2006 · Computer Science
Cite This
Download PDF
Analyze on Scholarcy
Share
Many metabolomics, and other high-content or high-throughput, experiments are set up such that the primary aim is the discovery of biomarker metabolites that can discriminate, with a certain level of certainty, between nominally matched ‘case’ and ‘control’ samples. However, it is unfortunately very easy to find markers that are apparently persuasive but that are in fact entirely spurious, and there are well-known examples in the proteomics literature. The main types of danger are not entirely independent of each other, but include bias, inadequate sample size (especially relative to the number of metabolite variables and to the required statistical power to prove that a biomarker is discriminant), excessive false discovery rate due to multiple hypothesis testing, inappropriate choice of particular numerical methods, and overfitting (generally caused by the failure to perform adequate validation and cross-validation). Many studies fail to take these into account, and thereby fail to discover anything of true significance (despite their claims). We summarise these problems, and provide pointers to a substantial existing literature that should assist in the improved design and evaluation of metabolomics experiments, thereby allowing robust scientific conclusions to be drawn from the available data. We provide a list of some of the simpler checks that might improve one’s confidence that a candidate biomarker is not simply a statistical artefact, and suggest a series of preferred tests and visualisation tools that can assist readers and authors in assessing papers. These tools can be applied to individual metabolites by using multiple univariate tests performed in parallel across all metabolite peaks. They may also be applied to the validation of multivariate models. We stress in particular that classical p-values such as “p < 0.05”, that are often used in biomedicine, are far too optimistic when multiple tests are done simultaneously (as in metabolomics). Ultimately it is desirable that all data and metadata are available electronically, as this allows the entire community to assess conclusions drawn from them. These analyses apply to all high-dimensional ‘omics’ datasets.
This paper references
10.1201/b10391-5
Inference in Bayesian networks
Poo ja (2008)
10.1016/S0021-9673(00)88077-2
Classification of human cancer cells by means of capillary gas chromatography and pattern recognition analysis.
E. J. Jellum (1981)
10.1093/bioinformatics/17.6.520
Missing value estimation methods for DNA microarrays
Olga G. Troyanskaya (2001)
Microarray Biochip Technology
Mark Schena (2000)
10.1016/j.mib.2004.04.012
Metabolomics and systems biology: making sense of the soup.
Douglas B. Kell (2004)
10.1186/1471-2105-6-S4-S24
Web services and workflow management for biological resources
Paolo Romano (2005)
Heuristics of instability in model selection
Leo Breiman (1994)
10.1126/science.274.5287.546
Life with 6000 Genes
André Goffeau (1996)
10.1007/978-1-4615-3612-3
Multiobjective Optimization: Behavioral and Computational Considerations
Jeffrey L. Ringuest (1992)
Semi-Supervised Support Vector Machines
Kristin P. Bennett (1998)
10.1147/sj.402.0532
Transparent access to multiple bioinformatics information sources
Carole A. Goble (2001)
10.1016/S0167-7799(00)89006-X
GMP — good modelling practice: an essential component of good manufacturing practice
Douglas B. Kell (1995)
10.1016/0003-2670(95)00163-T
Further investigation on a comparative study of simulated annealing and genetic algorithm for wavelength selection
Uwe Hörchner (1995)
10.1007/BF01908075
Comparing partitions
Lawrence Hubert (1985)
Self-organization and associative memory: 3rd edition
Teuvo Kohonen (1989)
Semi-Supervised Clustering Using Genetic Algorithms
Ayhan Demiriz (1999)
Regression,Wiley, New York
G.A.F. Seber (1989)
10.2307/2026705
Probabilistic reasoning in intelligent systems - networks of plausible inference
Judea Pearl (1989)
10.1016/j.accreview.2005.09.021
Contradicted and Initially Stronger Effects in Highly Cited Clinical Research
John P. A. Ioannidis (2005)
10.2307/2287009
Statistics for experimenters : an introduction to design, data analysis, and model building
George E. P. Box (1978)
Predictive data mining in intensive care
Fabian Guiza Grandas (2006)
10.1016/S0167-7799(99)01407-9
On the optimization of classes for the assignment of unidentified reading frames in functional genomics programmes: the need for machine learning.
Douglas B. Kell (2000)
10.1093/jnci/djh326
Assessing the probability that a positive report is false: an approach for molecular epidemiology studies.
Sholom Wacholder (2004)
10.1198/tech.2004.s738
Chemometrics: Data Analysis for the Laboratory and Chemical Plant
David E. Booth (2004)
10.1002/1097-024X(200009)30:11<1203::AID-SPE338>3.3.CO;2-E
An open graph visualization system and its applications to software engineering
Emden R. Gansner (2000)
10.1007/978-3-322-86853-4
Multiobjective Heuristic Search
Pallab Dasgupta (1999)
10.1038/nbt0302-243
Metabolic control analysis in drug discovery and disease
Marta Cascante (2002)
10.1186/1471-2288-2-3
Meta-analysis, Simpson's paradox, and the number needed to treat
Douglas G. Altman (2002)
10.1146/annurev.genom.2.1.9
Pharmacogenomics: the inherited basis for interindividual differences in drug response.
Williams E Evans (2001)
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
Ron Kohavi (1995)
10.1038/nrc1550
Bias as a threat to the validity of cancer molecular-marker research
David F. Ransohoff (2005)
Designing network diagrams
Systems (1980)
Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems
S. Merrill Weiss (1990)
10.1016/j.chroma.2007.03.082
Multivariate calibration.
Michele Forina (2007)
10.1104/pp.126.3.943
Genomic computing. Explanatory analysis of plant expression profiling data using machine learning.
Douglas B. Kell (2001)
10.1109/T-C.1969.222678
A Nonlinear Mapping for Data Structure Analysis
John W. Sammon (1969)
The elements of statistical learning: data mining
Trevor J. Hastie (2000)
Statistical Modeling: The Two Cultures
Leo Breiman (2001)
10.1007/978-3-540-30217-9_109
Evolutionary Multiobjective Clustering
Julia Handl (2004)
10.1016/0003-2670(93)80430-S
The parsimony principle applied to multivariate calibration
Mary Beth Seasholtz (1993)
10.1016/S1359-6446(05)03609-3
Metabolomics: from pattern recognition to biological interpretation.
Wolfram Weckwerth (2005)
10.1126/science.286.5439.487
Pharmacogenomics: translating functional genomics into rational therapeutics.
Williams E Evans (1999)
10.1002/spe.4380211102
Graph Drawing by Force-directed Placement
Thomas M. J. Fruchterman (1991)
Multi- and megavariate data analysis
Lasse Eriksson (2006)
10.1007/978-1-4614-6170-8_100658
Cluster Analysis
B. S. Everitt (1974)
10.1145/1248800.1248811
Multiobjective Optimization in Bioinformatics and Computational Biology
Julia Handl (2007)
10.5860/choice.34-6252
Understanding the Control of Metabolism
David A. Fell (1996)
10.1016/S1741-8364(04)02412-6
myGrid and the drug discovery process
Robert Stevens (2004)
10.1016/S0303-2647(03)00143-6
Model selection methodology in supervised learning with evolutionary computation.
Jem J. Rowland (2003)
10.1093/bioinformatics/bth361
Taverna: a tool for the composition and enactment of bioinformatics workflows
Thomas M. Oinn (2004)
10.1007/978-1-4757-5184-0
Evolutionary Algorithms for Solving Multi-Objective Problems
Carlos A. Coello Coello (2002)
Machine learning (mcgraw-hill
Tom Michael Mitchell (1997)
10.1093/bioinformatics/bti102
Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data
Roger M. Jarvis (2005)
10.1136/bmj.310.6973.170
Multiple significance tests: the Bonferroni method.
J. Bland (1995)
10.1002/(SICI)1097-0258(19970515)16:9<981::AID-SIM510>3.0.CO;2-N
Variation of sensitivity, specificity, likelihood ratios and predictive values with disease prevalence.
Hermann Brenner (1997)
10.1021/ci010375j
Combinatorial Library Design Using a Multiobjective Genetic Algorithm
Valerie J. Gillet (2002)
10.1177/0272989X9401400209
ROC Curves for Classification Trees
Richard F. Raubertas (1994)
10.1038/nbt1235
Protein biomarker discovery and validation: the long and uncertain path to clinical utility
Nader Rifai (2006)
10.1016/S0960-9822(99)80208-5
Functional genomics: Learning to think about gene expression data
Roger Brent (1999)
10.1007/s10142-005-0006-z
Associating phenotypes with molecular events: recent statistical advances and challenges underpinning microarray experiments
Yulan Liang (2005)
10.1093/clinchem/17.8.802
Metabolic profiles: gas-phase methods for analysis of metabolites.
Evan C. Horning (1971)
10.1016/S0140-6736(06)69506-8
Are statins analogues of vitamin D?
Thomas F. Hiemstra (2006)
10.1162/106365600568158
Multiobjective Evolutionary Algorithms: Analyzing the State-of-the-Art
David A. van Veldhuizen (2000)
10.1191/0962280205sm415oa
Outcome selection bias in meta-analysis
Paula R Williamson (2005)
Predictive ability of DNA
E. E. Ntzani (2003)
10.1023/A:1020342216314
Metabolomics and Machine Learning: Explanatory Analysis of Complex Metabolome Data Using Genetic Programming to Produce Simple, Robust Rules
Douglas B. Kell (2004)
10.1080/00401706.1989.10488527
Experimental Design: a chemometric approach
Stanley N. Deming (1987)
10.1136/jcp.42.12.1315-c
Statistics with Confidence. Confidence Intervals and Statistical Guidelines
Linda M. Anderson (1989)
10.1093/bioinformatics/bti345
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data
Muhammad Shoaib B. Sehgal (2005)
10.2307/2983471
Multiplicity considerations in the design and analysis of clinical trials
Richard J Cook (1996)
10.3929/ethz-a-003856832
Evolutionary algorithms for multiobjective optimization: methods and applications
Eckart Zitzler (1999)
10.1007/978-1-4612-2538-6
The Data Handbook
Brand Fortner (1995)
Fuzzy models for pattern recognition : methods that search for structures in data
James C. Bezdek (1992)
10.1201/9780203738665
Statistical Evidence: A Likelihood Paradigm
Richard M. Royall (1997)
Mastering the Art of Data
M.J.A. Berry (2000)
Learning optimization from nature : Genetic algorithms and simulated annealing
Ronald E. Shaffer (1997)
Data Mining Cookbook, Wiley, New York
O. P. Rud (2001)
10.2307/2344557
Nonparametric Statistical Methods.
Malcolm Knott (1974)
10.1145/369133.369228
Gene functional classification from heterogeneous data
Paul Pavlidis (2001)
10.1038/nature02626
Moving towards individualized medicine with pharmacogenomics
Williams E Evans (2004)
Statistical learning theory wiley
V. Vapnik (1998)
10.2105/AJPH.78.12.1568
Evidence and scientific research.
Steven N. Goodman (1988)
10.1038/ng0706-731
Statistical false positive or true disease pathway?
John A. Todd (2006)
Artificial Neural Networks: Approximation and Learning Theory, Blackwell, Oxford
H. White (1992)
Neural Networks for Chemists, Verlag Chemie, Weinheim
J. Zupan (1993)
10.1042/bst0311476
Interpreting correlations in metabolomic networks.
Ralf Steuer (2003)
10.1038/nrmicro1177
Metabolic footprinting and systems biology: the medium is the message
Douglas B. Kell (2005)
10.1214/ss/1177012420
[Design and Analysis of Computer Experiments]: Rejoinder
Jerome Sacks (1989)
10.1016/S0169-7439(01)00156-3
Some recent developments in PLS modeling
Svante Wold (2001)
10.1038/129188a0
An Introduction to Medical Statistics
Mark Greenwood (1932)
10.2307/j.ctv301h9r.8
Statistical Inference
George Casella (1990)
10.1111/j.1467-985X.2005.358_16.x
Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models
Andrea Saltelli (2004)
10.1111/j.1365-2753.2005.00598.x
Measuring diagnostic and predictive accuracy in disease management: an introduction to receiver operating characteristic (ROC) analysis.
Ariel Linden (2006)
Systematic functional analysis of the yeast genome
R. D'ari (1998)
10.2307/2684909
Statistics: A Bayesian Perspective
Donald A. Berry (1995)
Principal Component Analysis Springer Verlag
Ian T. Jolliffe (1986)
10.1016/S0140-6736(03)12516-0
Genetic associations in large versus small studies: an empirical assessment
John P A Ioannidis (2003)
10.1007/bf02919627
What is a gene?
Subhash Chandra Lakhotia (1997)
10.1002/bies.10385
Here is the evidence, now what is the hypothesis? The complementary roles of inductive and hypothesis-driven science in the post-genomic era.
Douglas B. Kell (2004)
10.1007/springerreference_64265
Visualizing Categorical Data
Michael Friendly (2000)
10.1373/49.3.433
Comparison of eight computer programs for receiver-operating characteristic analysis.
Carsten Stephan (2003)
10.1111/j.1742-4658.2006.05136.x
Metabolomics, modelling and machine learning in systems biology – towards an understanding of the languages of cells
Douglas B. Kell (2006)
10.1038/sj.embor.embor944
Parallel analysis of transcript and metabolic profiles: a new approach in systems biology.
Ewa Urbanczyk-Wochniak (2003)
10.1023/A:1016063422605
Design of Computer Experiments for Metamodel Generation
Selden B. Crary (2002)
10.1007/978-3-540-31880-4_13
Multiobjective Optimization on a Budget of 250 Evaluations
Joshua D. Knowles (2005)
10.1038/nm1432
Proton NMR analysis of plasma is a weak predictor of coronary artery disease
Heide L. Kirschenlohr (2006)
Computational Learning Theory
Norman Biggs (1992)
10.1148/radiology.143.1.7063747
The meaning and use of the area under a receiver operating characteristic (ROC) curve.
James A Hanley (1982)
10.1111/j.1574-6968.1986.tb01863.x
Metabolic control theory: its role in microbiology and biotechnology
Douglas B. Kell (1986)
Biometry, (3 edn)
R. R. Sokal (1995)
Introduction to Graphical Modeling:Introduction to Graphical Modeling
Ronald Christensen (2001)
Multi-Objective Optimization Using Evolutionary Algorithms, Wiley, New York
K. Deb (2001)
10.1016/S0092-8674(00)81693-1
Genomic Biology
Roger Brent (2000)
Response Surface Meth
edn. Wiley (1995)
10.1093/bioinformatics/btg323
Missing-value estimation using linear and non-linear regression with Bayesian gene selection
Xiaobo Zhou (2003)
10.1039/A608255F
Wavelet Denoising of Infrared Spectra
Bjørn K. Alsberg (1997)
10.1101/gr.1859804
The Ensembl analysis pipeline.
Simon C. Potter (2004)
10.1002/9780470125878.ch1
Genetic Algorithms and Their Use in Chemistry
Richard S. Judson (2007)
10.4088/JCP.v65n1111
Multiplicity-adjusted sample size requirements: a strategy to maintain statistical power with Bonferroni adjustments.
Andrew C. Leon (2004)
10.1017/S0376892997000088
A review of methods for the assessment of prediction errors in conservation presence/absence models
Alan Fielding (1997)
10.1073/pnas.0503955102
Hierarchical metabolomics demonstrates substantial compositional similarity between genetically modified and conventional potato crops.
Gareth Catchpole (2005)
10.1056/NEJMoa021967
A gene-expression signature as a predictor of survival in breast cancer.
Marc J. van de Vijver (2002)
10.1016/0003-2670(94)80155-X
Genetic algorithms in wavelength selection : a comparative study
Carlos B. Lucasius (1994)
10.2307/2532327
Bradford Hill's Principles of Medical Statistics.
Anthony B. Hill (1992)
Semi-Supervised Learning with Trees
Charles Kemp (2003)
10.1515/znc-1991-1-218
Identification by Gas Chromatography-Mass Spectrometry of 150 Compounds in Propolis
W. Greenaway (1991)
10.1038/73696
From genome to cellular phenotype—a role for metabolic flux analysis?
Athel Cornish-Bowden (2000)
10.1016/j.jbi.2005.05.005
A medical bioinformatics approach for metabolic disorders: Biomedical data prediction, modeling, and systematic analysis
Ming Chen (2006)
10.1093/bioinformatics/18.6.825
Modelling biological processes using workflow and Petri Net models
Mor Peleg (2002)
10.1016/S0749-3797(03)00092-8
Cholesterol measures to identify and treat individuals at risk for coronary heart disease.
Sundar Natarajan (2003)
10.1021/ac049146x
Closed-loop, multiobjective optimization of analytical instrumentation: gas chromatography/time-of-flight mass spectrometry of the metabolomes of human serum and of yeast fermentations.
Steve O’Hagan (2004)
10.1016/S0140-6736(05)17947-1
Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer
Yixin Wang (2005)
Exploratory data analysis addison-wesley (1977)
J. W. Tukey (1977)
10.1016/j.toxlet.2003.09.011
Application of high-throughput Fourier-transform infrared spectroscopy in toxicology studies: contribution to a study on the development of an animal model for idiosyncratic toxicity.
George G. Harrigan (2004)
10.1007/978-1-4615-0333-0_13
Evolutionary Computation for the Interpretation of Metabolomic Data
Royston Goodacre (2003)
10.1093/bioinformatics/btg484
Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments
Keith A. Baggerly (2004)
10.1007/978-1-4615-5731-9
Genetic Programming and Data Structures
William B. Langdon (1998)
Associating phenotypes
A. Kelemen (2006)
10.1088/0031-9112/35/7/022
The Left Hand of Creation: The Origin and Evolution of the Expanding Universe
John M. Irvine (1984)
10.1073/pnas.97.26.14295
Explaining the high mutation rates of cancer cells to drug and multidrug resistance by chromosome reassortments that are catalyzed by aneuploidy.
Peter H Duesberg (2000)
Pattern Classification, (2 edn)
R. O. Duda (2001)
10.2307/3619013
How to Solve It
A. C. Robson (1945)
10.1002/mas.1280130204
Mass spectrometric profiling and pattern recognition
A. C. Tas (1994)
Genetic Programming: Routine Human-Com
G. Lanza (2003)
Theory and Practice of Bayesian Belief Networks, Edward Arnold, London
M. Ramoni (1998)
10.7551/mitpress/6604.003.0003
The role of modeling in systems biology
Douglas B. Kell (2005)
10.1093/jnci/dji034
The central role of receiver operating characteristic (ROC) curves in evaluating tests for the early detection of cancer.
Stuart G. Baker (2003)
Multiplicity-adjusted sample size requirements
Processes (2004)
10.1093/bioinformatics/btl205
springScape: visualisation of microarray and contextual bioinformatic data using spring embedding and an "information landscape"
Timothy M. D. Ebbels (2006)
10.1126/science.1110535
A Fishing Buddy for Hypothesis Generators
Roger Brent (2005)
10.1093/bioinformatics/bti685
A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data
Yang Xie (2005)
10.1016/S0001-2998(78)80014-2
Basic principles of ROC analysis.
Charles E. Metz (1978)
10.1126/science.1061603
A Gene Expression Map for Caenorhabditis elegans
Stuart K. Kim (2001)
10.1201/b18469-8
Multiobjective Optimization
Matthias Ehrgott (2005)
10.1111/j.2517-6161.1995.tb02031.x
Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing
Yoav Benjamini (1995)
10.1093/bioinformatics/bti517
Computational cluster validation in post-genomic data analysis
Julia Handl (2005)
10.1136/bmj.313.7059.735
The relation between treatment benefit and underlying risk in meta-analysis
Stephen J Sharp (1996)
10.1093/bioinformatics/btf877
Identifying differentially expressed genes using false discovery rate controlling procedures
Anat Reiner (2003)
10.1080/00031305.1983.10483087
A Leisurely Look at the Bootstrap, the Jackknife, and
Bradley Efron (1983)
10.1016/S0003-2670(97)00065-2
Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry
David Broadhurst (1997)
10.1373/clinchem.2004.031823
ROC curves in clinical chemistry: uses, misuses, and possible solutions.
Nancy A. Obuchowski (2004)
10.1093/clinchem/39.4.561
Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine.
Mark H. Zweig (1993)
10.1007/978-1-4899-4541-9
An Introduction to the Bootstrap
Bradley Efron (1993)
10.2307/2986830
Practical Nonparametric Statistics.
M. F. Rowett Fuller (1973)
A Heuristic for Graph Drawing
Peter Eades (1984)
10.1007/978-0-387-35973-1_636
Information Retrieval
C. J. V. Rijsbergen (1979)
10.2307/2983440
Model Uncertainty, Data Mining and Statistical Inference
Chris Chatfield (1995)
10.1038/415530a
Gene expression profiling predicts clinical outcome of breast cancer
Laura van 't Veer (2002)
10.1038/ng749
Replication validity of genetic association studies
John P. A. Ioannidis (2001)
10.1186/gb-2003-4-4-210
Statistical tests for differential expression in cDNA microarray experiments
Xiangqin Cui (2003)
10.1002/cfg.411
Comparative Genomic Assessment of Novel Broad-Spectrum Targets for Antibacterial Drugs
Thomas A. White (2004)
No turning back, Reductonism and Biological Complexity
D. B. Kell (1991)
Metabolic control theory: its
D. B. Kell (1986)
Measuring diagnostic and predictive accuracy
A. 1–13. Linden (2006)
10.1016/S0168-9525(02)02765-8
Genotype-phenotype mapping: genes as computer programs.
Douglas B. Kell (2002)
10.1016/S0140-6736(02)07746-2
Use of proteomic patterns in serum to identify ovarian cancer
Emanuel F. Petricoin (2002)
10.1080/07408170500232495
A review on design, modeling and applications of computer experiments
Victoria C. P. Chen (2006)
10.1093/bib/bbl009
Review: On the analysis and interpretation of correlations in metabolomic data
Ralf Steuer (2006)
10.1016/S1093-3263(01)00123-1
Beware of q2!
Alexander Golbraikh (2002)
10.1016/S0140-6736(03)14686-7
Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment
Evangelia E. Ntzani (2003)
10.1016/j.urolonc.2008.07.015
Why most published research findings are false: Ioannidis JP, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece
James D Brooks (2008)
10.2307/2283175
Adaptive Control Processes: A Guided Tour.
Marshall Freimer (1962)
10.1109/IJCNN.2006.247330
Semi-supervised feature selection via multiobjective optimization
Julia Handl (2006)
10.1111/1467-9868.00346
A direct approach to false discovery rates
John D. Storey (2002)
Chaos in the neurosciences: Cautionary tales from the frontier
P. Rapp (1993)
10.2165/00002018-200730070-00010
Principles of Data Mining
David J. Hand (2001)
10.1093/bioinformatics/17.6.509
A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes
Pierre Baldi (2001)
10.1198/jasa.2002.s479
Principles of Multivariate Analysis: A User's Perspective
James R. Schott (2002)
10.1002/bit.260300115
Matrix method for determining steps most rate-limiting to metabolic fluxes in biotechnological processes.
Hans Victor Westerhoff (1987)
10.1109/TEVC.2006.877146
An Evolutionary Approach to Multiobjective Clustering
Julia Handl (2007)
10.1007/s11306-005-1106-4
A metabolome pipeline: from concept to data to knowledge
Marie Brown (2005)
Signal detection theory and ROC analysis
James P. Egan (1975)
10.1111/j.1467-9868.2004.05304.x
Probabilistic sensitivity analysis of complex models: a Bayesian approach
Jeremy E. Oakley (2004)
10.1007/BF01890988
The elements of graphing data
Issei Fujishiro (2005)
10.1073/pnas.222164199
Quantitative noise analysis for gene expression microarray experiments
Y. Tu (2002)
10.1007/s11306-005-0003-1
Novel biomarkers for pre-eclampsia detected using metabolomics and machine learning
Louise C Kenny (2005)
10.1007/s11306-005-1107-3
The origin of correlations in metabolomics data
Diogo M Camacho (2005)
Understanding and using
C. B. Lucasius (1994)
Molecular diversity in Gasteiger, J
Farnum M.A. (2003)
10.1002/9780470906514
Statistical methods in diagnostic medicine
Xiao-Hua Zhou (2002)
10.1056/NEJM197810262991705
Problems of spectrum and bias in evaluating the efficacy of diagnostic tests.
David F. Ransohoff (1978)
10.1073/pnas.1530509100
Statistical significance for genomewide studies
John D. Storey (2003)
10.1073/pnas.0601231103
Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer.
Liat Ein-Dor (2006)
10.1038/nbt1075
Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks
Diego di Bernardo (2005)
Discovering Data Mining: From Concept to Implementation
Peter Cabena (1997)
10.2307/3172576
Subset Selection in Regression
Sudeep Haldar (1992)
10.1198/jasa.2006.s92
Epidemiology: Study Design and Data Analysis
Simon H. R. Davies (2006)
10.1038/83496
A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations
Léonie M. Raamsdonk (2001)
10.1080/716099686
The Visual Display of Quantitative Information
Bernard Marx (1985)
10.1021/ac980506o
Variable selection in discriminant partial least-squares analysis.
B K Alsberg (1998)
10.1093/jnci/djh056
Analysis of serum proteomic patterns for early cancer diagnosis: drawing attention to potential problems.
Eleftherios P. Diamandis (2004)
10.1093/bioinformatics/bti670
Analysis of mass spectral serum profiles for biomarker selection
Habtom W. Ressom (2005)
10.1093/bioinformatics/17.12.1198
Visualizing plant metabolomic correlation networks using clique-metabolite matrices
Frank Kose (2001)
10.1080/00031305.1966.10479786
Sequential Trials, Sequential Analysis and the Likelihood Principle
Jerome Cornfield (1966)
Modern Epidemiology, (2 edn)
K. J. Rothman (1998)
10.1016/S0167-7799(98)01214-1
Systematic functional analysis of the yeast genome.
Stephen G. Oliver (1998)
10.1007/978-1-4613-1161-4
The Regulation of Cellular Systems
Reinhart Heinrich (1996)
10.1177/108705719900400206
A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays
Ji-Hu Zhang (1999)
10.1016/S0002-9149(99)80474-3
Range of serum cholesterol values in the population developing coronary artery disease.
William B. Kannel (1995)
10.1136/bmj.316.7139.1236
What's wrong with Bonferroni adjustments
Thomas V. Perneger (1998)
10.1111/0272-4332.00039
Identification and review of sensitivity analysis methods.
H. Frey (2002)
10.1016/0169-7439(94)85038-0
Understanding and using genetic algorithms Part 2. Representation, configuration and hybridization
Carlos B. Lucasius (1994)
10.5860/choice.47-3771
Causality: Models, Reasoning, and Inference
Judea Pearl (2000)
10.1080/00401706.1965.10490308
Fundamental Concepts In the Design of Experiments
Charles Robert Hicks (1965)
10.1016/j.jclinepi.2004.10.019
Early extreme contradictory estimates may appear in published research: the Proteus phenomenon in molecular genetics research and randomized trials.
John P A Ioannidis (2005)
10.1007/978-1-4757-3502-4
Bayesian Networks and Decision Graphs
Finn Verner Jensen (2001)
Visualizing Data, Hobart Press, Summit, NJ
W. S. Cleveland (1993)
10.1016/S0934-8840(96)80004-1
Quantitative analysis of multivariate data using artificial neural networks: a tutorial review and applications to the deconvolution of pyrolysis mass spectra.
Royston Goodacre (1996)
The parsimony principle appliedtomultivariatecalibration
B. Kowalski (1993)
10.1016/0020-0190(89)90102-6
An Algorithm for Drawing General Undirected Graphs
Tomihisa Kamada (1989)
Novel biomarkers
GOPEC Consortium (2005)
10.1038/nrc1322
Rules of evidence for cancer molecular-marker discovery and validation
David F. Ransohoff (2004)
10.1145/1968.1972
A theory of the learnable
Leslie G. Valiant (1984)
10.1002/gepi.1124
Empirical bayes methods and false discovery rates for microarrays.
Bradley Efron (2002)
10.1038/nature750
Comparative assessment of large-scale data sets of protein–protein interactions
Christian von Mering (2002)
10.1093/bioinformatics/bti456
Sample size for FDR-control in microarray data analysis
Sin-Ho Jung (2005)



This paper is referenced by
10.1007/s00216-013-7254-x
VIZR—an automated chemometric technique for metabolic profiling
Gregory A. Barding (2013)
10.1016/J.JCS.2013.10.002
High-throughput cereal metabolomics: Current analytical technologies, challenges and perspectives
Bekzod Khakimov (2014)
10.1093/IJE/DYW145
Random Survival Forest in practice: a method for modelling complex metabolomics data in time to event analysis.
Stefan Dietrich (2016)
10.1371/journal.pcbi.1005479
Identifying stochastic oscillations in single-cell live imaging time series using Gaussian processes
Nick E Phillips (2017)
10.1016/j.cyto.2015.11.018
Reductions in circulating levels of IL-16, IL-7 and VEGF-A in myalgic encephalomyelitis/chronic fatigue syndrome.
Abdolamir Landi (2016)
10.1016/j.jbiotec.2018.07.027
Establishment of a five-enzyme cell-free cascade for the synthesis of uridine diphosphate N-acetylglucosamine.
Reza Mahour (2018)
10.1021/pr500462f
Metabolomic Profiling of Autoimmune Hepatitis: The Diagnostic Utility of Nuclear Magnetic Resonance Spectroscopy.
Jia-bo Wang (2014)
10.3389/fbioe.2015.00129
Joint Analysis of Dependent Features within Compound Spectra Can Improve Detection of Differential Features
Diana Trutschel (2015)
10.1016/j.aca.2015.02.068
Lipidomic data analysis: tutorial, practical guidelines and applications.
Antonio Checa (2015)
10.1007/s00216-015-8977-7
Using non-targeted direct analysis in real time-mass spectrometry (DART-MS) to discriminate seeds based on endogenous or exogenous chemicals
Arvind K Subbaraj (2015)
10.1007/978-1-62703-986-4_4
On the statistics of identifying candidate pathogen effectors.
Leighton Pritchard (2014)
10.1155/2014/756138
A Metabolomic Perspective on Coeliac Disease
Antonio Calabrò (2014)
10.4155/ebo.13.487
Considerations in the design of clinical and epidemiological metabolic phenotyping studies
Georgios Theodoridis (2014)
10.1039/c2mb25512j
Yeast cells with impaired drug resistance accumulate glycerol and glucose.
Duygu Dikicioglu (2014)
10.1016/S1474-4422(13)70233-3
Body fluid biomarkers in multiple sclerosis
Manuel Comabella (2014)
10.1186/1752-0509-4-151
Multi-level reproducibility of signature hubs in human interactome for breast cancer metastasis
Chen Yao (2010)
10.1002/pca.1181
The photographer and the greenhouse: how to analyse plant metabolomics data.
Jeroen J. Jansen (2010)
10.1002/EJLT.200800144
Bioinformatics and computational approaches applicable to lipidomics
Matej Orešič (2009)
10.1098/rsif.2017.0941
Both lipopolysaccharide and lipoteichoic acids potently induce anomalous fibrin amyloid formation: assessment with novel Amytracker™ stains†
Etheresia Pretorius (2018)
10.1378/chest.14-0781
Breathomics in lung disease.
Marc P C van der Schee (2015)
NMR-based metabolomics: global analysis of metabolites to address problems in prostate cancer
Matthew J. Roberts (2014)
10.1201/b19567-33
Mining with Inference: Data-Adaptive Target Parameters
Alan Hubbard (2016)
10.3390/metabo9050092
Comparison of Bi- and Tri-Linear PLS Models for Variable Selection in Metabolomic Time-Series Experiments
Qian Gao (2019)
10.1016/j.aca.2019.04.011
Removal of false positive features to generate authentic peak table for high-resolution mass spectrometry-based metabolomics study.
Ran Ju (2019)
10.3945/ajcn.113.065235
A diet rich in high-glucoraphanin broccoli interacts with genotype to reduce discordance in plasma metabolite profiles by modulating mitochondrial function123
Charlotte N. Armah (2013)
10.1016/j.aca.2012.12.023
Mid-infrared (MIR) metabolic fingerprinting of amniotic fluid: a possible avenue for early diagnosis of prenatal disorders?
Gonçalo Graça (2013)
10.3390/metabo9040076
Systems Biology and Multi-Omics Integration: Viewpoints from the Metabolomics Research Community
Farhana R Pinu (2019)
10.1007/s10646-012-0928-x
Earthworm metabolomic responses after exposure to aged PCB contaminated soils
Melissa L. Whitfield Åslund (2012)
10.1016/j.envpol.2011.08.002
Metabolic responses of Eisenia fetida after sub-lethal exposure to organic contaminants with different toxic modes of action.
Jennifer R. McKelvie (2011)
10.1007/s11306-011-0348-6
The metabolome of human placental tissue: investigation of first trimester tissue and changes related to preeclampsia in late pregnancy
Warwick B. Dunn (2011)
10.1007/978-1-61779-594-7_18
A strategy for selecting data mining techniques in metabolomics.
Ahmed BaniMustafa (2012)
10.1007/s10646-011-0638-9
1H NMR metabolomics of earthworm responses to polychlorinated biphenyl (PCB) exposure in soil
Melissa L. Whitfield Åslund (2011)
See more
Semantic Scholar Logo Some data provided by SemanticScholar