Online citations, reference lists, and bibliographies.
Please confirm you are human
(Sign Up for free to never see this)
← Back to Search

Regression Approaches For Microarray Data Analysis

M. Segal, K. Dahlquist, B. Conklin
Published 2003 · Biology, Computer Science, Medicine

Save to my Library
Download PDF
Analyze on Scholarcy
Share
A variety of new procedures have been devised to handle the two-sample comparison (e.g., tumor versus normal tissue) of gene expression values as measured with microarrays. Such new methods are required in part because of some defining characteristics of microarray-based studies: (i) the very large number of genes contributing expression measures which far exceeds the number of samples (observations) available and (ii) the fact that by virtue of pathway/network relationships, the gene expression measures tend to be highly correlated. These concerns are exacerbated in the regression setting, where the objective is to relate gene expression, simultaneously for multiple genes, to some external outcome or phenotype. Correspondingly, several methods have been recently proposed for addressing these issues. We briefly critique some of these methods prior to a detailed evaluation of gene harvesting. This reveals that gene harvesting, without additional constraints, can yield artifactual solutions. Results obtained employing such constraints motivate the use of regularized regression procedures such as the lasso, least angle regression, and support vector machines. Model selection and solution multiplicity issues are also discussed. The methods are evaluated using a microarray-based study of cardiomyopathy in transgenic mice.
This paper references
and class prediction by gene expression monitoring
A. B. Goryachev (2001)
10.1073/pnas.091062498
Significance analysis of microarrays applied to the ionizing radiation response
V. G. Tusher (2001)
10.1093/bioinformatics/bti422
Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data
J. Gui (2005)
10.1126/SCIENCE.286.5439.531
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
T. Golub (1999)
myoblasts deprived of the interferon-related protein PC4
J. Weston (2002)
10.1023/A:1012487302797
Gene Selection for Cancer Classification using Support Vector Machines
I. Guyon (2004)
10.1214/009053604000000067
Least angle regression
B. Efron (2004)
10.2307/1390657
On the LASSO and its dual
M. R. Osborne (2000)
10.1089/10665270252833190
Improved Background Correction for Spotted DNA Microarrays
C. Kooperberg (2002)
Identification of regulatory elements using a feature selection algorithm
S Keleskeles (2002)
10.1007/978-1-4612-0919-5_38
Information Theory and an Extension of the Maximum Likelihood Principle
H. Akaike (1973)
10.1016/S0003-4975(01)03303-3
Myocardial immediate early gene activation after cardiopulmonary bypass with cardiac ischemia-reperfusion.
D. Nelson (2002)
10.2307/2530946
Classification and Regression Trees
L. Breiman (1983)
10.2307/1269729
More comments on C p
C. Mallows (1995)
Comparison of discrimination methods for the classi Ž cation of tumors using gene expression data
S. Dudoit (2002)
10.1186/gb-2001-2-8-research0031
Cluster-Rasch models for microarray gene expression data
Hongzhe Li (2001)
Multivariate adaptive regression splines. Discussions
J. Friedman (1991)
10.1016/S0092-8674(00)80579-6
The Neuroendocrine Protein 7B2 Is Required for Peptide Hormone Processing In Vivo and Provides a Novel Mechanism for Pituitary Cushing’s Disease
C. H. Westphal (1999)
Statistical modeling: The two cultures
L. Breiman (2001)
Address correspondence to: Mark R. Segal Department of Epidemiology and Biostatistics University of California 500 Parnassus Avenue, MU 420
10.1016/S0968-0004(99)01365-1
Interactions among pathways for phosphatidylcholine metabolism, CTP synthesis and secretion through the Golgi apparatus.
C. Kent (1999)
10.1214/SS/1009213726
Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)
L. Breiman (2001)
10.1073/PNAS.95.25.14863
Cluster analysis and display of genome-wide expression patterns.
M. Eisen (1998)
10.1080/01621459.1998.10474094
On Measuring and Correcting the Effects of Data Mining and Model Selection
Jianming Ye (1998)
Identi cation of regulatory elements using a feature selection algorithm
S Kele¸kele¸s (2002)
Molecular classiŽ cation of cancer: Class discovery
J.R (1999)
Inhibition of differentiation in myoblasts deprived of the interferon-related protein PC4.
D. Guardavaccaro (1995)
Statistical Learning Theory, Wiley, New York
V. Vapnik (1998)
10.1080/00401706.2000.10485984
Some Comments on Cp
C. Mallows (2000)
J. Roy. Statist. Soc. B
10.1111/1467-9868.00191
The Covariance Inflation Criterion for Adaptive Model Selection
R. Tibshirani (1999)
10.1073/pnas.111153698
Recursive partitioning for tumor classification with gene expression microarray data
H. Zhang (2001)
10.2307/1271436
Ridge regression: biased estimation for nonorthogonal problems
A. Hoerl (2000)
10.1080/01944366508978189
Rejoinder by the Author
C. Hartman (1965)
10.1089/106652701300099074
On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data
M. Newton (2001)
ClassiŽ cation of multiple cancer types by multicategory support vector machines using gene expression data
Y. Lee (2002)
10.1016/S0968-0004(99)01369-9
Getting tRNA synthetases into the nucleus.
P. Schimmel (1999)
10.1111/J.2517-6161.1996.TB02080.X
Regression Shrinkage and Selection via the Lasso
R. Tibshirani (1996)
10.1073/pnas.201162998
Predicting the clinical status of human breast cancer by using gene expression profiles
M. West (2001)
Cell Growth Differ
I. Guyon (2002)
Comparison of discrimination methods for the classiŽ cation of tumors
S. Cambridge. Dudoit (2002)
10.1093/bioinformatics/18.9.1167
Identification of regulatory elements using a feature selection method
S. Keleş (2002)
E-mail: mark@biostat.ucsf
10.1073/PNAS.98.1.31
Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection.
C. Li (2001)
10.1093/bioinformatics/btg102
Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data
Yoonkyung Lee (2003)
10.1198/016214502753479248
Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data
S. Dudoit (2002)
10.1038/6165
Conditional expression and signaling of a specifically designed Gi-coupled receptor in transgenic mice
C. H. Redfern (1999)
10.1073/PNAS.97.9.4826
Conditional expression of a Gi-coupled receptor causes ventricular conduction delay and a lethal cardiomyopathy.
C. H. Redfern (2000)
10.1214/AOS/1176347963
Multivariate Adaptive Regression Splines
J. Friedman (1991)
Classication and Regression Trees
L Breiman (1984)
10.1089/10665270252833226
Strong Feature Sets from Small Samples
S. Kim (2002)
10.2307/1267380
Some comments on C_p
C. L. Mallows (1973)
10.1007/978-3-319-94989-5_4
An introduction to Support Vector Machines
N. Cristianini (2000)
10.1093/bioinformatics/16.10.906
Support vector machine classification and validation of cancer tissue samples using microarray expression data
T. Furey (2000)
10.1089/106652701752236232
Unfolding of Microarray Data
A. Goryachev (2002)
Supervised harvesting of expression
T. Hastie (2001)
10.1007/b94608
The Elements of Statistical Learning
T. Hastie (2001)
10.1073/PNAS.97.1.262
Knowledge-based analysis of microarray gene expression data by using support vector machines.
M. Brown (2000)
10.1023/A:1010933404324
Random Forests
L. Breiman (2004)
10.1111/J.1467-9868.2005.00503.X
Regularization and variable selection via the elastic net
H. Zou (2005)
Classi cation of multiple cancer types by multicategory support vector machines using gene expression data
Y Lee (2002)
10.1186/gb-2001-2-1-research0003
Supervised harvesting of expression trees
T. Hastie (2000)
10.1214/AOS/1176344136
Estimating the Dimension of a Model
G. Schwarz (1978)



This paper is referenced by
10.1007/s00521-012-0885-6
Partly adaptive elastic net and its application to microarray classification
Juntao Li (2012)
10.1016/j.apm.2019.12.024
Conditional Distance Correlation Screening for Sparse Ultrahigh-Dimensional Models
Fengli Song (2020)
10.1080/01621459.2020.1783274
Model-free Feature Screening and FDR Control with Knockoff Features
Wanjun Liu (2019)
10.1007/s11222-014-9498-5
A fast unified algorithm for solving group-lasso penalize learning problems
Yi Yang (2015)
10.1186/s12885-015-1237-6
Predicting response to multidrug regimens in cancer patients using cell line experiments and regularised regression models
Steffen Falgreen (2015)
High-dimensional classification and attribute-based forecasting
Shin-Lian Lo (2010)
10.1134/S1022795416070073
Genetic risk assessment of the joint effect of several genes: Critical appraisal
A. Rubanovich (2016)
10.1080/02664763.2018.1548583
A sure independence screening procedure for ultra-high dimensional partially linear additive models
M. Kazemi (2017)
10.1080/03610926.2016.1140783
Outlier detection in high-dimensional regression model
T. Wang (2017)
10.1093/bioinformatics/btq008
Active site prediction using evolutionary and structural information
S. Sankararaman (2010)
10.1007/S12033-007-0012-6
Microarray analysis: basic strategies for successful experiments
S. Ness (2007)
10.1186/1471-2105-9-412
Gene and pathway identification with Lp penalized Bayesian logistic regression
Z. Liu (2008)
Efficient Clustering of Correlated Variables and Variable Selection in High-Dimensional Linear Models
Niharika Gauraha (2016)
Penalized Cox Regression Analysis in the High-Dimensional and Low-sample Size Settings, with Applications to Mi-croarray Gene Expression Data
Molecular Biostatistics (2004)
Iterative hard clustering of features
V. Roulet (2017)
10.1214/09-AOS699
Using the bootstrap to quantify the authority of an empirical ranking
P. Hall (2009)
Statistical methods for the analysis of high-dimensional data
H. Miller (2010)
10.2202/1544-6115.1248
Sparse Logistic Regression with Lp Penalty for Biomarker Identification
Z. Liu (2007)
10.1089/10665270252833190
Improved Background Correction for Spotted DNA Microarrays
C. Kooperberg (2002)
10.1109/GENSIPS.2012.6507746
Latent feature decompositions for integrative analysis of diverse high-throughput genomic data
K. Gregory (2012)
Breaking the abstractions for productivity and performance in the era of specialization
Jongsea Park (2018)
10.1109/TCSVT.2018.2793359
Subspace Clustering Under Complex Noise
Baohua Li (2019)
10.1016/j.csda.2008.05.021
Survival prediction using gene expression data: A review and comparison
W. N. Wieringen (2009)
10.1016/J.CSDA.2019.06.004
Robust sufficient dimension reduction via ball covariance
Jia Zhang (2019)
10.1109/TMM.2017.2758524
Discriminative Part Selection for Human Action Recognition
Shiwei Zhang (2018)
10.1080/02664763.2015.1078300
Variable selection for high-dimensional generalized linear models with the weighted elastic-net procedure
X. Wang (2015)
10.1007/978-1-84996-196-7_12
Group Variable Selection Methods and Their Applications in Analysis of Genomic Data
J. Xie (2010)
10.1016/j.jmva.2015.10.010
Robust model-free feature screening via quantile correlation
X. Ma (2016)
Least Angle and $L_1$ Regression: A Review
T. Hesterberg (2008)
10.5705/SS.2011.036
EFFECT OF HEAVY TAILS ON ULTRA HIGH DIMENSIONAL VARIABLE RANKING METHODS
A. Delaigle (2012)
10.1016/j.asoc.2007.02.023
Soft computing methods to predict gene regulatory networks: An integrative approach on time-series gene expression data
Zeke S. H. Chan (2008)
10.2427/8672
High Dimensional Regression on Serum Analytes
Yuanzhang Li (2012)
See more
Semantic Scholar Logo Some data provided by SemanticScholar