Online citations, reference lists, and bibliographies.
← Back to Search

Finding Predictive Gene Groups From Microarray Data

M. Dettling, P. Bühlmann
Published 2004 · Mathematics

Save to my Library
Download PDF
Analyze on Scholarcy
Share
Microarray experiments generate large datasets with expression values for thousands of genes, but not more than a few dozens of samples. A challenging task with these data is to reveal groups of genes which act together and whose collective expression is strongly associated with an outcome variable of interest. To find these groups, we suggest the use of supervised algorithms: these are procedures which use external information about the response variable for grouping the genes. We present Pelora, an algorithm based on penalized logistic regression analysis, that combines gene selection, gene grouping and sample classification in a supervised, simultaneous way. With an empirical study on six difterent microarray datasets, we show that Pelora identifies gene groups whose expression centroids have very good predictive potential and yield results that can keep up with state-of-the-art classification methods based on single genes. Thus, our gene groups can be beneficial in medical diagnostics and prognostics, but they may also provide more biological insights into gene function and regulation.
This paper references
Boosting for tumor classification with microarray data
M. Dettling (2003)
10.1038/415530a
Gene expression profiling predicts clinical outcome of breast cancer
L. J. Veer (2002)
10.1162/15324430152733133
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Erin L. Allwein (2000)
10.1016/S0140-6736(03)13308-9
Gene expression predictors of breast cancer outcomes
E. Huang (2003)
10.1111/1467-9868.00293
Estimating the number of clusters in a dataset via the gap statistic
R. Tibshirani (2000)
10.1126/SCIENCE.286.5439.531
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
T. Golub (1999)
Supervised clustering of genes , Genome
P. Bühlmann Dettling
10.1117/12.427987
Classification of microarray data with penalized logistic regression
P. Eilers (2001)
10.1186/gb-2001-2-1-research0003
Supervised harvesting of expression trees
T. Hastie (2000)
10.1038/35000501
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
Ash A. Alizadeh (2000)
10.2307/1271436
Ridge regression: biased estimation for nonorthogonal problems
A. Hoerl (2000)
10.1093/bioinformatics/btg039
Simultaneous Gene Clustering and Subset Selection for Sample Classification Via MDL
Rebecka Jörnsten (2003)
10.1186/gb-2002-3-12-research0069
Supervised clustering of genes
M. Dettling (2002)
10.1186/gb-2002-3-7-research0036
A prediction-based resampling method for estimating the number of clusters in a dataset
S. Dudoit (2002)
A prediction - based resampling method to estimate the number of clusters in a dataset , Genome
J. Fridlyand Dudoit (2002)
Fridlyand , A prediction - based resampling method to estimate the number of clusters in a dataset
J. Dudoit (1993)
10.1093/bioinformatics/16.10.906
Support vector machine classification and validation of cancer tissue samples using microarray expression data
T. Furey (2000)
10.2307/2533465
Efficient and Adaptive Estimation for Semiparametric Models.
K. A. Do (1994)
A
D. Singh (2004)
10.1073/PNAS.96.12.6745
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.
U. Alon (1999)
10.1093/BIOSTATISTICS/KXG046
Classification of gene microarrays by penalized logistic regression.
J. Zhu (2004)
10.1016/S1535-6108(02)00030-2
Gene expression correlates of clinical prostate cancer behavior.
Dinesh K Singh (2002)
10.1093/bioinformatics/18.1.39
Tumor classification by partial least squares using microarray gene expression data
D. Nguyen (2002)
10.1073/pnas.201162998
Predicting the clinical status of human breast cancer by using gene expression profiles
M. West (2001)
10.1198/016214502753479248
Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data
S. Dudoit (2002)
10.2307/2347628
Ridge Estimators in Logistic Regression
S. Cessie (1992)
Van Houwelingen , Ridge estimators in logistic regression
J. Le Cessie (1990)



This paper is referenced by
10.1007/978-3-319-19704-3_18
Clustering Variables Based on Fuzzy Equivalence Relations
Kingsley Adjenughwure (2015)
10.1039/C6AY02445A
Elastic net wavelength interval selection based on iterative rank PLS regression coefficient screening
Xin Huang (2017)
10.12681/EADD/26361
Discovery of gene interactions in regulatory networks using genomic data mining and computational intelligence methods
A. Dragomir (2006)
Histogram based Hierarchical Data Representation for Microarray Classification
Sandeep Kottath (2012)
Significance Testing and Group Variable Selection
Adriano Z. Zambom (2012)
Histogram based Hierarchical Data Representation for Microarray Classification
Philippe Jean Salembier Clairon (2012)
10.1158/0008-5472.CAN-04-0844
Gene Expression Signatures Identify Rhabdomyosarcoma Subtypes and Detect a Novel t(2;2)(q35;p23) Translocation Fusing PAX3 to NCOA1
M. Wachtel (2004)
10.1093/bioinformatics/btn262
Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value
A. Boulesteix (2008)
ANAKAΛYΨH TΩN (AITIΩ∆ΩN) ΣXEΣEΩN AΛΛHΛEΠI∆PAΣHΣ ΣTO ∆IKTYO PYΘMIΣHΣ ΓONI∆IΩN, ME XPHΣH ΠPOHΓMENΩN MEΘO∆ΩN TEXNHTHΣ NOHMOΣYNHΣ, BAΣIZOMENEΣ ΣTHN EΞOPYΞH ΠΛHPOΦOPIAΣ AΠO ∆E∆OMENA ΣYNOΛIKHΣ ΓONI∆IΩMATIKHΣ KΛIMAKOΣ
Andrei Dragomir (2006)
10.1016/J.IJFORECAST.2019.04.017
A novel cluster HAR-type model for forecasting realized volatility
Xingzhi Yao (2019)
10.7916/D8X928CH
Interaction-Based Learning for High-Dimensional Data with Continuous Predictors
C. Huang (2014)
10.1145/2988544
Predicting Breast Cancer Recurrence Using Machine Learning Techniques
P. Abreu (2016)
10.1093/biostatistics/kxv049
Sparse regression and marginal testing using cluster prototypes.
S. Reid (2016)
10.1002/0470094419
Data Analysis and Visualization in Genomics and Proteomics
F. Azuaje (2005)
Variable selection from random forests: application to gene expression data
Ramon D ´ õaz-Uriarte (2005)
Universidade de Santiago de Compostela
Mª Virtudes Pardo Gómez (2008)
10.1214/08-AOAS137C
Discussion of: TreeletsAn adaptive multi-scale basis for sparse unordered data
N. Meinshausen (2008)
10.4155/bio.10.35
Derivation of cancer diagnostic and prognostic signatures from gene expression data.
S. Goodison (2010)
10.1016/j.jmva.2014.08.014
Nonparametric significance testing and group variable selection
Adriano Z. Zambom (2015)
10.1007/s10930-017-9746-6
A Proteomics Analysis Reveals 9 Up-Regulated Proteins Associated with Altered Cell Signaling in Colon Cancer Patients
O. Kit (2017)
Molecular Signatures from Gene Expression Data
Ramon D ´ õaz-Uriarte (2004)
10.1038/s41598-020-66466-z
Bayesian Hyper-LASSO Classification for Feature Selection with Application to Endometrial Cancer RNA-seq Data
Lai Jiang (2020)
10.4161/sysb.25271
Learning diagnostic signatures from microarray data using L1-regularized logistic regression
P. Nandy (2013)
10.1214/009053606000000092
BOOSTING FOR HIGH-DIMENSIONAL LINEAR MODELS
B. Péter (2006)
Discriminative Feature Grouping
Lei Han (2015)
An Empirical Analysis of Predictive Machine Learning Algorithms on High-Dimensional Microarray Cancer Data
Jo A. Bill (2015)
10.3109/09513590.2014.989981
Supervised clustering of immunohistochemical markers to distinguish atypical and non-atypical endometrial hyperplasia
E. Laas (2015)
Augmented Lagrangian and Alternating Direction Methods for Convex Optimization: A Tutorial and Some Illustrative Computational Results
Jonathan Eckstein (2012)
10.1186/1471-2105-10-S1-S19
A voting approach to identify a small number of highly predictive genes using multiple classifiers
M. R. Hassan (2009)
Prediction Model for Women Breast Cancer Recurrence
B. Andrade (2015)
10.5705/SS.2012.196
Estimation of treatment policies based on functional predictors.
I. McKeague (2014)
10.18063/APM.2016.01.003
A classifier driven approach to find biomarkers for affective disorders from transcription profiles in blood
W. Mazin (2016)
See more
Semantic Scholar Logo Some data provided by SemanticScholar