Online citations, reference lists, and bibliographies.
← Back to Search

Wrappers For Feature Subset Selection

R. Kohavi, G. John
Published 1997 · Computer Science

Save to my Library
Download PDF
Analyze on Scholarcy
Share
Abstract In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive-Bayes.
This paper references
Arttficial Intelligence
R Kohavi (1997)
On Biases in Estimating Multi-Valued Attributes
I. Kononenko (1995)
10.5860/choice.33-1577
Artificial Intelligence: A Modern Approach
S. Russell (1995)
10.1016/b978-1-55860-335-6.50043-x
Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms
D. B. Skalak (1994)
10.1142/S0218001488000145
On Automatic Feature Selection
W. Siedlecki (1988)
10.1016/0004-3702(94)90084-1
Learning Boolean Concepts in the Presence of Many Irrelevant Features
Hussein Almuallim (1994)
10.1007/3-540-57868-4_52
Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning
P. Brazdil (1994)
10.2307/2530946
Classification and Regression Trees
L. Breiman (1983)
The Attribute Selection Problem in Decision Tree Generation
U. Fayyad (1992)
10.1137/0330034
Stochastic discrete optimization
D. Yan (1992)
Instance-based learning algorithms, Machine Learning
D. W. Aha (1991)
Wrappers for performance enhancement and oblivious decision graphs, Ph.D. Thesis, Stanford University, Computer Science Department, STAN-CS-TR-95-1560
R. Kohavi (1995)
On the difficulty of finding small consistent decision trees
I T R Hancock (1989)
Enhancements to the data mining process
G. John (1997)
rationale, Comput. Intell
(1990)
Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation
O. Maron (1993)
An Analysis of Bayesian Classifiers
P. Langley (1992)
10.1109/ICPR.1988.28334
Best first strategy for feature selection
L. Xu (1988)
Regression by leaps and bounds, Technometrics
GM. Fumival (1974)
10.1109/TAI.1992.246402
Genetic algorithms as a tool for feature selection in machine learning
H. Vafaie (1992)
10.1142/S021821309700027X
Data Mining Using MLC a Machine Learning Library in C++
R. Kohavi (1997)
An evaluation of feature selection methods and their application to computer security
J. Doak (1992)
10.1007/3-540-56602-3_158
Exploiting Context When Learning to Classify
Peter D. Turney (1993)
10.1111/j.1467-8640.1990.tb00298.x
Learning hard concepts through constructive induction: framework and rationale
L. Rendell (1990)
10.1080/00401706.1999.10485680
Applied Regression Analysis
R. Gunst (1999)
10.1016/b978-1-55860-377-6.50071-2
An Inductive Learning Approach to Prognostic Prediction
W. Street (1995)
10.2307/2984653
Applied Linear Statistical Models
V. Barnett (1975)
10.1016/b978-1-55860-377-6.50068-2
A Comparison of Induction Algorithms for Selective and non-Selective Bayesian Classifiers
M. Singh (1995)
Une gen&alisation de quelques algorithms sous-optimaux de recherche d'ensembles d'attributs
J Kittler (1978)
10.1080/00401706.1995.10484383
Machine Learning, Neural and Statistical Classification
D. Michie (1994)
Some comments on c
Cl Mallows (1973)
10.1016/b978-1-55860-247-2.50037-1
A Practical Approach to Feature Selection
K. Kira (1992)
Readings in Artificial Intelligence
B. Webber (1981)
Rough sets. Present state and the future
Z. Pawlak (1993)
10.1007/3-540-59286-5_57
The Power of Decision Tables
R. Kohavi (1995)
Knowledge Discovery in Databases
Gregory Piateski (1991)
The Discovery, Analysis, and Representation of Data Dependencies in Databases
W. Ziarko (1991)
10.1016/b978-1-55860-335-6.50012-x
Greedy Attribute Selection
R. Caruana (1994)
Probabilistic Hill-climbing: Theory and Applications
R. Greiner (1992)
Solving Time-Dependent Planning Problems
Mark S. Boddy (1989)
C4.5: Programs for Machine Learning
J. Quinlan (1992)
5: Programs for Machine Learning
J R Quinlan (1993)
Oblivious Decision Trees and Abstract Cases
P. Langley (1994)
10.1007/3-540-57868-4_57
Estimating Attributes: Analysis and Extensions of RELIEF
I. Kononenko (1994)
Neural Network Ensembles, Cross Validation, and Active Learning
A. Krogh (1994)
10.5860/choice.27-0936
Genetic Algorithms in Search Optimization and Machine Learning
D. Goldberg (1988)
10.1007/978-1-4612-2404-4_28
Learning Bayesian Networks Using Feature Selection
G. Provan (1995)
Estimating Probabilities: A Crucial Task in Machine Learning
B. Cestnik (1990)
Pattern recognition : a statistical approach
P. Devijver (1982)
10.1016/0004-3702(79)90003-1
The B* Tree Search Algorithm: A Best-First Proof Procedure
H. Berliner (1979)
Applied Linear Statisrical Models
W Neter
Nearest neighbor (NN) norms: NN pattern classification techniques
B. V. Dasarathy (1991)
10.1145/356893.356898
Decision Trees and Diagrams
B. Moret (1982)
Learning with Many Irrelevant Features
Hussein Almuallim (1991)
Artijcial Intelligence
R G H Kohavi (1997)
10.1016/b978-1-55860-335-6.50031-3
Efficient Algorithms for Minimizing Cross Validation Error
A. Moore (1994)
10.1109/TSMC.1977.4309803
On the Possible Orderings in the Measurement Selection Problem
T. Cover (1977)
10.1007/978-1-4612-0865-5_26
Probability Inequalities for sums of Bounded Random Variables
W. Hoeffding (1963)
Data mining using MU++: A machine learning library in C++, in: Tools with Artificial Intelligence
D Kohavi (1996)
10.1109/TC.1977.1674939
A Branch and Bound Algorithm for Feature Subset Selection
P. M. Narendra (1977)
10.1006/inco.1995.1136
Boosting a weak learning algorithm by majority
Y. Freund (1990)
10.1016/0031-3203(93)90054-Z
A more efficient branch and bound algorithm for feature selection
Bing Yu (1993)
UCI Repository of Machine Learning Databases
C. Merz (1996)
10.1016/b978-1-55860-307-3.50010-1
Using Decision Trees to Improve Case-Based Learning
Claire Cardie (1993)
10.2307/2982683
Subset Selection in Regression
A. Atkinson (1992)
Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison
D. Aha (1994)
Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier
Pedro M. Domingos (1996)
Feature Subset Selection as Search with Probabilistic Estimates
R. Kohavi (1994)
On the induction of decision trees for multiple concept learning
U. Fayyad (1992)
Genetic Programming: On the Programming of Computers by Selection Means of Natural Selection
Koza (1992)
10.1016/c2009-0-27845-7
Essentials of Artificial Intelligence
M. Ginsberg (1993)
The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch
S. Thrun (1991)
10.2307/2981576
Sélection of subsets of regression variables
A. J. Miller (1984)
10.1007/978-1-4612-2404-4_19
A Comparative Evaluation of Sequential Feature Selection Algorithms
D. Aha (1995)
10.21236/ada292575
Selection of Relevant Features in Machine Learning
P. Langley (1994)
Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology
R. Kohavi (1995)
10.1162/neco.1992.4.1.1
Neural Networks and the Bias/Variance Dilemma
S. Geman (1992)
The Feature Selection Problem: Traditional Methods and a New Algorithm
K. Kira (1992)
10.7551/mitpress/11723.003.0006
Artificial Intelligence
N. Nilsson (1974)
UCI Repository of machine learning databases
Catherine Blake (1998)
10.1016/S0169-7161(82)02038-0
35 Use of distance measures, information measures and error bounds in feature evaluation
M. Ben-Bassat (1982)
Hybrid Learning Using Genetic Algorithms and Decision Trees for Pattern Classification
J. Bala (1995)
On the Connection between In-sample Testing and Generalization Error
D. Wolpert (1992)
10.1007/BF00058655
Bagging predictors
L. Breiman (1996)
Lookahead and Pathology in Decision Tree Induction
S. Murthy (1995)
10.1016/0004-3702(89)90046-5
Models of Incremental Concept Formation
J. Gennari (1989)
Classijication and Regression Trees
L Breiman (1984)
10.1007/978-1-4612-2404-4_23
Searching for Dependencies in Bayesian Classifiers
M. Pazzani (1995)
Data mining using MU++: A machine learning library in C++, in: Tools with Artificial Intelligence
CA Los Altos (1996)
10.1007/BF01889584
Learning classification trees
Wray L. Buntine (1992)
10.1145/219717.219791
Rough Sets
Z. Pawlak (1995)
10.2307/2283422
The Estimation of Probabilities: An Essay on Modern Bayesian Methods
Bruce M. Hill (1965)
Learning classification trees, Statist. and Comput
W Buntine (1992)
10.1016/b978-1-55860-377-6.50045-1
Automatic Parameter Selection by Minimizing Estimated Error
R. Kohavi (1995)
10.1016/b978-1-55860-377-6.50032-3
Supervised and Unsupervised Discretization of Continuous Features
J. Dougherty (1995)
More& Decision trees and diagrams
(1982)
10.1016/b978-1-55860-377-6.50036-0
A Quantitative Study of Hypothesis Selection
Philip W. L. Fong (1995)
Policies for the selection of bias in inductive machine learning
J. Provost (1992)
Artijicial Intelligence
R Kohavi (1997)
Readings in Machine Learning
J. Shavlik (1991)
10.1016/0020-0190(76)90095-8
Constructing Optimal Binary Decision Trees is NP-Complete
L. Hyafil (1976)
Wrappers for Performance Enhancements and Oblivious Decision Graphs.
R. Kohavi (1995)
10.1109/ROBOT.1991.131713
Using locally weighted regression for robot learning
C. Atkeson (1991)
Perceptrons - an introduction to computational geometry
M. Minsky (1969)
10.1109/TSMC.1987.4309045
Uncertainty in Artificial Intelligence
L. Kanal (1987)
10.1109/TAI.1993.633981
Robust feature selection algorithms
H. Vafaie (1993)
Robust feature selection algorithms, in Proceedings 5th International Conference on Tools with Artificial Intelligence (IEEE Computer
H. Vafai (1993)
10.2307/2286028
Pattern classification and scene analysis
R. Duda (1973)
Automated model selection, in: ECML Workshop on Knowledge Level Modeling and Machine Learning
D. Mladenic (1995)
10.7551/mitpress/4168.001.0001
Learning in embedded systems
L. Kaelbling (1993)
10.1007/3-540-59119-2_166
A decision-theoretic generalization of on-line learning and an application to boosting
Y. Freund (1995)
De Mantaras, A distance-based attribute selection measure for decision tree induction, Machine Z.-earning
R.L (1991)
10.1016/0885-064X(88)90019-2
On the complexity of loading shallow neural networks
J. S. Judd (1988)
10.1016/S0893-6080(05)80010-3
Training a 3-node neural network is NP-complete
A. Blum (1992)
10.1016/b978-1-55860-335-6.50046-5
An Improved Algorithm for Incremental Induction of Decision Trees
P. Utgoff (1994)
10.7551/MITPRESS/1090.001.0001
Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence
J. Holland (1992)
10.1007/3-540-56602-3_138
Feature Selection Using Rough Sets Theory
M. Modrzejewski (1993)
Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization
M. Perrone (1993)
Essentials of Artijiciul Intelligence
M L Ginsberg (1993)
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
R. Kohavi (1995)
Useful Feature Subsets and Rough Set Reducts
R. Kohavi (1994)
10.1016/0020-7373(92)90018-G
Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms
D. Aha (1992)
Some comments on c
Cl (1973)
10.1037/H0042519
The perceptron: a probabilistic model for information storage and organization in the brain.
F. Rosenblatt (1958)
Oversearching and Layered Search in Empirical Learning
J. Quinlan (1995)
10.1007/978-94-015-7744-1
Simulated Annealing: Theory and Applications
P. V. Laarhoven (1987)
10.1016/B978-0-444-88650-7.50030-5
Multiple decision trees
Suk Wah Kwok (1988)
On the difficulty of finding small consistent decision trees, Unpublished manuscript
I T R Hancock (1989)
10.1109/TIT.1963.1057810
On the effectiveness of receptors in recognition systems
T. Marill (1963)
10.1016/S0893-6080(05)80023-1
Stacked generalization
D. Wolpert (1992)
10.2307/1267380
Some comments on C_p
C. L. Mallows (1973)
Perceptrons: an Inrroduction lo Computational Geometry
M L Minsky
Proceedings 12th International Conference on Machine Learning
(1995)
10.1016/B978-1-55860-332-5.50055-9
Induction of Selective Bayesian Classifiers
P. Langley (1994)
Bias Plus Variance Decomposition for Zero-One Loss Functions
R. Kohavi (1996)
10.1016/b978-1-55860-335-6.50023-4
Irrelevant Features and the Subset Selection Problem
George H. John (1994)
10.1006/inco.1994.1009
The Weighted Majority Algorithm
N. Littlestone (1994)
On automatic feature selection, Internat
W. Siedlecki (1988)



This paper is referenced by
10.1007/3-540-45164-1_40
Dynamic Feature Selection in Incremental Hierarchical Clustering
L. Talavera (2000)
MUTUAL INFORMATION BASED FEATURE SELECTION FOR ACOUSTIC AUTISM DIAGNOSIS
F. Gürgen (2015)
10.1109/COMPSAC.2014.66
FECAR: A Feature Selection Framework for Software Defect Prediction
S. Liu (2014)
10.1093/bioinformatics/btl352
Mutagenic probability estimation of chemical compounds by a novel molecular electrophilicity vector and support vector machine
M. Zheng (2006)
10.1007/978-3-540-69939-2_26
Feature Ranking Ensembles for Facial Action Unit Classification
T. Windeatt (2008)
Genetic Algorithms for Selection and Partitioning of Attributes in Large-Scale Data Mining Problems
W. Hsu (1999)
10.1007/978-3-642-01307-2_101
Similarity-Based Feature Selection for Learning from Examples with Continuous Values
Yun Li (2009)
10.1109/ICMLC.2009.5212426
A novel Multi-surface Proximal Support Vector Machine Classification model incorporating feature selection
Ming Yang (2009)
Patient-Centered Development and Evaluation of a Mobile Wound Tracking Tool
Patrick C. Sanger (2015)
10.1016/j.neucom.2008.10.003
Simultaneous input variable and basis function selection for RBF networks
J. Tikka (2009)
Business Intelligence: Data Mining and Optimization for Decision Making
C. Vercellis (2009)
10.1016/B978-044452701-1.00075-2
3.05 – Variable Selection
R. Galvao (2009)
Dataset Selection for Aggregate Model Implementation in Predictive Data Mining
Patricia E. N. Lutu (2010)
10.1109/TFUZZ.2008.917291
Takagi--Sugeno--Kang Fuzzy Classifiers for a Special Class of Time-Varying Systems
R. Mikut (2008)
10.1109/SSCI.2016.7850126
Dimension reduction in classification using particle swarm optimisation and statistical variable grouping information
Bing Xue (2016)
Double Relief with progressive weighting function
Gabriel Prat-Masramon (2015)
10.1016/J.SAB.2008.08.016
Artificial neural network for Cu quantitative determination in soil using a portable Laser Induced Breakdown Spectroscopy system
E. C. Ferreira (2008)
10.1109/ITST.2008.4740302
Detection and classification of moving Thai vehicles based on traffic engineering knowledge
A. Leelasantitham (2008)
10.1145/1414004.1414040
A constrained regression technique for cocomo calibration
Vu Nguyen (2008)
10.1109/SYNASC.2009.18
Enhanced Rule-Based Phonetic Transcription for the Romanian Language
Mihai Alexandru Ordean (2009)
Feature Subset Selection approach based on Maximizing Margin of Support Vector Classifier
Khin May Win (2008)
10.21936/SI2014_V35.N4.705
USING TABU SEARCH FOR FEATURE SELECTION IN DISCRIMINANT ANALYSIS
K. Stapor (2014)
Modeling Movement Disorders in Parkinson's Disease using Computational Intelligence
S. Lacy (2015)
10.1371/journal.pone.0066592
Multivariate Analysis of Dopaminergic Gene Variants as Risk Factors of Heroin Dependence
A. Vereczkei (2013)
10.1016/J.IJEPES.2014.05.036
A new short-term load forecast method based on neuro-evolutionary algorithm and chaotic feature selection
S. Kouhi (2014)
10.1161/CIRCIMAGING.115.004330
Cognitive Machine-Learning Algorithm for Cardiac Imaging: A Pilot Study for Differentiating Constrictive Pericarditis From Restrictive Cardiomyopathy
P. Sengupta (2016)
10.1504/IJDMB.2013.056617
Predictability of intracranial pressure level in traumatic brain injury: features extraction, statistical analysis and machine learning-based evaluation
Wenan Chen (2013)
10.1016/j.asoc.2013.03.021
Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients
S. Vieira (2013)
10.1016/j.isatra.2012.12.005
Feature subset selection using constrained binary/integer biogeography-based optimization.
S. Yazdani (2013)
10.1186/1471-2105-12-375
Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction
P. Shi (2010)
10.1186/1471-2164-16-S5-S3
Analyse multiple disease subtypes and build associated gene networks using genome-wide expression profiles
S. Aibar (2015)
10.1088/1361-6560/ab326a
Predicting Lung Nodule Malignancies by Combining Deep Convolutional Neural Network and Handcrafted Features
S. Li (2019)
See more
Semantic Scholar Logo Some data provided by SemanticScholar