← Back to Search

# Wrappers For Feature Subset Selection

R. Kohavi, G. John

Published 1997 · Computer Science

Abstract In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive-Bayes.

This paper references

Arttficial Intelligence

R Kohavi (1997)

On Biases in Estimating Multi-Valued Attributes

I. Kononenko (1995)

10.5860/choice.33-1577

Artificial Intelligence: A Modern Approach

S. Russell (1995)

10.1016/b978-1-55860-335-6.50043-x

Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms

D. B. Skalak (1994)

10.1142/S0218001488000145

On Automatic Feature Selection

W. Siedlecki (1988)

10.1016/0004-3702(94)90084-1

Learning Boolean Concepts in the Presence of Many Irrelevant Features

Hussein Almuallim (1994)

10.1007/3-540-57868-4_52

Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning

P. Brazdil (1994)

10.2307/2530946

Classification and Regression Trees

L. Breiman (1983)

The Attribute Selection Problem in Decision Tree Generation

U. Fayyad (1992)

10.1137/0330034

Stochastic discrete optimization

D. Yan (1992)

Instance-based learning algorithms, Machine Learning

D. W. Aha (1991)

Wrappers for performance enhancement and oblivious decision graphs, Ph.D. Thesis, Stanford University, Computer Science Department, STAN-CS-TR-95-1560

R. Kohavi (1995)

On the difficulty of finding small consistent decision trees

I T R Hancock (1989)

Enhancements to the data mining process

G. John (1997)

rationale, Comput. Intell

(1990)

Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation

O. Maron (1993)

An Analysis of Bayesian Classifiers

P. Langley (1992)

10.1109/ICPR.1988.28334

Best first strategy for feature selection

L. Xu (1988)

Regression by leaps and bounds, Technometrics

GM. Fumival (1974)

10.1109/TAI.1992.246402

Genetic algorithms as a tool for feature selection in machine learning

H. Vafaie (1992)

10.1142/S021821309700027X

Data Mining Using MLC a Machine Learning Library in C++

R. Kohavi (1997)

An evaluation of feature selection methods and their application to computer security

J. Doak (1992)

10.1007/3-540-56602-3_158

Exploiting Context When Learning to Classify

Peter D. Turney (1993)

10.1111/j.1467-8640.1990.tb00298.x

Learning hard concepts through constructive induction: framework and rationale

L. Rendell (1990)

10.1080/00401706.1999.10485680

Applied Regression Analysis

R. Gunst (1999)

10.1016/b978-1-55860-377-6.50071-2

An Inductive Learning Approach to Prognostic Prediction

W. Street (1995)

10.2307/2984653

Applied Linear Statistical Models

V. Barnett (1975)

10.1016/b978-1-55860-377-6.50068-2

A Comparison of Induction Algorithms for Selective and non-Selective Bayesian Classifiers

M. Singh (1995)

Une gen&alisation de quelques algorithms sous-optimaux de recherche d'ensembles d'attributs

J Kittler (1978)

10.1080/00401706.1995.10484383

Machine Learning, Neural and Statistical Classification

D. Michie (1994)

Some comments on c

Cl Mallows (1973)

10.1016/b978-1-55860-247-2.50037-1

A Practical Approach to Feature Selection

K. Kira (1992)

Readings in Artificial Intelligence

B. Webber (1981)

Rough sets. Present state and the future

Z. Pawlak (1993)

10.1007/3-540-59286-5_57

The Power of Decision Tables

R. Kohavi (1995)

Knowledge Discovery in Databases

Gregory Piateski (1991)

The Discovery, Analysis, and Representation of Data Dependencies in Databases

W. Ziarko (1991)

10.1016/b978-1-55860-335-6.50012-x

Greedy Attribute Selection

R. Caruana (1994)

Probabilistic Hill-climbing: Theory and Applications

R. Greiner (1992)

Solving Time-Dependent Planning Problems

Mark S. Boddy (1989)

C4.5: Programs for Machine Learning

J. Quinlan (1992)

5: Programs for Machine Learning

J R Quinlan (1993)

Oblivious Decision Trees and Abstract Cases

P. Langley (1994)

10.1007/3-540-57868-4_57

Estimating Attributes: Analysis and Extensions of RELIEF

I. Kononenko (1994)

Neural Network Ensembles, Cross Validation, and Active Learning

A. Krogh (1994)

10.5860/choice.27-0936

Genetic Algorithms in Search Optimization and Machine Learning

D. Goldberg (1988)

10.1007/978-1-4612-2404-4_28

Learning Bayesian Networks Using Feature Selection

G. Provan (1995)

Estimating Probabilities: A Crucial Task in Machine Learning

B. Cestnik (1990)

Pattern recognition : a statistical approach

P. Devijver (1982)

10.1016/0004-3702(79)90003-1

The B* Tree Search Algorithm: A Best-First Proof Procedure

H. Berliner (1979)

Applied Linear Statisrical Models

W Neter

Nearest neighbor (NN) norms: NN pattern classification techniques

B. V. Dasarathy (1991)

10.1145/356893.356898

Decision Trees and Diagrams

B. Moret (1982)

Learning with Many Irrelevant Features

Hussein Almuallim (1991)

Artijcial Intelligence

R G H Kohavi (1997)

10.1016/b978-1-55860-335-6.50031-3

Efficient Algorithms for Minimizing Cross Validation Error

A. Moore (1994)

10.1109/TSMC.1977.4309803

On the Possible Orderings in the Measurement Selection Problem

T. Cover (1977)

10.1007/978-1-4612-0865-5_26

Probability Inequalities for sums of Bounded Random Variables

W. Hoeffding (1963)

Data mining using MU++: A machine learning library in C++, in: Tools with Artificial Intelligence

D Kohavi (1996)

10.1109/TC.1977.1674939

A Branch and Bound Algorithm for Feature Subset Selection

P. M. Narendra (1977)

10.1006/inco.1995.1136

Boosting a weak learning algorithm by majority

Y. Freund (1990)

10.1016/0031-3203(93)90054-Z

A more efficient branch and bound algorithm for feature selection

Bing Yu (1993)

UCI Repository of Machine Learning Databases

C. Merz (1996)

10.1016/b978-1-55860-307-3.50010-1

Using Decision Trees to Improve Case-Based Learning

Claire Cardie (1993)

10.2307/2982683

Subset Selection in Regression

A. Atkinson (1992)

Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison

D. Aha (1994)

Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier

Pedro M. Domingos (1996)

Feature Subset Selection as Search with Probabilistic Estimates

R. Kohavi (1994)

On the induction of decision trees for multiple concept learning

U. Fayyad (1992)

Genetic Programming: On the Programming of Computers by Selection Means of Natural Selection

Koza (1992)

10.1016/c2009-0-27845-7

Essentials of Artificial Intelligence

M. Ginsberg (1993)

The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch

S. Thrun (1991)

10.2307/2981576

Sélection of subsets of regression variables

A. J. Miller (1984)

10.1007/978-1-4612-2404-4_19

A Comparative Evaluation of Sequential Feature Selection Algorithms

D. Aha (1995)

10.21236/ada292575

Selection of Relevant Features in Machine Learning

P. Langley (1994)

Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology

R. Kohavi (1995)

10.1162/neco.1992.4.1.1

Neural Networks and the Bias/Variance Dilemma

S. Geman (1992)

The Feature Selection Problem: Traditional Methods and a New Algorithm

K. Kira (1992)

10.7551/mitpress/11723.003.0006

Artificial Intelligence

N. Nilsson (1974)

UCI Repository of machine learning databases

Catherine Blake (1998)

10.1016/S0169-7161(82)02038-0

35 Use of distance measures, information measures and error bounds in feature evaluation

M. Ben-Bassat (1982)

Hybrid Learning Using Genetic Algorithms and Decision Trees for Pattern Classification

J. Bala (1995)

On the Connection between In-sample Testing and Generalization Error

D. Wolpert (1992)

10.1007/BF00058655

Bagging predictors

L. Breiman (1996)

Lookahead and Pathology in Decision Tree Induction

S. Murthy (1995)

10.1016/0004-3702(89)90046-5

Models of Incremental Concept Formation

J. Gennari (1989)

Classijication and Regression Trees

L Breiman (1984)

10.1007/978-1-4612-2404-4_23

Searching for Dependencies in Bayesian Classifiers

M. Pazzani (1995)

Data mining using MU++: A machine learning library in C++, in: Tools with Artificial Intelligence

CA Los Altos (1996)

10.1007/BF01889584

Learning classification trees

Wray L. Buntine (1992)

10.1145/219717.219791

Rough Sets

Z. Pawlak (1995)

10.2307/2283422

The Estimation of Probabilities: An Essay on Modern Bayesian Methods

Bruce M. Hill (1965)

Learning classification trees, Statist. and Comput

W Buntine (1992)

10.1016/b978-1-55860-377-6.50045-1

Automatic Parameter Selection by Minimizing Estimated Error

R. Kohavi (1995)

10.1016/b978-1-55860-377-6.50032-3

Supervised and Unsupervised Discretization of Continuous Features

J. Dougherty (1995)

More& Decision trees and diagrams

(1982)

10.1016/b978-1-55860-377-6.50036-0

A Quantitative Study of Hypothesis Selection

Philip W. L. Fong (1995)

Policies for the selection of bias in inductive machine learning

J. Provost (1992)

Artijicial Intelligence

R Kohavi (1997)

Readings in Machine Learning

J. Shavlik (1991)

10.1016/0020-0190(76)90095-8

Constructing Optimal Binary Decision Trees is NP-Complete

L. Hyafil (1976)

Wrappers for Performance Enhancements and Oblivious Decision Graphs.

R. Kohavi (1995)

10.1109/ROBOT.1991.131713

Using locally weighted regression for robot learning

C. Atkeson (1991)

Perceptrons - an introduction to computational geometry

M. Minsky (1969)

10.1109/TSMC.1987.4309045

Uncertainty in Artificial Intelligence

L. Kanal (1987)

10.1109/TAI.1993.633981

Robust feature selection algorithms

H. Vafaie (1993)

Robust feature selection algorithms, in Proceedings 5th International Conference on Tools with Artificial Intelligence (IEEE Computer

H. Vafai (1993)

10.2307/2286028

Pattern classification and scene analysis

R. Duda (1973)

Automated model selection, in: ECML Workshop on Knowledge Level Modeling and Machine Learning

D. Mladenic (1995)

10.7551/mitpress/4168.001.0001

Learning in embedded systems

L. Kaelbling (1993)

10.1007/3-540-59119-2_166

A decision-theoretic generalization of on-line learning and an application to boosting

Y. Freund (1995)

De Mantaras, A distance-based attribute selection measure for decision tree induction, Machine Z.-earning

R.L (1991)

10.1016/0885-064X(88)90019-2

On the complexity of loading shallow neural networks

J. S. Judd (1988)

10.1016/S0893-6080(05)80010-3

Training a 3-node neural network is NP-complete

A. Blum (1992)

10.1016/b978-1-55860-335-6.50046-5

An Improved Algorithm for Incremental Induction of Decision Trees

P. Utgoff (1994)

10.7551/MITPRESS/1090.001.0001

Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence

J. Holland (1992)

10.1007/3-540-56602-3_138

Feature Selection Using Rough Sets Theory

M. Modrzejewski (1993)

Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization

M. Perrone (1993)

Essentials of Artijiciul Intelligence

M L Ginsberg (1993)

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection

R. Kohavi (1995)

Useful Feature Subsets and Rough Set Reducts

R. Kohavi (1994)

10.1016/0020-7373(92)90018-G

Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms

D. Aha (1992)

Some comments on c

Cl (1973)

10.1037/H0042519

The perceptron: a probabilistic model for information storage and organization in the brain.

F. Rosenblatt (1958)

Oversearching and Layered Search in Empirical Learning

J. Quinlan (1995)

10.1007/978-94-015-7744-1

Simulated Annealing: Theory and Applications

P. V. Laarhoven (1987)

10.1016/B978-0-444-88650-7.50030-5

Multiple decision trees

Suk Wah Kwok (1988)

On the difficulty of finding small consistent decision trees, Unpublished manuscript

I T R Hancock (1989)

10.1109/TIT.1963.1057810

On the effectiveness of receptors in recognition systems

T. Marill (1963)

10.1016/S0893-6080(05)80023-1

Stacked generalization

D. Wolpert (1992)

10.2307/1267380

Some comments on C_p

C. L. Mallows (1973)

Perceptrons: an Inrroduction lo Computational Geometry

M L Minsky

Proceedings 12th International Conference on Machine Learning

(1995)

10.1016/B978-1-55860-332-5.50055-9

Induction of Selective Bayesian Classifiers

P. Langley (1994)

Bias Plus Variance Decomposition for Zero-One Loss Functions

R. Kohavi (1996)

10.1016/b978-1-55860-335-6.50023-4

Irrelevant Features and the Subset Selection Problem

George H. John (1994)

10.1006/inco.1994.1009

The Weighted Majority Algorithm

N. Littlestone (1994)

On automatic feature selection, Internat

W. Siedlecki (1988)

This paper is referenced by

10.1007/3-540-45164-1_40

Dynamic Feature Selection in Incremental Hierarchical Clustering

L. Talavera (2000)

MUTUAL INFORMATION BASED FEATURE SELECTION FOR ACOUSTIC AUTISM DIAGNOSIS

F. Gürgen (2015)

10.1109/COMPSAC.2014.66

FECAR: A Feature Selection Framework for Software Defect Prediction

S. Liu (2014)

10.1093/bioinformatics/btl352

Mutagenic probability estimation of chemical compounds by a novel molecular electrophilicity vector and support vector machine

M. Zheng (2006)

10.1007/978-3-540-69939-2_26

Feature Ranking Ensembles for Facial Action Unit Classification

T. Windeatt (2008)

Genetic Algorithms for Selection and Partitioning of Attributes in Large-Scale Data Mining Problems

W. Hsu (1999)

10.1007/978-3-642-01307-2_101

Similarity-Based Feature Selection for Learning from Examples with Continuous Values

Yun Li (2009)

10.1109/ICMLC.2009.5212426

A novel Multi-surface Proximal Support Vector Machine Classification model incorporating feature selection

Ming Yang (2009)

Patient-Centered Development and Evaluation of a Mobile Wound Tracking Tool

Patrick C. Sanger (2015)

10.1016/j.neucom.2008.10.003

Simultaneous input variable and basis function selection for RBF networks

J. Tikka (2009)

Business Intelligence: Data Mining and Optimization for Decision Making

C. Vercellis (2009)

10.1016/B978-044452701-1.00075-2

3.05 – Variable Selection

R. Galvao (2009)

Dataset Selection for Aggregate Model Implementation in Predictive Data Mining

Patricia E. N. Lutu (2010)

10.1109/TFUZZ.2008.917291

Takagi--Sugeno--Kang Fuzzy Classifiers for a Special Class of Time-Varying Systems

R. Mikut (2008)

10.1109/SSCI.2016.7850126

Dimension reduction in classification using particle swarm optimisation and statistical variable grouping information

Bing Xue (2016)

Double Relief with progressive weighting function

Gabriel Prat-Masramon (2015)

10.1016/J.SAB.2008.08.016

Artificial neural network for Cu quantitative determination in soil using a portable Laser Induced Breakdown Spectroscopy system

E. C. Ferreira (2008)

10.1109/ITST.2008.4740302

Detection and classification of moving Thai vehicles based on traffic engineering knowledge

A. Leelasantitham (2008)

10.1145/1414004.1414040

A constrained regression technique for cocomo calibration

Vu Nguyen (2008)

10.1109/SYNASC.2009.18

Enhanced Rule-Based Phonetic Transcription for the Romanian Language

Mihai Alexandru Ordean (2009)

Feature Subset Selection approach based on Maximizing Margin of Support Vector Classifier

Khin May Win (2008)

10.21936/SI2014_V35.N4.705

USING TABU SEARCH FOR FEATURE SELECTION IN DISCRIMINANT ANALYSIS

K. Stapor (2014)

Modeling Movement Disorders in Parkinson's Disease using Computational Intelligence

S. Lacy (2015)

10.1371/journal.pone.0066592

Multivariate Analysis of Dopaminergic Gene Variants as Risk Factors of Heroin Dependence

A. Vereczkei (2013)

10.1016/J.IJEPES.2014.05.036

A new short-term load forecast method based on neuro-evolutionary algorithm and chaotic feature selection

S. Kouhi (2014)

10.1161/CIRCIMAGING.115.004330

Cognitive Machine-Learning Algorithm for Cardiac Imaging: A Pilot Study for Differentiating Constrictive Pericarditis From Restrictive Cardiomyopathy

P. Sengupta (2016)

10.1504/IJDMB.2013.056617

Predictability of intracranial pressure level in traumatic brain injury: features extraction, statistical analysis and machine learning-based evaluation

Wenan Chen (2013)

10.1016/j.asoc.2013.03.021

Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients

S. Vieira (2013)

10.1016/j.isatra.2012.12.005

Feature subset selection using constrained binary/integer biogeography-based optimization.

S. Yazdani (2013)

10.1186/1471-2105-12-375

Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction

P. Shi (2010)

10.1186/1471-2164-16-S5-S3

Analyse multiple disease subtypes and build associated gene networks using genome-wide expression profiles

S. Aibar (2015)

10.1088/1361-6560/ab326a

Predicting Lung Nodule Malignancies by Combining Deep Convolutional Neural Network and Handcrafted Features

S. Li (2019)

See more