Online citations, reference lists, and bibliographies.
← Back to Search

Overfitting And Undercomputing In Machine Learning

Thomas G. Dietterich
Published 1995 · Computer Science

Cite This
Download PDF
Analyze on Scholarcy
A central problem in machine learning is supervised learning—that is, learning from labeled training data. For example, a learning system for medical diagnosis might be trained with examples of patients whose case records (medical tests, clinical observations) and diagnoses were known. The task of the learning system is to infer a function that predicts the diagnosis of a patient from his or her case records. The function to be learned might be represented as a set of rules, a decision tree, a Bayes network, or a neural network. Learning algorithms essentially operate by searching some space of functions (usually called the hypothesis class) for a function that fits the given data. Because there are usually exponentially many functions, this search cannot actually examine individual hypothesis functions but instead must use some more direct method of constructing the hypothesis functions from the data. This search can usually be formalized by defining an objective function (e.g., number of data points predicted incorrectly) and applying various algorithms to find a function that minimizes this objective function is NP-hard. For example, fitting the weights of a neural network or finding the smallest decision tree are both NP-complete problems [Blum and Rivest, 1989; Quinlan and Rivest 1989]. Hence, heuristic algorithms such as gradient descent (for neural networks) and greedy search (for decision trees) have been applied with great success. Of course, the suboptimality of such heuristic algorithms ~mmediately suggests a reas&able line of research: find ~lgorithms that can search the hypothesis class better. Hence, there has been extensive research in applying secondorder methods to fit neural networks and in conducting much more thorough searches in learning decision trees and rule sets. Ironically, when these algorithms were tested on real datasets, it was found that their performance was often worse than simrde szradient descent or greedy sear~h [&inlan and Cameron-Jones 1995; Weigend 1994]. In short: it appears to be bet~er not to optimize! One of the other important trends in machine-learning research has been the establishment and nurturing of connections between various previously disparate fields, including computational learning theory, connectionist learning, symbolic learning. and statistics. The . connection to statistics was crucial in resolvins$ this naradox. The-key p~oblem arises from the structure of the machine-learning task, A learning algorithm is trained on a set of training data, but then it is applied to make predictions on new data points. The goal is to maximize its predictive accuracy on the new data points—not necessarily its accuracy on the trammg data. Indeed, if we work too hard to find the very best fit to the training data, there is a risk that we will fit the noise in the data by memorizing various peculiarities
This paper references

This paper is referenced by
The impact of different training data set on the accuracy of sentiment classification of Naïve Bayes technique
Mohd Naim Mohd Ibrahim (2017)
Detection of distributed denial of service attacks at source
Fábio Alexandre Henriques da Silva (2018)
Fuzzy functions with support vector machines
A. Çelikyilmaz (2007)
Systematic Evaluation of Convergence Criteria in Iterative Training for NLP
Patricia Brent (2009)
Machine Learning Models for Directed Curation of Design Solution Space
Christian Sjoberg (2017)
Energy simulation of the fused deposition modeling process using machine learning approach
Li Yi (2019)
Artificial Intelligence in Clinical Neuroscience: Methodological and Ethical Challenges
M. Ienca (2020)
Neural networks for classification: a survey
G. Zhang (2000)
Feature Selection Filters Based on the Permutation Test
P. Radivojac (2004)
Evolving Distributed Algorithms With Genetic Programming
T. Weise (2012)
Review into State of the Art of Vulnerability Assessment using Artificial Intelligence
Saad Khan (2018)
New maps of lunar surface chemistry
W. Xia (2019)
Application of Kalman filter to Model-based Prognostics for Solenoid Valve
Xi-Lang Tang (2020)
Data-driven and Variant-based Throughput and Bottleneck Prediction Using Ensembled Machine Learning Algorithms
Anton Johannesson (2018)
Introduction to machine and deep learning for medical physicists.
Sunan Cui (2020)
Machine learning and data mining frameworks for predicting drug response in cancer: An overview and a novel in silico screening process based on association rule mining.
K. Vougas (2019)
On the use of multi-objective evolutionary algorithms for survival analysis
Christian Setzkorn (2007)
The use of genetic programming to develop a predictor of swash excursion on sandy beaches
Marinella Passarella (2017)
Genetic Programming with Rademacher Complexity for Symbolic Regression
Christian Raymond (2019)
Using neuroimaging to help predict the onset of psychosis
G. Gifford (2017)
A study of model parameters for scaling up word to sentence similarity tasks in distributional semantics
Dmitrijs Milajevs (2018)
A Novel Proposed Pooling for Convolutional Neural Network
Dou El Kefel Mansouri (2019)
Does Source Code Quality Reflect the Ratings of Apps?
Gemma Catolino (2018)
Selecting Physiological Features for Predicting Bidding Behavior in Electronic Auctions
Marius B. Müller (2016)
Machine Learning
P. Domingos (2017)
Evaluating supervised machine learning algorithms to predict recreational fishing success : A multiple species, multiple algorithms approach
J. Wikström (2015)
Challenges and Opportunities in Game Artificial Intelligence Education Using Angry Birds
Du-Mim Yoon (2015)
Political Campaigns and Big Data
David W. Nickerson (2014)
Levels of emergence in individual based models: Coping with scarcity of data and pattern redundancy
Guillaume Latombe (2011)
Breaking Through the Speed-Power-Accuracy Tradeoff in ADCs Using a Memristive Neuromorphic Architecture
L. Danial (2018)
The relationship between linguistic expression and symptoms of depression, anxiety, and suicidal thoughts: A longitudinal study of blog content
B. O’Dea (2018)
Adaptive weighted splines: a new representation to genetic programming for symbolic regression
Christian Raymond (2020)
See more
Semantic Scholar Logo Some data provided by SemanticScholar