Online citations, reference lists, and bibliographies.
← Back to Search

Overfitting And Undercomputing In Machine Learning

Thomas G. Dietterich
Published 1995 · Computer Science

Cite This
Download PDF
Analyze on Scholarcy
Share
A central problem in machine learning is supervised learning—that is, learning from labeled training data. For example, a learning system for medical diagnosis might be trained with examples of patients whose case records (medical tests, clinical observations) and diagnoses were known. The task of the learning system is to infer a function that predicts the diagnosis of a patient from his or her case records. The function to be learned might be represented as a set of rules, a decision tree, a Bayes network, or a neural network. Learning algorithms essentially operate by searching some space of functions (usually called the hypothesis class) for a function that fits the given data. Because there are usually exponentially many functions, this search cannot actually examine individual hypothesis functions but instead must use some more direct method of constructing the hypothesis functions from the data. This search can usually be formalized by defining an objective function (e.g., number of data points predicted incorrectly) and applying various algorithms to find a function that minimizes this objective function is NP-hard. For example, fitting the weights of a neural network or finding the smallest decision tree are both NP-complete problems [Blum and Rivest, 1989; Quinlan and Rivest 1989]. Hence, heuristic algorithms such as gradient descent (for neural networks) and greedy search (for decision trees) have been applied with great success. Of course, the suboptimality of such heuristic algorithms ~mmediately suggests a reas&able line of research: find ~lgorithms that can search the hypothesis class better. Hence, there has been extensive research in applying secondorder methods to fit neural networks and in conducting much more thorough searches in learning decision trees and rule sets. Ironically, when these algorithms were tested on real datasets, it was found that their performance was often worse than simrde szradient descent or greedy sear~h [&inlan and Cameron-Jones 1995; Weigend 1994]. In short: it appears to be bet~er not to optimize! One of the other important trends in machine-learning research has been the establishment and nurturing of connections between various previously disparate fields, including computational learning theory, connectionist learning, symbolic learning. and statistics. The . connection to statistics was crucial in resolvins$ this naradox. The-key p~oblem arises from the structure of the machine-learning task, A learning algorithm is trained on a set of training data, but then it is applied to make predictions on new data points. The goal is to maximize its predictive accuracy on the new data points—not necessarily its accuracy on the trammg data. Indeed, if we work too hard to find the very best fit to the training data, there is a risk that we will fit the noise in the data by memorizing various peculiarities
This paper references



This paper is referenced by
10.1109/ICOS.2017.8280267
The impact of different training data set on the accuracy of sentiment classification of Naïve Bayes technique
Mohd Naim Mohd Ibrahim (2017)
Detection of distributed denial of service attacks at source
Fábio Alexandre Henriques da Silva (2018)
10.1016/j.ins.2007.06.022
Fuzzy functions with support vector machines
A. Çelikyilmaz (2007)
Systematic Evaluation of Convergence Criteria in Iterative Training for NLP
Patricia Brent (2009)
Machine Learning Models for Directed Curation of Design Solution Space
Christian Sjoberg (2017)
10.1016/j.procir.2020.01.002
Energy simulation of the fused deposition modeling process using machine learning approach
Li Yi (2019)
10.1080/21507740.2020.1740352
Artificial Intelligence in Clinical Neuroscience: Methodological and Ethical Challenges
M. Ienca (2020)
10.1109/5326.897072
Neural networks for classification: a survey
G. Zhang (2000)
10.1007/978-3-540-30115-8_32
Feature Selection Filters Based on the Permutation Test
P. Radivojac (2004)
10.1109/TEVC.2011.2112666
Evolving Distributed Algorithms With Genetic Programming
T. Weise (2012)
10.1007/978-3-319-92624-7_1
Review into State of the Art of Vulnerability Assessment using Artificial Intelligence
Saad Khan (2018)
10.1016/J.ICARUS.2018.10.031
New maps of lunar surface chemistry
W. Xia (2019)
10.1007/s00500-019-04311-w
Application of Kalman filter to Model-based Prognostics for Solenoid Valve
Xi-Lang Tang (2020)
Data-driven and Variant-based Throughput and Bottleneck Prediction Using Ensembled Machine Learning Algorithms
Anton Johannesson (2018)
10.1002/mp.14140
Introduction to machine and deep learning for medical physicists.
Sunan Cui (2020)
10.1016/j.pharmthera.2019.107395
Machine learning and data mining frameworks for predicting drug response in cancer: An overview and a novel in silico screening process based on association rule mining.
K. Vougas (2019)
10.1016/j.biosystems.2006.03.002
On the use of multi-objective evolutionary algorithms for survival analysis
Christian Setzkorn (2007)
10.5194/NHESS-18-599-2018
The use of genetic programming to develop a predictor of swash excursion on sandy beaches
Marinella Passarella (2017)
10.1109/CEC.2019.8790341
Genetic Programming with Rademacher Complexity for Symbolic Regression
Christian Raymond (2019)
10.1016/j.neuroimage.2016.03.075
Using neuroimaging to help predict the onset of psychosis
G. Gifford (2017)
A study of model parameters for scaling up word to sentence similarity tasks in distributional semantics
Dmitrijs Milajevs (2018)
10.1109/ICTAI.2019.00258
A Novel Proposed Pooling for Convolutional Neural Network
Dou El Kefel Mansouri (2019)
10.1145/3197231.3198447
Does Source Code Quality Reflect the Ratings of Apps?
Gemma Catolino (2018)
10.1109/HICSS.2016.55
Selecting Physiological Features for Predicting Bidding Behavior in Electronic Auctions
Marius B. Müller (2016)
10.1007/978-3-319-17885-1_100714
Machine Learning
P. Domingos (2017)
Evaluating supervised machine learning algorithms to predict recreational fishing success : A multiple species, multiple algorithms approach
J. Wikström (2015)
10.1109/ACCESS.2015.2442680
Challenges and Opportunities in Game Artificial Intelligence Education Using Angry Birds
Du-Mim Yoon (2015)
10.2139/ssrn.2354474
Political Campaigns and Big Data
David W. Nickerson (2014)
10.1016/J.ECOLMODEL.2011.02.020
Levels of emergence in individual based models: Coping with scarcity of data and pattern redundancy
Guillaume Latombe (2011)
10.1109/TETCI.2018.2849109
Breaking Through the Speed-Power-Accuracy Tradeoff in ADCs Using a Memristive Neuromorphic Architecture
L. Danial (2018)
The relationship between linguistic expression and symptoms of depression, anxiety, and suicidal thoughts: A longitudinal study of blog content
B. O’Dea (2018)
10.1145/3377930.3390244
Adaptive weighted splines: a new representation to genetic programming for symbolic regression
Christian Raymond (2020)
See more
Semantic Scholar Logo Some data provided by SemanticScholar