Talk:Overfitting

Robotics Mid‑importance

	This article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.RoboticsWikipedia:WikiProject RoboticsTemplate:WikiProject RoboticsRobotics articles
Mid	This article has been rated as Mid-importance on the project's importance scale.

Statistics Mid‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Mid	This article has been rated as Mid-importance on the importance scale.

Edit 18 Feb 2009 edit

Latest comment: 15 years ago1 comment1 person in discussion

Regarding this recent edit, can something more specific be said about what is meant by cognitive strategy ... it seem a very specific term, so is there a sensible wikilink for it? Or is it not meant to be that precise an idea? Melcombe (talk) 14:19, 18 February 2009 (UTC)Reply

Slidecast about overfitting edit

Latest comment: 15 years ago1 comment1 person in discussion

I have posted a video tutorial about overfitting. This content is aimed to be a gentle introduction of overfitting. Yet, the video is posted on my company website, thus I am letting the community decides if it is appropriate.--Joannes Vermorel (talk) 13:04, 22 April 2009 (UTC)Reply

Definition edit

Latest comment: 14 years ago1 comment1 person in discussion

This article still needs a good definition. Most books seem to refer to the concept without defining it. Has anyone come across any good definitions? —3mta3 (talk) 12:49, 23 May 2009 (UTC)Reply

Updating the overfitting image edit

Latest comment: 14 years ago1 comment1 person in discussion

I'd like this page to be accessible to users who are new to statistics and to overfitting. The image with the red and blue lines in general does a good job emphasising the the model predictivity actually gets worse with overfitting, but the fact that the red line trends downwards near the right edge of the image may lead newer users to the idea that they can extrapolate the red line to end close to the blue line. Ghopcraft (talk) 01:16, 17 November 2009 (UTC)Reply

Too many vs. too few degrees of freedom edit

Latest comment: 13 years ago3 comments2 people in discussion

The lede correctly says that "Overfitting generally occurs when a model is excessively complex". This occurs when there are too many explanatory variables. The degrees of freedom is the number of observations minus the number of explanatory variables. Therefore overfitting occurs when there are too few degrees of freedom. Also, the lede in paragraph 2 correctly mentions "an extreme example, if the number of parameters is the same as or greater than the number of observations". This is the case of zero or negative -- too few -- degrees of freedom. Therefore I'm correcting the lede again to say too few degrees of freedom. Duoduoduo (talk) 19:26, 18 November 2010 (UTC)Reply

That's one interpretation. Another one is that it's the number of parameters of the model (see e.g. [1]) that are free to vary in order to fit the data. I propose that we should avoid this ambiguous term. What's wrong with simply saying "number of parameters"? -- X7q (talk) 20:38, 18 November 2010 (UTC)Reply

Done. Duoduoduo (talk) 02:56, 19 November 2010 (UTC)Reply

Bias towards machine learning edit

Latest comment: 11 years ago1 comment1 person in discussion

The introduction tends to describe overfitting from a more machine learning centered perspective as exemplified by the sentence "Overfitting occurs when a model begins to memorize training data rather than learning to generalize from trend." Overfitting is a very traditional topic in statistics and a "model" that begins to "memorize training data" does not really fit into a general description of this very basic statistical term in my opinion. I'm more from a "traditional" statistics background and I was first confused by it. I think the first section should try to avoid this kind of machine-learning-centered view. --Fabian Flöck (talk) 20:56, 27 December 2012 (UTC)Reply

Underfitting edit

Latest comment: 7 years ago2 comments2 people in discussion

Underfitting redirects to the Overfitting article. However as far as I understand underfitting refers to effects occuring because too few data sets are provided and therefore data sets get learned by heart. On the other hand overfitting occurs when the model becomes too complex and random error is introduced. So this is not exactly the opposite. I hope we can find someone who can explain it separately on the page. — Preceding unsigned comment added by 193.171.240.14 (talk) 11:37, 13 December 2013 (UTC)Reply

hope to see more about overfitting in regression application. Deng9578 (talk) 05:35, 31 January 2017 (UTC)Reply

Problem with PDF generation edit

Latest comment: 7 years ago1 comment1 person in discussion

I don't know why, but trying to generate the PDF of this page lead to: "File not found The file you are trying to download does not exist: Maybe it has been deleted and needs to be regenerated." — Preceding unsigned comment added by Liar666 (talk • contribs) 09:33, 14 April 2017 (UTC)Reply

Translations edit

Latest comment: 6 years ago1 comment1 person in discussion

Greek: σφαλματογόνος υπεραρμογή, σφαλματώδης υπεραρμογή (error-generating overfitting) — Preceding unsigned comment added by 2A02:587:4113:B100:CD22:F41:9DAE:E551 (talk) 00:22, 2 July 2017 (UTC)Reply

Citations improvement severely needed: Uniformity in the literature edit

Latest comment: 6 years ago1 comment1 person in discussion

Apart from 3 references (Hawkins, Leinweber, Tetko) to published journals, there are not enough references to the literature. Apparently, the definition of Overfitting is not uniform in the literature, see recent discussion from Andrew Gelman. mcyp (talk) 23:36, 7 August 2017 (UTC)Reply

Overtraining edit

Latest comment: 6 years ago1 comment1 person in discussion

This article mostly explains 'overtraining' not overfitting. We need to re-write this. mcyp (talk) 23:38, 15 August 2017 (UTC)Reply

Non-function model in introduction edit

Latest comment: 5 years ago1 comment1 person in discussion

Diagram from introduction

I wonder why the introductory overfitting diagram shows two non-functions — assuming that these are indeed relations and that a y-axis input maps to a non-unique x-axis output. I base my remarks on Christian and Griffiths (2017:ch 7) ^[1] who cite only statistical models captured as functions in their treatment of overfitting. If the introductory diagram is not intended to be a function-based model with unique mappings, then some further explanation is required. Moreover the axes or dimensions should be indicated and/or described. (I recommend Christian and Griffiths as well). HTH. RobbieIanMorrison (talk) 07:31, 2 May 2018 (UTC)Reply

References

^ Christian, Brian; Griffiths, Tom (6 April 2017). "Chapter 7: Overfitting". Algorithms to live by: the computer science of human decisions. London, United Kingdom: William Collins. pp. 149–168. ISBN 978-0-00-754799-9.

Burnham and Anderson edit

Latest comment: 3 years ago1 comment1 person in discussion

I cannot find the reference to this quote in either the first or second edition of Burnham and Anderson: Overfitted models … are often free of bias in the parameter estimators, but have estimated (and actual) sampling variances that are needlessly large (the precision of the estimators is poor, relative to what could have been accomplished with a more parsimonious model). False treatment effects tend to be identified, and false variables are included with overfitted models. … A best approximating model is achieved by properly balancing the errors of underfitting and overfitting. Could someone identify the correct source?BinaryPhoton (talk) 22:05, 4 January 2021 (UTC)BinaryPhotonReply

Add topic

[griffiths-and-christian-2017-ch7-1] Christian, Brian; Griffiths, Tom (6 April 2017). "Chapter 7: Overfitting". Algorithms to live by: the computer science of human decisions. London, United Kingdom: William Collins. pp. 149–168. ISBN 978-0-00-754799-9.

[1]