Talk:Cross-validation (statistics)/Archive 1

Archive 1

Can the title figure not be in Spanish

This is an English wikipedia. Can someone find an English figure for the title figure? I also don't think the figure is that communicative. Also, the figure is for k=4, so I don't understand why there are dots between k=3 and k=4. — Preceding unsigned comment added by 128.3.19.169 (talk) 21:21, 18 July 2016 (UTC)

Inefficiency of Leave-One-Out Cross-Validation (LOOCV)

This lecture discussing (among other things) LOOCV in the context of kernel methods seems to imply that LOOCV is actually very fast to evaluate compared to other cross-validation methods, since only a term depending on a single sample needs to be recalculated. The link:

http://videolectures.net/mlss08au_smola_ksvm/ Click "Part 2", then select slide "23:40 Selecting the Kernel Width".

I'm posting here since there may be some subtleties I'm not aware of and this article has a broader context than just kernel methods. Jotaf (talk) 00:02, 13 April 2011 (UTC)

Section K-Cross-Validation

"10-fold cross-validation is commonly used" A citation and an explanation should be made here. —Preceding unsigned comment added by 128.186.232.93 (talk) 20:32, 11 April 2009 (UTC)

Out of sample testing

Finally!! I've been searching for this subject on Wikipedia for weeks, but I thought the concept was called "Out of sample testing". I'll figure out how to make that term link to this article as well. I took a really roundabout route to find this article, and others may have the same problem. capitalist 08:23, 19 October 2005 (UTC)

Section "Using a validation set" - confuses the matter

Cross validation is an alternative to a validation set, but in the section "Using a validation set" it gives the mistaken impression that using a validation set is some special way to use cross validation. Of course, there can be many levels of validation, but in exactly the same way there can be many levels of cross-validation. For example, you can have a training set to learn a model with a set of parameters, a validation set to optimize the parameters, and a test set to estimate the performance of the model learned on the learner with the optimized parameters. Just as well you can have level-1 cross-validation to learn a model and optimize parameters, and level-2 cross-validation to estimate the performance of the model with the chosen parameters. But the section confuses the matter, and I think does not add anything to the topic of article. If so, it should be removed. -Pgan002 (talk) 08:46, 30 June 2008 (UTC)

Agreed. I'm not familiar with validation sets, but if they're an important topic, they should have their own page. The material doesn't seem to belong here. --Zaqrfv (talk) 23:15, 12 August 2008 (UTC)

I think the article merely needs to explain terminology that is in use better. One common classification is by the terms "internal" and "external". In the case of external, there is indeed a separate "validation set" kept apart from the "training set". The article currently claims that "external" is out of the scope of "cross validation". It is doubtful if "validation set" is worthy of an independent article. Shyamal (talk) 12:02, 13 August 2008 (UTC)

Some really bad math notation style

Please: Wikipedia:Manual of Style really does exist.

I found the following in this article:

(n-p-1)/n<1.

Of course I changed it to this:

(n − p − 1)/n < 1

Note:

  • It is incorrect to include digits, such as 1, within these sorts of italics.
  • It is incorrect to unclude punctuation, such as these round parentheses, within these sorts of italics.
  • A minus sign is not a stubby little hyphen.
  • Spacing precedes and follows the minus sign.
  • Spacing precedes and follows the "less-than" symbol.
(I've made the spaces non-breakable; i.e. they won't allow line-breaks when the browser window size and shape are altered.)

I also found a bunch of other similar things and fixed those; I'm not sure I got everything.

All this is covered in the style manual cited above. Michael Hardy (talk) 18:25, 7 March 2009 (UTC)

Comments on latest edits

I've made some major edits to this page over the last few days. If anyone has comments please leave them here.

I would like to remove the second paragraph in the introduction about testing hypotheses suggested by data, etc. This seems barely relevant. But I will hold off for a while to see if anyone objects. Skbkekas (talk) 20:01, 7 March 2009 (UTC)

The stuff in the introduction about testing hypotheses suggested by data seems OK to me. It says what cross-validation is used for and why it is important a great deal better than the first paragraph of the intro. Remember, the intro is meant to say why the topic is important. Melcombe (talk) 18:06, 10 March 2009 (UTC)

Expected value of MSE

The statement "...the expected value of the MSE for the training set is (n - p - 1) / n < 1 times the expected value of the MSE for the validation set" is incorrect. Presumably the "mild assumptions" are the usual ones: conditionally on the X's, the Y's are uncorrelated with common variance σ2. In this case, the expected value of the MSE for the training set is (n - p - 1) σ2 / n, and the expected value of the MSE for the validation set is (n + p + 1) σ2 / n.[citation needed] The ratio is therefore (n - p - 1) / (n + p + 1), which is also < 1, but more so.

The later discussion is, of course, still correct: MSE for the training set gives an optimistically biased assessment of fit, and of performance in predicting new data sets. Primrose61 (talk) 19:08, 7 April 2009 (UTC)

Thanks for pointing this out, I have fixed it. Skbkekas (talk) 00:54, 9 April 2009 (UTC)

Could somebody please support this with a reference? (Preferably something that is available online) Veastix (talk) 10:38, 23 June 2010 (UTC)

I'd much appreciate a reference too... I wonder also whether the assumptions (which I think should be stated explicitly) can really be described as "mild", and consequently whether cross-validation really "is not practically useful" for linear regression. Anecdotal I concede, but on the few occasions I've used cross-validation in anger on linear regressions, the "mild assumptions" have been violated. Lionelbuk 13:59, 26 August 2010 (UTC)

Uses outside of statistics

I believe this article was intended to discuss cross-validation in predictive modeling (statistics and machine learning). Apparently there are some other completely distinct uses of the term. Some text was recently added to the introductory section about cross-validation in analytical chemistry, and the following link defines its meaning in psychology: [[1]]. Since these are completely different topics, I think it confuses things to mention them here. Any suggestions? Should we rename this page "Cross-validation(Statistics)" and add a disambiguation page? Skbkekas (talk) 22:27, 10 April 2009 (UTC)

Advantage of 2-fold cross-validation

In the homework of my Machine Learning course my professor asked the advantage of 2-fold cross-validation compare to other fold cross-validation. I copy & pasted the statement on WikiPedia but my professor tell me the statement is incorrect. So I changed the statement here https://en.wikipedia.org/w/index.php?title=Cross-validation_%28statistics%29&diff=576230503&oldid=566306656

Feel free to correct me if I'm wrong. Yegle (talk) 02:23, 8 October 2013 (UTC)

Cross-validation is vulnerable to a small amount of type-II error into its measurement. When the training folds are artificially biased in one direction (due to the random shuffling of the data), the test folds are generally biased in the opposite direction (because every pattern either falls into the training or test folds). This causes cross-validation to slightly under-estimate accuracy. This effect is particularly observable in small datasets. 2-fold cross-validation has the lowest type-II error of any other type of cross-validation. It is also significant to note that performing multiple repetitions does not correct for this kind of error. Consequently, it is generally advisable to perform multiple repetitions of 2-fold cross-validation rather than to perform n-fold cross-validation with another value for n.
The well-known Iris dataset provides a simple demonstration of this behavior. It has exactly 3 classes that are perfectly balanced. If you use learning algorithm that always predicts the most-common class in the training data, it should obtain a predictive accuracy of exactly 33.333%. Cross-validation, however, consistently measures a predictive accuracy score lower than 33.333% with this dataset (due mostly to the effect I described). 2-fold cross-validation gives the closest measurement to the ideal result on average. Unfortunately, 2-fold cross-validation is more volatile than n-fold cross-validation with other values for n, so it is typically necessary to perform multiple repetitions of 2-fold CV.--Laughsinthestocks (talk) 15:36, 8 October 2013 (UTC)

(Ctrl)F does not work for k-fold

I was looking for the phrase "k-fold", but could not find it via the normal find on the page. Could it be due to the double quotes around k?

In other words, you can find:

  • k
  • -fold

but not

  • k-fold

Lindner wf (talk) 21:36, 21 April 2014 (UTC)Lindner_wf

That sounds like a feature (or bug) of the browser. I tried k-fold without quotes and it works fine on Chrome. Shyamal (talk) 06:37, 22 April 2014 (UTC)

Is this logical?

I'm afraid I don't get the logic of this statement in the intro: "Furthermore, one of the main reasons for using cross-validation instead of using the conventional validation (e.g. partitioning the data set into two sets of 70% for training and 30% for test) is that the error (e.g. Root Mean Square Error) on the training set in the conventional validation is not a useful estimator of model performance and thus the error on the test data set does not properly represent the assessment of model performance."

I understand the point about the error on the training set not being useful, but how does that translate into the part after "thus"?

Scottedwards2000 (talk) 01:31, 10 January 2016 (UTC)

Last sentence and description of "true validation" are misleading

"It should be noted that some statisticians have questioned the usefulness of validation samples.[17]"

Ref 17, Hirsch 1991 Biometrics, refers specifically to the case where a single data set is divided into "training" and "validation" sets. In fact, Hirsch is careful to make a distinction between this practice and external validation, in which the validation set is an independently procured data set. External validation is acceptable, while internal validation is criticized. As currently presented, this sentence in Wikipedia does not distinguish between the two.

Actually, the Hirsch letter is a direct criticism of the "Relationship to other forms of validation" section in this article. What's being described here as "true validation" or "holdout validation" is actually the internal validation being criticized by Hirsch. As Hirsch points out, internal validation sets are subject to the same biases as the training set and thus do not allow one to estimate how well a model generalizes. We should not be advocating this practice by calling it "true" validation, because it is not. 143.117.149.187 (talk) 12:58, 3 May 2016 (UTC)

I've gone ahead and made some edits that I feel address my comments above. Use the article history to see the changes. 143.117.149.187 (talk) 13:12, 3 May 2016 (UTC)


Holdout method ≠ 2-fold cross-validation

I understand that are some conflicting definitions of the holdout method throughout sources. However, I believe stating that 2-fold cross-validation = the holdout method is imprecise. Generally, the holdout method splits the dataset into two disjoint sets. There are no specific rules that say how large the datasets are, i.e., where the splitting point is. In 2-fold cross-validation, the data are split into *equal-sized* disjoint sets. Therefore, the 2-fold cross-validation is a *special instance* of the holdout method. [1] [2] [3] Thrau (talk) 20:14, 2 September 2016 (UTC)

References

Paragraphs 2 and 3 seem out of place/incorrect

"One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, multiple rounds of cross-validation are performed using different partitions, and the validation results are averaged over the rounds." -This seems to describe K-fold cross-validation only, not cross-validation as a whole.

No, it applies to more than K-fold; for example, it applies to leave-one-out, for example. K-fold just means equal-size partitions. Rolf H Nelson (talk) 21:07, 7 January 2017 (UTC)

"One of the main reasons for using cross-validation instead of using the conventional validation (e.g. partitioning the data set into two sets of 70% for training and 30% for test) is that there is not enough data available to partition it into separate training and test sets without losing significant modeling or testing capability. In these cases, a fair way to properly estimate model prediction performance is to use cross-validation as a powerful general technique.[5]" -This description applies more specifically to K-fold cross-validation. I think "conventional validation" is actually just generic cross-validation, judging by the way the example is given. Here are a couple links: https://www.cs.cmu.edu/~schneide/tut5/node42.html http://robjhyndman.com/hyndsight/crossvalidation/ http://scikit-learn.org/stable/modules/cross_validation.html (machine learning resource, not just statistics, take with a grain of salt)

Again, nothing there is specific to K-fold, which just means equal-size partitions. As for the second part, you're correct, the current text is confusing, since "conventional validation" doesn't really mean anything and "partitioning the data set into two sets of 70% for training and 30% for test" can indeed be cross-validation terminology; feel free to be WP:BOLD and rewrite or delete that part. If you fix it, source it to something like a journal paper or a textbook, rather than an online tutorial. Rolf H Nelson (talk) 21:07, 7 January 2017 (UTC)

I didn't want to make wholesale changes to the intro section without others getting a chance to comment so I haven't changed anything.

Sunday funday (talk) 10:03, 6 January 2017 (UTC)Sunday funday

Added suggestion on Stationary Bootstrap

Added a suggestion to use the Politis and Romano Stationary Bootstrap for cross-validating time series. bayesianlogic.1@gmail.com -- Empirical bayesian, 22:45, 28 January 2019‎