Wikipedia:Words per article

One of the metrics in the Wikipedia:Size comparisons page is the number of words per article. Some Wikipedians anticipate the rate of new article creation eventually slowing down, and effort going instead to improve the quality of existing articles. This page examines a couple of trends loosely associated with quality: the number of words per article, and the number of revisions per article.

In the graph below, which deals with the English Wikipedia, the number of articles (blue, multiplied by 1,000) and revisions (yellow, right hand scale) is increasing very much faster than is the number of words per article (purple).

This graph is based on November 7th 2005 figures from Wikistats. For explanation and technical analysis, see foot of page.

Note that the hyperlinked nature of wikis tends to produce sets of small articles concerned with different aspects of a subject, while a paper encyclopedia would tend to describe these aspects in a single, larger, article. When comparing according to words per article, these tendencies may lead to Wikipedia's coverage of subjects being underestimated.

The jump seen in the number of articles in October 2002 (and the consequential aberration of the revision statistics) was due to the addition of 36,000 "data dumped" Gazetteer entries about towns and cities in the United States; clearly these were longer than the prevailing mean article length. The volatility of the words per article count in the early stages of Wikipedia's life arises from the relatively low base of articles. Some of the rise in the number of revisions per article from about May 2004 onwards was due to the introduction of a system of categorisation which necessitated revisiting articles in order to apply categorisation.

Extrapolating from this graph, it appears unlikely that the average number of words per page will increase at much more than a snail's pace; it is even possible that the gradient might flatten or fall slightly if the rate of new stub addition eclipses the rate of expansion of existing articles.

Technical notesEdit

The above graph is based on November 7, 2005, figures from Wikipedia Statistics[dead link], specifically Words per article[dead link] and the Article count (alternate)[dead link]. The alternative definition of an article is that it shall contain at least one internal link and 200 characters of readable text, disregarding wiki and HTML codes, hidden links, and headers.

Excluding redirect pages, there are roughly (using figures from May 1, 2004):

  • 261,000 articles that have at least a single link.
  • 239,000 articles that have at least a single link and 200 readable characters (roughly equivalent to at least 33 words).

Taking the difference of these two figures, there are about:

  • 22,000 articles that have at least a single link but fewer than 200 characters.

There is also an uncounted number of articles which have no links. The current statistics provide no indication of the size of this last category. The upshot is that the 79 million words in fact span the 239,000 bona fide articles, the remaining 22,000 linked articles, and the unknown number of articles without links. As of October 2004, the total word count in the latter two categories was estimated at two million words. Dividing the remaining 77 million words by 239,000 gives a mean article length of about 320 words.

Further, of the articles on the English Wikipedia, perhaps 36,000 are "data dumped" gazetteer entries about towns and cities in the United States. It is controversial whether gazetteer entries should count towards the number of "real" encyclopedia articles; however, their statistical significance is very much less now than in October 2002 when they were added. Very many have been colonised by Wikipedians who have transformed them to varying extents, including to an unimpeachably encyclopedic status.

See alsoEdit