Wikipedia:WikiProject Military history/News/November 2023/Op-ed





WP:Size considered harmful

By Hawkeye7
The Encyclopædia Britannica. The very model of the modern encyclopaedia?

Should Wikipedia articles be short or comprehensive?

Wikipedia:Summary style says:

Articles over a certain size may not cover their topic in a way that is easy to find or read. Opinions vary as to what counts as an ideal length; judging the appropriate size depends on the topic and whether it easily lends itself to being split up. Size guidelines apply somewhat less to disambiguation pages and to list articles, especially if splitting them would require breaking up a sortable table. This style of organizing articles is somewhat related to news style except that it focuses on topics instead of articles.

This is more helpful to the reader than a very long article that just keeps growing, eventually reaching book length. Summary style keeps the reader from being overwhelmed by too much information up front, by summarizing main points and going into more details on particular points (subtopics) in separate articles. What constitutes "too long" varies by situation, but generally 50 kilobytes of readable prose (8,000 words) is the starting point at which articles may be considered too long. Articles that go above this have a burden of proof that extra text is needed to efficiently cover their topics and that the extra reading time is justified.

The reference to 50 kilobytes of readable prose requires some unpacking. First of all, kilobytes are not characters. A byte is a unit of digital information that nowadays usually consists of eight bits. For technical reasons, a power of two is very convenient but in the past 6, 7 and 9 bit architectures were common. An eight-bit byte can hold numbers from 0 to 255. Characters are encoded as numbers, so that allowed for 256 distinct characters. Since the Latin alphabet has 26 upper case and 26 lower case letters, that seemed enough for letters, punctuation and the odd funny character with a diacritical mark. Unfortunately, 256 characters, and even 65,536 characters (16 bits), proved too small when people tried to add Chinese characters, cuneiform and Klingon scripts to the mix. Not to mention funny glyphs like the endash and emdash that some Wikipedians are so fond of. So we went to Unicode, whereby characters are encoded with variable numbers of bytes. Therefore the number of bytes is invariably greater than the number of characters.

A bigger gotcha lies in the term "readable prose". This is characters in the main body of the text, excluding material such as footnotes and reference sections ("see also", "external links", bibliography, etc), diagrams and images, tables and lists, Wikilinks and external URLs, and formatting and markup. This is pretty reasonable in my opinion (although universities count footnotes in the thesis size Goddammit) but for articles with a lot of tables, this doesn't work nearly so well. You can use the prosesize gadget to calculate readable prose. It has its limitations though; in particular, it grossly overestimates the size of articles with mathematical markup in them. Assessment of the size of these articles must be done another way. Since tables and lists are not counted, list articles are grossly underestimated. WP:SIZE specifically exempts list articles from its ambit, at least partly because we have no agreed-upon means of measuring their size, which would be a necessary precursor to any size guideline.

Readable prose is much smaller than the markup, which in turn is smaller than the resultant HTML that is actually used to render a page. The actual download size to your computer or smart phone is dependent on the images, which take up far more bandwidth even in 64K thumbnail size. There is indeed a technical limit to markup size, but it is very large: 2 megabytes. It's been done, with the help of templates. The limitations of the template language made many simple templates absurdly large, but the advent of the MediaWiki Scribunto extension means that these can now be written in Lua, with consequent improvements in efficiency. The crucial point is that readable prose size has no bearing on how quickly or slowly a page loads.

The arbitrary nature of the numbers in WP:SIZE and its restrictions chafed on writers of featured articles as early as 2006, because the limitation made it difficult to comprehensively cover many subjects. Originally, it was half the size, and the limits were arbitrarily doubled in response to complaints from people attempting to write the first WP:featured articles.

The root of the problem lies in our conception of what an encyclopaedia should look like. Most encyclopaedias consist of short articles; the Micropædia of the Encyclopædia Britannica has 65,000 articles with fewer than 750 words each. Often overlooked is the accompanying Macropædia with its 699 in-depth articles that range up to 310 pages in length. This arose from the nature of the publication, although the Britannica was both large and expensive, there was a restriction on the size of articles owing to the cost of publication.

Wikipedia is not paper and has no such limitations. Wikipedia doesn't have separate Micropædia and Macropædia articles; instead the lead performs the former function and the body the latter. Although they can be separated for some purposes, they are together in the same article. They are easy to find with an online search, which often will just present the lead, and it is easier to check that the lead truly reflects what is in the article.

We know a lot about how readers approach the articles that was not known when the guideline was written back in 2004:

  • Most just read the lead and nothing else
  • Many do not read the article sequentially, but jump around looking for very specific information
  • Only a small percentage read the whole article from top to bottom

Therefore, to service the readers' needs, articles need to be comprehensive and detailed, with a well-written lead.

If an article exceeds WP:SIZE limits, we have three techniques for reducing article size:

  1. Material can be deleted outright;
  2. Text can be trimmed to use fewer words to say the same thing;
  3. Sections can be split off into subarticles.

The first technique cannot be used simply to reduce the size of an article. Material should be preserved unless it is unsourced, libelous, patent nonsense, vandalism or violates copyright. This is a policy, so it overrides any guideline. A more controversial action is to summarise if it a section has undue weight. This requires consensus that the section is indeed undue, and the imperative is to preserve, since the Wikipedia is a compendium of information.

The second technique looks more promising. Some good essays have been written on how to do this, including our Wikipedia:WikiProject Military history/Academy/Copy-editing essentials, User:Tony1/Redundancy exercises: removing fluff from your writing and the Wikipedia:Principle of Some Astonishment. Following this advice will improve your article writing style, but it is not a panacea. When it comes to reducing the overall size of an article, one should not expect too much from trimming; experience has shown that perhaps a five percent reduction can be expected at best.

Which brings us to the third technique: splitting off subarticles. This is called summary style. The idea is that sections of long articles should be spun off into their own articles, leaving summaries in their place. A fuller treatment of a major subtopic can have a separate article of its own. This holds out the possibility of substantial savings in the size of the parent article. However, this technique has limitations that need to be carefully considered before embarking on such a course of action.

The first is that a child article must be a complete encyclopaedic article in its own right. That means that it must meet our notability guideline. A child article created from an arbitrary split can face challenge at Wikipedia:Articles for deletion if it does not appear to pass muster on its own merits. Indeed a well-known technique for getting rid of unwanted material like "In popular culture" content is to split it off and then nominate it for deletion. Undue material can also be dealt with in this manner as a point of view fork. (Conversely, splitting off a large section may leave the parent article with undue emphasis issues.) Many editors at Articles for Deletion are unaware of or do not understand the summary style or size guidelines, and in any case in a conflict between them and notability will prefer the latter. They may question the need for the child article, potentially leading to a resolution of WP:merging it back into the parent article.

Another limitation is that a when a child article is created from a section it must be replaced with a summary in its parent article, which must be similar to the lead in the child article. Simply replacing the section with a hatnote is unacceptable. Apart from the illogic of violating one guideline in the pursuit of another, the readers really do not like this. Put simply, for reasons not fully understood, they do not like following links, and will complain on the talk page when forced to do so. This problem is compounded by the fact that child articles often do not appear in searches with common search engines, which may direct the reader to the main article even if a child article is available.

What this means for the editor trying to reduce the size of an article is that spawning a child article will not reduce the article in size by that of the section being split off. To achieve a reduction, we need to locate a section with more than just a few paragraphs. Not articles have sections that can easily be split off, so in some cases the parent article may need considerable restructuring in order to create one. The creation of child articles also comes with a maintenance overhead. If a child article changes, the summary in the parent article will need to be changed as well.

WP:SIZE is a guideline, and policies and guidelines should be applied using reason and common sense. It contains a great many weasel words even for a guideline. It cautions that:

There is no need for haste in splitting an article when it starts getting large. Sometimes an article simply needs to be big to give the subject adequate coverage.

In summary, WP:SIZE posits arbitrary size limits on articles, and meeting them may involve considerable work for the article writers and generate conflict with other guidelines while detracting from the quality of the work delivered to the readers.


About The Bugle
First published in 2006, the Bugle is the monthly newsletter of the English Wikipedia's Military history WikiProject.

» About the project
» Visit the Newsroom
» Subscribe to the Bugle
» Browse the Archives
+ Add a commentDiscuss this story

Jumping around looking for information

Quoting (from rev. 1184270070:

We know a lot about how readers approach the articles that was not known when the guideline was written back in 2004:

  • Most just read the lead and nothing else
  • Many do not read the article sequentially, but jump around looking for very specific information
  • Only a small percentage read the whole article from top to bottom
Therefore, to service the readers' needs, articles need to be comprehensive and detailed, with a well-written lead.

I agree, and with respect to bullet two, I've always found it very curious that although Wikipedia is on a hypertext platform and takes full advantage of it via wikilinks to link sections in other articles, and even interlanguage links to link to sections of articles on other Wikipedias, it seems we don't really encourage linking to other sections of the same article we are reading, other than in the Table of Contents, although in my view we definitely should, via on-page section links.

Wikipedia:Manual of Style/Linking#What generally should be linked talks only about linking to other articles, even articles that don't exist, but says nothing about linking to a section later on the page. It's not exactly forbidden, and section § What generally should be linked doesn't discourage internal links, it just doesn't mention them at all. (Interestingly, the last paragraph of § What generally should be linked contains two section links to lower down on the page.) The page does have a paragraph on MOS:SECTIONLINKS, but it's clear they are talking about sections on other pages.

The one place I've found that does talk about it, is a sentence at Help:Link#Section linking (anchors) and the similar wording at Help:Section#Section linking. But somehow, this doesn't seem to be part of the culture, and I rarely see it used, and it should be encouraged. Using Template:Section link could help the reader to know that it's an on-page link, due to the section symbol prefix, as off-page wikilinks are generally not implemented that way. I don't know whether you'd want to mention that in this essay or not, but I just find it curious, and unfortunate, that we don't encourage the use of on-page section links a lot more than we do, because I think you're right about bullet 2, and we don't do enough to help the user do that. Mathglot (talk) 00:55, 25 November 2023 (UTC)[reply]