User talk:Dr pda/generatestats.js

The purpose of this script is to generate some statistics about articles which transclude a given template, namely a list of the ten longest and ten shortest articles, the mean and median length, and a histogram of the article lengths. The original motivation was to find out what were the longest and shortest {{featured articles}}, but could also be used for your favourite stub, infobox or other template.


Installation edit

Add {{subst:js|User:Dr_pda/generatestats.js}} to your monobook.js, and save it.

After saving, you have to bypass your browser's cache to see the changes. Mozilla/Safari: hold down Shift while clicking Reload (or press Ctrl-Shift-R), Internet Explorer: press Ctrl-F5, Opera/Konqueror: press F5.

Usage edit

Once you have installed the script, go to http://en.wikipedia.org/w/index.php?title=User:Dr_pda/generatestats&action=edit. A dialog box will pop up, asking you to enter the name of the template, without the word "Template:", i.e. featured article, instead of Template:featured article. The script will then retrieve the necessary information, 500 pages at a time, showing the progress within the edit window on that page. You can stop this at any time by navigating away from the page (e.g. clicking the back button in your browser). Once it is done the script will copy the output into the edit window and preview the page. If you desire you can then copy the wiki-text and save it somewhere else.

If you want to have the full list of articles sorted by size, as well as just the top and bottom ten, go to http://en.wikipedia.org/w/index.php?title=User:Dr_pda/generatestats&action=edit&list

To get statistics for articles whose talk pages belong to a certain category (e.g. WikiProjects) use the URL http://en.wikipedia.org/w/index.php?title=User:Dr_pda/generatestats&action=edit&usetalkcategory, or http://en.wikipedia.org/w/index.php?title=User:Dr_pda/generatestats&action=edit&usetalkcategory&list for the version with a list.

To get statistics for articles which transclude any template within a given category (e.g. Category:Television episode infobox templates), use the URL http://en.wikipedia.org/w/index.php?title=User:Dr_pda/generatestats&action=edit&usetemplatecategory, or http://en.wikipedia.org/w/index.php?title=User:Dr_pda/generatestats&action=edit&usetemplatecategory&list for the version with a list.

Example output edit

Ten longest articles edit

  1. Intelligent design (179 kB)
  2. Ronald Reagan (163 kB)
  3. Muhammad al-Durrah incident (162 kB)
  4. Elvis Presley (161 kB)
  5. General relativity (161 kB)
  6. Barack Obama (161 kB)
  7. Battle of the Coral Sea (156 kB)
  8. Major depressive disorder (156 kB)
  9. Michael Jackson (153 kB)
  10. 2007 USC Trojans football team (153 kB)

Ten shortest articles edit

  1. Tropical Depression Ten (2005) (9 kB)
  2. Nico Ditch (9 kB)
  3. 2005 Azores subtropical storm (10 kB)
  4. MissingNo. (10 kB)
  5. Hurricane Irene (2005) (10 kB)
  6. Bam Thwok (10 kB)
  7. Tropical Storm Erick (2007) (10 kB)
  8. North Road (stadium) (11 kB)
  9. She Shoulda Said No! (12 kB)
  10. Interactions (The Spectacular Spider-Man) (12 kB)

Statistics edit

  • Number of articles: 2815
  • Mean: 50.261 kB
  • Median: 44.795 kB

Chart edit

Notes edit

  • The size of the article is that of the wiki text, i.e. what appears in the edit window. It is NOT the readable proze size. (This can be calculated on an article-by-article basis by this prose size script.) If it is REALLY necessary to have the readable prose size, this script will now support it at http://en.wikipedia.org/w/index.php?title=User:Dr_pda/generatestats&action=edit&prosesize&list, however this requires loading each article, which is resource intensive and will take a long time if there are a large number of articles (approx 1 hour for 1500 articles).
  • This script only counts pages which are in the article namespace, so it won't work for talk page templates (e.g. wikiproject banners).
  • The script chooses bin sizes on the horizontal axis such that there are approximately 15 bins, but they use a sensible scale (1,2,5,10,20,50 etc). Due to the limitations of the code used to generate the chart, the labels are in the middle of each bin, rather than the left hand edge. Thus in the example above, the first bin contains articles between 0 and 20 kB, the second bin between 20 and 40 kB, and so on. Note that the upper edge of the last bin is not marked; here it contains articles between 160 and 180 kB.
  • You can see the numbers for the histogram by looking in the edit window.
  • Sometimes the chart doesn't show up in the preview. I'm not sure why; sometimes adding/removing a blank line, changing the height or inserting an error then correcting it, made it show up.