User:Dr pda/Article referencing statistics

The statistics on this page arose from a request by User:Peregrine Fisher at WP:BOTREQ for "statistics on the proportion of articles with references, average references per article, etc", but may be of wider interest.

Disclaimer: For all the following statistics, the "number of refs" refers to the number of <ref></ref> tags in an article. Articles that use another referencing system (e.g. inline author–date references, such as Smith 2000), which is nevertheless perfectly acceptable, will consequently be counted as having zero refs by this method. This is unavoidable at present because of the technical difficulty in identifying such references.

Overall statistics edit

May 2007 January 2008 June 2008 March 2009
% without refs 85% 78% 72% 64%
Total no. articles 1,779,390 2,123,873 2,251,862 2,715,035

Notes:

  • Number of articles excludes redirects, articles assessed as List- or FL-class, and articles with 'List of' or '(disambig' in the title.
  • The June 2008 statistics also include a more comprehensive rejection of disambiguation pages. I will try to implement this for the other dumps soon.
  • The May 2007 and January 2008 statistics come from old database dumps which I happened to have on my computer. These dumps contained the article pages only, so I cannot do a breakdown by class, as this requires the talk pages as well.

Average number of refs per paragraph edit

Class June 2008 March 2009
# articles refs/para # articles refs/para
  FA 2066 2.07 2471 2.72
  A 967 2.06 428 2.40
  GA 4262 0.87 6304 2.52
B 59,284 0.51 59,961 0.89
C N/A N/A 24,457 0.85
Start 377,028 0.26 460,113 0.43
Stub 925,531 0.14 1,215,063 0.27
Assessed 1,333,458 0.20 1,718,682 0.34
Unassessed 918,404 0.15 996,353 0.28

Notes:

  • June 2008 statistics obtained by parsing the static HTML dump, March 2009 statistics obtained by parsing the XML database dump. It is possible there are discrepancies due to the different methods used.
  • Class assessments are obtained from class= parameters in WikiProject templates on talk pages. Consequently the number of FA-class or GA-class articles in the table may not agree with the number of articles listed at WP:FA or WP:GA on that date.
  • Articles which are assessed as belonging to more than one class count towards the statistics for each class, thus the number of assessed articles is not equal to the sum of articles in all the classes.
  • The number of articles per class generally differs from the WP:1.0 statistics (June 2008, March 2009). This may be because not all WikiProjects are set up to be counted by the WP:1.0 bot.
  • C-class had not been introduced in June 2008

Old articles vs new articles edit

In the following table "old article" means those articles which were present in both the June 2008 and March 2009 dumps, while "new article" refers to articles created since the June 2008 dump. This should show whether the change in the average number of refs per paragraph is due to existing articles being improved or new articles being created with more references.

Class June 2008 March 2009
# articles refs/para Old articles New articles
# articles refs/para # articles refs/para
  FA 2066 2.07 2361 2.42 110 2.48
  A 967 2.06 398 2.27 30 4.13
  GA 4262 0.87 5739 2.45 565 3.20
B 59,284 0.51 57,508 0.84 2543 2.02
C N/A N/A 21,538 0.74 2919 1.64
Start 377,028 0.26 432,038 0.40 28,075 0.92
Stub 925,531 0.14 1,081,709 0.24 133,344 0.55
Assessed 1,333,458 0.20 1,553,382 0.31 165,300 0.65
Unassessed 918,404 0.15 770,741 0.23 225,612 0.45

Note:

  • Articles can change class over time, thus the "old articles" of a given class are not necessarily a subset of the June 2008 articles of that class