Wikipedia:Wikipedia Signpost/2008-07-14/Dispatches

Dispatches

Dispatches: Interview with botmaster Rick Block

Wikipedia's featured-content processes are multifaceted and complex, and generate major opportunities for gathering data that will provide insights into the process and the overall project. Managing the mountain of potential data was crying out for automation, and editor Rick Block did something about it: he was and still is one of the pioneers of this type of automation on the project. As such, he's been at the forefront of making Wikipedia a smoother and more sophisticated operation over the past three years. He operates Rick Bot which, among other things, generates a List of Wikipedians by featured article nominations (WBFAN) and yearly lists such as Featured articles promoted in 2007. In June 2008, David Fuchs interviewed Rick for The Signpost, slightly paraphrased here for readability.

The Signpost: Rick, looking at the page history of WBFAN, you were the list’s creator and primary contributor for a long period. What prompted you to create the page?

In the olden days (2005), before the semi-automated editing tools like WP:AWB were everyday features of Wikipedia, a user's number of edits was a much bigger deal than it seems to be now (because of the ease of amassing edits via automated tools -ed.). Very few users had more than 5000 edits because they had to have clicked "edit this page", made some change and saved it 5000 times. Unless you're doing something completely brainless, each edit corresponds to a minute or more of time invested into Wikipedia so someone with 5000 edits had spent on the order of at least 100 hours editing. Since higher edits count basically meant more effort dedicated to Wikipedia, it came up regularly at WP:RFA, and lots of people displayed their edit count on their user page and seemed to pay attention to their ranking at Wikipedia:List of Wikipedians by number of edits. Every edit is precious, but there did seem to be a fair number of people taking on massive projects because they saw prestige in increasing their edit count.

The Signpost: So you wrote Editcountitis to deal with this issue?

Well, I wrote that page to gently poke a little bit of fun at this phenomenon. The list of Wikipedians by featured article nominations started as a slightly more serious response. The basic idea was to provide an alternative arena for folks to compete in (FA production) that would do more for the encyclopedia than List of Wikipedians by number of edits. The premise behind the list is that it takes a lot of effort and risk for nominators to expose their work to the WP:FAC process, and awarding a star against each person's name for each of their articles that is promoted might encourage users to go through with it—no article becomes an FA unless someone has a thick enough skin to take it through FAC!

The Signpost: Currently, your bot account Rick Bot auto-updates WBFAN; when did the bot take over the arduous task of manually updating the page?

Actually, the process has always been tool-assisted. I wrote a tool to parse the nominator out of the FAC nomination logs, and originally stored the output in a set of files on my Mac. I then wrote another tool (a slightly out-of-date version here) that took these files as inputs and created the WBFAN table. I changed the log-parsing tool so its output would be a table as well, and the result has been lists like Featured articles promoted in 2008 in tabular form. About once a month from August 2005 until April 2007, I ran these tools and manually copied and pasted the updated content into both the by-year lists and WBFAN. Partly in response to concerns raised in April 2007 by SandyGeorgia about the reliability of these lists, I updated the tools so they audit WBFAN and the by-year nomination lists against WP:FA/WP:FFA, created the bot account, and started updating these lists daily using the bot to do the final step of making the actual page edits.

The Signpost: WBFAN is totally automated, I guess.

Ah, the process for WBFAN is still partly manual: I have to start the script, and I have to monitor it as it runs; the reason for this is that, historically, the FAC nomination logs have not been regular enough to reliably parse. The parsing tool makes an educated guess about who the nominator might be based on the first link to a user page. When I run the tool, I check each guess and either approve it or correct it. As well as adding new entries to the by-year nomination lists and updating WBFAN, the tool updates the main page appearance date in the by-year lists and indicates FA/FFA status for each entry in the by-year lists and the WBFAN page.

The Signpost: Did you write the bot yourself?

The bulk of the work is done by the analysis tools that I wrote in Unix shell and awk (two common programming languages for online contexts like Wikipedia). I use a slightly modified version of what is known as the pywikipedia replace.py script to upload changed versions of the list contents. Replace.py is the basis for many of the "search and replace" bots other botmasters run.

The Signpost: What was the most challenging part of the process?

It's difficult to figure out from the free-format text in the nomination file just who the nominator is – the difficulties are when there are two or more nominators or when other users are mentioned in the nomination statement. I've recently added fully automated tasks for the bot to update WP:WBFLN and WP:WBFTN, and corresponding by-year lists for each of these. For list and topic nominations, the bot uses the creator of the nomination file as the nominator so any joint nominations have to be corrected after the fact.

The Signpost: What can FAC nominators do or know to make your job easier?

The FAC nomination procedure now uses a preloaded nomination page with an explicit line for identifying the nominator(s). If everyone used the preloaded file, the process could become fully automated. The code to parse the nom files is written and works OK, so I've never really bothered to push this – in the grand scheme of things, making the process for updating WBFAN more reliable or more automated than it already is has never seemed that important. I mean, I'm not even sure anyone cares about either the by-year summary lists or WBFAN, or whether it's helped in any way to increase the number of FAs.

The Signpost: So you don't think the list has helped increase the numbers of FAs or made getting the star more desirable in some way? After all, there is that ironically titled essay Why would you want to get an article to FA?.

Um ... My "not sure" is not nearly as strong as "don't think". I'm pretty sure it doesn't hurt (although there is an editor who, oddly enough, asked to opt out). I suspect that because of WBFAN, we may have more FAs on hurricanes, Anglo-Saxon history, US/Japanese military history and Mary Wollstonecraft than we would without it, but even in these seemingly obvious cases I don't know that WBFAN is a motivating factor. Thinking back to when the task was much more manual, there were some folks who seemed fairly anxious about getting their stars listed, and similar pages were created (and, until recently, manually maintained) for featured lists and featured topics. So I guess there's evidence that it has some importance to at least some people. Time and effort spent editing here generally goes completely unappreciated. Bringing an article to FA quality takes a significant amount of work. If WBFAN helps encourage this in any way, I'm happy!

The Signpost: Finally, Rick, your Wiki activities go beyond being a bot-guy, don't they?

Of course. I'm an admin, currently spending most of my Wikipedia-time at the John McCain and Barack Obama articles. I shepherded Monty Hall problem through the Wikipedia:Featured articles process and its two featured article reviews, and at its talk page still try to help folks understand it. I started and continue working on an effort to update all articles on Japanese prefectures, towns and villages due to something called gappei - thousands of towns and villages have merged into new, larger, municipalities and in many instances Wikipedia's articles still do not reflect this merger activity which has quite literally made most sources about Japanese municipalities obsolete. I used to be a regular responder at the Help desk and Wikipedia:Village pump (technical) but don't seem to find the time to hang out at either of these pages much anymore.




Also this week:
  • From the editor
  • WikiWorld
  • Dispatches
  • Features and admins
  • Technology report
  • Arbitration report

  • (← Previous Dispatches) Signpost archives (Next Dispatches→)