Wikipedia talk:Village pump (proposals)/Proposal to require autoconfirmed status in order to create articles/Trial duration

Stale discussion

I introduced this proposal because I feel the last large scale trial for changes to the editor experienced suffered from a lack of planning and a non testable framework and I am worried this trial will suffer the same fate. Below are some general concerns and some elaboration on my proposed method, feel free to interject comments wherever you like.

Statistical troubles edit

Imagine that we implement the proposed system to limit article creation to confirmed editors (herein just "the proposal"). What metrics are we interested in? We may be interested in number of articles created per month (or week). We may be interested in rate of editor conversion to confirmed status. We may be interested in number of articles nominated or tagged for deletion. The trouble with each of these metrics is that they are a proxy for our real variable of interest: new editor retention. Obviously the proposal offers a number of benefits: new users will not see their first article deleted immediately, the new pages queue will see a dramatic drop and NPP/CSD taggers will have to deal with less volume and therefore have more attention to pay to individual articles, and presumably average quality of submitted articles will improve. We can argue about each of these benefits but most of those arguments are moot as a sufficient majority of the community wants to implement this feature. The principal downside of the proposal is the potential to increase editing friction for new users. If the proponents can show that editing friction does not materially increase under the new system they will be able to convince many of their opponents. However, none of our proxy variables are terribly well linked to new editor friction.

Take articles created for example. We see roughly 1000 articles created per day (and slowly declining FWIW). There is a fair amount of variation in that number even if we remove the secular decline. If we have a trial of 3 months and show for the three month period before the trial the decline adjusted figure was ~31,000 per month and the adjusted figure during the trial was ~30,000 per month, what have we shown? A critic of the proposal may jump and say "Aha! The change correlated with lower articles created! We shouldn't continue." A smarter proponent may say "Ah, but that variation isn't significant, so the trial did not coincide with a decline in article creation!" Who is right? Neither, as we don't care about # of articles created. I fully expect the number of articles created during the trial to be significantly less than before (or after). But if the trial is working properly most of those stymied articles were those which would have been deleted anyway and we may have attracted a thousand new editors whose first interaction with wikipedia was something other than a deletion notice. Of course we may also have turned away those thousand editors because they wanted to make an article on Miomantinae and we stopped them with an arbitrary limit. Point is, articles created will not resolve this dispute and at the end of a time consuming trial we will be back to arguing from priors.

Likewise # of articles tagged for deletion. # of accounts confirmed or autoconfirmed might be the best proxy for the trial but even then we cannot distinguish between accounts which reach autoconfirmed status and stop contributing because of the friction due to the trial (i.e. an account attempts to create an article, then makes a few edits but doesn't return after 4 days) and an account which would have left anyway.

To get to the point, we have no direct measure of interaction with the article creation process--exactly the variable we are trying to measure.

Technical details edit

  • I'll preface this section by saying that I am not even remotely an expert on the back end of wikipedia. I have no idea whether or not my proposal is technically possible, feasible or wise. But it is better than just flipping the switch without a game plan

The ideal data for this trial would be (sort of) longitudinal data on articles and editors under the remit of the proposal. Basic flow would look something like this:

  • A non-confirmed editor clicks on a redlink or opens a non-existent page on their browser while logged in.
    • Prior to the switch on they would see the same dynamic link (e.g. http://en.wikipedia.org/w/index.php?title=Talk:Zaur_Ardzinba&action=edit&redlink=1) they do now.
    • During the switch on they would (under the base proposal) see the same page an IP editor sees.
  • In both cases, the server should record the page in question as well as the account name. This is bolded because it is not current practice (AFAIK) to gather data on which accounts view which dynamic pages (or any information except that which commits a revision).
  • Nothing else need be recorded. At the end of the trial the developers can tally recorded articles and accounts and provide summary statistics for:
    • Articles: Whether or not the article had been created or deleted at any point in the trial and if it was created if it still existed at the end of the trial
    • Accounts: Descriptive statistics (mean, med., sd., quantiles would be nice) for total account edits, number of days dormant before the end of the trial, and what percentage of accounts (during the switch on phase) gained confirmed status and created the article they visited (or any article)

Again, I don't know how feasible any of these are. But this information (or any reasonably close proxy for it) would be extremely valuable in measuring the actual impact of the proposal

Why 30 day chunks edit

I proposed:

  • 30 day information gathering with no policy change
  • 30 days with no article creation for new editors
  • 30 days with continued information gathering but policy reverted to status quo ante

My reasons for this are simple. We want to capture the effect of the policy and not a time trend or any other concurrent policy. In order to do so we want some lead time where we are gathering the relevant data as well as a period after the policy is shut off in order to fully net out a time trend. Adding a tail period to the trial also allows us to capture more information about editors who attempted to create articles but could not. We can follow editors captured by the policy change for a maximum of 60 days (those that attempt to create an article on the first day of the switch on) and follow editors who are unaffected by the policy change for 90 days. Adding a switch off period also allows us to capture any rebound effect for editor conversion (or lack thereof).

Trial length edit

The main proposal suggests a length of 6 months, which is an absolute eternity on the web. Reasons for the six month trial are to judge the long term impact of the change and to gather sufficient data for analysis. As I pointed out above, we get ~30,000 new articles a month. I don't know but I gather a decent sized chunk of those articles are from new editors. Even with the reasonable variation in those averages, 6 months is way too long for a reasonably sized trial (and we may even end up capturing too much of the secular decline in new articles if we make the trial too long). The first reason (editor analysis) is more credible. However a relatively recent survival analysis (pdf!) of Wikipedia editors shows the survival rate of new editors drops to less than 50% within about 100 days (give or take). Given that we are interested in changes to new editors and not necessarily the declining 2-100 edits editors. Our focus is on the very left of the survival curve and extending a trial out to 180 days (including some review of the previous 180 days for comparison) will be well in excess of what is necessary to measure new editor retention.

Trial alternatives edit

I mentioned the possibility of A/B trials for even more information, but I'll hold back on elaborating until I get some word on the basic feasibility of the data collection itself.

General comments edit

Publicity edit

Should this RfC be announced with a watchlist notice? There have been some delays and controversy in the past stemming from "I didn't know about it" and "So-and-so interested constituency wasn't informed" concerns. Rivertorch (talk) 21:55, 12 July 2011 (UTC)Reply

I don't think that's necessary. The main RfC has already come and gone. This is simply a minor RfC on the technical details of trial implementation, not an opportunity for all of Wikipedia to express their opinion on whether or not they think non-autoconfirmed users should be able to create articles. —SW— yak 22:31, 12 July 2011 (UTC)Reply
Yeah a new RFC would just be restarting the discussion. There's a consensus to move forward with a trial. We're talking about minor details here, which should still conform to the broad consensus found at the larger RFC. Shooterwalker (talk) 00:39, 13 July 2011 (UTC)Reply
It states clearly on the RfC that it is for discussing the duration of the trial only, - any attempts to derail it by bringing peripheral arguments into the discussion will be met with an appropriate comment. The original RfC was confused and confounded by people wanting to discuss technicalties and other methods of new creation control. Although our philosophy is based on reaching consensus, this is the major downside of our RfC system and why it takes so long for anythign to get implemented. I speak from my experience of managing BLPPROD in its second and final phases. The implementation was left up to a small group of about 10 editors. The general attitude of the people who decline a proposal is 'OK, they've got want they wanted, let them get on with it." On BLPPROD however, there were people who joined in at the last moments of technical discussion who had not read the 100s of thousands of words of discussion, and tried to overthrow the overwhelming consensus on technicalities. --Kudpung กุดผึ้ง (talk) 05:46, 13 July 2011 (UTC)Reply
I know what you mean about peripheral comments, and I was probably being overly cautious. Rivertorch (talk) 09:15, 13 July 2011 (UTC)Reply