Checkuser Auditing. (If anyone comes across this, please place questions on my talkpage rather than here - I will be moving anything that isnt a direct response from specific editors to my talkpage.)

1. In the ColonelHenry case, MikeV stated it was a match based on Checkuser - heavily implying technical match, can you confirm what the *type* of information was used to give a positive result - Was it a match based on private/non-public data (such as useragent strings) semi-public like IP addresses edited from or was it behavioural? Only in death does duty end (talk)

2. Same question for the Grant Shapps/Contribx issue (when CU was used by a member of WMUK) Also I never got a straight answer on this, was there actually any match in that case? Arbcom's findings of fact in the case danced around it. Only in death does duty end (talk)

3. The WMF's privacy and data rentention policies indicate that the information which the Checkuser tool uses to determine technical matches should have been deleted/anonymised within a (relatively for data retention) short period of time. Given the above two questions, both of which were performed significantly after any such technical information should have been purged, can you confirm why that information was still available to be checked against? (ColonelHenry's last edits were 8 months earlier, and Contribs were 3 years, there should have been nothing to run against) Only in death does duty end (talk)

Some of these questions are going to require research, some don't, so I'll start here. The data is lost server side after 90 days, but for particularly aggressive sockpuppeters, data is sometimes retained either on the checkuser wiki or the arbwiki. Othertimes, data is available in the individual CU's email inbox -- the checkuser-l list is not archived, but individual CU's may still have it available in their inboxes. And, of course, the longer you are a CU, the more institutional memory you posses, and lacking that, there are plenty of CU's of long standing whose memories can be tapped. Courcelles (talk) 17:46, 18 November 2015 (UTC)
When you say particularly aggressive, this seems to be a bar that is not set that high from what I can see, there is a huge difference between Grawp and ColonelHenry, and Contribx could barely qualify as a serial sockpuppeter - is there a big difference in what different CUs consider 'aggressive' or 'prolific' or is there a general standard that all CU's use (that appears to me) that is just pretty low? Only in death does duty end (talk)

4. In relation to above - it has been said that Private information is kept (contrary to the WMF's stated policy) in cases of 'long term abuse' however this does not appear to be written in policy anywhere, either here on EN-WP or on Meta (where the WMF hosts their data retention policy) - can you confirm if this is the case, and if so, where this is documented? Only in death does duty end (talk)

See [1], specifically "Exceptions to these guidelines" and the third bullet point of that section (And the third sub-bullet point). Courcelles (talk) 17:55, 18 November 2015 (UTC)
As this might cause you to edit some of your questions, I'm going to stop for now. Courcelles (talk) 18:01, 18 November 2015 (UTC)
No actually this one was fairly clear-cut. I had already identified what I thought was the relevant bit, I just wanted it confirmed. Thanks. Only in death does duty end (talk)

5. In situation above, who decides what information is kept past the retention date? Is it documented? Is it up to the individual Checkuser? Only in death does duty end (talk)

All data that is "publicly" retained is either on the checkuserwiki or the arbwiki (I assume other projects, where the locla arbcom has a private wiki, might contain such data, I, of course, have no way of knowing.) I have no CU data personally retained, though I have access to some data that was sent through checkuser-l in my email, which is behind a strong password and two-factor authentication. Checkuser-l is not archived. Courcelles (talk) 17:57, 18 November 2015 (UTC)
Just to confirm - as Checkuser-l is not archived, any info sent through it that qualifies as 'Personal' while not formally kept in a mailing list archive, may be still available to anyone who had access to Checkuser-l at that point? Possibly still available in the email account they registered with? (depending on their personal archive/deletion process) Is there any requirement to use an email account under the control of the WMF? Only in death does duty end (talk)
Right, checkuser-l sometimes contains raw CU data for the purposes of cross-wiki identification and blocking of socks. And, yes, it can still be in people's emails; there is no way for you to verify what I've kept other than handing over my gmail password. Checkusers are not offered WMF email addresses; so none of us have any sort of WMF oversight of what is retained or not. Courcelles (talk) 17:10, 20 November 2015 (UTC)
Is there any provision that the email account be from a US provider? And do you know what % of current checkusers are based outside the US? (I dont need to know who or specifically where, just if there are non-US CU's and preferably what region, EU, Asia etc.) Only in death does duty end (talk) 10:07, 24 November 2015 (UTC)
Checkuser-l is global, so the subscribers are the stewards, and the CU's from each language Wikipedia that has CU's. Beyond guessing that it is somewhat likely that, say some Japanese WP CU's live in Japan, or some Korean Wikipedia CU's live in the Republic of Korea; we can break them down by language spoken, but not by geography. No one cares about which email service people use; some use a webmail )yahoo, gmail, etc.), some uyse their own domain name, some use their ISP's emai', etc. Also, on the English Wikipedia specifically, several CU's are not from the United States; some openly identify as Canadian or British, for example. Courcelles (talk) 23:50, 3 December 2015 (UTC)

6. Who decides the length? Is it variable or is there a fixed period? Only in death does duty end (talk)

All checkusers are admins on the checkuser wiki, and all arbitrators are admins and bureaucrats on the arbwiki, which means that data there is effectively retained indefinitely, even when individual pages are deleted or blanked. Permanent removal would have to be done by a developer. Courcelles (talk) 17:59, 18 November 2015 (UTC)
So anyone who has the checkuser bit effectively decides on a personal basis if the information is kept, and this decisions continues even once they are no longer a CU? (this is leading on to Qs 7 & 8 below, so I understand you may not know/have this info available to hand - but from your answer to the above, it appears the answer to Q8 is 'never'?) Only in death does duty end (talk)
If you lose the CU flag for any reason, including voluntary resignation, you lose your access to the checkuserwiki. Deleting pages on checkuserwiki and arbwiki makes them much harder to find, especially for newly elected members, but I cannot permanently remove anything once posted to either wiki with any method stronger than regular deletion. What people may retain privately is of course something I can't know. Courcelles (talk) 17:19, 20 November 2015 (UTC)
(The other thing is the log of actual checks is retained indefinitely. This allows some limited comparison with suspected socks that are stale, but were once checked. If someone had checked you in the past, then immediately after a couple IP addresses with the same summary, it is reasonably certain that those were your IP addresses. This only does me any good in being able to get a former geolocation and ISP, but sometimes that is useful information. Courcelles (talk) 17:19, 20 November 2015 (UTC))
Just to clarify: The log would for example have 3 checks in a row in a short period by the same CU, User-IP-IP, the log itself would keep the IP address for the latter two checks indefinately? Only in death does duty end (talk) 10:07, 24 November 2015 (UTC)
Yes. The log of what checks were made, whether they were made on an IP or a user, who made them, and when; is retained indefinitely or effectively so. I just looked, and my first checks after I was appointed in April 2011 are still in the logs. Courcelles (talk) 23:57, 3 December 2015 (UTC)

7. What processes are in place to audit the above decisions on keeping Private data past the WMF's retention date? Where is this documented? Who performs it? Only in death does duty end (talk)

There is the peer-review of the arbwiki, where there have been several attempts to "clean house" of things no longer necessary; and all members of the global ombudsman commission, all stewards, as well as certain WMF staff members (which I will not enumerate further); have access to checkuserwiki. Talking to the ombudsmen might be more useful in determing exactly how much work they do in this area. Courcelles (talk) 17:37, 20 November 2015 (UTC)
RE Peer review - Is this pro-active or reactive? (for random quality control or on request due to a complaint etc)
RE Ombudsman, whats the best venue/location to query them? Only in death does duty end (talk) 10:07, 24 November 2015 (UTC)
This page contains as much about contacting the Ombuusdmen as I know. (They are not a very visible group!) Courcelles (talk) 23:54, 3 December 2015 (UTC)

8. When is this decision re-reviewed and the data allowed to lapse? (if at all) Only in death does duty end (talk)

Unless direct action is taken, data stored on the cuwiki will never lapse. There is activity in the deletion log there, but no one -- as far as I'm aware -- is systematically looking through the data stored there. (And of course, no one is looking through my, or anyone else's) email!) Courcelles (talk) 17:37, 20 November 2015 (UTC)

9. What auditing measures are in place to check that use of the checkuser tool has been used appropriately in the first place? (From browsing the AUSC back catalog, it does not appear they do this from public info I can see) Only in death does duty end (talk)

We do over 4,000 checkusers a month on this project. In my various terms on AUSC, I would from time-to-time look at the CU log to see a sample of what was going on, but full oversight is something I do not believe is happening. All checkusers require somthing to be entered in the "reasons" field; this can be an SPI, a suspect master, or sometimes the reason is so totally obvious looking at the account's contributions the reason is rather perfunctory, but looking at the contributions no one would doubt why the account got checked. Courcelles (talk) 17:41, 20 November 2015 (UTC)

10. Who performs this? Only in death does duty end (talk)

11. What is the approx number of Checkusers performed each month? Only in death does duty end (talk)

No need to approximate, hard data is available at Wikipedia:Arbitration Committee/Audit Subcommittee/Statistics. If you want a longer than six months sample, the page history will supply it. Courcelles (talk) 17:49, 18 November 2015 (UTC)

12. What % of these are randomly audited to assure compliance with the WMF's data protection policies? Only in death does duty end (talk)

There are other follow on questions depending on answers to the above which I am holding back on for now.

Regards.