Open main menu

Wikipedia:Bots/Requests for approval

< Wikipedia:Bots  (Redirected from Wikipedia:BRFA)

BAG member instructions

If you want to run a bot on the English Wikipedia, you must first get it approved. To do so, follow the instructions below to add a request. If you are not familiar with programming it may be a good idea to ask someone else to run a bot for you, rather than running your own.

 Instructions for bot operators


Current requests for approval

DannyS712 bot 30

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 09:47, Saturday, April 20, 2019 (UTC)

Automatic, Supervised, or Manual: supervised

Programming language(s): Python

Source code available: Pywikipedia

Function overview: Update Wikipedia:Database reports/Polluted categories (3)

Links to relevant discussions (where appropriate):

Edit period(s): 1 edit per run, run as needed

Estimated number of pages affected: 1

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Using PAWS and pywikibot, update Wikipedia:Database reports/Polluted categories (3). The update code doesn't work perfectly yet, so some of the category names are broken, but overall it would be a net benefit. I've tested it on the test wiki, and the current output is visible at Special:Permalink/893288113.

Code available at:


DannyS712 bot 29

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 00:44, Wednesday, April 17, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AWB

Source code available: AWB

Function overview: Replace file links to Non-free content policy with links to Wikipedia:Non-free contentWikipedia:Non-free content criteria

Links to relevant discussions (where appropriate):

Edit period(s): One time run to clear the backlog, then as needed

Estimated number of pages affected: ~290 to begin with

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: According to Wikipedia:Most-wanted articles#1–100, at the end of march there were 289 links to Non-free content policy. A similar number is currently present. The vast majority of incoming links are from the file namespace, and should instead link to Wikipedia:Non-free contentWikipedia:Non-free content criteria. Compare File:Marc anthony-marc anthony-album.jpg (redlink to mainspace) with File:CleanGenius logo.png (proper link to policy). This task would only edit in the file namespace, would would not change the display of piped links, only of the links' target.


The policy is at Wikipedia:Non-free content criteria. — JJMC89(T·C) 02:24, 17 April 2019 (UTC)

@JJMC89: Using AWB's list creator (What links here (all NS) (and to redirects)), I found that Wikipedia:Non-free content criteria has 1896 incoming links from pages in the file namespace, while Wikipedia:Non-free content has 9315 incoming links from pages in the file namespace. If indeed the criteria page is the correct target, then I can file another brfa to fix those >9000 other pages, but since both are labeled as policy pages is there any rule about which should be linked to? DannyS712 (talk) 02:38, 17 April 2019 (UTC)
WP:NFC is a guideline, which has one section that transcludes the policy, WP:NFCC. Which one you link to depends on context. In this case, the policy is the intended target. — JJMC89(T·C) 02:44, 17 April 2019 (UTC)
@JJMC89: in that case, shouldn't all files link to the criteria page? If you take a look at File:CleanGenius logo.png, it links to WP:NFC - should it be changed? When does context call for linking (from a file page) to WP:NFC? Thanks, --DannyS712 (talk) 02:48, 17 April 2019 (UTC)
Only if the link text (or other context) indicates that the target is the policy. I just checked and updated the rationale templates. I don't know when someone would want to link to NFC, but since it could happen and be reasonable, a bot shouldn't indiscriminately change the links. — JJMC89(T·C) 03:14, 17 April 2019 (UTC)
@JJMC89: I see - then the other brfa idea is a no-go. But, for this task (which I just updated) do you see any reason not to change the redlinks to point to WP:NFCC? --DannyS712 (talk) 03:26, 17 April 2019 (UTC)
No, those should be fixed. — JJMC89(T·C) 03:36, 17 April 2019 (UTC)

For files using standardized FUR templates (e.g. {{Non-free use rationale album cover}}, {{Non-free use rationale logo}}), you should consider blanking the override fields. I spot checked a few, and the "erroneous" text is effectively identical to the default text. This will decrease our maintenance overhead if we ever decide to change said text again. -FASTILY 09:07, 17 April 2019 (UTC)

@Fastily: what do you mean override fields? Also, I'd prefer to do that as a separate task if its done by bot, since it seems like a very different scope compared to the less than 300 pages this BRFA applies to initially --DannyS712 (talk) 09:11, 17 April 2019 (UTC)
FUR templates usually have override fields. See Template:Non-free use rationale album cover#Syntax. In the example you give above, File:Marc anthony-marc anthony-album.jpg, section "Other information", the text reads: "Use of the cover art in the article complies with Wikipedia non-free content policy and fair use under United States copyright law as described above. ". Compare this with the text on {{Non-free use rationale album cover}}, section "Other information", which reads "Use of the cover art in the article complies with Wikipedia non-free content policy and fair use under United States copyright law as described above." -FASTILY 09:16, 17 April 2019 (UTC)
@Fastily: I see. I'll look into it and work something up for a different BRFA --DannyS712 (talk) 17:56, 17 April 2019 (UTC)
No no, that's not what I meant. If you do that in another BRFA (after this task), you'll risk running afoul of WP:COSMETICBOT. -FASTILY 23:40, 17 April 2019 (UTC)
@Fastily: I don't think its a cosmetic edit. The first example of non-cosmetic edits: Changes that are typically considered substantive affect something visible to readers and consumers of Wikipedia, such as the output text or HTML in ways that make a difference to the audio or visual rendering of a page in web browsers, screen readers, when printed, in PDFs, or when accessed through other forms of assistive technology (e.g. removing a deleted category, updating a template parameter, changing whitespace in bulleted vertical lists) - adding such links change the output text. It provides useful links to both the relevant wikipedia policy, and a general explanation of fair use. --DannyS712 (talk) 23:46, 17 April 2019 (UTC)
I think you misunderstood me. *This* BRFA is not cosmetic. However, you say "I'll look into it and work something up for a different BRFA". *That* will be cosmetic. -FASTILY 00:18, 18 April 2019 (UTC)
@Fastily: I think you misunderstood me. *That* will *not* be cosmetic, since it adds the link(s) and standardizes the use of the template. But, if that is a concern, its possible to just remove the override parameter entirely (ensuring that this will not be a cosmetic edit while also ensuring that only a one time run is needed and that in the future, deviations from the standard format are less prevalent). Thoughts? --DannyS712 (talk) 00:27, 18 April 2019 (UTC)
  Facepalm. Here is an example edit of what I'm talking about. If you don't want to do this, then don't. I've already outlined my reasoning above, so I won't be repeating it here. -FASTILY 00:53, 18 April 2019 (UTC)
@Fastily: Oh... in that case, I think that would be too much of a context bot. Sorry, --DannyS712 (talk) 01:10, 18 April 2019 (UTC)

SportsStatsBot 2

Operator: DatGuy (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 18:49, Thursday, April 4, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available:

Function overview: Automatically update football (soccer) players' career statistics

Links to relevant discussions (where appropriate):

Edit period(s): Every 15 minutes

Estimated number of pages affected: Unsure

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: I've been holding a few tests over at the testwiki. Runs a check every 15 minutes. Uses data from (provided from Opta Sports)


  • Looking over the contributions at testwiki, that page doesn't appear to have any sources. How exactly would you add sources for these edits here on enwiki? --DannyS712 (talk) 04:52, 7 April 2019 (UTC)
  • Unless the plan is for a subset of footballers, there are more than 100,000 football biographies, so I think you'd want to think about how often the statistics need to be updated (even once a month would be in the range of tens of thousands of edits a month, depending on whether it is offseason of course) and discuss at WT:FOOTBALL. Galobtter (pingó mió) 07:44, 7 April 2019 (UTC)
  • The plan is for a case-by-case basis starting out. Afterwards if all goes well, I'll seek to gain consensus on categories for a specific league/country before mass-implementing any changes. Dat GuyTalkContribs 13:30, 7 April 2019 (UTC)
    @DatGuy: You say "case-by-case basis starting out" - what pages do you intend to start out with? --DannyS712 (talk) 00:52, 10 April 2019 (UTC)
Probably some Championship players. Dat GuyTalkContribs 11:03, 11 April 2019 (UTC)

  A user has requested the attention of a member of the Bot Approvals Group. Once assistance has been rendered, please deactivate this tag by replacing it with {{tl|BAG assistance needed}}.


Operator: Tymon.r (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 00:16, Friday, March 15, 2019 (UTC)

Function overview: Maintenance – automatic (procedural) closure of WP:AfD discussions when nominated pages do not exist

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: standard pywikipedia

Links to relevant discussions (where appropriate): n/a

Edit period(s): continuous, being run every 3 minutes

Estimated number of pages affected: up to a few a day

Namespace(s): Wikipedia

Exclusion compliant (Yes/No): No

Function details: Closing per WP:PCLOSE AfD discussions when nominated pages do not exist, e.g. when they've been already speedy deleted or their title is mistyped. Informing a nominator about a closure on his user's talk page. In every run bot will go through AfD log pages for last 7 days and check for existence of nominated pages. If a nominated page doesn't exist, it will close (edit) page's AfD discussion in accordance with WP:AFD/AI and then inform a nominator about closure performed, stating possible reasons of the closure (title mistyped, article speedy deleted, etc.). The bot shall not perform any actions/closures when a decision oughts to be made. In future bot's functionality could be extended to other XfDs, but if so, it'd requested in separate BRFA. Best, Tymon.r Do you have any questions? 00:16, 15 March 2019 (UTC)


I note User:AnomieBOT does something similar for WP:TFD and WP:FFD (and some related tasks at WP:CFD, but currently detecting the nominated categories there seems too prone to errors), although that doesn't preclude your bot doing this task for WP:AFD.

I see your BRFA says the source code is "standard pywikipedia", but I don't see any script included with Pywikibot for doing this task. Useful additional features compared to your manual diff include relaying the deleting admin and deletion reason from the log (after verifying it's not a deletion log entry previous to the AFD itself), detecting "moved without redirect" as being distinct from "nominated title does not exist", and allowing the deleting admin a chance to manually close before the bot does it for them. Anomie 13:21, 15 March 2019 (UTC)

@Anomie, thanks for your input and your work with User:AnomieBOT! Agreed – my description of the programming technique used to create a bot is imprecise. The bot'd based on pywikipedia, but, as you mentioned, it'd need to use some additional self-written scripts to handle non-standard operations, e.g. checking a log of a deleted page. For the time period in which an admin could close AfD himself – I'd propose 10 minutes of delay before performing an automated closure. Best, Tymon.r Do you have any questions? 13:24, 16 March 2019 (UTC)
  1. I note that AnomieBOT's tasks specific who deleted the relevant page (FFD, TFD) - do you intend to do the same?
  2. Would there be a way for admins to opt-out of having their deletions closed for them?
Thanks, --DannyS712 (talk) 01:53, 22 March 2019 (UTC)
@DannyS712, thanks for your input.
  • Ad 1 – I believe it'd worth to have an unified message form used by bots (procedurally) closing XfDs. Therefore I'd probably retrieve this particular information from the page's log for each closure. I don't see it as something necessary, though. Getting to know this is as easy as pressing the red link.
  • Ad 2 – I don't see it necessary. First and foremost, because of a delay before which the bot won't be automatically closing discussions, leaving an adequate timeframe for a closer to do it at his own. It's in the best interest of a smoothness of a deletion process and no one will ever forbid a closing administrator from editing an AfD discussion, e.g. by adding some comments regarding deletion, even after it is closed by the bot.
Best, Tymon.r Do you have any questions? 14:05, 24 March 2019 (UTC)
You are planning on "a few" edits per day, but need to run this 480 times a day? Are you going to be hosting this somewhere for continuous operations? — xaosflux Talk 02:14, 22 March 2019 (UTC)
@Xaosflux, Yes, to catch all potentially interesting changes for the bot and to ensure that no discussions on an already removed article remains open for too long. Well, this script wouldn't need many servers' resources – running it, even more often, would be still cheap. I've been considering hosting it on WP:TOOLFORGE or my own VPS. Best, Tymon.r Do you have any questions? 14:05, 24 March 2019 (UTC)

I share the concerns of DannyS712 above. Additionally, if this is going to be posting to user talk pages it should really be exclusion compliment (especially as posting to User Talk isn't AFAIK standard practice when closing an AfD)
I would also question the benifit of running this over all 7 days worth of logs rather than just recent nominations - how common are pages being deleted for unrelated reasons after being at AfD for more than 24 hours or so?
Finally, like Xaosflux above, I'd question why this needs to run every 3 minutes; I'd suggest every 15 minutes at most if you are going to scan all AfDs is likely more appropriate. Mdann52 (talk) 07:42, 22 March 2019 (UTC)
@Mdann52, thanks for your input. I hope you'll consider my replies to Xaosflux and DannyS712 above. Regarding exclusion compliment – definitely agreed. Posting to users' talk pages should be definitely facultative. Best, Tymon.r Do you have any questions? 14:05, 24 March 2019 (UTC)

{{BAGAssistanceNeeded}} It's been over a week already and no decision regarding the trial's been made. Tymon.r Do you have any questions? 10:13, 1 April 2019 (UTC)

I know I'm coming a little late to the party, but do you have any specific stats for how frequently this problem arises? I ask mainly because of the previous questions about how often this will run; if (for an extreme example) there is one "event" per day, it doesn't make much sense to have a bot do this task as there will undoubtedly be plenty of editors who will catch the mistake themselves. Primefac (talk) 20:55, 1 April 2019 (UTC)

Xinbenlv bot

Operator: Xinbenlv (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 06:29, Wednesday, February 20, 2019 (UTC)

Function overview: User:Xinbenlv_bot#Task 1: Notify (on Talk page) cross language inconsistency for birthdays.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Javascript

Source code available: [1]

Links to relevant discussions (where appropriate): Wikipedia:Village_pump_(technical)/Archive_166#Cross_Lang_Conflicts

Edit period(s): daily or twice a week

Estimated number of pages affected: 30 per day to begin with, can increase to 100 per day if community sees it helpful. Speed is completely controllable. Overall, there are a few thousands between major wikis like EN - JA(~3000), EN - DE(~5000).

Namespace(s): Talk

Exclusion compliant (Yes/No): Yes

Adminbot (Yes/No): No

Function details:

The bot will notify editors by writing a new section on Talk page of a subject, if that subject has inconsistent birthdays in this and another wikipedia languages.

The data of inconsistency comes from a public available dataset Github, called Project WikiLoop. An example edit looks like this

- Notifying French Editors fr:Utilisateur:Xinbenlv/sandbox/Project_Wikiloop/unique_value/Discussion:Samuel_Gathimba
- Notifying English Editors en:User:Xinbenlv/sandbox/Project_Wikiloop/unique_value/Talk:Samuel_Gathimba


  • {{TakeNote}} This request specifies the bot account as the operator. A bot may not operate itself; please update the "Operator" field to indicate the account of the human running this bot. AnomieBOT 06:49, 20 February 2019 (UTC)
Fixed, changed to User:Xinbenlv. Xinbenlv (talk) 06:54, 20 February 2019 (UTC)
  • {{TakeNote}} This bot appears to have edited since this BRFA was filed. Bots may not edit outside their own or their operator's userspace unless approved or approved for trial. AnomieBOT 06:49, 20 February 2019 (UTC)
@Anomie, @AnomieBOT, Sorry, I mistakenly used my bot account to create its BRFA, it was me manually. The only bot auto edits are those in its User page. Xinbenlv (talk) 06:52, 20 February 2019 (UTC)
Don't worry about it Xinbenlv. I've struck it now as the notice isn't relevant. --TheSandDoctor Talk 04:17, 21 February 2019 (UTC)
Thank you, that makes sense. I also updated the Not for operator. Let me know if I've not done it right. @TheSandDoctor. Xinbenlv (talk) 07:18, 21 February 2019 (UTC)
This bot is helping on cross-language inconsistency therefore it shall be editing other languages, how should I apply for global bot permission? Xinbenlv (talk) 06:53, 20 February 2019 (UTC)
@Xinbenlv:, m::BP should be what you're looking for. RhinosF1(chat)(status)(contribs) 16:16, 20 February 2019 (UTC)
Thank you RhinosF1 thank you!. it seems the m::BP requires the bot to obtain local community permission and keep it running locally for a while. Therefore, I think I shall apply for approvals from multiple local communities each individually for now. Do I understand it correctly? Xinbenlv (talk) 18:48, 20 February 2019 (UTC)
Xinbenlv, That's how it read to me aswell. It's probably best to make them aware anyway before launching anything that will affect them in a big way (e.g. mass notifications being issued). You don't want to cause confusion. RhinosF1(chat)(status)(contribs) 19:01, 20 February 2019 (UTC)
RhinosF1 Thanks, agreed! That's why I am asking advice and approval in English Wikipedia so this most active community can help take a look of my (wild?) idea. Xinbenlv (talk) 19:07, 20 February 2019 (UTC)
Xinbenlv, I think it's a great idea. RhinosF1(chat)(status)(contribs) 19:25, 20 February 2019 (UTC)
Thanks everyone who are interested. Just so that you know, the bot has two trial edits on German wiki, as encouraged by the BRFA discussion. Feel free to take a look and advice is welcomed! Xinbenlv (talk) 21:59, 21 February 2019 (UTC)
Added Xinbenlv (talk) 17:15, 25 February 2019 (UTC)
    1. How often is the dbase updated? Could this potentially result in one page receiving multiple notices simply because no one has either seen or cared enough to fix the missing information?
Datebase will be updated on a daily / weekly basis, currently still in development. I plan to also rely on "Xinbenlv_bot" to surppress articles that already been touched by the same bot. Xinbenlv (talk) 17:15, 25 February 2019 (UTC)
This seems like a reasonable task to deal with cross-wiki data problems, just want to get a better feel for the size and scope of the task. Primefac (talk) 20:26, 24 February 2019 (UTC)
Thanks @Primefac: If I apply to change the bot scope to be "=<200 edits in total" for first phase, what do you think? Xinbenlv (talk) 21:37, 24 February 2019 (UTC)
The number of edits per day/week/month can be discussed, I'm just looking for more information at the moment. Primefac (talk) 21:47, 24 February 2019 (UTC)
What can I do to provide the information you need? Xinbenlv (talk) 02:04, 25 February 2019 (UTC)
Just looking for some numbers. I assume you know where to find them better than I would. Primefac (talk) 02:07, 25 February 2019 (UTC)
@Primefac: The EN-JA file contains around ~3000 inconsistencies of birthdays, the EN-DE contains around ~5000 inconsistencies. To begin with, I think we can limit to 100 - 200 edits on English Wikipedia. Xinbenlv (talk) 16:47, 25 February 2019 (UTC)

@Xover's suggestion regarding using maintenance template

Would adding a maintenance template (that adds a tracking category) be a viable alternative to talk page notices? It might be more effort due to the inherently cross-project nature of the task, but talk page notices are rarely acted on, is extra noise on busy talk pages, and may cause serious annoyance since the enwp date may be correct (it's, for example, the dewp article that's incorrect) and the local editors have no reasonable way to fix it. A tracking category can be attacked like any gnome task, and the use of a maint template provides the option of, for example, flagging a particular language wikipedia as having a verified date or specifying that the inconsistency comes from Wikidata. In any case, cross-project inconsistencies are an increasingly visible problem due to Wikidara, so kudos for taking on this issue! --Xover (talk) 18:41, 25 February 2019 (UTC)

@Xover: thank you. So far, I am applying to 5 different wikis for botflag in the same time. I received 3 suggestions:
1. use template and transclusion
2. add category
3. put it as a over article "cleanup" message box or Talk page message.
For the #1 and #2, there is consensus amongst all responding communities (EN, DE, ZH, FR). So now the trial edits on these communities are using template and category, see ZH examples:
For #3, put it as an over article "cleanup" message box, the DE community some editors prefer a Talk page message, while some prefer over-article message box. My personal opinion is that we can start slow, do some Talk page message (like 200) for trial edits, and then when they looks good, we can start to approve for allowing the bot to write over article messages? The reason being, I hope it demonstrate more stability before writing on (article) namespace. Especially for such high impact wikis of English wikipedia.
By the way, the format I prepare for English wikipedia is actually a maintenance template at User:Xinbenlv_bot/msg/inconsistent_birthday, could you take a look, @Xover:?
Xinbenlv bot (talk) 22:09, 25 February 2019 (UTC)
Well, assuming the technical operation of the bot is good (no bugs) maint. templates in article space are generally less "noisy" than talk page messages (well, except the big noisy banners that you say dewp want, but that's up to them). I suspect the enwp community will prefer the less noisy way, but I of course speak only for myself. In any case, I did a small bit of copyediting on the talk page message template. It changed the tone slightly, so you may not like it, and in any case you should feel free to revert it for whatever reason. Finally, you should probably use {{BAGAssistanceNeeded}} in the "Trial edits" section below. --Xover (talk) 05:22, 26 February 2019 (UTC)
There was a consensus to stop InternetArchiveBot from adding talk page notices. I suspect that if this bot were to start running that there would be a similar consensus to stop adding the same. My suggestion is not to do #3. --Izno (talk) 23:29, 29 March 2019 (UTC)

Trial Edits now available (in sandbox)

Dear all admins and editors,

I have generated 30 trial edits in sandbox, you can find them in en:Category:Wikipedia:WikiProject_WikiLoop/Inconsistent_Birthday. I also generated 3 trial edits in real Talk page namespace

Please take a look. Thank you!

Xinbenlv (talk) 00:13, 26 February 2019 (UTC)

Update: [2] shows editor @LouisAlain: who happens to be the creator of en:Gaston_Blanquart, which is one of our 3 trial edits, update the birthday and death date on English Wikipeda. Xinbenlv (talk) 08:22, 1 March 2019 (UTC)
Update : generated 10 more trial edits in Talk namespace, I will actively monitor them. Xinbenlv (talk) 08:33, 1 March 2019 (UTC)
Dear Admins and friends interested in this topic @RhinosF1:, @Primefac:, @Xover:, @TheSandDoctor:, how do I proceed to apply for the bot status? Xinbenlv (talk) 00:38, 7 March 2019 (UTC)

Confess - realized trial edits before trial approval

{{BAG assistance needed}}

Dear Admin, I just realize English Wikipedia requires trial edits approval before running trial edits, which I already did for 9 edits in (Article) namespace. Shall I revert the trial edits? I am sorry Xinbenlv (talk) 21:13, 15 March 2019 (UTC)
@Xinbenlv: don't revert if they were good edits. — xaosflux Talk 13:49, 20 March 2019 (UTC)
@Xaosflux:, OK, thank you! By the way, is there anything else I need to do other than just wait for people to comment? It seems the discussion has halted.
How should I get trial approval?
Xinbenlv (talk) 18:08, 20 March 2019 (UTC)
Xinbenlv You just have to wait for a a member of the bot approvals group to come and approve a trial. Galobtter (pingó mió) 15:07, 22 March 2019 (UTC)

Discussion Redux

Could I just verify something? I notice that all of the sandbox trials are placing what appear to be talk page sections, while it sounds like the majority of participants (on multiple languages) feel either a maintenance template or category are more appropriate to fix this issue.

In other words, the template you've made looks like it's a wall of text that (as mentioned previously) users aren't generally thrilled about dealing with. Is there another way to make this template look more like a "maintenance" template? Maybe just the intro line ("An automated process has determined...") and the table, with instructions to remove when checked? Something that can be placed at the top of a talk page? Primefac (talk) 20:16, 4 April 2019 (UTC)

@Primefac: thank you for your question.
Message box My understanding of consensus is the other way around, for example in EN Wiki, My suggestion is not to do #3. --Izno.. In German on after a long discussion they reached a consensus that a talk page section (not look like a message box) is preferred in their opinions.
Category: The category is in place, see en:Category:Wikipedia:WikiProject_WikiLoop/Inconsistent_Birthday, this is added by including the template.
Actually I have an iteration that does message-box like notification but then was suggested to change to talk page section.
Something that makes this process very challenging is this is a cross language project so we are trying to accommodate suggestions from different language of Wikis while try to keep them as aligned as possible so we can effectively maintain them across languages. See FAQ m:User:Xinbenlv_bot

─────────────────────────   On hold. I feel there's a sweet spot to be had. A short message done through a template would be ideal.

== Possible Wikidata issue==
{{Inconsistent Interwiki/Wikidata Issue<!-- Come up with a better name than this please
 |lang1=fr |subject1=Ernst Joll |date1=1902-06-19
 |land2=en |subject2=Ernst Joll |date2=1902-09-10
Automated notice by ~~~~

@RexxS and Pigsonthewing:, you're the resident Wikidata experts here. Could you come up with a template that scales generalize to other Interwiki/Wikidata conflits? @Xinbenlv: feel free to participate in those efforts too. Until that template is designed, I'm going to put this on hold. Headbomb {t · c · p · b} 05:21, 9 April 2019 (UTC)

Thank you, agreed Xinbenlv (talk) 19:01, 15 April 2019 (UTC)

Bots in a trial period

DannyS712 bot 28

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 03:15, Monday, April 15, 2019 (UTC)

Automatic, Supervised, or Manual: supervised

Programming language(s): Python

Source code available: Pywikipedia

Function overview: Update Wikipedia:Database reports/Polluted categories (2)

Links to relevant discussions (where appropriate):

Edit period(s): 1 edit per run, run as needed

Estimated number of pages affected: 1

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Using PAWS and pywikibot, update Wikipedia:Database reports/Polluted categories (2). The update code doesn't work perfectly yet, so some of the category names are broken, but that doesn't pose a problem for Task 27 (which is the reason I created the database report) since it'll just skip the category when it has no members because it doesn't exist. I've tested it on the test wiki, and the current output is visible at Special:Permalink/892521503. I count less than 2 dozen broken category names.

Code available at


  Approved for trial (10 days).xaosflux Talk 03:38, 15 April 2019 (UTC)
@Xaosflux: how many times do you want me to run it within those 10 days? Once a day? --DannyS712 (talk) 03:38, 15 April 2019 (UTC)
You said it is 'run as needed' - so whatever you expect the normal run for this will end up being (keep it under 100 edits). — xaosflux Talk 03:41, 15 April 2019 (UTC)
@Xaosflux: okay, definitely. I expect to run it once every few days (assuming task 27 is approved, I'll run it twice, once before I want to run 27, and once afterwards to see what I may have missed). I've done the first edit if you want to take a look. Thanks so much, --DannyS712 (talk) 03:43, 15 April 2019 (UTC)


Operator: Bradv (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 19:31, Tuesday, April 9, 2019 (UTC)

Automatic, Supervised, or Manual: supervised

Programming language(s): Python

Source code available:

Function overview: Assist the clerks with announcements at the Arbitration Committee Noticeboard.

Links to relevant discussions (where appropriate):

Edit period(s): Continuous

Estimated number of pages affected: 2+

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): No

Function details: This bot will handle announcements from the Arbitration Committee per the clerk procedures. It automatically creates the appropriate talk page section, adds the "Discuss" link at the bottom of the post, and crossposts to WP:AN and the talk pages of any users mentioned within the announcement. To prevent duplicate entries or the propagation of inappropriate posts, the bot checks to ensure that the last user to edit the noticeboard is an arbitrator or clerk before acting, and will not create sections with duplicate names on target pages. The clerks will be encouraged to verify that the crossposting was done correctly after each announcement.


Hi @Bradv:, can you link to any arbcom related discussion where this was requested/desired? — xaosflux Talk 19:41, 9 April 2019 (UTC)
Xaosflux, there has been discussion about this on the clerks' mailing list, but none onwiki so far. Other clerks and arbitrators are welcome to comment here. – bradv🍁 19:46, 9 April 2019 (UTC)
@Bradv: would you please drop a link in from the noticeboards in case anyone wants to know about this. — xaosflux Talk 19:53, 9 April 2019 (UTC)
Xaosflux, done. – bradv🍁 20:06, 9 April 2019 (UTC)
I've been following the clerks-l discussion, and the bot would take care of a number of tedious and time-consuming tasks. Miniapolis 20:00, 9 April 2019 (UTC)
@Bradv: will you want to be using the 'bot' flag on these edits (hiding from recent changes/watchlists?) — xaosflux Talk 20:21, 9 April 2019 (UTC)
Announcements crossposted to AN or to user talk pages should not be marked with the bot flag, but the changes to the noticeboard could. This is identified in the code and tested on testwiki, but a bot flag is not strictly necessary for this task. – bradv🍁 20:26, 9 April 2019 (UTC)
Can you estimate your volume (in edits/interval)? — xaosflux Talk 20:21, 9 April 2019 (UTC)
There have been 5 posts to ACN in the past month, each of which would have required 3 or 4 edits by this bot. – bradv🍁 20:26, 9 April 2019 (UTC)
I commented on the discussion on clerks-l and can say that this rather clearly had consensus among the Committee/clerks. I hadn't thought about the bot flag decision, but I'd lean toward no bot flag. There's really no need for one, since this bot wouldn't be doing any high-volume editing. ~ Rob13Talk 20:42, 9 April 2019 (UTC)
It seems unnecessary to run this every minute (as in the source code currently). Every five minutes is more reasonable with 15 minutes seeming well enough, considering the limited number of times per month the bot actually has to do a post. Running parser.parse every time is also a bit intensive; I'd suggest checking that the last edit made to WP:ACN occurred after the last run of the bot first.
TBH, while a bot would work of course, this seems more suited to a user script that clerks could use when posting an announcement. Galobtter (pingó mió) 09:28, 10 April 2019 (UTC)
Galobtter, thanks for your feedback. I had the interval set for 60 seconds for testing, but can certainly slow that down when this goes live. Checking to see if the page has changed before calling the parser is a good idea as well - I will make that change.
The main reason to implement this functionality as a bot rather than a script is so that it does not require extra action by the arbitrators when posting announcements to this page, or extra scripts for them to install. Urgent notices from the committee occasionally get posted here, and a clerk may not be around to do the actual crossposting. We also have plans to extend this bot into other areas of the clerk procedures, so it makes sense to set this up as a bit of a framework. – bradv🍁 14:37, 10 April 2019 (UTC)
Hopefully I can help by weighing in as a current committee member. While a user script would not be unhelpful, I agree with Bradv that a bot is the superior solution. Among other factors, a script would need all committee members to install it – the bot works seamlessly in the background. AGK ■ 23:10, 13 April 2019 (UTC)
  Approved for trial (50 edits or 30 days). go ahead and try it out, report back the results of your testing here when done. — xaosflux Talk 15:11, 14 April 2019 (UTC)
@Bradv, AGK, and BU Rob13: just discovered the bot's first bug - Special:Diff/892477425 has an improper section link to the discussion, likely caused by the presence of a section link in the noticeboard post's heading. As far as I can tell, every edit, including the one to WP:A/N, needs to be fixed, either by escaping the brackets in the heading or more easily removing the link in the heading. I didn't fix them myself since they are official notices from the committee. Also, bug #2: should respect redirects, and follow them (and avoid double posting when it does follow them). See Special:Diff/892477395. Thanks, --DannyS712 (talk) 20:25, 14 April 2019 (UTC)
I migrated from wikitextparser to mwparserfromhell, which has the capability of stripping the template code from the title, and I added code to resolve the redirect. This has been updated and tested on testwiki, and should work fine unless GorillaWarfare finds a new way to trip me up. – bradv🍁 04:49, 15 April 2019 (UTC)
I'm sure I'll think of something... GorillaWarfare (talk) 05:12, 15 April 2019 (UTC)

DannyS712 bot 13

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 10:01, Sunday, March 10, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AWB

Source code available: User:DannyS712 test/GTK.js

Function overview: Easy tag all of a category's subcategories with notices that they are being discussed at CfD

Links to relevant discussions (where appropriate): Wikipedia talk:Categories for discussion/Archive 17#Tagging bot

Edit period(s): As needed

Estimated number of pages affected: Varies, I'd guess ~50 per run

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Traverse through the subcategories within a category and add tags to them. Examples of when this would be really useful are listed in that discussion, but in general it would be for mass-cfds ("Rename XYZ and all of its subcategories", etc)


How would you ever know it is time to run this task? — xaosflux Talk 03:20, 11 March 2019 (UTC)
@Xaosflux: It would run as-needed (I'd leave a note at CfD saying that if people wanted to nominate an entire category tree, or a list of categories that's really long, and don't want to tag them manually, I could do it for them). --DannyS712 (talk) 03:24, 11 March 2019 (UTC)
@DannyS712: this almost feels like it would be better as a user script (e.g. 'xfd-batch' for twinkle, with a 'recurse' option), any thoughts?
  • Do you plan on any sort of max-tag-per-request limits here? — xaosflux Talk 03:33, 11 March 2019 (UTC)
    @Xaosflux: this would be a user-script, and I would be willing to run it from my account rather than the bot. I don't know what xfd-batch is (admin only?) but since WP:BOTDICT defines automated editing as Refers to editing that is done automatically, without human review, i.e. editing done by bots., I thought I should err on the safe side and file a BRFA, because I do not intend to review each edit --DannyS712 (talk) 03:37, 11 March 2019 (UTC)
    As for max-tag-per-request, since it would be triggered manually I would decide for each request if it seems to broad --DannyS712 (talk) 03:38, 11 March 2019 (UTC)
  • 'xfd-batch' doesn't exist in twinkle, I was comparing it to some options like delete-batch, protect-batch. — xaosflux Talk 03:46, 11 March 2019 (UTC)
    @Xaosflux: Then yes, that is very similar to what I would be doing, but I filed this BRFA to be on the safe side (see explanation above) --DannyS712 (talk) 10:58, 11 March 2019 (UTC)
  • As far as I can see, this is not a bot request. It's a user script, which may or may not be shared to users beyond its creator.
From what I can see, the script above is way too simplistic, and is suitable on for some very simple cases at CFDS. I am concerned that releasing it for wider use will lead to it being used in the many more complex cases where it will produce inaccurate output.
The task which it performs is one which I encounter several times a month, for full CFD discussions, CFDS nominations, and WP:RM nominations. I do it by using of a set of AWB custom modules which I hack on a per-case basis. My experience is that
  • In a bit less than half the cases, the tagging can be achieved by a plaintext replace function
  • In the rest, one or more regexes are needed
  • In all cases, some care is needed to ensure that all 3 tasks are performed accurately:
    • tag so the the tag includes the name of the target category. e.g.
      {{cfr}} → a tag saying "rename to some other title"
      .. but {{cfr|MyNewTitle}} → a tag saying "rename to Category:MyNewTitle"
    • tag so that the name of the discussion section is included, otherwise the links will point to the wrong place. e.g.
      Wrong: {{cfr|MyNewTitle}}
      Right: {{cfr|MyNewTitle|DiscussionSectionHeading}}
    • a meaningful edit summary. The edit summary should both describe the proposed action, and the location of the discussion
Some examples:
  1. CFDS plus subcats:
    • CFDS listing:[3]
    • tagging example [4]
    • code needed: plain text replace
  2. Full CFR discussion of by-year cats for British Empire / British Overseas territories
  3. Full CFR ~650 "Republic of Macedonia" categories:
    • CFD discussion: WP:Categories for discussion/Log/2019 February 16#North_Macedonia
    • tagging examples: [6], [7]
    • code needed: a single regex, to accommodate the fact that some old the old titles were of the form "Republic of Macedonia foo" and some of the form "Foo in the Republic of Macedonia". The word "the" needed to be removed if present, so the regex was s/(([tT]he +)?Republic +of +Macedonia/North Macedonia/
Code such as this can probably do a good job in some simple cases. Unfortunately, there are any other cases where it risks mistagging dozens of categories.
I also think that javascript is not a good tool for these uses, because it does not allow a test of the first edit before proceeding, manual intervention for edge cases, etc. When I use AWB, I do the first edit, then stop and check its effects: is that tag correct? Is it linking to the right discussion section on the right page? Is the edit summary accurate, and doe sit too link correctly?. I then check a few a more variants before whacking the save button repeatedly through the rest of the list
AFAICS, a javascript tool will just proceed through the list in one go, with no possibility of intervention nif there is an unforeseen error (which in my experience there often is).
I have huge regard for Danny's skills and conscentiousness both as an editor and as a programmer (he really should be an admin), but in this case I think he is using the wrong tool, and has not taken enough account of the many variations which arise in this sort of group nomination. --BrownHairedGirl (talk) • (contribs) 09:02, 21 March 2019 (UTC)
@BrownHairedGirl: This task would extend to types 1 and 3 listed above, both of which are done using a single regex. --DannyS712 (talk) 18:53, 21 March 2019 (UTC)
@DannyS712, my concern is that any script will end up being used by other editors for tasks where it wouldn't do the job properly ... as indeed with your own proposal of it at WP:Categories for discussion/Log/2019 March 13#Category:American_female_rappers. --BrownHairedGirl (talk) • (contribs) 20:21, 21 March 2019 (UTC)
@BrownHairedGirl: By if @Xaosflux or another BAG member wants to trial this task, I meant send it to trial for me to run. All of my tasks that are done in javascript are hosted on wiki, meaning that their source code is visible to users (I haven't figured out how to use toolforge yet) with the implicit understanding that using them without permission is just like using any other tool to bot-like edit without permission - against policy. I won't venture into BEANS territory, but any script I make will, as far as I am concerned, only be run by me - if another editor tries to use the script, they are responsible. As for doing the job properly, I would set the regex each time for each run, and would manually check (from the bot account) a few to see that it works before setting the bot loose on an entire category. The reason I didn't do it with AWB is because the regex relies on the name of the category itself, which as far as I am aware can't be accessed from the regex within AWB. I hope this explanation allays your concerns. Thanks, --DannyS712 (talk) 21:40, 21 March 2019 (UTC)
AWB can access the pagename through custom modules. I am using a simple one right now (with 2 alternative plaintext replaces) to tag >1000 categories for WP:Categories for discussion/Log/2019 March 21#Places_of_worship. --BrownHairedGirl (talk) • (contribs) 22:08, 21 March 2019 (UTC)
@BrownHairedGirl: in that case, would you mind sending me that custom module? I'll look into it, and maybe change this task to be AWB-based --DannyS712 (talk) 22:13, 21 March 2019 (UTC)
@DannyS712: Email sent. --BrownHairedGirl (talk) • (contribs) 22:46, 21 March 2019 (UTC)
@BrownHairedGirl: Thanks, but how does one use a custom module in there first place? --DannyS712 (talk) 23:01, 21 March 2019 (UTC)
@DannyS712: Menu bar → Tools → Make module. Then paste in your module, enable module at top left, then "make module" on the right. --BrownHairedGirl (talk) • (contribs) 23:05, 21 March 2019 (UTC)
@BrownHairedGirl: After 1 mistake, I managed to use the module to tag a number of categories for CfD (a nomination had been made without tags) - contribs: here. Given that this worked so easily, I'm changing this task to AWB. @Xaosflux does this satisfy your concerns? --DannyS712 (talk) 02:22, 25 March 2019 (UTC)
@DannyS712, I'm glad that worked. But it was a relatively simple case, without any need for multiple regexes. I'd happier to see you deploying an adapted version of the module on more complex cases before this gets the bot flag. --BrownHairedGirl (talk) • (contribs) 09:09, 26 March 2019 (UTC)
@BrownHairedGirl: Sure, let me know when you have another mass-nomination. --DannyS712 (talk) DannyS712 (talk) 20:16, 27 March 2019 (UTC)
To be honest, @DannyS712, I have a pile of tools already made, so whenever I have a mass nom, I find it quick and easy to just tag them myself. I usually AWB to create the lists for the nom, so it's a tiny extra bit of work to then chick in the appropriate module and tag; probably less effort that explaining it. But I'll try to remember to pass one your way so that you can test the bot. --BrownHairedGirl (talk) • (contribs) 20:34, 27 March 2019 (UTC)
@BrownHairedGirl: Thanks, --DannyS712 (talk) 20:39, 27 March 2019 (UTC)

Request, copied from wp:botreq: "There was a request to move categories with "eSports" to "esports" per WP:C2D at WT:VG, but that list is sizable. Is there someone here who can take care of the listing and tagging? (Avoid the WikiProject assessment categories.)" (made by @Izno:) - can I do this with my bot as a trial? --DannyS712 (talk) 19:50, 1 April 2019 (UTC)

also, tagging the category (and sub categories) discussed at: Wikipedia talk:Categories for discussion#Anyone wants to standardize the naming in Category:Criminals by occupation? --DannyS712 (talk) 23:51, 1 April 2019 (UTC)
{{BAGAssistanceNeeded}} no BAG input since Xaosflux's questions on 11 March. I'd like to request a trial with one or both of the category sets I mention above. Thanks, --DannyS712 (talk) 23:20, 3 April 2019 (UTC)
  Approved for trial (Up to 3 test cases). (that is 3 mass nominations), preferably with categories that have relatively few articles pages, with the understanding that this is to be done as a semi-automated task during at least the first 10 edits of each mass nominations, with high levels of scrutiny. And since this concerns deletion tagging, the bot flag should not be set, we want those popping in watchlists. Headbomb {t · c · p · b} 04:44, 9 April 2019 (UTC)
@Headbomb: how do I turn off the bot flag? --DannyS712 (talk) 04:45, 9 April 2019 (UTC)
Actually, I don't know. @Magioladitis and Reedy:, any help here? Headbomb {t · c · p · b} 04:50, 9 April 2019 (UTC)
Just run the code from your main account (it does not have the bot flag) to check that it actually works. -- Magioladitis (talk) 18:15, 9 April 2019 (UTC)
@Izno: I'm going to start with your request - I made a list at User:DannyS712 bot/Task 13 - you want the first 49 categories tagged for moving from "eSports" to "esports" per C2D, right? Headbomb, what do you mean by having few articles? I'm only editing category pages. --DannyS712 (talk) 04:57, 9 April 2019 (UTC)
I meant pages, fixed above. Headbomb {t · c · p · b} 04:59, 9 April 2019 (UTC)
@Headbomb and Izno: I tagged all 48 eSports to esports categories. I'll do the second trial run soon --DannyS712 (talk) 00:37, 10 April 2019 (UTC)
@DannyS712: I'll review soon, but any issue with things as far as you can tell? Headbomb {t · c · p · b} 00:39, 10 April 2019 (UTC)
@Headbomb: some (2) of them were soft redirects to "esports" already, but I tagged them anyway so that it can be official. --DannyS712 (talk) 00:41, 10 April 2019 (UTC)
I don't see the edits on the bot contribs? Headbomb {t · c · p · b} 00:44, 10 April 2019 (UTC)
@Headbomb: Contribs were in my main account per Magioladitis' suggestion - see [8] --DannyS712 (talk) 00:45, 10 April 2019 (UTC)
@Headbomb: it wasn't fun to click the same button 48 times. I understand your concern about appearing on watchlists, but that will also apply if the task is approved, right? Meaning that this task should never be run from User:DannyS712 bot. Should I make another bot account for unflagged tasks? And, if so, can I use that for the rest of this trial? --DannyS712 (talk) 00:44, 10 April 2019 (UTC)
Up to you from which account you want to run this, DannyS712 bot, or DannyS712 bot 2 or whatever, but if it's an AWB bot, you'll either have to do it semi-automatically with a non-flagged bot account, or you figure out how to not use the bot flag from an bot account with AWB. Magioladitis or Reedy would have insight here, so if they don't reply, I suggest talking to them on their talk page. Headbomb {t · c · p · b} 00:52, 10 April 2019 (UTC)
@Headbomb: I'll just use User:DannyS712 bot 2 - can you give it AWB bot rights (but not a bot flag) so it can be automatic? --DannyS712 (talk) 01:06, 10 April 2019 (UTC)
Again, I'm not sure that you can do that. Magioladitis or Reedy would know. Headbomb {t · c · p · b} 01:14, 10 April 2019 (UTC)
@Headbomb: what do you mean. Magioladitis' response above was Just run the code from your main account (it does not have the bot flag) - no I have a separate account that won't have a bot flag, like you suggested --DannyS712 (talk) 01:15, 10 April 2019 (UTC)
I mean let's wait for @Magioladitis and Reedy: to tell us if there's a way to run an AWB bot without flagging edits with the bot flag. Unless you don't mind a clickfest. Headbomb {t · c · p · b} 01:18, 10 April 2019 (UTC)
@Headbomb: Oh, okay. Ill wait for a response for a few days before running the next trial --DannyS712 (talk) 01:23, 10 April 2019 (UTC)
The first trial run was a success, and the categories were successfully speedy renamed - see Special:Diff/892143917. I'm still waiting to hear about how I should proceed though --DannyS712 (talk) 21:30, 12 April 2019 (UTC)
@Headbomb: can I just use User:DannyS712 bot 2 with AWB, and have the bot account not have a bot flag? That way I can "run an AWB bot without flagging edits with the bot flag" (since the account wouldn't have a bot flag) and could proceed with the trial. If we ever discover a way to not use the bot flag even when an account has it, we can migrate the task to User:DannyS712 bot, but until then, I'd like to be able to proceed. Thanks, --DannyS712 (talk) 21:17, 14 April 2019 (UTC)
If you're willing to mash the save button three hundred billionty times yourself or setup a drinking bird to do it for you, that's fine by me. Headbomb {t · c · p · b} 21:58, 14 April 2019 (UTC)
@Headbomb: or, the second account can be added to Wikipedia:AutoWikiBrowser/CheckPage#Bots --DannyS712 (talk) 22:07, 14 April 2019 (UTC)
@Headbomb: Can the second account please be added so I can continue the trial? --DannyS712 (talk) 00:46, 17 April 2019 (UTC)
@DannyS712: can't, I'm not an admin. No objection to someone else adding the 2nd account though. Not sure it'll work, but it's worth a try. Headbomb {t · c · p · b} 05:35, 17 April 2019 (UTC)
@Headbomb: Per Wikipedia:Requests for permissions/AutoWikiBrowser For bots, please consult the Bot Approvals Group or bureaucrats' noticeboard about adding AWB access. - can you make a request there? Or should I? --DannyS712 (talk) 05:47, 17 April 2019 (UTC)
Going to ping @Primefac and Xaosflux: I believe both are 'crats. Headbomb {t · c · p · b} 05:57, 17 April 2019 (UTC)

───────────────────────── @Headbomb: I have not read though all of this but three hundred billionty times seems like a lot of volume to be running an unflagged bot at? What are the actual volumes this may generate - flooding of recent changes is another concern for sure. As far as the AWB access list goes, if it is in the AWB 'bots' section of AWBCB it will get the bots tab last I checked. — xaosflux Talk 11:35, 17 April 2019 (UTC)

Eleventy billion times really being a few dozen to a few hundreds times. And can't appearingrecent changes be bypassed thought a confirmed account or similar whitelisting? Headbomb {t · c · p · b} 12:02, 17 April 2019 (UTC)
Not without every single user manually doing that. So here's my thoughts on this request: If you want it to work in AWB without tagging everything as 'bot' then it will need a second account. Without a bot flag it could possible get tripped up on an abuse filter or something, so add +confirmed/+extendedconfirmed as well. From a bot-task-approval point of view, running this on the 'low end' doesn't seem to be an issue. For example, Category:Internationalization and localization and all of its sub categories seems sane; running it on something like Category:Scientists and all of its recursive subcategories would lead to massive flooding of recent changes and watchlists, not to mention requiring huge work to undo everything in the event of a "keep" closure. Just thinking of the work that will be put off to other editors in the event of CFD "keep" results makes be wary of this task as well. Do the CFD people really need to see every page tagged when dealing with some sort of massive nomination? If not, what is a realistic upper bound for such a nomination? — xaosflux Talk 12:56, 17 April 2019 (UTC)
I'm not sure there is a bound. If someone proposes a rename of the 180 subcategories of Category:Scientists by nationality from "Nationality Scientists" to "Scientist from Nation", then per CFD process, all categories need to be tagged. Flooding recent changes isn't useful, but there are ways to bypass that without the bot flag. Flooding watchlist isn't really a concern, or at least not one that takes precedence over advertising the discussion on the relevant categories. Headbomb {t · c · p · b} 14:59, 17 April 2019 (UTC)
@Headbomb: Also, for major CfDs I think if they are advertised fully (like the Macedonia or Organizanion discussions) they aren't each tagged individually, but otherwise I agree that there isn't really a bound. However, I can agree to be reasonable in my tagging runs --DannyS712 (talk) 15:11, 17 April 2019 (UTC)
@Xaosflux asks Do the CFD people really need to see every page tagged when dealing with some sort of massive nomination? Answer: yes. There is no upper bound.
The biggest set I know of recently is CFD:2019 March 21#Places_of_worship with 2057 categories to be renamed. All tagged by me without drama.
It seems to me that concerns about flooding watchlists are misplaced. This is not like a bot edit to a featured article, which may be on hundreds of watchlists; it's a edits to almost unwatched pages. Most lower-level categories have been edited by about two or three editors, unlike articles which may have been edited by dozens of hundreds of editors. I make several noms per month which involve large numbers of categories, and my tagging never raises eyebrows. So I am surprised to see so many eyebrows raised here at Danny's proposal. It's just a routine issue which should be automated as Danny proposes. --BrownHairedGirl (talk) • (contribs) 17:44, 17 April 2019 (UTC)
@BrownHairedGirl: thanks for the note, I don't dive in to CfD much - I assume that some sort of realistic bound would exist (at the extreme something like "lets rename 'stubs' to 'shorts' and tag every single lower category in Category:Stub categories) right? I'm not to worried about flooding watchlists as Special:RecentChanges - but its also not like this is something expected to happen all the time. Even in your example above, I personally think that is insane: was someone going to refuse to let the discussion proceed if one of the subcategories in that series didn't have a tag on it? Do you expect a unique argument that only applies to Category:Former places of worship in Oregon for example? Especially if these are "almost unwatched" - who do you expect to benefit from all the tagging? — xaosflux Talk 18:04, 17 April 2019 (UTC)
@Xaosflux: for one thing, each category is usually created by someone else, and thus has different watchers. It would be similar to Wikipedia:Miscellany for deletion/Mass-created portals based on a single navbox, except not tagging the portals for deletion - to any watchers, it would appear that the target is deleted (or in the case of CfD merged / renamed) without any discussion, because no notice was posted. Unless a page is tagged for discussion, it can only be deleted by CSD (or prod, which requires a notice). Even for "speedy renaming" a notice is needed for 48 hours. In short, I agree that may seem insane, but this is the current practice (and policy I believe) and until it changes I would like to help automate it. --DannyS712 (talk) 18:10, 17 April 2019 (UTC)

─────────────────────────I understand, this is just a pause for a sanity check. If we have a wasteful process that will be exacerbated by throwing bots at it, we should fix it instead. I'm not convinced one way or the other if the current process is wasteful or useful though - thus this discussion. — xaosflux Talk 18:16, 17 April 2019 (UTC)

(ec) @Xaosflux, this primarily not about unique arguments relating to e.g. Category:Former places of worship in Oregon. (That can happen, but it is rare). It's about ensuring that the proposed change is advertised on the pages which readers actually visit.
Readers rarely watchlist categories. They are watchlisted when edited, and they are rarely edited. So the way that editors get to know about a change is when they see see a tag on a category which they visit. Most people view the lowest-level categories, because that's where the actual pages are: there should be almost no pages in e.g. Category:Places of worship in the United States, because they should be diffused to the bottom rung, e.g. Category:Places of worship in Mercer County, Pennsylvania. So the category which actually appears on an article is the lowest-level one.
No, it's unlikely that a big discussion would be derailed by one missed tag. But it is common for nomination of a big set to be rejected if a significant chunk of the subcats are either not listed or not tagged, because the omission means that those potentially interested will be unlikely to know about it. The tagging also triggers article alerts system, which is crucial.
This is not insane, and I am surprised to see that label applied to a process you seem to be unfamiliar with. It's about building consensus by ensuring that all those interested have a reasonable chance of being aware of the discussion. Otherwise, the shit hits the fan after a change, when editors pile in to say "you merge or renamed dozens of categories with in the scope of WikiProject FooBar, but because there was no tagging it didn't trigger article alerts so none of the editors here knew about it".
A systemic change like rename 'stubs' to 'shorts' would probably need to be handled at RFC. That happens periodically when the set size is well into the thousands, e.g, in the just-closed RFC I started at RFC: spelling of "organisation"/"organization" in descriptive category names (permalink) ... but CFD is well used to handling set sizes in the hundreds. --BrownHairedGirl (talk) • (contribs) 18:49, 17 April 2019 (UTC)
@BrownHairedGirl: I suppose insane was a bit of a hyperbole, are you suggesting that there could still be an upper bound (it could certainly be ">=1000")? On that Org..[S|Z].. one we certainly don't expect individual category tags, correct? — xaosflux Talk 18:58, 17 April 2019 (UTC)
(ec) @Xaosflux wrote if we have a wasteful process that will be exacerbated by throwing bots at it, we should fix it instead. I'm not convinced one way or the other if the current process is wasteful or useful though - thus this discussion.
This is a bit tedious. If you want to propose a change to the CFD, process, then WT:CFD is the place to go. But I do find it fairly exasperating that when editors with v significant experience of CFD start to explain why the process is at is, the response here is to pejoratively label it as insane.
The reality is simple:
  1. this is how CFD works, for sound reasons
  2. There might be other ways to ensure adequate notification, but I have seen no proposal which suggests specific, workable alternatives. If some such proposal emerges and if it gains RFC consensus, then we can reconsider the need for a bot, but that's a lot of ifs
  3. Right now, this tagging is handled either by individual editors using their own AWB, or by one-off requests to WP:BOTREQ where other AWB users help
  4. Danny's proposal would be a big help by allowing one single bot to be advertised as doing this, rather than having random AWB owners relearn from the ground up each time. That would significantly lower the barriers to creating mass CFDs (which I define as those of more than dozen categories), allowing a lot more big issues to be tackled
Sorry to be blunt, but if the word insane is going to be used, then the only insanity I see here is the huge bureaucratic hurdle placed in Danny's way. --BrownHairedGirl (talk) • (contribs) 19:06, 17 April 2019 (UTC)
@Xaosflux, the ORG issue is not a CFD. It's about a change in policy, which will be implemented by CFDS nominations, and yes, those will be tagged with {{subst:cfr-speedy}}. Take a guess why. --BrownHairedGirl (talk) • (contribs)
@BrownHairedGirl: when we engage bots in processes it usually makes things better and more reliable, and I agree this seems like a useful process if there were for example 100 categories to deal with in a single nomination, but I think it would be a bad idea to do it for 10,000 - and I really don't expect any human editor to tag 10,000 pages for renaming or deleting in a nomination. Generally, if an edit shouldn't be made by a live editor it shouldn't be made by a bot either. I expect the realistic bound is somewhere short of that 10,000 number as well - do you really think that nominations of that size or even larger require that many edits? If not, what is a good count to lay down some guidance? — xaosflux Talk 19:20, 17 April 2019 (UTC)
The place to raise to concerns about limits on CFD tags is at CFD and not here. Even if 10,000 cats are tagged, there's no actual problem caused by tagging (save perhaps causing WP:AALERTS to choke on some pages, maybe). If my interest is on a page like First Universalist Church of Sharpsville, categorized in Category:Places of worship in Mercer County, Pennsylvania, then how else I am going to know about the discussion if Category:Places of worship in Mercer County, Pennsylvania isn't tagged? That's why advertisement is required. Normally this is done by humans, but bots should get involved in the bigger cases to save humans the hassle. Headbomb {t · c · p · b} 19:48, 17 April 2019 (UTC)
(ec) @Xaosflux, I'm very wary of setting hard numbers, because they can become targets. There's a danger than a limit of X thousand becomes a sorta green light for up to that number, as happens with speed limits on roads.
I think that it's much better to apply discretion. For example, if the proposed change is a simple and relatively straightforward issue (e.g. thousands of Category:Retards from Foo to Category:People from Foo with learning disabilities) then I'd set a higher threshold than if the proposal was lily to be controversial or involve discussion of any possible alternatives. In those cases I'd suggest a pre-discussion or possibly an RFC.
So there are cases where I'd say 100 was way too many, and other cases 5,000 was fine. Suppose for example the issue was whether to abandon the uses of demonyms in category names (e.g. Category:Bhutanese people, Category:French novelists, Category:Irish astronomers), and to use instead the Commons format of "people from" (Commons:Category:People from Bhutan, Commons:Category:Novelists from France, Commons:Category:Astronomers from Ireland), then I'd object to a full CFD discussion on the whole set. I'd say, first an RFC, then nominate each country one at a time to check carefully that we avoid any glitches (some uses will be proper names relating to awards, job titles etc), and do big category set (such as Category:American people+subcats), in chunks.
So what I'd really like to see here would be that rather than setting a mathematical formula, Danny should be constrained to use his experience, wisdom and judgement. Danny has plenty of all three qualities, so I have great confidence that Danny would know when to say "no prob", when to say "no way", and when to say "better discuss whether this is the right path". --BrownHairedGirl (talk) • (contribs)
  • I'm inclined to let the trial progress and approve this task (assuming there are no major issues found during said trail). BrownHairedGirl is making some very reasonable arguments for having this bot task run, to a degree that I personally feel that the (equally reasonable) concerns raised by xaosflux have been answered. While I do agree that the process seems unnecessary and maybe heavy-handed, if it's what the CFD folks use on a regular basis and there haven't been any major issues so far, I'd say go for it until such time as those practices change. If concerns are raised about this bot running this task, then like all BRFAs we can rescind or revisit the approval at BOTN. Primefac (talk) 20:44, 17 April 2019 (UTC)
    I'm pretty fine with a trial occurring as well, as noted above will require a different account to run under - but that's no big deal. This BRFA isn't the right forum to determine what any tagging thresholds should be, and BHG is vastly more familiar with CFD then I am so if this is standard practice then it shouldn't be an issue. Basically: so long as if these edits were to be made by a 'live' editor they wouldn't be considered disruptive, using a bot account should be equally fine - that is what I was really trying to get at. — xaosflux Talk 21:02, 17 April 2019 (UTC)
    Thanks, @Xaosflux. Just for context, I have probably made over 10,000 such tagging edits in the last twelve months alone. Possibly over 20,000.
    The only objections have been when I have made mistakes (I think that was once, but possibly twice; I usually spot and fix my own errors), and once when there was a concern about a brief flooding of recent changes. So no, I don't see any sign that these edits are perceived as disruptive.
    I should say that my responses here might be v difft if the request was from someone other than @DannyS712. I have found Danny to be a skilled coder who is quite exceptionally meticulous and conscientious, and very ready to discuss and review problems. (The inverse of Betacommand, who was a v skilled coder with abysmal communication and consensus-building skills). He's the sort of problem-solving person who I would trust to have authorisation way beyond his experience, because he is clearly v responsible about how he extends his experience. If you have any expertise in sadistic coercion techniques, that would be very helpful in persuading Danny to accept an RFA nomination. His one glaring vice is that he repeatedly refuses requests turn this redlink blue. --BrownHairedGirl (talk) • (contribs) 21:25, 17 April 2019 (UTC)
    @BrownHairedGirl: thanks for dealing with explaining the CfD process and how helpful the bot is. However, can I ask that the discussion here focus on the bot and related (like the CfD process, etc) and not on me personally as an editor? Thanks, --DannyS712 (talk) 21:32, 17 April 2019 (UTC)
    OK. --BrownHairedGirl (talk) • (contribs) 21:57, 17 April 2019 (UTC)
    @BrownHairedGirl, Xaosflux, and Primefac: since Primefac said I'm inclined to let the trial progress and Xaosflux said I'm pretty fine with a trial occurring as well, can someone please add "DannyS712 bot 2" to Wikipedia:AutoWikiBrowser/CheckPage#Bots, so that I can continue? Thanks, --DannyS712 (talk) 21:49, 18 April 2019 (UTC)

───────────────────────── AWB/+c/+ec added. — xaosflux Talk 21:57, 18 April 2019 (UTC)

Thanks - I should be able to finish the trial today or tomorrow (UTC). Thanks --DannyS712 (talk) 22:00, 18 April 2019 (UTC)
@Piotrus: I'm going to do your request for standardizing the format of "convicted of crimes" vs "with criminal convictions" for the second run. I'm posting here because it involves tagging 187 different categories, so I want to make sure that doing so is okay with Xaosflux/Primefac/Headbomb. The list is at User:DannyS712 bot/Task 13#Trial - convicted of crimes. Are there any issues with the list or the proposed run? Thanks, --DannyS712 (talk) 06:50, 20 April 2019 (UTC)
Should be fine. — xaosflux Talk 14:07, 20 April 2019 (UTC)
Done. Will do the 3 run soon. --DannyS712 (talk) 19:05, 20 April 2019 (UTC)

WikiCleanerBot 2

Operator: NicoV (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 17:25, Monday, February 25, 2019 (UTC)

Function overview: To fix ISSN with an incorrect syntax. As described in ISSN#Code format, the correct syntax for an ISSN is "an eight digit code, divided by a hyphen into two four-digit numbers"

Automatic, Supervised, or Manual: Automatic

Programming language(s): Java (Wikipedia:WPCleaner)

Source code available: On Github

Links to relevant discussions (where appropriate): Maintenance task for CW Error #106

Edit period(s): At most, twice a month, following the dump analysis that I already perform, see Wikipedia:Bots/Requests for approval/WikiCleanerBot.

Estimated number of pages affected: Around a thousand At most a few hundred pages for the first complete run (pages with such problems are listed in Wikipedia:CHECKWIKI/WPC 106 dump, which currently contains a list of 1315 420 pages), and probably no more than a few dozen after that on each run given the evolution of the number of pages in the list.

Namespace(s): Main namespace

Exclusion compliant (Yes/No): No, because there's no reason to use an incorrect syntax for an ISSN instead of the correct one.

Function details: Based on the list generated on Wikipedia:CHECKWIKI/WPC 106 dump, the bot will only fix trivial problems (like a missing hyphen in the ISSN number, extra whitespace characters...) and will leave the more complex ones to be fixed by a human. It will reduced a lot the list, so human editors can fix the remaining problems.

For the bot flag, I currently don't have it, and I would like to keep it that way (or if need be, only added temporarily for the first run).


If you will be operating from the dump, could you not do a dry run outputting to Wikipedia:CHECKWIKI/WPC 106 dump so its handling of the pathological cases there can be inspected? --Xover (talk) 17:48, 25 February 2019 (UTC)

Hi Xover. The dump analysis is performed independently and produces several analysis (Wikipedia:CHECKWIKI/WPC all), I would prefer to keep it separated from automatic fixing. --NicoV (Talk on frwiki) 18:05, 25 February 2019 (UTC)
But if you want to know which pages won't be fixed by the bot, I can do a dry run on my computer and give the list of fixed pages. --NicoV (Talk on frwiki) 18:06, 25 February 2019 (UTC)
@NicoV: I was more interested in seeing the before→after list. Several of the instances listed in the WPC 106 dump looked like they would be hard to fix automatically, so if the output of a dry run could be inspected it might provide a priori confidence that the task won't mess anything up. A dry run might be more efficient / reduce the need for a trial period with live edits (but I speak only for myself: the BAG may see it differently). --Xover (talk) 18:24, 25 February 2019 (UTC)
@Xover: Ok, I understand. I will see if I can do something. The idea is to fix only trivial cases automatically, the hard ones will be left to human editors, and I will check what the results are before doing an actual run. --NicoV (Talk on frwiki) 09:28, 26 February 2019 (UTC)

Comment: The dump list appears to have some false positives on it. I picked one page at random, Pocket Dwellers, and there is an ISSN of 00062510 listed within a citation template. This ISSN is valid within a CS1 template; articles with invalid ISSNs are placed in Category:CS1 errors: ISSN. The template handles this unhyphenated ISSN format with no trouble, displaying properly with a hyphen. It should not be "corrected"; the bot would be making a cosmetic edit, leaving the rendered page unchanged. Perhaps the dump analysis should be corrected before this bot attempts to modify articles based on the list. – Jonesey95 (talk) 17:56, 25 February 2019 (UTC)

Hi Jonesey95. On other wikis like frwiki, the templates don't add the hyphen by themselves. If ISSN without the missing hyphen have to be considered correct on enwiki for some templates, then I will first need to add an option in WPCleaner for this (and then generate again the page Wikipedia:CHECKWIKI/WPC 106 dump to check that false positives are removed) before implementing the automatic replacement. I will post here when this part is done. --NicoV (Talk on frwiki) 18:05, 25 February 2019 (UTC)
Thanks. It looks like {{ISSN}} does not add the hyphen, but the CS1 citation templates do so. Just to see if I had gotten unlucky, I picked four more articles at semi-random from the list, limiting my "random" choices to articles that were displaying eight digits as the erroneous string. All four articles: Acritogramma metaleuca, Capri (cigarette), David Mba, and Ensoniq VFX contain no ISSN errors. I believe that the dump analysis needs to be debugged before this task can be run. It is possibly telling that there are only 65 pages in the three ISSN error categories combined. – Jonesey95 (talk) 18:16, 25 February 2019 (UTC)
Jonesey95. I've modified my code to allow telling WPCleaner that some templates automatically add the hyphen if it's missing, so the articles you mentionned won't be reported anymore. I'm currently running an update of Wikipedia:CHECKWIKI/WPC 106 dump to see what will be left. --NicoV (Talk on frwiki) 09:24, 26 February 2019 (UTC)

Page Wikipedia:CHECKWIKI/WPC 106 dump has been updated to avoid reporting missing dash when the template automatically adds it to the displayed result, there are only 420 pages remaining compared to the 1315 initially. I could probably also remove reports for internal links to pages like ISSN 1175-5326 which exist, but even if they are reported, the bot won't fix anything there. With the current algorithm, a dry run modifies 115 pages on the 420.

Pages that would be currently modified by the bot

--NicoV (Talk on frwiki) 12:36, 26 February 2019 (UTC)

That list looks much more reasonable. There are still some weird ones in there, like You Are Happy, where |issn= was being used in a {{WorldCat}} template, which doesn't support that parameter. Also, it looks like dashes, as in Iran–Iraq War and The Mauritius Command and Resonant inductive coupling, are also silently converted to hyphens by CS1 templates, so those don't need to be fixed and should be removed from the WPCleaner report.
I can also add an option to ignore such cases where the dash is automatically replaced, like I did for the missing hyphen. But is it a good idea to keep incorrect syntax just because the template itself will fix it ?
For the non-existing parameter in a {{Worldcat}}, I think I will leave it like that and a hyphen will be added, there are only a few pages like that. --NicoV (Talk on frwiki) 14:02, 26 February 2019 (UTC)
In a case like Tytthaspis sedecimpunctata, will the bot/script apply the ISSN template, making the ISSN actually useful, or will it just replace the dash with a hyphen? – Jonesey95 (talk) 13:23, 26 February 2019 (UTC)
Currently, it will simply replace the dash with a hyphen, but I can add a feature to use a template instead. --NicoV (Talk on frwiki) 14:02, 26 February 2019 (UTC)
I think replacing a plain-text ISSN with a template is a good idea in nearly every case.
I don't want to rain on your parade, but at this point, it looks like a periodic supervised AWB run, combined with a bit more tweaking of the WPCleaner report, might be the best option. The risk of cosmetic edits by the bot (and AWB, unless it is watched carefully) is high. With considerably fewer than 100 pages fixable by the proposed bot, a script may be better. If you still want to get this task bot-flagged in order to avoid cluttering people's watchlists, of course, I would support that. – Jonesey95 (talk) 14:04, 26 February 2019 (UTC)
I will try several modifications to limit the number of false positives in the generated list (which is good in itself), and we'll see then what is the best course of action. --NicoV (Talk on frwiki) 16:38, 26 February 2019 (UTC)
Even if it means only fixing less than a 100 pages at the end, I'm still interested in running at least a test run. For the test run, if it's accepted, I will proceed one page at a time (after each modification, WPCleaner will ask me if it should proceed, so I will be able to check thoroughly before going to the next article). Running a script would be a good idea, but as no one is proposing to create it and run it (the list has been available for years), I think it's interesting to run WPCleaner on this. After the test run, we can still decide if it's interesting running it periodically or not. --NicoV (Talk on frwiki) 11:05, 23 March 2019 (UTC)
{{BAG assistance needed}} : can I do a test run? As explained, after each modification, WPCleaner will ask me if it should proceed, so I will check each edit before letting it do the next one. If it's ok, tell me how many modifications I can make. --NicoV (Talk on frwiki) 13:21, 1 April 2019 (UTC)

  Approved for trial (50 edits). Primefac (talk) 20:32, 4 April 2019 (UTC)

HostBot 9

Operator: Maximilianklein (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 18:50, Monday, January 7, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available:

Function overview: User:Jtmorgan and User:Maximilianklein have planned, and received consent to run an A/B experiment between the current version of HostBot and a newly developed-AI version. The AI version uses a machine-learning classifier based on ORES to prioritize which users should be invited to the TeaHouse whereas the current version uses rules. The point is to see if we can improve user retention by turning our attention to the most promising users.

The two versions would operate simultaneously. Both versions would log-in as "User:HostBot" so that the end-users would be blinded as to what process they were interacting with.

The A/B experiment would run for 75 days (calculated by statistical power analysis).

Links to relevant discussions (where appropriate): Wikipedia_talk:Teahouse#Experiment_test_using_AI_to_invite_users_to_Teahouse

Edit period(s): Hourly (AI-version) and Daily (rules-version)

Estimated number of pages affected: ~11,000

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: All technical details on meta:Research:ORES-powered_TeaHouse_Invites.


Just posting here to confirm that I am excited to collaborating with Maximilianklein on this experiment. I've been wanting to improve HostBot's sampling criteria for a while now, and other Teahouse hosts have asked for it. J-Mo 19:33, 7 January 2019 (UTC)

Thought I'd drop by to voice my support, both for the experiment and for Maximilianklein. During the earlier discussion, I posted a couple of question on their talk page and got both a timely and thoughtful reply. I'm also interested in learning about the outcomes of this experiment, looking forward to them! Cheers, Nettrom (talk) 15:20, 15 January 2019 (UTC)

  Comment: - HostBot seems to be having a few issues. Which version is this? See here. RhinosF1(chat)(status)(contribs) 08:21, 18 February 2019 (UTC)
  Resolved- They were on trial version. RhinosF1(chat)(status)(contribs) 07:54, 19 February 2019 (UTC)
Next check in: April 7. — xaosflux Talk 12:02, 19 March 2019 (UTC)
@Xaosflux, Maximilianklein, and Jtmorgan: Its after April 7 - the 75 days should be over, right? --DannyS712 (talk) 00:55, 10 April 2019 (UTC)
  A user has requested the attention of the operator. Once the operator has seen this message and replied, please deactivate this tag. (user notified) what are the results of the trial? — xaosflux Talk 00:56, 10 April 2019 (UTC)
@DannyS712 and Jtmorgan: thanks for the ping. Indeed 75 days are over. Is it possible to ask for a 25-day extension? The reason being that it took some time to de-bug the new bot while it was live, so it was not operating 100% correctly the first few days. The 75-days came from a power-analysis, so I would like to have a pristine 75-days of data to analyse. If that's not possible I understand and Jtmorgan and I can put things back like they were until we can analyse the results. The results of the trial so far are that our two-bot-version co-ordination plan worked in practice. As far as whether the AI-powered HostBot had a higher efficacy at inviting question-asking users, or user that survived longer, I still have to crunch those numbers. Maximilianklein (talk) 23:16, 10 April 2019 (UTC)
I'd also appreciate a 25-day extension, if possible. If we have a clean 75 day sample, we can make stronger claims. Better Teahouse invite targeting could have a substantial positive impact on the Teahouse, and on retaining good faith newcomers in general. Cheers, J-Mo 18:39, 11 April 2019 (UTC)

PkbwcgsBot 5

Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 09:15, Thursday, December 13, 2018 (UTC)

Function overview: The bot will fix ISBN syntax per CW Error #69 (ISBN with incorrect syntax) and PMID syntax per CW Error #102 (PMID with incorrect syntax).

Automatic, Supervised, or Manual: Supervised

Programming language(s): AWB

Source code available: AWB

Links to relevant discussions (where appropriate):

Edit period(s): Once a week

Estimated number of pages affected: 150 to 300 a week

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: The bot is going to fix incorrect ISBN syntax per WP:ISBN. So, if the syntax is ISBN: 819345670X, it will take off the colon and make it ISBN 819345670X. The other case of incorrect ISBN syntax this bot is going to fix is when the ISBN number is preceded by "ISBN-10" or "ISBN-13". For example, in ISBN-10: 995341775X, it will take off "-10:" and that will make it ISBN 995341775X. The bot will only fix those two cases of ISBN syntax. Any other cases of incorrect ISBN syntax will not be fixed by the bot. The bot will also fix incorrect PMID syntax. So, for example, if it is PMID: 27401752, it will take off the colon and convert it to PMID 27401752 per WP:PMID. It will not make it PMID 27401752 because that format is deprecated.


Please make sure to avoid ISBNs within |title= parameters of citation templates. Also, is there a reason that you are not proposing to use the {{ISBN}} template? Magic links have been deprecated and are supposed to go away at some point, although the WMF seems to be dragging their feet for some reason. There is another bot that converts magic links to templates, but if you can do it in one step, that would probably be good. – Jonesey95 (talk) 12:05, 13 December 2018 (UTC)

@Jonesey95: The bot will convert to the {{ISBN}} template and it will not touch ISBNs in the title parameters of citations. Pkbwcgs (talk) 15:19, 13 December 2018 (UTC)
What about the PMID's? Creating more deprecated magic words isn't ideal. — xaosflux Talk 19:16, 14 December 2018 (UTC)
@Xaosflux: I did say that was going to happen in my description that they will be converted to templates. However, now I need to code in RegEx and I have been trying to code that but my RegEx skills are unfortunately not very good. Pkbwcgs (talk) 19:52, 14 December 2018 (UTC)
I have tried coding in RegEx but I have gave up soon after as it is too difficult. Pkbwcgs (talk) 21:14, 14 December 2018 (UTC)
@Pkbwcgs: After removing the colon you can use Anomie's regex from Wikipedia:Bots/Requests for approval/PrimeBOT 13: \bISBN(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs})++((?:97[89](?:-|(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs}))?)?(?:[0-9](?:-|(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs}))?){9}[0-9Xx])\b and \b(?:RFC|PMID)(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs})++([0-9]+)\b, or you can adjust them to account for the colon. Primefac could advise if he made any changes to them. — JJMC89(T·C) 06:27, 15 December 2018 (UTC)
@JJMC89: Thanks for the RegEx. I will be able to remove the colon easily. It is the RegEx for the ISBN that I struggled with. Thanks for providing it. Pkbwcgs (talk) 09:49, 15 December 2018 (UTC)
It is saying "nested identifier" and it is not replacing when I tested the RegEx on my own AWB account without making any edits. Pkbwcgs (talk) 09:53, 15 December 2018 (UTC)
@Pkbwcgs: The regex comes from PHP, but AWB (C#) doesn't support possessive quantifiers (e.g. ++). Replacing ++ with + in the regex should work. — JJMC89(T·C) 18:57, 15 December 2018 (UTC)
@JJMC89: I have tested the find RegEx on my AWB account without making any edits and it works. I also worked out the replace RegEx and it is {{ISBN|$1}}. That works too. I think this is ready for a trial. I will also request a small extension for this task which is to clean out Category:Pages using ISBN magic links and Category:Pages using PMID magic links. That will be PkbwcgsBot 7. Pkbwcgs (talk) 20:15, 15 December 2018 (UTC)
I adjusted the RegEx to accommodate ISBNs with a colon. Pkbwcgs (talk) 20:33, 15 December 2018 (UTC)
This diff from my account is good and perfectly justifies what this bot is going to do for this task. Is this good enough? Pkbwcgs (talk) 20:53, 15 December 2018 (UTC)
This is what it will look like if the bot handles an ISBN with the "ISBN-10" prefix. That diff is also from my account. Pkbwcgs (talk) 21:08, 15 December 2018 (UTC)
{{BAG assistance needed}} There is a huge backlog at Wikipedia:WikiProject Check Wikipedia/ISBN errors at the moment. This task can cut down on that backlog through replacing the colon with the correct syntax. It has also been waiting for two weeks. Pkbwcgs (talk) 22:12, 27 December 2018 (UTC)

  Approved for trial (25 edits). --slakrtalk / 20:43, 4 January 2019 (UTC)

The first thirteen edits are here. Pkbwcgs (talk) 09:54, 12 January 2019 (UTC)
This edit put the ISBN template inside an external link, which is an error. This one has the same error. The other eleven edits look good to me. I recommend a fix to the regex and more test edits. – Jonesey95 (talk) 19:51, 12 January 2019 (UTC)
@Jonesey95: I fixed those errors. Pkbwcgs (talk) 19:57, 12 January 2019 (UTC)
  Approved for extended trial (25 edits). OK try again. — xaosflux Talk 04:10, 30 January 2019 (UTC)
I apologise for the delay to the trial of this task. I will do the trial as soon as I can. Pkbwcgs (talk) 11:00, 22 February 2019 (UTC)
  A user has requested the attention of the operator. Once the operator has seen this message and replied, please deactivate this tag. (user notified) any update on the trialing? — xaosflux Talk 18:49, 12 March 2019 (UTC)
@Xaosflux: I will go forward with the trial this week. Pkbwcgs (talk) 19:06, 12 March 2019 (UTC)
@Pkbwcgs: Did you mean to disable the template? --DannyS712 (talk) 09:36, 22 March 2019 (UTC)
@DannyS712: I will disable the template once I have done the trial. Pkbwcgs (talk) 19:18, 22 March 2019 (UTC)

Bots that have completed the trial period

GreenC bot 14

Operator: GreenC (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 14:09, Thursday, April 11, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): GNU Awk

Source code available: Yes

Function overview: Add {{Spain metadata Wikidata}} and any other country as they become available to infoboxes. This BRFA mirrors Wikipedia:Bots/Requests for approval/GreenC bot 12 except it will be for Spain instead of Austria, and it will be be for any other countries that become available under the terms set out in the Austria BRFA.

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#Population_for_Spanish_municipalities

Edit period(s): one time

Estimated number of pages affected: TBD

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Example edit. It may operate on other countries besides Spain as those templates and data become available.


  • Hi @Underlying lk: - In the previous Austria BRFA you generated a Query, I am having trouble adapting it to Spain can't find the Spanish equivalent of P964. -- GreenC 14:12, 11 April 2019 (UTC)
@GreenC: Here's the updated query: link.--eh bien mon prince (talk) 17:54, 11 April 2019 (UTC)
Ok, the bot is pretty much ready to go. @Underlying lk: can you confirm the Wikidata has been updated and is the most recent available so we don't overwrite any in-line sourcing that is more recent? -- GreenC 14:42, 12 April 2019 (UTC)
They are from 2018, so they're definitely the most recent ones.--eh bien mon prince (talk) 19:44, 12 April 2019 (UTC)

  Approved for trial (50 edits). As usual, please post a link to the contribs (preferably permanent) and take all the time you need, GreenC. --TheSandDoctor Talk 20:10, 14 April 2019 (UTC)

  Trial complete. Diffs -- GreenC 17:13, 17 April 2019 (UTC)

A couple things came up in the test run:

  1. A Coruña - There is a |population_metro= hard-coded and since it's displayed in the same sub-box that has the Spain population footnote it's not the right reference for that number. It has some other fields which may also no longer be accurate like |population_density_km2=.
  2. Castellón de la Plana - has a |population_density_km2=auto which seems to automatically generate the density based on calculating |population_total= divided by |area_total_km2=. It seems to work with the new template. Wondering if the bot should check for the existence of |area_total_km2= and if available set |population_density_km2=auto.
  3. Montejo, Salamanca - the reference the bot replaced was used in multiple places creating a missing citation error.


  1. The |population_metro= appears to be verifiable in the new Footnote. The number is outdated, but that is beyond the scope of the bot to fix as it preexists.
  2. The bot will now convert to |population_density_km2=auto if the |area_total_km2= exists and has a valid number. Same with the sq_mi versions. This works.
  3. The other bot at work here is which is a sophisticated bot and beyond the scope of this bot to replicate. There will be some temporary breaking changes but will fix it within 24hrs.

-- GreenC 17:17, 17 April 2019 (UTC)

FastilyBot 14

Operator: Fastily (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 23:07, Wednesday, March 20, 2019 (UTC)

Function overview: Leave courtesy notifications for PROD'd files if the tagger has not done so.

Automatic, Supervised, or Manual: automatic

Programming language(s): Java

Source code available: after I write it

Links to relevant discussions (where appropriate):

Edit period(s): daily

Estimated number of pages affected: 0-10 daily

Namespace(s): User talk

Exclusion compliant (Yes/No): Yes

Function details: Leaves courtesy notifications for PROD'd files if the tagger has not done so. This task is an extension to Task 6 and Task 12. -FASTILY 23:07, 20 March 2019 (UTC)


  Approved for trial (50 edits or 21 days). go ahead and trial and let us know how it goes. — xaosflux Talk 23:32, 20 March 2019 (UTC)
@Fastily: Shouldn't the bot follow redirects when leaving notifications? See User talk:The Singing Badger for example. --DannyS712 (talk) 05:51, 30 March 2019 (UTC)
Thanks for pointing that out. This should be fixed now -FASTILY 08:04, 30 March 2019 (UTC)
{{OperatorAssistanceNeeded|D}} was the trial completed? What were the results? Can you link to the series of diffs? — xaosflux Talk 21:50, 7 April 2019 (UTC)
Hi Xaosflux. This task has been in trial since 28 March 2019. All daily deletion notifications the bot has made since then are here. Some examples of edits: 1, 2, 3. Everything is working as expected thus far. -FASTILY 23:47, 10 April 2019 (UTC)

  Trial complete. Everything worked as expected. Task has been temporarily disabled via config change. A few more sample edits: 1, 2, 3 -FASTILY 08:56, 17 April 2019 (UTC)

DannyS712 bot 27

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 07:08, Friday, April 12, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Javasrcipt

Source code available: Yes, once written

Function overview: Remove categories from drafts, per WP:DRAFTNOCAT

Links to relevant discussions (where appropriate): Wikipedia:Bots/Requests for approval/DannyS712 bot 3, Wikipedia:Bots/Requests for approval/DannyS712 bot 11

Edit period(s): Likely weekly

Estimated number of pages affected: Probably around 250 per run, though the first will be higher

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Currently, task 3 allows me to remove categories from AfC submissions, and task 11 allows me to remove categories from other userspace pages that are categorized with articles in polluted categories. I'd like to extend this functionality to also allow removing categories from drafts. quarry:query/34864 can be used to retrieve a list of "polluted" categories (categories with drafts in them that are not meant to hold drafts), and I would paste this into a page on enwiki and then the bot would go through each category and remove drafts automatically. I haven't written the source code yet, but the functionality would be most similar to that of task 11 (polluted categories with user pages), except without the checks for root user pages.


  Approved for trial (40 edits).xaosflux Talk 15:09, 14 April 2019 (UTC)
@Xaosflux:   Trial complete. 40 edits made ([9] search for Task 27 if needed, because there are a few from task 3 listed too.) I looked through all of the edits, and didn't find any errors. Code located at User:DannyS712 test/Draft cleaner bot.js. Thanks, --DannyS712 (talk) 21:06, 14 April 2019 (UTC)
@DannyS712: just to be clear as to the scope, this type of removal should only be removing pages from "content categories" - correct? — xaosflux Talk 15:43, 15 April 2019 (UTC)
@Xaosflux: yes, it should only be removing categories that are defined in the source of the draft (not those that arise from infobox error, etc).) Is that what you mean? DannyS712 (talk) 15:46, 15 April 2019 (UTC)
@DannyS712: no. The text-source of the categories shouldn't be the discriminator - the type of category should be. If there is an approriate manual non-content category it shouldn't be removed either. While these are normally provided via templates, should the template have been subst'd or the non-content category have been manually added by an editor it shouldn't be removed either (just as you wouldn't remove it when manually editing the page). — xaosflux Talk 15:57, 15 April 2019 (UTC)
@Xaosflux: okay, I can whitelist any specific categories I come across that shouldn't be removed. I haven't seen any yet, but I'll keep an eye out. But, if a template is substituted, the page shouldn't be categorized as a template, so I feel that linking that category too is a benefit. --DannyS712 (talk) 16:08, 15 April 2019 (UTC)

PkbwcgsBot 7

Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 20:27, Saturday, December 15, 2018 (UTC)

Function overview: This is an extension to Wikipedia:Bots/Requests for approval/PkbwcgsBot 5 and I will clean out Category:Pages using ISBN magic links and Category:Pages using PMID magic links.

Automatic, Supervised, or Manual: Automatic

Programming language(s): AWB

Source code available: AWB

Links to relevant discussions (where appropriate): This RfC

Edit period(s): ISBNs will be once a fortnight and PMIDs will be once a month.

Estimated number of pages affected: 300-500 pages per run (ISBNs) and 50-100 pages per run (PMIDs)

Namespace(s): Most namespaces (Mainspace, Article Talkspace, Filespace, Draftspace, Wikipedia namespace (most pages), Userspace and Portalspace)

Exclusion compliant (Yes/No): Yes

Function details: The bot will replace ISBN magic links with templates. For example, ISBN 978-94-6167-229-2 will be replaced with {{ISBN|978-94-6167-229-2}}. In task 5, it fixes incorrect ISBN syntax and replaces the magic link with the template after that. This task only replaces the ISBN magic link with the template using RegEx.


Working in article space only? – Jonesey95 (talk) 23:48, 15 December 2018 (UTC)

@Jonesey95: The problem is in multiple namespaces, not just the article namespace. Pkbwcgs (talk) 09:39, 16 December 2018 (UTC)
Since Magic links bot is already handling article space, it looks like this bot's focus will be in other spaces. I think those spaces will require manual oversight in order to avoid turning deliberate magic links into templates. Happily, there are only 4,000 pages, down from 500,000+ before the first couple of bots did their work. – Jonesey95 (talk) 12:10, 16 December 2018 (UTC)
I can distinguish deliberate magic links and not touch them. There are very few deliberate ones; an example is at Wikipedia:ISBN which shouldn't be changed. Pkbwcgs (talk) 13:06, 16 December 2018 (UTC)
"I can distinguish" -- how can you do this automatically? This is WP:CONTEXTBOT. —  HELLKNOWZ   ▎TALK 21:45, 16 December 2018 (UTC)

We don't generally approve bots for non-mainspace unless there is a specific problem. Especially without a discussion or consensus. In short, the problem with non-mainspace namespaces is that there is no expectation that article policies or guidelines should apply or are even necessary. Userspace is definitely not a place for bots to run without opt-in. You also cannot automatically work on talk pages with a task like this -- users can easily be discussing syntax and no bot should be changing their comments. The discussion may very well be archived. Same goes with Wikipedia and there are many guideline and help and project pages where such a change may not be desired. Draft, File and Portal seem fine. To sum up, we either need community consensus for running tasks in other namespaces or bot operator assurance (proof) that there are minimal to none incorrect/undesirable edits. —  HELLKNOWZ   ▎TALK 21:45, 16 December 2018 (UTC)

@Hellknowz: I have struck the namespaces which I feel that will cause problems. I assure that there won't be no incorrect edits. Pkbwcgs (talk) 21:51, 16 December 2018 (UTC)
Looks good then. Will wait for resolution at Wikipedia:Bots/Requests for approval/PkbwcgsBot 5. —  HELLKNOWZ   ▎TALK 21:57, 16 December 2018 (UTC)
I think the revised list of spaces (at this writing: Main, Draft, Portal, File) makes sense. – Jonesey95 (talk) 01:46, 17 December 2018 (UTC)
@Hellknowz: Task 5 has went to trial and this has been wait for over one month. Pkbwcgs (talk) 22:08, 27 January 2019 (UTC)
@Pkbwcgs: I think its like 3 months by now - maybe tag as BagAssistanceNeeded? --DannyS712 (talk) 01:59, 22 March 2019 (UTC)
Ping @Pkbwcgs: to start trial Hhkohh (talk) 08:11, 11 April 2019 (UTC)
  Approved for trial (100 edits). (50 of each). — xaosflux Talk 14:42, 7 April 2019 (UTC)
@Xaosflux:   Trial complete. The edits are here. The ISBN RegEx worked but the PMID RegEx didn't work so I only did ISBNs. Pkbwcgs (talk) 08:28, 12 April 2019 (UTC)
I reviewed all of the edits from April 12 and found no errors. In this edit, the ISBN is inside the title of a work in a reference, but the magic link and the template rendered the same, so no problem. – Jonesey95 (talk) 09:11, 12 April 2019 (UTC)
@Pkbwcgs: do you want to work on fixing the PMID's or limit the scope of this task to only the ISBN's? — xaosflux Talk 15:15, 14 April 2019 (UTC)
@Xaosflux: I will limit the scope of this task to only ISBNs. Pkbwcgs (talk) 18:25, 14 April 2019 (UTC)

qbugbot 3

Operator: Edibobb (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 05:01, Friday, March 29, 2019 (UTC)

This will edit pages created by qbugbot 2, updating references, photos, common names, and a few minor edits. Not all changes will be made to all pages, and some pages will not be changed.

Automatic, Supervised, or Manual: Automatic

Programming language(s):

Source code available: Yes. I will update User:Qbugbot/source before the first test.

Links to relevant discussions (where appropriate): There have been some comments, requests, and edits over the past year that have motivated to do this, but I have not requested a consensus on ToL. I think it will be non-controversial.

Edit period(s): 8-24 hours per day.

Estimated number of pages affected: 17,000

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details:

Qbugbot2 created around 18,000 pages about a year ago. I'd like to make corrections and updates to these pages. These changes are a result of comments and page edits. Edits made to these pages since they were created will be preserved. The first 100+ edits by this bot will be reviewed manually.

1. "Further reading" and "External link" references will be updated, and in most cases cut back or eliminated. Any references in Further reading and External links that were created with the page will be removed and replaced with the new references from the current qbugbot database. This will provide fewer and more specific references in these areas. Any reference added by other editors will be retained as is. References are matched by title, or by authors and year. This item will affect most pages, and has been the source of most negative comments about qbugbot articles.

2. If the prose, infobox, and inline refererences have not been edited since an article was created, it will be updated with the following changes:

  • Wording in the prose may be updated, usually for the distribution range or common names, sometimes to correct errors.
  • Inline references will be updated. Sometimes more specific references will be added, and sometimes non-specific references may be removed (such as EOL, some redundant database references, and some database references without specific data on the article.)
  • The database sources for lists of taxonomic children (species list, etc.) will be removed. While this information might be handy, it makes it difficult for people to update the list. When list is edited, the source database information tends to be omitted.
  • Occassionally, the taxonomic information and children will updated.

3. Photos will be added if they are available and not already on the page. This will affect a minority of pages. The Photos have been manually reviewed.

4. Unnecessary orphan and underlinked tags will be removed.

5. External link to Wikimedia commons will be updated to handle disambig links properly, without displaying the "(beetle)" in something like "Adelina (beetle)"

6. The formatting of many references has been improved, correcting errors, adding doi's, etc. These will be updated in most cases. If the references has been edited since creation, it will not be changed.

Here is an example of a page editing manually using bugbot 3 content: Muellerianella


  • You say that this will edit around "17,000" pages, despite creating ~18,000 - why not edit the other 1,000? --DannyS712 (talk) 20:29, 29 March 2019 (UTC)
Some pages have been changed so much that the bot can't successfully revise them without altering other people's edits, something I'd rather not do automatically and something that's probably not necessary in pages with significant additions. Some other pages won't need any of these changes, either because the changes have already been made through manual edits, or because the original pages happened not to need them. I am just estimating the 1,000 pages. It could be more or less than that. Bob Webster (talk) 00:38, 30 March 2019 (UTC)
I looked at this and decided to postpone it for another update. The main problem is that I could see no easy way to determine what was described in 1956 (or any year) -- insects? moths? spiders? animals? beetles? North American millipedes? I was also considering narrowing down some of the categories (bees to sweat-bees, etc.) as some editors have been doing, but I haven't found a reliable list of categories to use. The same thing applies the -stub templates. I would prefer to do these three tasks in another bot session. Bob Webster (talk) 03:00, 31 March 2019 (UTC)
Having done some of this categorisation [caveat: not so much recently], I have to agree that this problem exists, and there are various schemes of parent categories that are in use if the category you are assigning needs to be created. One could put everything into a higher level category, to await sorting, but I see no great advantage. I would accept it as WP:WORKINPROGRESS. William Avery (talk) 08:56, 1 April 2019 (UTC)
That's correct. Bob Webster (talk) 22:02, 7 April 2019 (UTC)

  Approved for trial (50 edits). Since I fixed a crapton of those citations myself, I'm rather enthusiastic about qbugbot cleaning up after its own mess. Headbomb {t · c · p · b} 05:25, 9 April 2019 (UTC)

  •   Trial complete. 50 pages were updated, and are listed on the bot talk page. I found and fixed a couple of bugs. One prevented the introduction from being updated sometimes, and the other was a minor line spacing error. Bob Webster (talk) 23:04, 10 April 2019 (UTC)
{{BAG assistance needed}} Bob Webster (talk) 04:04, 17 April 2019 (UTC)
@Edibobb: what exactly is the criteria for removal/addition here? Headbomb {t · c · p · b} 05:54, 17 April 2019 (UTC)

Also, see...

Headbomb {t · c · p · b} 05:42, 17 April 2019 (UTC)

I've significantly reduced the number of references in further reading in new pages created by qbugbot. This was a manual, subjective process. The pages edited in qbugbot3 will have the original further reading references replaced with this new set. If a further reading reference has been added by an editor since page creation, it will be included in the edited page. The inline citations of qbugbot have also been updated. If the text of a page has not been edited, the original set of inline citations will be replaced.
I've corrected the references you listed, and fixed the problem of ending up with the same references in both inline citations and further reading.
Bob Webster (talk) 14:53, 17 April 2019 (UTC)
Also, EOL inline citations are removed even if the text has been edited. Bob Webster (talk) 14:56, 17 April 2019 (UTC)
@Edibobb: I think that given the number of articles/citations affected, it would be a good idea to have a sandbox version of all references that will be used. Then you (or I, if you don't know how) could run citation bot on them, and see what the improvements are, and those could get implemented, reducing the future cleanup load. Headbomb {t · c · p · b} 15:12, 17 April 2019 (UTC)
I think that would be good. I don't know how to run the citation bot, but I've copied all the citations to these sandbox pages. Can you run the bot on them? (A few of the citations are leftover and will never be used. It's easier to fix them all than sort them out, so don't worry if you see a few weird titles and dates.) Thanks!
Bob Webster (talk) 15:56, 17 April 2019 (UTC)
User:Citation bot/use explains the various methods. Right now the bot is blocked, so only the WP:Citation expander gadget works. I'll run the bot on these pages though. There's an annoying bug concerning italics and titles though, so just ignore that part of the diffs that will result. Headbomb {t · c · p · b} 16:01, 17 April 2019 (UTC)
@Edibobb: could you upload in batches of 250 citations? The bot chokes on pages so massive. Headbomb {t · c · p · b} 16:15, 17 April 2019 (UTC)
No problem, they're up now on User:Edibobb/sandbox/ref1 through User:Edibobb/sandbox/ref15 Bob Webster (talk) 17:36, 17 April 2019 (UTC)
@Edibobb: The bot still crashes. Could you do 100 per page? Headbomb {t · c · p · b} 19:58, 17 April 2019 (UTC)
@Headbomb: They're up now, 100 per page, User:Edibobb/sandbox/ref1 to User:Edibobb/sandbox/ref36 Bob Webster (talk) 23:09, 17 April 2019 (UTC)

WikiCleanerBot 3

Operator: NicoV (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 09:40, Tuesday, April 2, 2019 (UTC)

Function overview: To fix some simple cases of square brackets without correct beginning.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Java (Wikipedia:WPCleaner)

Source code available: On Github (and especially algorithm 46)

Links to relevant discussions (where appropriate): Maintenance task for CW Error #46

Edit period(s): Twice a month, preceding the dump analysis that I already perform, see Wikipedia:Bots/Requests for approval/WikiCleanerBot.

Estimated number of pages affected: Probably a few hundreds (estimation, as only simple cases will be fixed in the list) for the first complete run (pages with such problems are listed in Check Wiki #46, which currently contains a list of 2420 pages), and probably no more than a few dozen after that on each run given the evolution of the number of pages in the list.

Namespace(s): Main namespace

Exclusion compliant (Yes/No): No, cases that will be fixed are simple enough to fix them in each article.

Function details: The function will fix simple cases of problems detected by Check Wiki #46 (Square brackets without correct beginning). The cases identified so far are the following situations:

  • an external link ending with 2 square brackets ([https://... ...]]), provided that it doesn't fall in the one of the following situations: remove the extra square bracket
    • it starts by 2 square brackets ([[https://... ...]])
    • it contains another square bracket ([https://... ...[...]])
    • there's a stray opening square bracket before in the line ([...[https://... ...]])
  • an internal link or a category ending with 4 square brackets ([[...]]]]): remove the extra 2 square brackets

You can tell me if it would be ok to add other fixes for #46 later if I find other cases where I'm sure of the fix.

I already ran this fix on frwiki (with eventually additional modifications), results can be seen in WikiCleanerBot's contributions ("Lien interne mal ouvert" is the French translation for this problem). For enwiki, I don't plan to have additional modifications (unless you think it's better to also fix some additional CW errors in the process).

For the test run, I can stop WPCleaner after a few modifications to let you check what the results are.

For the first articles in the list provided by Check Wiki #46, WPCleaner should do the following:

  • 1 John 5: nothing, not a simple case ([[First Epistle of John]] 4:11–12, 14–17]])
  • 10 nanometer: removing the extra square bracket by replacing <ref>[ triple patterning for 10nm metal]]</ref> by <ref>[ triple patterning for 10nm metal]</ref>
  • 11AM (TV program): nothing, not a simple case ([Vincent Smith (television presenter)|Vincent Smith]]), but it may be a candidate for more ways of fixing ([XXX (YYY)|[XXX]] replaced by [[XXX (YYY)|XXX]]) in a later version
  • 12 Scorpii: nothing, not a simple case ({{odlist | B=c<sup>1</sup>]])
  • 16-cell: nothing, not a simple case ([<nowiki/>[4,2<sup>+</sup>,4]]), and it would be a cosmetic fix only to remove a false positive
  • 1746 English cricket season: nothing, not a simple case (*14 July - Addington & Bromley]])
  • 185th Air Refueling Wing: nothing, not a simple case (the group received General Dynamics F-16 Fighting Falcon]]s)
  • ...
  • 1970 Law on dangerousness and social rehabilitation: removing the extra square bracket by replacing twice <ref name="BOE">[ Ley 16/1970, de 4 de agosto, sobre peligrosidad y rehabilitación social]]. Boletín oficial del estado español (B.O.E) nº187 de 6/8/1970. Incluye un PDF con el texto de la ley y su análisis jurídico]</ref> by <ref name="BOE">[ Ley 16/1970, de 4 de agosto, sobre peligrosidad y rehabilitación social]. Boletín oficial del estado español (B.O.E) nº187 de 6/8/1970. Incluye un PDF con el texto de la ley y su análisis jurídico]</ref>
  • 1970–71 Iraqi Central Premier League: removing the extra square bracket by replacing twice <ref>[ Iraq 1970/71]]</ref> by <ref>[ Iraq 1970/71]</ref>


What happens in a case like

? Could it handle conversions to &#91; / &#93; Headbomb {t · c · p · b} 12:44, 3 April 2019 (UTC)

Hi Headbomb. With the current code, nothing happens as it doesn't respect the check that it doesn't contain another square bracket. So for a first run, it would simply be ignored. It may fall in the other fixes I could add for #46, but I would first have to try it to see if there are no situations where this replacement shouldn't be used. --NicoV (Talk on frwiki) 16:21, 3 April 2019 (UTC)
@NicoV: Alright, then that case can always be handled separately.   Approved for trial (50 edits).. Headbomb {t · c · p · b} 16:25, 3 April 2019 (UTC)
@Headbomb: I run WPCleaner on a few pages, one at a time, and everything was good. I let it run on an extra dozen, and I found several articles with Source: [Source: [ Soccerway]] which was replaced by Source: [Source: [ Soccerway] as specified, where the correct human fix would be Source: [ Soccerway]. I wonder if I should make this an exception also (a stray [ before the external link, in the same line). What do you think? --NicoV (Talk on frwiki) 07:36, 4 April 2019 (UTC)
I think those would be malfunctions, and that they need to be addressed, either by skipping those cases, or handling them correctly. But it is not at all clear to me that [Source: link] is something that needs to be 'fixed' in the first place, so unless a clear and compelling argument can be made, I say skip it. Headbomb {t · c · p · b} 14:02, 4 April 2019 (UTC)
Ok. I will modify WPCleaner to skip such cases, and I will run it on a few pages. I'll keep you posted. --NicoV (Talk on frwiki) 14:17, 4 April 2019 (UTC)

@NicoV: remember that you need to post the trial results here once they are completed. Headbomb {t · c · p · b} 14:49, 4 April 2019 (UTC)

@Headbomb:, I ran WPCleaner for 49 edits (miscounted, stopped 1 too early), here are the results (comment of the edits is "v2.01b - Task 3 - WP:WCW project (Square brackets without correct beginning)"):
  • First part mentioned above with 17 edits, where I detected the case with stray opening square bracket before the external link: I modified WPCleaner to apply no automatic fixing in such case, and updated the description above accordingly. I also fixed manually the few pages where the automatic fix wasn't the best one to do by removing the extra "Source: [".
  • Second part with the extra 32 edits, which seems ok to me.
The trial edits seem to confirm the estimated number of pages that will be modified if run on the entire list, a few hundreds. --NicoV (Talk on frwiki) 10:53, 6 April 2019 (UTC)
@NicoV: if the trial is complete you need to add {{BotTrialComplete}} so that AnomieBOT moves this from "in trial" to "trial complete" at WP:BRFA. Based on what you wrote, it seems like the trial is done; is it? --DannyS712 (talk) 01:01, 10 April 2019 (UTC)
Thanks DannyS712.   Trial complete. --NicoV (Talk on frwiki) 11:35, 10 April 2019 (UTC)
  • [24], [25], [26], [27] are tricky. I'll need to think a bit, but you're welcome to brainstorm about them in the meantime, or come up with improved logic for those cases. Headbomb {t · c · p · b} 16:46, 10 April 2019 (UTC)
    Ok Headbomb. Here's my thoughts about them:
    • Adam Davies (footballer, born 1992):
      • Before the edit the reference was really malformed, [ {{Webarchive|url= |date=7 September 2018 }} Barnsley F.C. Adam Davis]], which gives [ Archived 7 September 2018 at the Wayback Machine Barnsley F.C. Adam Davis]]
      • The fix doesn't make it worse, by only balancing the square brackets, but leaving out the real problem. Right now, I don't see any improvement to the logic or to avoid modifying it, but the current fix is still slightly better than the original. So, I think we could let it be that way...
      • Obviously, the correct fix would be something like the following, but I can't see any way to make that fix by bot... Barnsley F.C. Adam Davis Archived 7 September 2018 at the Wayback Machine
    • Air America (airline):
      • Before the edit it was malformed, [|Charles Charles Herrick]] which gives Charles Herrick]
      • The fix removes the extra square bracket, which gives Charles Herrick.
      • This fixes one of the problem, the other one being the incorrect target of the external link. It's still an improvement.
    • April 1913 and April 1915 are very similar to the previous one : the link has 2 problems (incorrect target, extra square bracket), the bot fixes one of them.
    --NicoV (Talk on frwiki) 20:06, 10 April 2019 (UTC)
I agree they're both improvements. Just wondering if those other errors are tracked somewhere. It's not really a strike against this bot, just wondering out loud if this is something this bot should also tackle, or if it's best left to another bot/tool. Headbomb {t · c · p · b} 20:09, 10 April 2019 (UTC)
Hi Headbomb.
For the first one, I don't see any way to fix it automatically, or even detect it properly (for example some templates can be used inside external links texts, some can't...)
For the other ones, I can easily add code to track them with WPCleaner (pipe inside an external link target). For automatic fixing, it's probably possible for some of them: when the pipe is at the end seems a good first candidate for automatic fixing, maybe the repetition of the first word is another good candidate. I can try to add that also, but it will require testing and I suggest to make it a separate request for approval: in the end, if they are all accepted, I will combine them with the dump analysis task (automatic fixing of some errors based on CW lists, dump analysis to create WPC lists, automatic fixing of some errors based on WPC lists). --NicoV (Talk on frwiki) 07:56, 11 April 2019 (UTC)

GreenC bot 13

Operator: GreenC (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 00:24, Wednesday, April 3, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): GNU Awk and BotWikiAwk framework

Source code available: Yes

Function overview: Convert instances of [[Batting average]] to either [[Batting average (cricket)]] or [[Batting average (baseball)]] as appropriate to the topic of the article where the wikilink occurs.

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#Deal_with_links_to_split_article_(Batting_average)

Edit period(s): one time

Estimated number of pages affected: ~ 15,000

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Find all articles containing [[Batting average]] (or with a pipe) and check page categories to determine if it should link to [[Batting average (cricket)]] or [[Batting average (baseball)]]. If the strings "baseball" and "cricket" both appear in an article skip and log. In terms of generating the target article list, for cricket check for: Category:Cricketers, Category:Seasons in cricket and Category:Years in cricket (about 3k links). For baseball: Category:Baseball players, Category:Seasons in baseball, Category:Years in baseball (about 12k links).


Typo, fixed. -- GreenC 00:33, 3 April 2019 (UTC)
@GreenC: also, what do you mean by If categories in both skip and log - is this supposed to be if in both categories? --DannyS712 (talk) 00:36, 3 April 2019 (UTC)
Actually I adjusted per BOTREQ discussion, string anywhere in the article not just category. -- GreenC 01:15, 3 April 2019 (UTC)
I thought I'd do some preliminary checks on this proposed solution. I ran through the first 10% of the 12k baseball players which link to Batting average in AWB, searching for "cricket". I found 2 categories Category:Battle Creek Crickets players and Category:Binghamton Cricket players which are for baseball teams with "cricket" in their title. I'll do these via AWB now so they hopefully don't get caught up in the Bot run. I did find 1 proper mention of cricket (a baseball player from Bahamas who also played cricket in his youth), which I've manually processed. I'll try the same but for cricket players. Spike 'em (talk) 10:30, 9 April 2019 (UTC)
The first 1k cricketers threw up 6 or so articles which contain "baseball", some just in hat-notes, so between the groups there is less than 1% that may need manual checks. Spike 'em (talk) 10:56, 9 April 2019 (UTC)
The bot will catch/log/skip these if it sees the word "baseball" and "cricket" anywhere in the wiki source (article, top hats, categories etc..) -- GreenC 14:06, 9 April 2019 (UTC)
  • Not sure if you've dealt with this already, but I think it should be replacing [[batting average]] with [[Batting average (cricket)|batting average]] or [[Batting average (baseball)|batting average]] Mmitchell10 (talk) 09:42, 6 April 2019 (UTC)
This is the purpose of the bot. -- GreenC 18:58, 7 April 2019 (UTC)
At the moment it says it will replace [[batting average]] with [[Batting average (cricket)]], whereas it should be replacing it with [[Batting average (cricket)|batting average]]. I'm just flagging that we need to make sure we don't change the text that appears. But maybe that's obvious/implicit and I'm being too picky! Mmitchell10 (talk) 19:33, 8 April 2019 (UTC)
Yeah sorry thought it was evident, it will be obvious what it does if it had permission to do a trial run :) -- GreenC 20:31, 8 April 2019 (UTC)
  • Also, we don't want to upset the capitalisation, so need to replace Batting with Batting, and batting with batting. Mmitchell10 (talk) 09:42, 6 April 2019 (UTC)
That is what is does. -- GreenC 18:57, 7 April 2019 (UTC)
  Approved for trial (50 edits). This is WP:CONTEXTBOT, but a well-constrained one. Let's trial. Headbomb {t · c · p · b} 05:08, 9 April 2019 (UTC)
  • A few more piped terms to add to the search and replace : [[Batting average#Major League Baseball| and [[Batting average#Baseball| to [[Batting average (baseball)| and [[Batting average#Cricket| to [[Batting average (cricket)| Spike 'em (talk) 16:21, 9 April 2019 (UTC)
Ah just saw this, posted the same question on your talk page :) Ok can convert these then. -- GreenC 16:40, 9 April 2019 (UTC)
  •   Trial complete. - Diffs. -- GreenC 17:03, 9 April 2019 (UTC)
The edit summaries for the first 45 were a little off, fixed in the final 5 edits. Otherwise see no problem, but appreciate anyone taking a look. -- GreenC 17:03, 9 April 2019 (UTC)
Looks good to me. What happens if you run across things like Category:Battle Creek Crickets players? Headbomb {t · c · p · b} 17:10, 9 April 2019 (UTC)
The bot will detect the article contains both strings "baseball" and "cricket" then log and skip. Spike'em estimates there might be 1% like this which is doable for manual context edits. -- GreenC 17:28, 9 April 2019 (UTC)
  Approved for extended trial (Category:Battle Creek Crickets players and Category:Binghamton Cricket players). Then let's try to break the bot and see if it makes any edits there. Headbomb {t · c · p · b} 17:44, 9 April 2019 (UTC)

───────────────────────── Done, no edits made.

bot log
  • Clint Rogge ---- error Contains baseball and cricket
  • Clint Rogge ---- error : batting average wikilink not found
  • Pat Duncan (baseball) ---- error Contains baseball and cricket
  • Pete Fahrer ---- error Contains baseball and cricket
  • Pat Duncan (baseball) ---- error : batting average wikilink not found
  • Pete Fahrer ---- error : batting average wikilink not found
  • Oscar Graham ---- error Contains baseball and cricket
  • Oscar Graham ---- error : batting average wikilink not found
  • Harry LaRoss ---- error Contains baseball and cricket
  • Dutch Zwilling ---- error Contains baseball and cricket
  • Ray Brubaker ---- error Contains baseball and cricket
  • Charlie Krause ---- error Contains baseball and cricket
  • Larry Gilbert (baseball) ---- error Contains baseball and cricket
  • Ray Brubaker ---- error : batting average wikilink not found
  • Charlie Krause ---- error : batting average wikilink not found
  • Wese Callahan ---- error Contains baseball and cricket
  • Wese Callahan ---- error : batting average wikilink not found
  • Baby Doll Jacobson ---- error Contains baseball and cricket
  • Harry LaRoss ---- error : batting average wikilink not found
  • Dutch Zwilling ---- error : batting average wikilink not found
  • Garland Nevitt ---- error Contains baseball and cricket
  • Garland Nevitt ---- error : batting average wikilink not found
  • Baby Doll Jacobson ---- error : batting average wikilink not found
  • Grover Hartley ---- error Contains baseball and cricket
  • Bill Culp ---- error Contains baseball and cricket
  • Bill Culp ---- error : batting average wikilink not found
  • Katsy Keifer ---- error Contains baseball and cricket
  • Pete Compton ---- error Contains baseball and cricket
  • Katsy Keifer ---- error : batting average wikilink not found
  • Pete Compton ---- error : batting average wikilink not found
  • Larry Gilbert (baseball) ---- error : batting average wikilink not found
  • Grover Hartley ---- error : batting average wikilink not found
  • Hardy Richardson ---- error Contains baseball and cricket
  • Hardy Richardson ---- error : batting average wikilink not found
  • Art Allison ---- error Contains baseball and cricket
  • Art Allison ---- error : batting average wikilink not found
  • John Morrissey (baseball) ---- error Contains baseball and cricket
  • John Morrissey (baseball) ---- error : batting average wikilink not found
  • Buttercup Dickerson ---- error Contains baseball and cricket
  • Buttercup Dickerson ---- error : batting average wikilink not found
  • Bobby Clack ---- error Contains baseball and cricket
  • Bobby Clack ---- error : batting average wikilink not found
  • Bill Smiley ---- error Contains baseball and cricket
  • Bill Smiley ---- error : batting average wikilink not found
  • Blondie Purcell ---- error Contains baseball and cricket
  • John Richmond (shortstop) ---- error Contains baseball and cricket
  • Blondie Purcell ---- error : batting average wikilink not found
  • John Montgomery Ward ---- error Contains baseball and cricket
  • Frank Heifer ---- error Contains baseball and cricket
  • John Montgomery Ward ---- error : batting average wikilink not found
  • Frank Heifer ---- error : batting average wikilink not found
  • Harry Arundel ---- error Contains baseball and cricket
  • John Richmond (shortstop) ---- error : batting average wikilink not found
  • Harry Arundel ---- error : batting average wikilink not found
  • Jim Whitney ---- error Contains baseball and cricket
  • John Shoupe ---- error Contains baseball and cricket
  • Horace Phillips (baseball) ---- error Contains baseball and cricket
  • John Shoupe ---- error : batting average wikilink not found
  • Horace Phillips (baseball) ---- error : batting average wikilink not found
  • Ed Kennedy (outfielder) ---- error Contains baseball and cricket
  • Ed Kennedy (outfielder) ---- error : batting average wikilink not found
  • Jim Whitney ---- error : batting average wikilink not found
  • John McGuinness (baseball) ---- error Contains baseball and cricket
  • John McGuinness (baseball) ---- error : batting average wikilink not found
  • Hal McClure ---- error Contains baseball and cricket
  • Hal McClure ---- error : batting average wikilink not found
  • Pop Smith ---- error Contains baseball and cricket
  • Pop Smith ---- error : batting average wikilink not found

-- GreenC 18:07, 9 April 2019 (UTC)

I checked through a reasonable set of them and they look good to me too. Spike 'em (talk) 10:40, 10 April 2019 (UTC)
  •   Trial complete. -- GreenC 19:30, 9 April 2019 (UTC)

GreenC bot 11

Operator: GreenC (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 17:02, Sunday, March 3, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): GNU Awk

Source code available: TBU

Function overview: Add {{Unreferenced}} template to target articles. {{Unreferenced}} currently has about 220,000 instances the bot will add about 25,000 more or about a 10% increase.

Links to relevant discussions (where appropriate): Wikipedia:Village_pump_(proposals)#Bot_to_add_Template:Unreferenced_and_Template:No_footnotes_to_pages_(single_run) (RFC)

Edit period(s): one time run

Estimated number of pages affected: 25,000 (est)

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details:


Informing previous BRFA participants of new BRFA: @Xaosflux, Headbomb, Ajpolino, MZMcBride, SD0001, Xover, DannyS712, and Wugapodes: -- GreenC 17:24, 3 March 2019 (UTC)

  • BAG notes, I encouraged GreenC to restart this BRFA. Avoiding FP's is important, and this is not an "easy" filter. The prior RfC is supportive in general of the tagging - but it has to be accurate. There could be a small margin of error, but we need to focus on reducing it. Feedback on FP avoidance and examples is extremely welcome below, thank you! — xaosflux Talk 17:47, 3 March 2019 (UTC)

Hi GreenC, for Skip if section named "External links", "References", "Sources", "Further reading", "Bibliography", "Notes", "Footnotes" if you have that as equals can you change it to contains? I've run across pages with sections such as "Literature and References". — xaosflux Talk 17:35, 3 March 2019 (UTC)

Will do. -- GreenC 18:13, 3 March 2019 (UTC)

I've had a look at a dozen or so of the articles identified at User:GreenC/data/noref. Here are a few articles that point to ways to potentially adjust the selection criteria:

  1. Vemuri – this is a surname article and it doesn't need sources as it's mostly serves as a disambiguation page
  2. Dom Aleixo Timorese – not unsourced. I guess it needs to be taken into account that the bibliography section might have a different title ("Literature" in this case). Also, if articles with external links are to be excluded, then articles with {{Authority control}} will need to be excluded as well.
  3. Callichirus – similarly, it has a {{Taxonbar}}.
  4. Fukushima's Theorem – it has hand-formatted citations in a section called "Journal articles"
  5. Cordichelys – weren't stubs meant to be excluded?
  6. There are quite a few articles on films, TV episodes, books or music albums (like Parade (Bottom) or The Platinum Collection (Blue album)) that indeed list no sources, but a fair amount of whose content – plot synopses, track listings and the like – are obviously sourced to the publication that is the subject of the article. I don't think tagging with {{unsourced}} is a good idea, but there certainly is an underlying issue and that's the fact that they don't use any secondary sources. A more appropriate tag would probably be {{Primary sources}}, though it use normally entails some form of editorial judgement. – Uanfala (talk) 17:36, 3 March 2019 (UTC)
  1. It will now filter anything with "surname" in a category name. Normally it would have been filtered out by one of the index templates in Category:Set index article templates but the page has none which is in error.
  2. {{Authority control}} can be filtered. "Literature" can be added to the section title list.
  3. {{Taxonbar}}, {{Authority control}} and others are now removed via Category:External link templates to linked data sites with reciprocal links
  4. Section titles with "Articles" are now filtered (the section title words are case and plural sensitive)
  5. It does not tag articles marked as stubs in an abundance of caution but that doesn't preclude stubby articles without sources can't or shouldn't be tagged. The article is unsourced and should be tagged. It was actually tagged previously, but some sort of deletion-by-redirect reversal caused them to be lost. The bot uncovered this problem.
  6. There is no source, primary or otherwise. The presumption of a source is not the same as a literal source ie. what is the name of the source, where is it located, who is the author, what date was it accessed etc.. all that is missing. There is no verifiable source. That is why we have this tag, so the community can be made aware of articles like this that need a source. -- GreenC 18:13, 3 March 2019 (UTC)

{{BAGAssistanceNeeded}} - the bot is ready to begin trials. -- GreenC 14:08, 6 March 2019 (UTC)

  Approved for trial (50 edits or 14 days). Go ahead a run a trial with your adjusted parameters. — xaosflux Talk 00:46, 7 March 2019 (UTC)
  Trial complete. diffs. -- GreenC 18:03, 13 March 2019 (UTC)
I skimmed these pretty quickly so I may have missed some. Thoughts:
  1. Sawsan, looks like the same problem as Vemuri discussed above. Can you also just skip articles with "Given name" categories? (same for Gaurav)
  2. Municipalities of Central Finland is basically a list article, but I can't think of a clever way to skip it. Maybe it's best articles like that have a reference anyway...
  3. Communes_of_the_Aisne_department is a bona fide list. Maybe you can skip articles with "List" in the category names? This one was in Category:Lists of communes of France. (Members of the 5th Dáil and Duchess of Brabant (by marriage) would also be skipped with this).
Otherwise looks great! Ajpolino (talk) 20:43, 13 March 2019 (UTC)
Given-name articles have sources (see the Category tree for example Abdul Hamid or William or Alexander). List-of articles also have sources eg. List of counties in New York. -- GreenC 21:27, 13 March 2019 (UTC)
@Ajpolino: Courtesy ping ^ --TheSandDoctor Talk 16:58, 16 March 2019 (UTC)
I'm not arguing that any kind of article ought not have references, but we pitched this in the RfC as a conservative bot skipping stubs, lists, et al. So if it's not too much trouble (and maybe it is), I think it'd be best if we skipped lists even if they aren't titled "List of..."... Also someone added a source to one of the articles you tagged in your most recent test run. So that's somewhat validating. That was kind of the point of all this. Thanks for all your work! Ajpolino (talk) 17:41, 16 March 2019 (UTC)
There was no 'pitch' to skip lists nor can I think of any reason to they have sources just list any other article. -- GreenC 18:40, 16 March 2019 (UTC)
Also, the most recent run suggests there will be around 10,000 edits not the 25,000 as originally thought, due to the additions of filters suggested by Uanfala. Each filter causes a significant reduction. To put 10,000 in perspective that is 0.00175 of all articles (about one-fifth of one percent) or an increase of {{unreferenced}} by 5%. These to me are conservative numbers. -- GreenC 18:52, 16 March 2019 (UTC)
Ah, sorry to be stuck on this point, but just to clarify does the bot in its current configuration skip articles that have titles "List of..."? I think that was in your original exclusion list (per the old BRFA) but perhaps you've decided against it. Ajpolino (talk) 20:00, 16 March 2019 (UTC)
Ah indeed it is filtering 'list of' articles, sorry! Not sure what I was thinking, loosing track. OK, more filtering can be be done on the category layer as you suggested. My code notes say the reason for filtering 'list of' articles it was picking up too many false positives. Also rethinking Given-name articles, those also are already filtered by way of the Set Index templates and those showing up here are edge cases that are not properly templated, so they should also be filtered on the category level. Thanks for your better memory keeping this straight :) -- GreenC 22:19, 16 March 2019 (UTC)

{{BAGAssistanceNeeded}} - above new filters added, ready for next trial, recommend another 50. -- GreenC 14:09, 17 March 2019 (UTC)

  Approved for extended trial (300 edits). @GreenC: I'd like to see a bigger trial here, odd cases can be hard to find until this has more volume. — xaosflux Talk 17:47, 22 March 2019 (UTC)

Hi Ajpolino and anyone else: Trial complete. The bot's contrib history is mixed in with other tasks so I made User:GreenC/data/noref/trial March 28 for 300. Feel free to edit this page with notes and comments. I have not checked yet but appreciate help finding problems and possible solutions. It was about 100k articles. -- GreenC 15:34, 29 March 2019 (UTC)

Made it through the first 50. In general looks great! There was one disambiguation page that has a category that's a sub-cat of Category:Disambiguation pages (I didn't know that category had sub-categories; learn something new every day). It doesn't need a ref, so maybe you could either skip all sub-cats of Category:Disambiguation pages, or if it's easier just skip categories with "Disambiguation" in the name? Annotated it on your list. Might get a chance to look through the rest of them in a bit. Ajpolino (talk) 20:59, 29 March 2019 (UTC)

  Trial complete. In the future, please remember to use the template (or others relevant) GreenC. Otherwise, this sort of thing can go unnoticed (even when directly viewing this BRFA) and lead to unnecessary waits. --TheSandDoctor Talk 21:13, 29 March 2019 (UTC)

I'm not sure if Line of succession to the Moroccan throne should have been tagged - it clearly says "According to..." before the info given. --DannyS712 (talk) 22:25, 30 March 2019 (UTC)
More analysis:
I have looked through all 301 pages edited, and there are a number of false positives. I didn't list a few of them because they were the same issue (list, dab, etc) but a number had inlice sources or a note at the bottom explaining the source and were still tagged. --DannyS712 (talk) 22:58, 30 March 2019 (UTC)
DannyS712, I have copied your comments as in-line annotations to the list. When I am done responding/fixing there will ping. -- GreenC 00:55, 31 March 2019 (UTC)
@DannyS712: - inline response to above. Feel free to continue the inline discussions it helps me to keep it organized in one place/line. -- GreenC 02:10, 31 March 2019 (UTC)
@GreenC: Seen - I left a note on your talk page with a question. Thanks for pointing out the outlines and deaths issues. --DannyS712 (talk) 02:13, 31 March 2019 (UTC)
Answered there and anything else let me know,thanks for checking through these, great improvements. -- GreenC 16:47, 31 March 2019 (UTC)

Because of the number of false positives discovered by DannyS712, and new filters added, I think it would be a good idea to next test with a dry run ie. post a list of 300 like before, but it won't make the final step of adding the tag, only listing which articles it would tag. I'll start this process now and post results when ready. -- GreenC 14:33, 2 April 2019 (UTC)

@Ajpolino and DannyS712: Next round dry-run trial results to test the latest filters: User:GreenC/data/noref/trial April 3 for 300 -- GreenC 15:25, 3 April 2019 (UTC)

I checked all 300, added inline comments, and added new filters. Two unable to resolve: 1920 Toronto municipal election and Gaius (biblical figure) (might be OK to tag). -- GreenC 16:40, 3 April 2019 (UTC)

Started another 300. -- GreenC 16:47, 3 April 2019 (UTC)

@GreenC: I didn't get to it between the "another 300" and the next dry-run trial, but I analyzed the trial April 3 for 300 link to added - see my notes there --DannyS712 (talk) 05:42, 4 April 2019 (UTC)

New trial results: User:GreenC/data/noref/trial April 4 for 300 -- GreenC 16:24, 4 April 2019 (UTC)

@GreenC: reviewed --DannyS712 (talk) 01:52, 5 April 2019 (UTC)
I saw your responses - I guess I would be more conservative with tags, but thats just my preference, and shouldn't be interpreted in my role as the closer of the discussion. --DannyS712 (talk) 01:59, 8 April 2019 (UTC)
DannyS712, appreciate your help thus far this is a difficult bot and a lot of work. I was assuming your involvement here was a personal interest. You have made a lot of good filter recommendations that have been implemented that have improved the bot. There are some I disagree with and I would be happy to demonstrate any of those in two ways: by adding sources to them, and showing other articles like them that have sources. There is also the problem some of these can't be effectively filtered so those might be a little less conservative with tagging, yet they can be justified as needing sources. -- GreenC 14:24, 8 April 2019 (UTC)

@Ajpolino and DannyS712: Next 300: User:GreenC/data/noref/trial April 8 for 300 -- GreenC 00:31, 9 April 2019 (UTC)

Danny checked already. Some new filters added. Number of problems are much less. Will start processing the next 300. I think we should consider tagging the previous 900 since they have been manually verified, minus the ones identified as a problem. It is about 9% complete out of 5.5 million. -- GreenC 00:31, 9 April 2019 (UTC)
@GreenC: if you want I can do that with AWB - I personally checked almost every page, and definitely the last 600 --DannyS712 (talk) 00:53, 10 April 2019 (UTC)
@DannyS712: Sounds great. Thanks for the offer! There is also another 300 ready to post. Would you do all 1200? -- GreenC 14:57, 10 April 2019 (UTC)
@GreenC: Maybe at the end.   Doing... User:GreenC/data/noref/trial April 4 for 300 --DannyS712 (talk) 20:51, 10 April 2019 (UTC)
  Done --DannyS712 (talk) 21:12, 10 April 2019 (UTC)
Verified about 60%, skipped the obvious sports and music articles, found one problem with a dab page. -- GreenC 15:32, 10 April 2019 (UTC)
I assumed you did the first 60%, so I went through 200-300. In general, looks great! One wishy-washy kinda disambiguation page. One list article that should probably be skipped (or better yet, replaced by a category), and two lists that should probably have references. Thanks again for all the leg work you're doing on this GreenC!! Ajpolino (talk) 19:35, 10 April 2019 (UTC)
  • @Ajpolino and DannyS712:. Taking stock, thinking the bot is running pretty clean now and maybe it's time to ping for BAG help. What do you think, or do you want to do more trial-trials? -- GreenC 22:10, 10 April 2019 (UTC)
    @GreenC: I think its good to go - the last run didn't have any true false positives --DannyS712 (talk) 22:21, 10 April 2019 (UTC)
    Also I'm not going to tag manually the April 8 and 10 results - maybe have those as an extended trial that the bot actually edits, after implementing the filters, so that anyone who notices a reference we missed has a chance to speak up? --DannyS712 (talk) 22:22, 10 April 2019 (UTC)
    Ok great. Ajpolin is probably offline but stated "generally looks great" so taking that as encouragement to keep going. Good idea re: having the bot do it. -- GreenC 23:28, 10 April 2019 (UTC)
  •   A user has requested the attention of a member of the Bot Approvals Group. Once assistance has been rendered, please deactivate this tag by replacing it with {{tl|BAG assistance needed}}. .. the bot is ready to resume live trials. It might be the April 3, 8 & 10 trial-trials comprising ~900 articles (which have been manually verified but not edited yet) and/or new. -- GreenC 23:28, 10 April 2019 (UTC)

JJMC89 bot III

Operator: JJMC89 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 07:23, Sunday, February 24, 2019 (UTC)

Function overview: Process WP:CFD/W and its subpages excluding WP:CFD/W/M

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available:

Links to relevant discussions (where appropriate): WP:BOTREQ#Categories for Discussion bot (permalink)

Edit period(s): Hourly

Estimated number of pages affected: Millions

Namespace(s): Many

Exclusion compliant: delete/move: no; edit: yes

Adminbot: Yes

Function details: Process WP:CFD/W and its subpages excluding WP:CFD/W/M, moving and deleting categories and re-categorizing pages as specified



Since Cyde is inactive, I am requesting t:o take over the task so that bugs can be fixed and feature requests implemented. Additionally, Cydebot will stop functioning at the end of March due to the Toolforge Trusty deprecation unless it is migrated to Stretch. The code is the based on the code that Cydebot is running. — JJMC89(T·C) 07:23, 24 February 2019 (UTC)

You say it's "based on" Cydebot's code, presumably meaning that you made changes. Can you please summarize the effects of these changes? עוד מישהו Od Mishehu 08:45, 24 February 2019 (UTC)
Cyde's code is part of Pywikibot, but Cydebot is not using the current version. I haven't made changes to the scripts yet, but other maintainers have. There haven't been any changes that would change the functionality of this task. — JJMC89(T·C) 22:07, 24 February 2019 (UTC)
I've now rewritten the code that parses the working page. The code that moves/deletes the categories and re-categorizes the pages is still the same as Cydebot's. — JJMC89(T·C) 07:32, 15 March 2019 (UTC)

Fluxbot is already approved for this task. {{3x|p}}ery (talk) 20:00, 24 February 2019 (UTC)

@Pppery: As far as I can tell the last edit Fluxbot made under task1 (cfds) was in July 2017. Pinging @Xaosflux who may be able to fill us in --DannyS712 (talk) 20:06, 24 February 2019 (UTC)
As an AWB bot, Fluxbot doesn't operate without operator intervention. It also isn't an adminbot. — JJMC89(T·C) 22:07, 24 February 2019 (UTC)
@JJMC89, Pppery, and DannyS712: yea, Fluxbot for CFDS is only on-demand, I used to process CFDS regularly, but more efficient bots came around. — xaosflux Talk 22:27, 24 February 2019 (UTC)
Thanks, @JJMC89.
In principle, I very much welcome idea of a new bot with extended functionality. (There are some major gaps in the current feature set).. However, I think that @Black Falcon was a bit hasty in posting the request at WP:BOTREQ when the discussion at WT:Categories_for_discussion/Working#Cydebot_replacement had few participants (I think only 4) and had not been widely notified; the suggested extra functionality needs more discussion.
However, it now turns out that a replacement is needed soon, so thanks to JJMC89 for stepping up in the nick of time.
So I hope that any new bot will run initially with the same functionality as CydeBot. Any enhancements need a clear consensus, which we don't yet have. --BrownHairedGirl (talk) • (contribs) 05:05, 10 March 2019 (UTC)
At a minimum, the new bot should process the main /Working page:
  • Deleting, merging, and renaming (i.e. moving) categories, as specified, with appropriate edit summaries.
  • Deleting the old category with an appropriate deletion summary.
  • In the case of renaming, removing the CfD notice from the renamed category.

Ideally, it would also do some or all of the following:

  • Process the /Large and /Retain subpages.
  • Accept manual input when a category redirect should be created—for example, by recognizing leading text when a redirect is wanted, such as * REDIRECT Category:Foo to Category:Bar.
  • Recognize and update category code in transcluded templates. This would need to be discussed/tested to minimize errors and false positives.
  • Recognize and update incoming links to the old category. This would need to be discussed/tested to minimize errors and false positives.

-- Black Falcon (talk) 20:48, 18 February 2019 (UTC), Wikipedia:Bot requests#Categories for Discussion bot

@BrownHairedGirl: Thanks for pinging me, and I don't disagree with you, fundamentally. More participation would undoubtedly have been better, and I probably should not have assumed many people watchlisted WP:CFD/W, but all the manual work required to close out Cydebot's processing was becoming quite tiresome after two months. In terms of the functionality I requested, only the last three items are enhancements compared to what Cydebot currently does or did when it was functioning properly, and I did note the last two needed more discussion/testing. As I mentioned at WP:BOTREQ, I would be happy if a bot did just the first three items properly. -- Black Falcon (talk) 06:06, 10 March 2019 (UTC)
Thanks, @Black Falcon.
I haven't so far seen anything on the list of suggested improvements, which I would dislike, and I see much I am v keen on.
In particular, the work required to close out CydeBot's efforts is v onerous, and I dearly wish that could be improved. I have been cleaning up after WP:CFD 2019 February 16#North_Macedonia, which renamed ~650 categories, and so far it has taken me about 8 hours to process about 400 of those. There has to be a better way somehow.
I agree that the redirects thing should be simple. However, the precise details of what a bot would do in next two cases needs a lot of scrutiny. It would be easy for an ill-specified bot to wreak much havoc in templates. --BrownHairedGirl (talk) • (contribs) 06:24, 10 March 2019 (UTC)
This BRFA is to take over Cydebot's current functionality. The enhancements, particularly the last two points, are out of scope. After this is up and running (with any bugs are worked out), I'll be happy to look at making enhancements that have consensus. — JJMC89(T·C) 18:36, 10 March 2019 (UTC)
Thanks, @JJMC89. Sounds great. --BrownHairedGirl (talk) • (contribs) 18:38, 10 March 2019 (UTC)
Understood, and thank you. -- Black Falcon (talk) 20:19, 10 March 2019 (UTC)

{{BAG assistance needed}} I would like to get this up and running before Cydebot stops functioning. — JJMC89(T·C) 07:32, 15 March 2019 (UTC)

  Note: Cyde has migrated Cydebot to Stretch, so it is no longer in danger of dying at the end of the month. — JJMC89(T·C) 00:54, 21 March 2019 (UTC)
@JJMC89: I suspect there could be issues with both of these bots trying to run the same page on top of each other - do you want to proceed as a backup plan, or do you want to try to work in tandem going forward? — xaosflux Talk 19:40, 21 March 2019 (UTC)
@Xaosflux: I'm not sure, but there could be. It would probably be best to disable Cydebot since Cyde isn't fixing any of the bugs, which is the original reason it was requested for someone else to take over the task. — JJMC89(T·C) 03:28, 22 March 2019 (UTC)
Comment. Cydebot stopped running more than 24h ago, and we are currently paralyzed with categories. It were great if this request could be approved.--Ymblanter (talk) 19:04, 23 March 2019 (UTC)
I agree with @Ymblanter. Please can we make the switchover ASAP? --BrownHairedGirl (talk) • (contribs) 19:42, 23 March 2019 (UTC)
Let me reactivate {{BAG assistance needed}} Galobtter (pingó mió) 19:55, 23 March 2019 (UTC)
@Galobtter: this request is already 'active' - the task is not, as it hasn't even trialed, yet. — xaosflux Talk 20:10, 23 March 2019 (UTC)
Xaosflux The "reactivating" was of the BAG assistance needed template, to get a BAG to come by and issue a trial (since Cydebot is crucial to WP:CFD and so it'd be good for this to be trialed and approved soon as mentioned above). Galobtter (pingó mió) 20:15, 23 March 2019 (UTC)
@JJMC89: do you have "human friendly" documentation (e.g. what you will do, when you will do it, and under what conditions) for this? Since this bot is going to almost entirely be edits and actions you are responsible for, but will be triggered by other admins - it should be clear to other admins what their requests will do. — xaosflux Talk 20:43, 23 March 2019 (UTC)
@Xaosflux: If the information at WP:CFD/AI and WP:CFD/W, including the HTML comment, aren't sufficient I can write something more detailed. The intent is to function as Cydebot is intended to function. — JJMC89(T·C) 21:21, 23 March 2019 (UTC)
@BrownHairedGirl: did you have concerns that getting started under those "rules" would be an issues (would it be worse then just not running at all?) — xaosflux Talk 21:41, 23 March 2019 (UTC)
(ec) @Xaosflux: Cydebot has done its job almost unchanged for ~13 years. AFAIK, the only significant change was when category pages became movable in 2014. So the bot's functions are stable and well understood by the admins who instruct it. Since JJMC89's bot is designed to be a drop-in-replacement, I don't see any need for more docs as a precondition -- provided the new bot passes its test, without which this discussion is moot anyway.
Of course, if more functions are added, then docs will be appropriate. But right now a backlog is building up, so we need a bot and need it urgently. --BrownHairedGirl (talk) • (contribs) 21:47, 23 March 2019 (UTC)
  Approved for trial (250 edits). (edits/action) OK, do a short trial for proof of concept. — xaosflux Talk 23:05, 23 March 2019 (UTC)
  Trial complete. The trial was run on this version of WP:CFD/W. edits (deleted) logs Since the bot isn't an admin yet, categories were tagged for deletion instead of deleted. Some of them got tagged multiple times, but this is just an artifact of not being an admin and me stopping and starting the script to limit the trial edits. It has the same problem that Cydebot does where it doesn't delete (tag for deletion) the original category like it should. This should happen right after it finishes emptying the category. I think the issue is either caching or MediaWiki thinks the category still has members. (It checks that the category is empty before trying to delete.) You'll see for the categories being emptied, if the script is run again it will delete (tag for deletion) the second time around. This doesn't happen for moves/merges since there is a check that the destination category does not exist at the start of processing. (It will always exist a second time around.) This is something I can work on fixing. The fix could be 1) adding a delay before checking that the original category is empty, 2) moving categories without leaving a redirect, or 3) removing the check that the destination category must exist. — JJMC89(T·C) 01:55, 24 March 2019 (UTC)
Awesome! I have two thoughts, and hopefully they are both quick fixes. First, the summary should link only to the daily log page, not the section—partly, this would be to keep the summary shorter (which will be important when dealing with long category names), and partly because the section name is not always the same as the category name (e.g. group nominations). Second, and more importantly, moving categories should be done without leaving a redirect. Another bot automatically recategorizes any pages placed in a redirected category, and so auto-creating category redirects dramatically increases the likelihood for miscategorization—see WT:CFD/W#Cydebot replacement for just a few examples. This will take care of the issue in cases of renaming but not merging, where solutions 1 or 3 may be needed. -- Black Falcon (talk) 03:01, 24 March 2019 (UTC)
@Black Falcon: Changing the edit summary is easy. Personally, I dislike having to find the correct section when a daily log is linked without a section. The code doesn't assume that the section is the name of the category. It looks for the category as a heading or a link to the category in nomination part of each discussion on the daily log page and links to that section. I don't think there is a danger of running out of space in the edit summary, but if it is the appearance, what about piping it like [[Wikipedia:Categories for discussion/log/<date>#Section|Wikipedia:Categories for discussion/log/<date>]]? In not, I'll just use the link directly from CFDW. Something like this should suppress redirects when moving. Since we're moving without leaving a redirect, I could just delete the category regardless of it it has members for merge/empty. Otherwise, I will have to do some live testing figure out a good delay before checking if it is empty. — JJMC89(T·C) 04:56, 24 March 2019 (UTC)
@JJMC89: Many thanks for running the trial. That looks good.
I presume that the inclusion of section links in the edit summaries is an artefact of me having left the section links in the WP:CFDW pages: here[29] and [30].
I did so because I had noticed that CydeBot just used whatever was in that header, and it seemed to me that there is no advantage in omitting them now that the length of edit summaries is almost unconstrained. However, I had not foreseen that it would coincide with the bot trial, so sorry for any confusion.
I don't see any benefit in piping the summaries to hide the section link. That just wastes space in the summary and misleads the readers.
As to redirects, there are many situations where it is appropriate to have them, and it is much easier (and less error-prone) to delete the un-needed redirects than to create them when omitted. Personally, I would favour a flexible approach with some new language of bot instruction. So instead of an abrupt change from always-on to always-off, we need some discussion on how best to handle them.
The big picture is that on both edit summaries and redirects, JJMC89's bot seems to be behaving exactly as CydeBot does. @Black Falcon, I thought we had had agreed to leave the possibility of changes to the bot's behaviour for a separate discussion? I agree that we need some changes, but I don't agree that the issues in either case are quite as you describe them, and I am disappointed to see them being raised here and now. They both need more discussion at WT:CFDW, with more people involved.
So please can we keep this process focused on the v urgent task of getting a drop-in replacement for the now apparently-defunct Cydebot? That was the basis on which Xaosflux authorised this trial, so I am surprised to see proposed changes.--BrownHairedGirl (talk) • (contribs) 08:02, 24 March 2019 (UTC)
Section links do get carried over from CFDW; however, I did add a feature to try to add section links when not provided because I find them helpful. (See some details in my reply to Black Falcon above.) Based on Cydebot's code, Cydebot is supposed to be deleting the redirects, but it isn't. My bot inherited that issue since it uses the same code for that. What we do with the redirects right now isn't a big concern for me. A granular system can be worked out later. For now, I can either keep doing what Cydebot has been doing or fix the code to delete the redirects. — JJMC89(T·C) 08:35, 24 March 2019 (UTC)
I also do not think redirects should be indiscriminately deleted. We are checking the backlinks anyway, and deleting a redirect takes much less time than checking backlinks anyway.--Ymblanter (talk) 09:39, 24 March 2019 (UTC)
Please can we just stick for now with what Cydebot was actually doing? --BrownHairedGirl (talk) • (contribs) 10:21, 24 March 2019 (UTC)
This is exactly what my remark was about.--Ymblanter (talk) 10:35, 24 March 2019 (UTC)
Sorry, @Ymblanter. Indeed it was, and I meant to acknowledge that but didn't. (I'm a bit sleepy today). --BrownHairedGirl (talk) • (contribs) 11:25, 24 March 2019 (UTC)
  Done Now back to matching Cydebot's code for redirects. — JJMC89(T·C) 16:41, 24 March 2019 (UTC)
@BrownHairedGirl: As far as I can tell, auto-creating tens of thousands of cat-redirects was never an approved function for Cydebot, and a bot should not be mass-creating pages when there is such a high error rate. I agree that there some situations where a cat-redirect is appropriate but strongly disagree that it is much easier (and less error-prone) to delete the un-needed redirects than to create them when omitted. It is, in my experience (having spent countless hours deleting inappropriate cat-redirects), more time-consuming to properly delete a page than to undelete it or just type {{Category redirect|Foo}}, and the risk of error is inherently greater when an action is necessary to avoid errors (i.e. deleting an inappropriate cat-redirect) than when an action is optionally helpful (i.e. creating an appropriate cat-redirect).
@Ymblanter: Re: checking backlinks, unforrtunately, I think that will continue to have to be performed manually regardless of whether a cat-redirect is retained. A bot may be able to help, but I don't see a bot being able to account for and fix all possible scenarios of incoming links. -- Black Falcon (talk) 17:37, 24 March 2019 (UTC)
@JJMC89: I'm sorry, my intention was not to create more work for you. I will be appreciative of what you're doing however the code ends up, but I struggle with including a function that was never approved to begin with. In general, I've always struggled with the idea of bots auto-creating pages without narrow parameters and/or human input. As you pointed out, Cydebot was not supposed to be leaving redirects but started to do so at some point, and for very good reason in my view. -- Black Falcon (talk) 17:37, 24 March 2019 (UTC)
@Black Falcon, Leaving redirects is a normal function of page moves. It is appropriate in some cases, inappropriate in others, with lots of grey in between. I personally think you are being way too absolutist about this, and that there are better solutions than simply disabling all creation of redirects. That's why any change needs to be discussed.
But this page is not the place to be having that discussion.
And the effect of your insistence on using this discussion to change the behaviour of the bot is to delay the approval of the new bot at a point when the old bot is apparently gone.
Please can you explain why you object to having a proper consensus-forming discussion on this, e.g. at WT:CFDW? And why instead you are trying to use BRFA to unilaterally force a change in the way that the bot had worked for nearly 5 years? This is not the way that consensus is built.  } --BrownHairedGirl (talk) • (contribs) 18:07, 24 March 2019 (UTC)
Categories are not the same as other pages when it comes to leaving redirects after moves, because bots patrol cat-redirects and move any contents into the target category. When inappropriate cat-redirects are created, this leads to pages being miscategorized.
Please don't misunderstand me, I do agree that there is "lots of grey in between". Between the two extremes of clearly appropriate cat-redirects (e.g. hyphen-to-dash redirects) and clearly inappropriate cat-redirects (acronym-to-full name redirects), there are many others that are harder to categorize and even more that are better judged based on usefulness than appropriateness. This is precisely why, in my view, a bot should not be mass-creating cat-redirects without human intervention.
I do not object to a consensus-forming discussion, and I am agree we should probably continue this discussion elsewhere. What I do object to is purposely retaining a bug that performs a function was never approved in the first place. When it comes to auto-creation of pages, consensus historically has been to take the more cautious approach that minimizes the likelihood of errors. And from a BRFA standpoint, it is simply not feasible to demonstrate that the bug results in an acceptably low error rate. -- Black Falcon (talk) 18:47, 24 March 2019 (UTC)
@Black Falcon, when the bot has been doing that for five years, it's a WP:IMPLICITCONSENSUS. A discussion may produce a different result, but it does need a discussion.
Having the bot not create redirects would also create a high error rate. I am very well aware of what RussBot does, and why ambiguous redirects should not be retained, but simply switching from one binary option to another creates as many problems as it solves. Describing the current process as a bug with a simple fix misrepresents a complex reality.
so ... please can you clarify whether you oppose or support JJMC89's bot going live to do exactly as Cydebot did? --BrownHairedGirl (talk) • (contribs) 20:16, 24 March 2019 (UTC)
I do not think there is a "simple fix", but I think not creating redirects is a better fix and disagree it would "create as many problems as it solves". I acknowledge that this will result in some appropriate redirects not being created, which inconveniences navigation. However, the alternative results in inappropriate redirects being created, which produces actual errors through miscategorization. I would rather take inconvenience over outright errors. And, yes, we can consider other (non-binary) options, but those have not been tried before and would need to be discussed and tested.
Regarding consensus... a lack of any alternative is not indicative of consensus—implicit or otherwise. Quite simply, a malfunctioning bot was better than none at all. So, in response to your ultimate question, I will not oppose JJMC89's bot going live no matter what. My preference remains for the bot to do what Cydebot was approved to do, and not to continue mass page-creation on the basis of inertia stemming from an unfixed bug, but ultimately I will defer to BAG on the appropriate course of action. -- Black Falcon (talk) 02:51, 25 March 2019 (UTC)
It's bit tedious that @Black Falcon persists in describing as a "bug" something for which there is no consensus on optimal behaviour. I have seen no evidence that CydeBot was approved either to create redirects or not to create redirects:WP:Bots/Requests for approval/Cydebot_4 does not mention the possibility of redirects, either in the proposal or in the discussion. CydeBot was approved long before category page became movable, and its actions since moving was enabled have conformed to the default process for moving pages, i.e. leave a redirect.
This is simply an issue which has not been resolved since category pages became movable in 2014. It needs a consensus, but consensus-building is not helped by one editor trying to sway BAG by describing the bot's failure to conform to their personal preference as a "bug". --BrownHairedGirl (talk) • (contribs) 15:31, 25 March 2019 (UTC)
  A user has requested the attention of a member of the Bot Approvals Group. Once assistance has been rendered, please deactivate this tag by replacing it with {{tl|BAG assistance needed}}. — JJMC89(T·C) 04:27, 3 April 2019 (UTC)

DannyS712 bot 12

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 07:55, Tuesday, March 5, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AWB

Source code available: AWB

Function overview: Solve CW Error #17 - Category duplication

Links to relevant discussions (where appropriate): Wikipedia:Bots/Requests for approval/PkbwcgsBot

Edit period(s): One time run

Estimated number of pages affected: ~8000

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Currently, PkbwcgsBot only fixes a maximum of 300 instances of this error per week. While this certainly helps with the backlog, I'd like to do a one-time run to clean it out. Using AWB, I would do find-and-replace on the regex (\[\[Category:.*\]\])((.|\n)*)\1\n, replacing it with $1$2. I did a few of these manually to perfect the regex (eg [31], [32], [33]). While gen-fixes would fix this issue, they would not be activated, so no other edits would be made.


  Approved for trial (50 edits). Primefac (talk) 19:42, 10 March 2019 (UTC)

@Primefac: Should I have AWB autosave, or hit save manually? --DannyS712 (talk) 19:55, 10 March 2019 (UTC)
Does it matter? The results are the same. Primefac (talk) 19:55, 10 March 2019 (UTC)
  Trial complete. - 50 edits made. [34] (search for "Category duplication"). 2 issues: marking the pages as fixed within the wikiproject (I posted on the discussion page to figure out if the list is regenerated, or how to mark the pages automatically from AWB); and the regex doesn't work if the categories have different sort keys. So, the current regex would work on ~2300 pages. Once I finish those, I can look into a different regex that removes the second instance of a category, even if it has a different sortkey, but that is a separate issue, and should probably be a separate task --DannyS712 (talk) 21:30, 10 March 2019 (UTC)
Update - marking as done taken care of - automatically updated at the end of the day, so issue 1 is unneeded. Isuse 2 doesn't prevent the bot from running, but rather just limits the scope, so as far as I can tell I should be able to run the bot overnight (once its approved). Forgot to @Primefac last time. Thanks, --DannyS712 (talk) 22:33, 10 March 2019 (UTC)
So does that mean the bot skips pages that have duplicate cats but different sortkeys? Primefac (talk) 22:35, 10 March 2019 (UTC)
@Primefac: yes, as it currently stands, I have the bot skip pages where no changes are made. The only change that is made is based on the find-and-replace regex, which relies on either identical sortkeys or having no sortkeys at all. --DannyS712 (talk) 23:15, 10 March 2019 (UTC)
  • This task smells pretty badly of WP:COSMETICBOT - arguable fixing sortkey collisions would be more reader-useful, but that isn't even happening here. The page output before and after this task gets done has no change for readers. I see a minor benefit for editors. Is there room to combine this with another task? — xaosflux Talk 13:31, 12 March 2019 (UTC)
    @Xaosflux: WikiProject Check Wikipedia says that solving this error is Technically cosmetic, however this is either deemed too much of a bad practice, or prevents future issues deemed egregious enough to warrant a deviation from WP:COSMETICBOT. I'll try to create a regex for sort key collision, but for now I'd prefer to avoid combining my tasks, since I'm still only starting out as a bot-op. --DannyS712 (talk) 18:34, 12 March 2019 (UTC)
    @Xaosflux: I think I have a working regex to fix duplicate categorization even if one or both of the categories have sort keys:


Which is replaced with $1$2$3. This is, as you said, more reader-useful. What do you think of an extended trial? --DannyS712 (talk) 03:16, 14 March 2019 (UTC)

As a comment, I'm the one that added "Technically cosmetic, however this is either deemed too much of a bad practice, or prevents future issues deemed egregious enough to warrant a deviation from WP:COSMETICBOT." back then. And the reason is that I felt this is a future-proofing situation, because someone that wants to update a sort key might only do it in one place, and it won't kick in because there's a dual listing of the category. Or they might remove the category in one place, thinking they removed the category from the article, unaware there's a duplication of it. This wasn't RFC'd or BRFA'd before however. Headbomb {t · c · p · b} 23:46, 20 March 2019 (UTC)
Also is there a particular reason why genfixes are disabled for this? They'd seem worth making on top of the main task, IMO. Headbomb {t · c · p · b} 23:49, 20 March 2019 (UTC)
@Headbomb: I'd prefer not to automatically run genfixes, but if you'd like them enabled I can supervise an extended trial --DannyS712 (talk) 00:11, 21 March 2019 (UTC)
In my experience genfixes have been pretty stable and well tested for a while now. But it's your bot, so it's your call ultimately about whether or not you want to enable them. It just seems to me that if you're going to make some genfix-like edits (duplicate category removal is covered by them after all), you might as well enable the full suite of genfixes. Headbomb {t · c · p · b} 00:15, 21 March 2019 (UTC)
@Headbomb: in that case, sure. Would you be willing at approve an extended trial with both regexes (to also fix category duplication) and also genfixes? --DannyS712 (talk) 00:17, 21 March 2019 (UTC)
  Approved for extended trial (50 edits). I'll approve for further trial, but since I'm the one that added "Technically cosmetic, however this is either deemed too much of a bad practice, or prevents future issues deemed egregious enough to warrant a deviation from WP:COSMETICBOT." back then, I'll recuse myself from final approval. Headbomb {t · c · p · b} 00:21, 21 March 2019 (UTC)
@Headbomb:   Trial complete. 50 edits, see [35] - for the first 10, I forgot to enable genfixes. I didn't see any errors, except for one where there were multiple repeated categories, which I fixed. Thanks, --DannyS712 (talk) 00:56, 21 March 2019 (UTC)

How does the bot handle cases like this [36]? Should it? Headbomb {t · c · p · b} 01:08, 21 March 2019 (UTC)

@Headbomb: I don't really understand the first question - that case is a bot edit, and I think it should handle it exactly as it did. --DannyS712 (talk) 01:20, 21 March 2019 (UTC)
There are two clashing sortkeys. How does the bot decide which to remove? Headbomb {t · c · p · b} 01:25, 21 March 2019 (UTC)
@Headbomb: it always removes the second instance of a category. If one or both have sortkeys, it still just removes the second instance of the category, and keeps the first, regardless of if the second had a sort key and the first didn't, etc --DannyS712 (talk) 01:28, 21 March 2019 (UTC)
@Xaosflux: does running it with genfixes and fixing sortkeys too allay your concern about cosmetic-bot? If so, would you be willing to approve this? DannyS712 (talk) 02:39, 22 March 2019 (UTC)
"Adding genfixes" is never a selling point for me, maybe one of the other BAGers. — xaosflux Talk 02:44, 22 March 2019 (UTC)
Well, the genfixes are not garanteed to be made, so it's rather moot. The real thing to look at is whether future-proofing is enough of a reason to make the edits. Headbomb {t · c · p · b} 02:28, 23 March 2019 (UTC)
@Headbomb: I believe that it is, since it also enables users to set sortkeys that actually work (in addition to fixing the error itself, and any genfixes) --DannyS712 (talk) 03:19, 26 March 2019 (UTC)
{{BAGAssistanceNeeded}} the trial has been over for almost a week --DannyS712 (talk) 06:26, 28 March 2019 (UTC)

──────────────────────I'm actually not thrilled with the idea of removing same-category-different-sortkeys, since in my experience people (incorrectly) try to add new ones when they want to correct the sortkey. I'm actually a little surprised no one commented on this. Are you just assuming that the second one is wrong? Primefac (talk) 14:03, 7 April 2019 (UTC)

@Primefac: yes. However, I could try to change it so that it removes the first category rather than the second, if that would be better. --DannyS712 (talk) 17:22, 7 April 2019 (UTC)
Falls afound of CONTEXT. I'd rather just see them left as-is for someone to adjust manually. Primefac (talk) 17:22, 7 April 2019 (UTC)
@Primefac: then I can go back to the original regex that skipped sort key collisions - I only changed it because of Xaosflux's suggestion above. --DannyS712 (talk) 17:25, 7 April 2019 (UTC)
That's probably for the best. Collision could be logged somewhere else though, and this task focus on preventing collisions, rather than fixing them. Headbomb {t · c · p · b} 16:51, 10 April 2019 (UTC)
@Headbomb: Once I make the run, all remaining WCW errors would be due to collisions, so I don't really see the need to log --DannyS712 (talk) 17:06, 10 April 2019 (UTC)
  A user has requested the attention of a member of the Bot Approvals Group. Once assistance has been rendered, please deactivate this tag by replacing it with {{tl|BAG assistance needed}}. I can do this with skipping pages with different sortkeys, or with removing the second instance of a category regardless of different sortkeys, or remove the first, but I'd like to do it. Can one of the options please be approved? Thanks, --DannyS712 (talk) 21:15, 14 April 2019 (UTC)


Operator: Lkolbly (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 18:57, Monday, December 24, 2018 (UTC)

Function overview: This bot automatically updates Alexa rankings in website infoboxes by querying the Alexa Web Information Service.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: (presently, the actual saving is commented out, for testing)

Links to relevant discussions (where appropriate): Previous bot that performed this task: OKBot_5

Edit period(s): Monthly or so

Estimated number of pages affected: 4,560 articles are in the current candidate list. A subset of these pages will be updated each month. Other pages could be pulled into the fray over time if someone adds alexa information to a page. Also, there will be a whitelist copied from User:OsamaK/AlexaBot.js of pages that will be edits (presently containing 1,412 pages).

Namespace(s): Articles

Exclusion compliant (Yes/No): Yes (via whatever functionality is already in pywikipedia)

Function details: This bot will scan all pages (using a database dump as a first pass) to find pages which have the "Infobox website" template with both "url" and "alexa" fields.

It will parse the domain from the url field using a few heuristics, and query the domain with AWIS. Domains that have subdomains return incorrect results from AWIS (e.g. returns the result for just, so these domains are discarded (and the page not touched). It will then perform an AWIS query to determine the current website rank and trend over 3 months.

Websites will be classified into {{Increase}}, {{Decrease}}, and {{steady}} ( ,  , and  , respectively). A site increasing in popularity will gain it the   tag, even though it is numerically decreasing (previously, many sites were also classified into IncreaseNegative and DecreasePositive that I didn't understand)

Then, in the text of the article, whatever the current alexa data is will be replaced by something like:

{{Increase}} 169,386 ({{as of|2018|12|24}})<ref name="alexa">{{cite web|url= | publisher= [[Alexa Internet]] | Traffic, Demographics and Competitors - Alexa |accessdate= 2018-12-24 }}</ref> <!-- Updated monthly by LkolblyBot -->

(e.g.   169,386 (As of 24 December 2018)[1] )

There are two as-yet untested test cases that I'll test (and fix if necessary) before any full-scale deployment:

  • Apparently some infoboxes have multiple |alexa= parameters? I have to go find one and see what the bot does with it. (probably the right thing to do is to not touch the page at all in that situation)
  • Some pages have an empty |alexa= parameter, which should be fine, but worth testing anyway.


Please make the bot's talk page.

"whatever the current alexa data is will be replaced" - how do you know there isn't more than just the previous value? Or that there isn't a reference that is used elsewhere?

I imagine many pages that copy-paste the template code will have an empty |alexa= parameter. This would not be any different to not having it at all.

Do you preserve template's formatting?

The particular citation style the bot uses may not match the article's, especially the date format. (I wonder why we don't have an Alexa citation template still.) —  HELLKNOWZ   ▎TALK 21:26, 24 December 2018 (UTC)

The format of the template code overall is preserved, the value is replaced by replacing the regex r"\|\s*alexa\s*=\s*{}".format(re.escape(current_alexa)), so the rest of the template is unaffected. (the number of spaces before the equal sign goes from "any number" to "exactly one", though)
Yeah, I was debating having it skip empty alexa parameters. There's value in adding it (as much as updating it), though for very small sites the increase/decrease indicator may not be particularly useful.
I didn't think to check whether there's more than the previous value, though I can't think of what else would be there. There's at least two common formats for this data, basically the OKBot format, and a similar format with parenthesized month/year instead of the asof (see - note lack of a reference). I guess it would be safest to check that the value is in a whitelisted set of alexa formats to replace, I'll bet a small number of regexes could cover 90% of cases (and the remaining 10% could be changed to a conforming case by hand :D)
The reference is interesting, because it's basically a lie. It's a link to the alexa page, but that isn't where the data was actually retrieved from, it was retrieved from their backend API. As for if someone's already using that reference, it shouldn't be too hard to check for that, I would think. I imagine (with only anecdotal evidence) that most of those cases will be phrases like "as of 2010, had an alexa rank of 5". Updating that reference to the present value may not make sense in the context of the article (myspace isn't as big as it used to be, an article talking about how big it was in 2008 won't care how big it is now). But either way they should probably be citing a source that doesn't automatically change as soon as you read it.
The ethnologue page already looks like it has diverging date formats? I don't know how common that is, I'll have to go dig up the style guide for citations (maybe we should have a bot to make that more uniform). What would it take to make a template? (also, would that solve the uniformity issue? I guess at least it'd be uniform across all alexa rankings)
Lkolbly (talk) 14:52, 25 December 2018 (UTC)
WP:CITEVAR and WP:DATEVAR is the relevant stuff on date and citation differences. On English wiki, changing or deviating from citation or date style without a good reason is very controversial. The short answer is "don't". Bots are generally expected to do the same, although minor deviations are somewhat tolerated. But bots are expected to follow templates, like {{use dmy dates}} or |df= parameters. —  HELLKNOWZ   ▎TALK 16:36, 25 December 2018 (UTC)
Okay, it looks like it should be pretty straightforward to just check for the two Template:Use ... dates tags and set the |df= parameter. Lkolbly (talk) 14:39, 26 December 2018 (UTC)
Updated the bot so that it follows mdy/dmy dates, updating the accessed date and asof accordingly. Also constrained the pages that will be updated to a handful of matching regexes and also pulled a list from User:OsamaK/AlexaBot.js, which eventually I'll make a copy of. Lkolbly (talk) 18:20, 1 January 2019 (UTC)
  •   Approved for trial (50 edits). Primefac (talk) 00:43, 20 January 2019 (UTC)
  Trial complete. Ran bot to edit 50 randomly selected pages. So far I've noticed two bugs that cropped up, one involving leading zeros in the access dates and another where the comment "Updated by LKolblyBot" got repeated. I'm going to go through and fix the issues by hand for now and apply fixes to the bot. Lkolbly (talk) 20:20, 27 January 2019 (UTC)
Also, looking closer, some pages got a "Retrieved" date format that doesn't match the rest of the page (e.g. Iraqi News), but I'm pretty sure it's because those pages aren't annotated with dmy or mdy. Lkolbly (talk) 20:47, 27 January 2019 (UTC)
I have questions.
  • First, Special:Diff/880480890 - is there a reason it chooses http over https?
  • Second, why do some diffs use ISO formatting for the date while others actually change it to dmy?
  • Third, are OKBot and Acagastya still updating these pages, and would it make sense to remove those names from the comments?
My fourth/fifth questions were going to be what you were going to do about duplicate names, but it looks like you noticed that and are taking care of it, along with a lack of leading zeros issue with the 2019-01-27.
Also, as a minor point, even if you've only done 44 edits with the bot, please make sure when you finish with a trial that you link to the specific edits, since while "Contribs" might only show those 44 edits now, after you've made thousands they won't be the first thing to look at.
Actually, I do have another thought - for brevity, it might be best to have a wikilink in the edit summary instead of a full URL. Primefac (talk) 20:12, 28 January 2019 (UTC)
I have answers.
  • There's no particular reason it uses http over https for the link, I hadn't given it a second thought. I can change it to https.
  • The variations in date formatting are an attempt to stick with the articles predominant style. The default style being ISO format, and if there's a use dmy or use mdy tag it uses the respective format.
  • OKBot appears defuct, I wasn't aware of Acagastya, though from their user page it looks like they've left English Wikipedia at least. It does make sense to remove the (now duplicate) comments, that was ultimately the goal but it didn't work as planned.
  • Good point on the making a list of the trial edits, conveniently it looks like I can search the contribs to make a view of just the trial edits.
  • Yeah, the wikilink idea occurred to me a few minutes too late, it looks terrible in the commit message :/ Lkolbly (talk) 23:32, 28 January 2019 (UTC)
With the constant modification that Alexa goes through, it is not a good idea to put manual labour for updating the ranks.
acagastya 08:53, 29 January 2019 (UTC)
  • Regarding the 'Updated monthly by ...' lines - as is being demonstrated here there are stale entries - and it can be expected as no bot should ever be expected to operate in the future. To that end I don't think this should be added, and would support having the next update remove any existing such comment codes. — xaosflux Talk 15:21, 7 February 2019 (UTC)
      Approved for extended trial (50 edits). Please implement the above changes in this run. Primefac (talk) 21:30, 14 February 2019 (UTC)
was the trial completed? What are the results (please link to diffs as well). — xaosflux Talk 18:51, 12 March 2019 (UTC)
Sorry I've been dead in the water this last month, time hasn't been on my side (I figured I'd re-architect my server before I ran the trial, and have everything nice and containerized, but that didn't work out and then one thing led to another). I haven't done the trial yet, I plan to run it this coming weekend though. Lkolbly (talk) 19:10, 12 March 2019 (UTC)
  Trial complete. Okay, ran the bot on these 50 pages. Some notes:
  • r.e. the "Updated by" comments: So it turns out the framework I'm using (pywikibot) strips out the comments, which is why they were being duplicated. This run did not add "updated by" comments. Removing existing comments could be done but would have to be a separate script.
  • I think I'll change the change comment to "Bot: Update Alexa ranking (link to a list of sites that the bot maintains)"
  • Some sites (e.g. Gothamist) list a URL in the infobox that is not ostensibly the site's actual (or main) URL, which gives an inaccurate alexa ranking. I think this is beyond my control though.
  • The original formatting of the infobox is unfortunately lost in pywikibot. The spacing varies - some (Adventure Gamers) use no spaces after the vertical bar, most one space, some align the equals signs, some don't (or do so inconsistently). Regardless, the information is gone at rewrite time.
  • A large number of sites had an "April 2014" style alt text specified for the "as of" tag. This script eliminates those.
  • One page (Shutterfly) had the "alexa" ref specified in a separate infobox references section at the bottom of the infobox, which led to a duplicate reference name error.
Otherwise, everything seemed to run fairly smoothly. The last point I might be able to handle by searching for name="alexa" or something in the page text. I think it's a fairly rare occurrence though.
Lkolbly (talk) 02:13, 20 March 2019 (UTC)
@Lkolbly: regarding the increase vs increasenegative difference, my reading is that this is a Numerical/Desirable field (see Template:Infobox website and related talk pages). Moving a Ranking from 5th to 1st is a "decrease" in value, but an increase in desirability. Arguably "1st place" is an increase from "2nd place" though so normal increase could be fine here - this should be sorted out at Template talk:Infobox website, and that template documentation should be updated to match before this begins. You don't want your bot to be warring with human editors over the direction of a triangle. — xaosflux Talk 13:59, 28 March 2019 (UTC)
  A user has requested the attention of the operator. Once the operator has seen this message and replied, please deactivate this tag. (user notified) please let us know any results from: Template_talk:Infobox_website#Bot_Job_and_arrows. — xaosflux Talk 14:05, 2 April 2019 (UTC)

Approved requests

Bots that have been approved for operations after a successful BRFA will be listed here for informational purposes. No other approval action is required for these bots. Recently approved requests can be found here (edit), while old requests can be found in the archives.

Denied requests

Bots that have been denied for operations will be listed here for informational purposes for at least 7 days before being archived. No other action is required for these bots. Older requests can be found in the Archive.

Expired/withdrawn requests

These requests have either expired, as information required by the operator was not provided, or been withdrawn. These tasks are not authorized to run, but such lack of authorization does not necessarily follow from a finding as to merit. A bot that, having been approved for testing, was not tested by an editor, or one for which the results of testing were not posted, for example, would appear here. Bot requests should not be placed here if there is an active discussion ongoing above. Operators whose requests have expired may reactivate their requests at any time. The following list shows recent requests (if any) that have expired, listed here for informational purposes for at least 7 days before being archived. Older requests can be found in the respective archives: Expired, Withdrawn.