Wikipedia:Bots/Requests for approval

< Wikipedia:Bots  (Redirected from Wikipedia:BRFA)

BAG member instructions

If you want to run a bot on the English Wikipedia, you must first get it approved. To do so, follow the instructions below to add a request. If you are not familiar with programming it may be a good idea to ask someone else to run a bot for you, rather than running your own.

New to BRFA? Read these primers!
 Instructions for bot operators

Current requests for approval

Pi bot 5

Operator: Mike Peel (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 21:12, 3 August 2020 (UTC)

Function overview: Synchronise short descriptions with Wikidata

Automatic, Supervised, or Manual: Automatic

Programming language(s): python (pywikibot)

Source code available: on bitbucket

Links to relevant discussions (where appropriate): Wikipedia_talk:Short_description#Copying_short_descriptions_to_Wikidata

Edit period(s): Daily

Estimated number of pages affected: TBC, there are 2.1 million articles with local shortdescriptions but perhaps quite a lot of those will be the same.

Namespace(s): Most namespaces

Exclusion compliant (Yes/No): No

Function details: The bot looks for articles where we have no short description, or where the short description does not match the Wikidata description beyond capitalisation differences. It then has two options:

  1. If there is no short description here, then import the English description from Wikidata.
  2. If the short description here does not match Wikidata, then replace the short description here with that from Wikidata.

This is the opposite of d:Wikidata:Requests for permissions/Bot/Pi bot 14, at least for option 2, although option 1 is complementary. The code is preliminary, I'll improve it to match the description soon. Related discussions are at d:Wikidata:Project_chat#Importing_short_descriptions_from_enwp and Wikipedia_talk:Short_description#Copying_short_descriptions_to_Wikidata. Thanks. Mike Peel (talk) 21:12, 3 August 2020 (UTC)

  • I read the discussion over at Wikidata which had a very WP:OWN attitude (I personally would be pretty pissed if my descriptions would be overwritten by a bot based on content taken from another project). For point #2, since any short description added here was manually added by someone and then (hopefully) vetted by other editors, it's probably safe to assume it's good enough. I don't see any value replacing it. This proposal also will probably cause inconsistencies with infoboxes which handle the short description. I'm also not sure how much #1 is worth it when it's done by a bot with no manual vetting. --Gonnym (talk) 22:12, 3 August 2020 (UTC)
  • This bot request appears to defeat the very reason for the existence of {{short description}}. It should not proceed without a much wider discussion. * Pppery * it has begun... 22:40, 3 August 2020 (UTC)
  •   Needs wider discussion. The two tasks are somewhat contradictory. #1 would certainly speed up the process of adding short descriptions, but overriding a local consensus with #2 as Gonnym describes is problematic and wastes edits. I wasn't really involved in the short descriptions discussion when it first came around, but I seem to recall it was fairly contentious. Setting a notification isn't the same as having a consensus for this sort of task. I'm not going to decline outright but until I see a local consensus this isn't going anywhere. Primefac (talk) 23:20, 3 August 2020 (UTC)
  • If memory serves, but the whole reason why we put up the whole shortdescriptions items was because of various problems with using Wikidata for this (e.g undetected vandalism, edits there not showing up in our history). So while I could see support emerging for #1 with more discussion, #2 would require some wider debate as noted by Primefac. Jo-Jo Eumerus (talk) 07:54, 4 August 2020 (UTC)
    I'm happy to start a wider discussion beyond the ones I linked to above, any suggestion of where? Thanks. Mike Peel (talk) 11:14, 4 August 2020 (UTC)
    I suggest WP:VPR. --BrownHairedGirl (talk) • (contribs) 11:19, 4 August 2020 (UTC)
  • @Mike Peel: Two questions:
  1. As far as I can see, you have not supplied any evidence of a consensus on en.wp for this task, let alone a consensus result of a well-advertised RFC. Is that correct?
  2. If your goal is that en.wp should always use the short description from Wikidata, why copy the data rather than making the template import that description? --BrownHairedGirl (talk) • (contribs) 11:19, 4 August 2020 (UTC)
  • The bot proposal on Wikidata, and the project chat thread I started there, was aimed at starting the discussion there. This bot proposal came out of that, with a similar thinking that the proposal would start the discussion at Wikipedia_talk:Short_description#Copying_short_descriptions_to_Wikidata. I'll post something at WP:VPR soon, since the short description page wasn't sufficient. On (2), I'd have no objection to that, but I don't know if that would be acceptable to others. Thanks. Mike Peel (talk) 12:45, 4 August 2020 (UTC)
    @ Mike Peel, apart from the lack of notification, the discussion labelling isn't very clear.
    WT:Short_description#Copying_short_descriptions_to_Wikidata is headlined about copying to Wikidata. However, this bot proposal is about copying from Wikidata.
    Those two propositions are very different. Many editors will shrug and say that Wikidata can import from en.wp whatever it likes ... but that importing from Wikidata to en.wp, and overwriting local content on en.wp is a much bigger issue. That sort of thing has been highly controversial in the past.
    I hope that at the very least an RFC will make it explicitly clear that this bot proposal is about overwriting en.wp short_descriptions with whatever is on Wikidata. --BrownHairedGirl (talk) • (contribs) 12:57, 4 August 2020 (UTC)

Unless I am misunderstanding the proposal, item 2 is probably not a good idea, and probably fails WP:CONTEXTBOT. Many short descriptions on Wikidata are far too long to comply with en.WP's guidance. See this import and this import, for example, where I had to shorten the Wikidata descriptions. – Jonesey95 (talk) 20:41, 5 August 2020 (UTC)

Usernamekiran BOT 3

Operator: Usernamekiran (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 23:33, Sunday, July 26, 2020 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AWB

Source code available: AWB, regex

Function overview: Insert banner on the talkpages of pages that come under the scope of wikiproject organised crime.

Links to relevant discussions (where appropriate): Wikipedia:Bots/Requests for approval/Usernamekiran BOT 2

Edit period(s): 3 to 4 times per week (each run around 10 minutes)

Estimated number of pages affected: more than 5000

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): No

Function details: The issues occurred during Wikipedia:Bots/Requests for approval/Usernamekiran BOT 2 have been rectified now. While making lists, I have been making them very carefully; so there would be no objections or addition to incorrect articles/subjects. The list making process can be seen in the edit history of Wikipedia:WikiProject Organized crime/Bot tagging categories, and Wikipedia:WikiProject Organized crime/Bot tagging categories/documentation.

Previously occurred error: special:diff/843217671, similar edit without error: special:diff/969699784.

The bot will not do any other changes other than adding this banner, except basic things like adding banner shell if banners exceed 3, old ProD.

I am aware there are other bots already approved with task (inlcuding Anomie's). But as I would be creating lists to be tagged sporadically, I do not want to keep nagging/bothering other bot-ops. And as of now I estimated there are somewhere around 5,000 to 8,000 pages that would need to be tagged. —usernamekiran (talk) 23:33, 26 July 2020 (UTC)

Discussion

  Comment. @Usernamekiran: please can you describe:

  1. the problems which occurred before, and how you have fixed them.
  2. How you are making the lists. --BrownHairedGirl (talk) • (contribs) 14:27, 27 July 2020 (UTC)
  1. @BrownHairedGirl: In the original error edit special:diff/843217671 the {{WikiProject Banner Shell| parameter didnt have |1= in it, the banner was placed outside the banner shell. The condition was same for special:diff/969699784. How did I fix it? I fixed it, but honestly speaking, I dont remember now how. I had been trying to fix it for few weeks after last BRFA. And even before that, I have been constantly improving it. Here is a discussion with Primefac on their talkpage before BRFA: special:permalink/824793019#AWB_module. Right after BRFA at WT:AWB: special:permalink/854668134#adding talkpage banner. There were many changes in the code, and eventually it was fixed. that was around October-November 2018 (I think). Even though I did not use AWB from my alt ac after August 2018, I am pretty sure I had found the fix before January 2019.
  2. I am currently working on making lists only from Wikipedia:WikiProject Organized crime/Bot tagging categories/documentation. As I am very familiar with the wikiproject, I can easily identify what topics/articles fall under the scope, and which categories can be misleading. For example, Category:American drug traffickers might seem a no problem with being tagged; but a lot of the celebrities have been tagged under this category simply for possession of some recreational drug (including marijuana). This includes Jay-Z. If someone adds banner of "organised crime" on a celebrity's talkpage, the fanboys will go mad. A long time ago, I added the banner to a well reputed, and respected journalist's talkpage. An editor got sort of offended I think, but they politely asked on my talkpage that why I am associating the good man with organised crime. I responded on the talkpage of the article (with DNAU): because of his regular, and quality coverage of organised crime, the wikiproject is interested in him. This is also one of the reasons I dont want someone not familiar with the project working on this. To put it in very short, and simple words: I add the categories to AWB's make list and choose "category", that is non-recursive/base category. No scope for confusion, but because of familiarity and practice, I have got very fast at it.
If there are further doubts, please ask :) —usernamekiran (talk) 15:51, 27 July 2020 (UTC)
Many thanks, usernamekiran, for that detailed and thoughtful reply.
You evidently know the topic area very well, and understand the many pitfalls. You are clearly v conscientious and know what you are doing, so I hope that the bot gets the go-ahead. --BrownHairedGirl (talk) • (contribs) 16:16, 27 July 2020 (UTC)
thank you. Also, this is one more advantage over uninvolved bot-op. Whenever I do some maintenance changes to any article, I tend to read the article. It increases general knowledge through reading, and in this case, just in case the bot errs at some point, I will be able to see it. Although, most likely it will not happen; as I have been adding the banner through AWB, and no errors have come forward yet even though I came across multiple differences in source code of talk pages. —usernamekiran (talk) 18:22, 27 July 2020 (UTC)

  update: Apparently, the parameter |1= being missing wasnt the problem. I cant find out what is causing the problem. I have asked Anomie for help. Kindly do not close this BRFA, I am working to find a solution. If you can, please help; it will be appreciated a lot. —usernamekiran (talk) 20:04, 28 July 2020 (UTC)

  update 2: the problem has been fixed by updating the module.
  1. I recreated the scenarios where the problem was encountered, and the problem did not resurface: page 1 with glitch, page 1 without glitch.
  2. first glitch which was encountered two years ago during trial run of the bot: special:diff/843217671. Edit with glitch on July 29: special:diff/970021245. Successful edit on August 3: special:diff/970980182. Successful edit with exact circumstances that of two years ago: special:diff/970981547.
  3. page 3: added banner with one other banner previously present: special:diff/970974657. with two banners already present + banner shell: special:diff/970974779. successful edit with 1= missing: special:diff/970980576.
In short: all the scenarios previously occurred, and that I could think of; have been successfully experimented without any errors. The bot it ready to have a trial run. —usernamekiran (talk) 14:37, 3 August 2020 (UTC)

Bots in a trial period

MDanielsBot 5

Operator: Mdaniels5757 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 16:01, Wednesday, July 22, 2020 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: Same as HasteurBot, may fork, but will be open source if I do so.

Function overview: Background: This is one of User:HasteurBot's AfC tasks. HasteurBot was deactivated due to the death of its owner. I now have access to the Toolforge account that it ran on. This will be a direct replacement. Another BRFA, for a takeover of his User:DRN clerk bot, has been filed at Wikipedia:Bots/Requests for approval/MDanielsBot 6, others have been/will be at MDanielsBot 7 and MDanielsBot 8.

The actual task: Nominate for Speedy Deletion articles that are valid for CSD:G13 (Stale/Abandoned) Articles for Creation submissions that have not been modified in 6 months. Notify creator of AfC submission that their submission is being nominated. Log nominations in a userspace page for auditing after the fact. (This last bit may not have been happening, and I don't think it's needed, so the logging may not happen)

Links to relevant discussions (where appropriate): BRFA 1

Edit period(s): A couple times a day

Estimated number of pages affected: As many as there are drafts

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: See above.

Discussion

There are three different tasks being combined here. I understand you're literally taking over the code, but I think filing three separate BRFAs is the best idea (the scope of the old tasks are related but still wildly different in implementation). This will allow better bug-catching and changes should there be concerns over implementation of any given task. For example, the G13-nominating task was rescinded for a while during a dispute, and having to say "well this part of Task 5 is acceptable but that part isn't" is problematic. Primefac (talk) 22:15, 2 August 2020 (UTC)

I concur with Primefac, Mdaniels5757. Please re-do this request and make two others for the other functions. --TheSandDoctor Talk 00:20, 3 August 2020 (UTC)
OK. I've done HasteurBot's first BRFA as this one, and will make the others shortly. --Mdaniels5757 (talk) 00:44, 3 August 2020 (UTC)

  Approved for trial (30 edits). I know this is a direct copy, but it never hurts to ensure everything's good. Primefac (talk) 23:30, 3 August 2020 (UTC)

@Primefac: OK. Due to the way the program's database is populated, this is actually dependent on MDanielsBot 6. From that trial, the bot managed these two edits. Absent a refactor of the task, further nominations would require approval of MDanielsBot 6. Best, --Mdaniels5757 (talk) 23:40, 5 August 2020 (UTC)

MDanielsBot 8

Operator: Mdaniels5757 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 01:14, Monday, August 3, 2020 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: [1]

Function overview: Background: This is one of User:HasteurBot's AfC tasks. HasteurBot was deactivated due to the death of its owner. I now have access to the Toolforge account that it ran on. This will be a direct replacement for HasteurBot 3.

The actual task: Once a day, traverse Category:AfC submissions with missing AfC template and evaluate the membership looking for any AfC template that applies the AfC submissions by date category. If at least 1 template exists, remove the defect tracking category. If the defect tracking category exists on the page but is disabled, remove it.

Links to relevant discussions (where appropriate): Wikipedia:Bots/Requests for approval/HasteurBot 3

Edit period(s): Daily

Estimated number of pages affected: Variable

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: See above and Wikipedia:Bots/Requests for approval/HasteurBot 3.

Discussion

  Approved for trial (30 edits). I know this is a direct copy, but it never hurts to ensure everything's good. Primefac (talk) 23:27, 3 August 2020 (UTC)

Yapperbot 3

Operator: Naypta (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 14:13, Saturday, June 20, 2020 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Golang

Source code available: https://github.com/mashedkeyboard/yapperbot-scantag

Function overview: Scans every article on Wikipedia, and checks for configured patterns. When it finds a pattern, it tags the article with an appropriate maintenance tag.

Links to relevant discussions (where appropriate): Wikipedia:Bot requests#Populate tracking category for CS1|2 cite templates missing "}}" (now archived here)

Edit period(s): Continuous

Estimated number of pages affected: Hard to say; as it's configurable, potentially the entire corpus of Wikipedia articles

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: GreenC was looking for a bot that would populate a tracking category for templates that are unclosed and contained within a <ref> tag. I started out making this, but then realised that, rather than having people make a mishmash of bots that did regex scanning of all articles with different regexes, it would be a better idea to have one bot parsing the pages and doing the regex scanning for all the regexes people wanted to match over pages. So, that's what I've made.

Scantag, the name I've given to this task, is a dynamic system for creating rules for pattern matching articles that need maintenance using regular expressions, and then maintenance tagging them according to the associated matches. At present, it is only configured for the specific request that GreenC has made; however, its configuration (as is becoming a recurring theme with my bots!) is entirely on-wiki, so it can be reconfigured on-the-fly. You can see its configuration file here. This is in a user JSON file, so it is only editable by administrators and myself through the bot account; I think this should be a sufficient threshold to prevent abuse, but was considering getting the content model changed to JS to make it interface protected, instead, due to the potential for danger inherent in the task. Thoughts on this would be appreciated.

Whilst the edit filter regex matches changes, it is designed only to be used for preventing serious issues that actively harm the wiki, and there's a limit to the number of rules that it can have - after all, a user is waiting. Scantag, on the other hand, is a deliberately slow process - it runs with a short maxlag, a high number of retries for maxlag, and after every edit it waits a full ten seconds before continuing. This brings with it the advantage that, while it may be a slow process, it can be used for a lot more than the edit filter would ever be. Because it's looking over every single article, it can also be useful for finding and tagging articles that would be impossible to run through a standard regex Elasticsearch, because it would simply time out. Case in point, the maintenance tagging that we're talking about here - but potentially, the same system could be useful for a number of other applications that involve matching patterns in articles.

The task works as follows:

  1. The bot examines the User:Yapperbot/Scantag.json file, reads the rules, and compiles the regexes.
  2. The bot then iterates through the latest database dump's list of page titles in NS0.
  3. For every title in NS0, the bot retrieves the wikitext.
  4. The bot matches each regex (currently only the one) specified in Scantag.json against the wikitext of the article.

If there is no match, the bot skips to the next article. If the bot matches the regex, however, it performs the following steps:

  1. Check the "noTagIf" regex specified corresponding to the main regex. This is a rule designed to check for where the article has already been tagged with the correct maintenance tag.
  2. Prefix the article with the corresponding "prefix" property in the JSON file, if there is one.
  3. Suffix the article with the corresponding "suffix" property in the JSON file, if there is one.
  4. Edit the page, with an edit summary linking to the task page, and listing the "detected" parameter as the reason.
  5. Wait ten seconds before moving on. This is a safety mechanism to prevent a situation where a badly-written regex causes the bot to go completely haywire, editing every single article it comes across.

In common with other Yapperbot tasks, the bot respects the kill page at User:Yapperbot/kill/Scantag, so in the event of an emergency, it could be turned off that way.

Because writing the regexes involved requires not only a good knowledge of regex, but for those regexes to be JSON escaped as well to stay in the JSON string correctly, and because of the potential for issues to come up as a result, there is also a sandbox for the rules. Myself or any other administrator configuring a Scantag rule would be able to set one up to test in here. Rules in the sandbox generate a report at User:Yapperbot/Scantag.sandbox, explaining exactly what the bot has understood from the JSON it's been given, and rendering an error if there are any obvious problems (e.g. failure to compile one of the regexes, noTagIf being set to anything other than a regex or false, etc). Each rule also can have a "testpage" parameter, specifying a test page with the prefix "User:Yapperbot/Scantag.sandbox/tests/", which is designed as a place to set up tests to make sure that the corresponding regex is matching when it's supposed to, and not matching when it's not. An example of one of these is here.

I appreciate that this is a fair bit more complicated than the previous bot tasks, so I'm absolutely about to answer any questions! There are specific instructions for admins on how to deal with Scantag rule requests on the task page. I think there is also an open question here as to whether each rule would require a separate BRFA. Fundamentally, what's going on here isn't all too different from a "retroactive edit filter", of sorts, so I should think either the default restriction for JSON files to only admins editing, or changing the content model to JS so only interface admins can edit, should be sufficient to protect from misuse; however, I'd definitely like to hear BAG members' thoughts on this.

Discussion

  • This bot is currently proposing to check for the "CS1|2 non-closed }} issue". How do you propose that "new changes" be proposed/vetted/implemented? Primefac (talk) 18:31, 30 June 2020 (UTC)
    @Primefac: I'd envisage the process to be similar to that which is used for edit filters, and indeed have modelled the system around many of the same assumptions, but I'm absolutely open to any better suggestions! To give an overview of what I'm thinking of, though:
    Proposing new changes would happen through User talk:Yapperbot/Scantag, where I've set up a requests system very similar to that of WP:EFR. In much the same way, I'd expect that anything that is posted there has a clearly-demonstrated need, and in cases where it is envisaged to affect a large number of pages, a clear consensus so to do. Any editor would be welcome to propose and discuss rules there, just like EFR, and as discussed below, myself or any sysop would then be able to implement them.
    Vetting changes would take place in two stages: community review of the rule requests from any interested users (much like a BRFA or an EFR) if applicable, as well as (hopefully!) other experienced technical editors and myself, and then implementation review - i.e. checking that the regexes that are written are sane and will run correctly. I'll talk a bit more about this below, as it leads into:
    Implementing changes, which would be done by myself through the Yapperbot account or by any other administrator who edits the JSON file containing the rules. Because this process is non-urgent by its very nature, I would expect that even a sysop making a request would go through the same processes as any other request - there's no reason for them to directly skip to editing the JSON file. As I've mentioned in the instructions up at User:Yapperbot/Scantag, it would be expected to be the case that all changes would be tested in the sandbox first before actually being implemented; I'm also considering adding a separate "live" parameter to the actual JSON file, which would notate whether or not a rule should be live, or on a dry run. This would allow for more complex regexes to be tested on the entire Wikipedia text, and having the bot save to a page a list of pages the regex would match, prior to it actually modifying those changes.
    Hopefully that clears things up a bit, let me know if there's anything that's not clear though! All of this is just "how it's struck me as being best", not "how it is definitively best", so any thoughts are definitely appreciated. As I mentioned in the original BRFA text, I'm particularly interested in thoughts on whether this is actually better to be restricted to interface administrators only rather than all administrators (or perhaps the sandbox should be admins, and the real rules intadmins? or perhaps even the sandbox and "dry run" rules being admins only, and the real rules intadmins?)
    PS. I appreciate that this is a chunky and annoying wall of text; sorry this BRFA is a bit more complex than the others! Naypta ☺ | ✉ talk page | 18:52, 30 June 2020 (UTC)
  • This bot appears to be fetching page texts from the API individually for every page. If its going to do that for 6 million pages, that's horribly inefficient. Please batch the queries - it's possible for bots to query the texts of upto 500 pages in one single request, which is more efficient for the server. See mw:API:Etiquette. I see you're already handling edit conflicts, which is great (as they would occur more often because of the larger duration between fetching and editing).
Regarding the editing restrictions, I don't there's a need to restrict it to intadmins. Just put a banner as an editnotice asking admins not to edit unless they know what they're doing. (non-BAG comment) SD0001 (talk) 14:05, 2 July 2020 (UTC)
@SD0001: I had a chat with some of the team in either #wikimedia-cloud or #wikimedia-operations on IRC (one or the other, I don't recall which, I'm afraid) who had indicated that there wouldn't be an issue with doing it this way, so long as maxlag was set appropriately (which is deliberately low here, at 3s). I didn't initially want to do too many page requests in a batch, for fear of ending up with a ton of edit conflicts towards the end of the batch; even with the ability to handle edit conflicts, it's expensive both in terms of client performance and also in terms of server requests to do so. That being said, batching some of the requests could be an idea - if either you or anyone else has a feel for roughly what that batch limit ought to be, I'd appreciate any suggestions, as this is the first time I'm building a bot that parses the entire corpus. Naypta ☺ | ✉ talk page | 14:38, 2 July 2020 (UTC)
@Naypta: Now I actually read the task description. Since the bot is only editing the very top or bottom of the page, it is unlikely to run into many conflicts. Edit conflicts are only raised if the edits touched content in nearby areas; the rest are auto-merged using diff3. I'd be very surprised if you get more than 5-10 editconflicts in a batch of 500. So if you want to reduce the number of server requests (from about 1000 to about 510 per 500 pages), batching is what I'd use. If you choose to do this, you'd want to give the jsub command enough memory to avoid an OOM. SD0001 (talk) 16:09, 2 July 2020 (UTC)
@SD0001: Sure, thanks for the recommendation - I'll plonk them all into batches then. You're right that it's only editing the very top and bottom, but it does need to do a full edit (because of maintenance template ordering) rather than just a prependpage and appendpage, which is unfortunate, but so the edit conflict issues might still come about from that. No harm in giving it a go batched and seeing how it goes though!   I'll make sure it gets plenty of memory assigned on grid engine to handle all those pages - a few gigabytes should do it in all cases. Naypta ☺ | ✉ talk page | 16:13, 2 July 2020 (UTC)
  Done - batching implemented. I've also changed the underlying library I use to interact with MediaWiki to make it work with multipart encoding, so it can handle large pages and large queries, like these batches, an awful lot better. Naypta ☺ | ✉ talk page | 22:20, 3 July 2020 (UTC)

  Approved for trial (50 edits). Primefac (talk) 22:22, 2 August 2020 (UTC)

DannyS712 bot III 72

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 10:26, Sunday, July 26, 2020 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Javascript

Source code available: Not written yet

Function overview: Automatically unpatrol pages created by global rollbackers

Links to relevant discussions (where appropriate): Wikipedia talk:New pages patrol/Reviewers#Autopatrol and global rollback

Edit period(s): Continuous

Estimated number of pages affected: Likely no more than a handful per day

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: See Wikipedia talk:New pages patrol/Reviewers#Autopatrol and global rollback. Users with global rollback that create pages here on enwiki are autopatrolled. For those users that lack local autopatrol, automatically un-patrol the pages that they create that are in the page curation system, so that they can be reviewed normally.

Pages to unpatrol will be fetched from the replicas using the query below. This filters for pages that last had their review status changed by global rollbackers where

  • The current status corresponds to autopatrolled pages (some global rollbackers, myself included, are new page reviewers, and so there are many entries where the status was changed by a global rollbacker, but in the context of patrolling others' creations, rather than creating their own)
  • The user lacks both local autopatrol and adminship (yes, bots also have autopatrol, so technically this wouldn't properly filter out global rollbackers that are also local bots and should not be unpatrolled, but there aren't any, and its unlikely there will be in the future)
  • The page is in the main namespace (I don't think the concerns raised in the linked discussion suggest that we should be concerned about the user (sub-)pages created by global rollbackers, and user pages are also a part of the page curation system
 1 /* Value for pagetriage_page.ptrp_reviewed for autopatrol */
 2 SET @autopatrol_status = 3;
 3 
 4 /* Value for global_user_groups.gug_user for global rollbackers */
 5 SET @global_group = 'global-rollbacker';
 6 
 7 SELECT
 8 	pg.page_id AS 'Target page id',
 9     gu.gu_name AS 'Creator (name)',
10     a.actor_user AS 'Creator (user id)',
11     ptrp.*,
12     pg.*
13 FROM centralauth_p.global_user_groups gug
14 JOIN centralauth_p.globaluser gu
15 	ON gu.gu_id = gug.gug_user
16 JOIN actor a
17 	ON a.actor_name = gu.gu_name
18 JOIN pagetriage_page ptrp
19 	ON ptrp.ptrp_last_reviewed_by = a.actor_user
20 JOIN page pg
21 	ON pg.page_id = ptrp.ptrp_page_id
22 WHERE gug.gug_group = @global_group
23 
24 /* Global rollbackers can be new page reviewers, only care about pages that they autopatrolled */
25 AND ptrp.ptrp_reviewed = @autopatrol_status
26 
27 /* The focus is on articles, global rollbackers can be trusted not to abuse user pages */
28 AND pg.page_namespace = 0
29 
30 /* Global rollbackers can also be locally autopatrolled. Exclude users in the relevant local groups */
31 AND NOT EXISTS (
32 	SELECT 1
33 	FROM user_groups ug
34 	WHERE ug.ug_user = a.actor_user
35 	AND ug.ug_group IN ('autoreviewer', 'sysop')
36 )

Testing the query correctly flagged some recent pages I and other global rollbackers created.

Discussion

@DannyS712: to unpatrol a page we need phab:T22399.--GZWDer (talk) 10:52, 30 July 2020 (UTC)
@GZWDer: Not with page curation - that is for rc patrolling. See https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/PageTriage/+/refs/heads/master/includes/Api/ApiPageTriageAction.php DannyS712 (talk) 10:53, 30 July 2020 (UTC)
  Approved for trial (25 edits). And yes, I realize these aren't "edits", but please post the log of un-patrols here when complete. Primefac (talk) 22:06, 2 August 2020 (UTC)
Will do. Unfortunately, the replicas are currently lagging, so it might be a bit before I can do this DannyS712 (talk) 22:08, 2 August 2020 (UTC)

Roccerbot

Operator: Philroc (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 20:32, Wednesday, April 15, 2020 (UTC)

Function overview: Remove {{ShadowsCommons}} from local files transcluding it whose corresponding Commons files no longer exist

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: Here

Links to relevant discussions (where appropriate): Wikipedia:Bot requests#Follow up task for files tagged Shadows Commons by GreenC bot job 10

Edit period(s): Daily

Estimated number of pages affected: <20 per day

Namespace(s): Files

Exclusion compliant (Yes/No): Yes (handled automatically by Pywikibot)

Function details: The bot will scan through the files in Category:Wikipedia files that shadow a file on Wikimedia Commons (which pages using ShadowsCommons are automatically added to) and determine if a page with the same title as each file exists on Commons; if one doesn't, a regex will be used to detect and remove ShadowsCommons from the file's wikitext.

Discussion

  •   Approved for trial (50 edits or 7 days). Whichever comes first. Primefac (talk) 21:47, 19 April 2020 (UTC)
    This will probably take some time. Category:Wikipedia files that shadow a file on Wikimedia Commons is a category that is updated weekly and only a few instances of a mistag are present at any time. I can carry out my weekly cleanup in a manner that leaves "testing material", though. Jo-Jo Eumerus (talk) 08:26, 20 April 2020 (UTC)
    That was kind of the point of the trial lengths. If it's common enough to merit a bot run, it'll hit 50 before the end of a week (after all the task says up to 20 per day). If it's going to take a month to hit 50 edits, it makes me wonder if a bot is necessary. Primefac (talk) 00:39, 15 May 2020 (UTC)
    @Philroc: {{OperatorAssistanceNeeded}} Please see the above. --TheSandDoctor Talk 06:56, 28 May 2020 (UTC)
    Hi Primefac – school-related activities have taken up most of my free time over the past few weeks so I apologize if I wasn't able to respond to your message until now. I could try running the bot weekly or monthly instead of daily, though I'm glad to withdraw this request if you feel that the task doesn't necessitate a bot run. Philroc (c) 19:38, 28 May 2020 (UTC)

Bots that have completed the trial period

MDanielsBot 7

Operator: Mdaniels5757 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 01:09, Monday, August 3, 2020 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: [2]

Function overview: Background: This is one of User:HasteurBot's AfC tasks. HasteurBot was deactivated due to the death of its owner. I now have access to the Toolforge account that it ran on. This will be a direct replacement for HasteurBot 2, as modified by HasteurBot 9

The actual task: Traverse Category:AfC_submissions_by_date and its dated subcategories (starting with October 2008) to

  1. Identify AfC submissions (using the page prefix Wikipedia talk:Articles for creation/) that have not been edited in more than 180 days
  2. Perform a null edit on the page so that templates and categories may be re-evaluated and populate the Category:G13 eligible AfC submissions category if appropriate
  3. Give notice to the page author that the page is eligible for deletion and could be deleted in the near future
  4. Also notify opted-in users who've touched an affected page as well. (This is from the speedily approved HasteurBot 9)

Links to relevant discussions (where appropriate): HasteurBot 2, HasteurBot 9

Edit period(s): Daily

Estimated number of pages affected: As many as there are drafts

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: See above links.

Discussion

  Approved for trial (30 edits). I know this is a direct copy, but it never hurts to ensure everything's good. Primefac (talk) 23:27, 3 August 2020 (UTC)

@Primefac:   Trial complete., edits at [3]. Only issue was that I failed to catch some mentions of HasteurBot in the output (Example), now fixed (Example). Best, --Mdaniels5757 (talk) 15:57, 4 August 2020 (UTC)

BHGbot 7

Operator: BrownHairedGirl (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 15:10, Tuesday, July 28, 2020 (UTC)

Function overview: Mass create {{Category redirect}}s to resolve the WP:ENGVAR variations in category names using the word "organisation(s)" or "organization(s)".
e.g. if we have a Category:Anti-Foobar organizations, then the page Category:Anti-Foobar organisations would be created with the content {{Category redirect|Anti-Foobar organizations|bot=BHGbot}}

Automatic, Supervised, or Manual: Automatic

Programming language(s): Bash and AutoWikiBrowser

Source code available: Yes. There are two components:

  1. Wikipedia:Bots/Requests for approval/BHGbot 7/Make-BHGbot7-edit-list.sh
  2. Wikipedia:Bots/Requests for approval/BHGbot 7/BHGbot-7-AWB-module

Links to relevant discussions (where appropriate): WT:WikiProject Categories#Organi[SZ]ations_category_redirects ([SZations_category_redirects permalink], tho discussion is ongoing). This discussion was notified to WP:VPP[4] and WP:VPR[5].
Previous related discussion: WP:Bots/Requests for approval/BHGbot 3 (a similar proposal in 2017, which ran into the sands due to lack of prior consensus. My bad)

Edit period(s): Initial run to handle the backlog. Then a followup every few months.

Estimated number of pages affected: ~12,500 in the initial run.

Namespace(s): Category

Exclusion compliant (Yes/No): Yes

Function details: This task supports MOS:COMMONALITY by resolving the s/z WP:ENGVAR variation in the spelling of "organisation"/"organization", by creating a soft {{category redirect}} to the title which is in use. This corresponds with the MOS:COMMONALITY guideline to create such redirects in article space.

The word "organisation"/"organization" is one of the most common ENGVAR variants in category titles, and the current lack of redirects is a long-standing nuisance for both readers and editors.
The bot works in three stages:
  1. A set of quarry queries to generate lists of pages
  2. A bash script to process these lists and generate a list of category redirect titles to be created
  3. An AWB run to create the category redirect pages
1. Get lists
The first part of the bot is three quarry queries:
2 process the lists
The bash script Make-BHGbot7-edit-list.sh:
  • inverts the S/Z spelling in the list of organisation categories
  • removes from that list titles which are in the list of all pages in the category namespace
  • removes from that list titles which are in the list of all pages in the main (article) namespace
  • wikilinks the resulting edit list
3 Create the redirects
Using the edit list created in step 2, AWB
  • skips any existing pages (there should be none, but some may have been created since the list was made)
  • applies the AWB custom module BHGbot-7-AWB-module to create the redirect with an explanatory edit summary as in this test edit[6]
    • If the page title to be created is "Foo organisations" (with an S), a {{category redirect}} is created to "Foo organizations" (with a Z). And vice versa.
    • Per a request by User:Hellknowz at the 2017 BRFA, the redirect template includes the parameter |bot=BHGbot
The module includes sanity checks to:
  • skip any pages whose title does not match the regex /^(.*?\b[oO]rgani)[sz](ations?\b.*)$/
  • skip any case where it is about to create a self-redirect
I have done a dry run (AWB in pre-parse mode) on a deliberately-polluted list of test pages, and it correctly skipped them all. I did another test of the full list of ~12,500 pages, where no pages were skipped, which indicates the accuracy of the list-making.
Differences from BHGbot 3
This proposal tackles the same problem as the 2017 proposal BHGbot 3, but it uses a different approach. The 2017 proposal drew its list from recursing the category tree. This proposal uses quarry to collect list of category titles. Using quarry gives a complete list, whereas category recursion is usually woefully incomplete. The quarry-generated lists allow rigorous checks against error.

Discussion

  Approved for trial (50 edits). Primefac (talk) 22:03, 2 August 2020 (UTC)

  Trial complete.. Thanks, @Primefac.

I used the linux shuf command to randomly select 50 pages from a list of 12,461 categories which I had built last week while testing the list-making:
50 randomly-selected redirects to create in trial run
  1. Category:Religious organisations established in 1928
  2. Category:Organisations established in 1718
  3. Category:Films about organisations
  4. Category:Wikipedia categories named after organizations based in Iran
  5. Category:Environmental organisations based in Europe
  6. Category:Organisations based in San Diego
  7. Category:Organisations based in Mayotte by subject
  8. Category:Transport organisations based in Lithuania
  9. Category:Organisations based in Oceania by country and subject
  10. Category:Student organisations established in 1917
  11. Category:Scientific organisations established in 1857
  12. Category:Organisations based in American Samoa by subject
  13. Category:British Cadet organizations
  14. Category:State history organisations of the United States
  15. Category:Missing people organisations
  16. Category:Arts organisations established in 1887
  17. Category:Environmental organizations based in the Bahamas
  18. Category:Horticultural organizations based in India
  19. Category:Organisations disestablished in 1950
  20. Category:Transport organizations based in Gibraltar
  21. Category:Defunct organizations based in Zambia
  22. Category:Humanitarian aid organisations of World War I
  23. Category:Defunct organizations based in the Cook Islands
  24. Category:Wikipedia categories named after organisations based in Romania
  25. Category:Religious organizations based in Chile
  26. Category:Cultural organisations based in Moldova
  27. Category:Cultural organizations based in Portugal
  28. Category:Ethnic organisations based in the Czech Republic
  29. Category:Religious organisations based in the Marshall Islands
  30. Category:Animal welfare organizations based in Peru
  31. Category:Women's organizations based in Pakistan
  32. Category:Islamic organizations based in Mali
  33. Category:Arts organisations established in 1988
  34. Category:Housing rights organisations
  35. Category:Sports organizations of South Ossetia
  36. Category:Religious organisations disestablished in 2010
  37. Category:National Taiwan University organisations
  38. Category:Sports organisations disestablished in 1954
  39. Category:Paramilitary organisations based in South America by country
  40. Category:Business and industry organisations based in Chicago
  41. Category:Music organisations based in the State of Palestine
  42. Category:Organizations based in Bhopal
  43. Category:Private and independent school organisations in the United States
  44. Category:Film organizations in Belgium
  45. Category:Organisations based in Orange County, California
  46. Category:Members of the Parliamentary Assembly of the Collective Security Treaty Organisation
  47. Category:Religious organizations based in Gibraltar
  48. Category:Business organisations based in Turkmenistan
  49. Category:Research organisations by country
  50. Category:Migration-related organisations based in the United States
Here are the 50 trial edits.
No pages were skipped, and I have reviewed each of the 50 edits. The redirects are all as intended. --BrownHairedGirl (talk) • (contribs) 10:17, 4 August 2020 (UTC)

MDanielsBot 6

Operator: Mdaniels5757 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 16:04, Wednesday, July 22, 2020 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: Same as DRNClerkBot, may fork, but will be open source if I do so.

Function overview: Perform User:DRN clerk bot's task. DRN clerk bot was deactivated due to the death of its owner. This will be a direct replacement. Another BRFA, for a takeover of his User:HasteurBot, has been filed at Wikipedia:Bots/Requests for approval/MDanielsBot 5. Since it's down now, speedy approval or trial would be nice.

Links to relevant discussions (where appropriate): Wikipedia:Bots/Requests for approval/DRN clerk bot

Edit period(s): Twice per hour

Estimated number of pages affected: 1

Exclusion compliant (Yes/No): No. (Only edits one page, that should render exclusion compliance unneeded).

Already has a bot flag (Yes/No): Yes

Function details: See above.

Discussion

  Approved for trial (10 edits). Just to make sure everything's working as intended. Primefac (talk) 22:11, 2 August 2020 (UTC)

ProcBot 2

Operator: ProcrastinatingReader (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 00:23, Saturday, July 18, 2020 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Ruby

Source code available:

Function overview: Change text dates to use date templates. Only in infoboxes for Template:Infobox television (and associated templates, eg Template:Infobox television season) change manual dates (eg 23 June 2019) to ones using {{start date}} (in this case, {{start date|2019|6|23}})

Links to relevant discussions (where appropriate): Wikipedia:Bot requests#Automatically format TV run dates

Edit period(s): Continuous

Estimated number of pages affected: ~20k initial run

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): No

Function details: The template already states: Use {{Start date}} (e.g. {{Start date}}) so that the date is included in the template's hCalendar microformat, and is properly formatted according to a reader's Wikipedia date and time preferences. Add |df=y if the article uses the dmy date format.

Quite a few articles don't do this. This will convert dates in plaintext (eg "23 June 2019") to using the relevant template. DMY/MDY dates are handled in the following order: if {{use dmy dates}}} or {{use mdy dates}} is present on the page, it'll add a df=y param as appropriate. If there isn't, then it'll default to the format currently used for the date. If it cannot parse the date, it'll skip (including for values like 'present'). This will only handle the first_aired and last_aired params.

Discussion

The bot script was changed from making a search query to fetching all articles transcluding the template and doing a regex on their content. Following discussion at WP:BOTREQ, a tracking category, Category:Pages using infobox television with nonstandard dates, was added to track this. Bot has been adjusted to use this category instead when fetching results. ProcrastinatingReader (talk) 13:04, 21 July 2020 (UTC)

{{BAGAssistanceNeeded}} ProcrastinatingReader (talk) 12:42, 25 July 2020 (UTC)

  Idea is not well explained. The scope of this task is unclear. Will it:

  1. Operate on all plain text dates, even in body text?
  2. Operate only on plain text dates in templates?
  3. Operate only on plain text dates in selected templates? (If so, which templates?)
@ProcrastinatingReader, please clarify. --BrownHairedGirl (talk) • (contribs) 15:23, 27 July 2020 (UTC)
@BrownHairedGirl: sorry, it's explained more fully at the BOTREQ link. It will only apply in television infoboxes. So not including body text. For point 3, the specific templates would be the two in my comments above. ProcrastinatingReader (talk) 15:29, 27 July 2020 (UTC)
Thanks, @ProcrastinatingReader. Please can you amend the Function overview above, to make that scope clear? --BrownHairedGirl (talk) • (contribs) 15:32, 27 July 2020 (UTC)
Done :) ProcrastinatingReader (talk) 15:34, 27 July 2020 (UTC)
Thanks. That's clearer. --BrownHairedGirl (talk) • (contribs) 18:03, 27 July 2020 (UTC)
  Approved for trial (50 edits). I know I was involved in the discussion of this bot, and will recuse from final accept/decline duties, but this is clear-cut enough that at the very least we can send to trial (now that BHG's concerns have been clarified). Primefac (talk) 22:17, 2 August 2020 (UTC)
Primefac, looks like I'm getting a captcha error? ProcrastinatingReader (talk) 23:31, 2 August 2020 (UTC)
@ProcrastinatingReader: It's because you aren't autoconfirmed or flagged yet. To get autoconfirmed, you can log in as the bot, then add and remove a period from its user page 5x (10 edits total). Alternatively, request the confirmed permission (from any admin or at WP:PERM/C). Cheers, --Mdaniels5757 (talk) 01:42, 3 August 2020 (UTC)
Heh. --Mdaniels5757 (talk) 01:44, 3 August 2020 (UTC)
Just figured it out at the same moment you responded hehe. Requested at PERM, since 'gaming' the edits in userspace seemed slightly questionable. ProcrastinatingReader (talk) 13:40, 3 August 2020 (UTC)

  Trial complete. 50 edits. Got a good variety of dates in. Couple of small bumps early on:

  • Bot tried to parse "MMMM YY", eg "September 1980". Fixed by adding an extra check which will reject any instances of month, year, or month/year only combinations (or, more accurately, will reject anything not in a US or American full date format) [7] (3 errors while I was testing the fix for this, [8][9][10])
  • Bot tried to interpret "March 26, 2009 (Craft awards)<br />March 28, 2009" [11][12]. Apparently Ruby's Date thinks it can make sense of that. Fixed by using regex to skip anything more than a single date.
  • When one parameter is to be edited, but the other is empty, bot removed the empty one. Now it won't touch it.[13]. Fix: [14]

Affected templates were repeated with updated bot code, and it correctly skipped them. Then about 35 more processed with a good variety of date formats and cases, all dealt with correctly. ProcrastinatingReader (talk) 13:40, 3 August 2020 (UTC)


Approved requests

Bots that have been approved for operations after a successful BRFA will be listed here for informational purposes. No other approval action is required for these bots. Recently approved requests can be found here (edit), while old requests can be found in the archives.


Denied requests

Bots that have been denied for operations will be listed here for informational purposes for at least 7 days before being archived. No other action is required for these bots. Older requests can be found in the Archive.

Expired/withdrawn requests

These requests have either expired, as information required by the operator was not provided, or been withdrawn. These tasks are not authorized to run, but such lack of authorization does not necessarily follow from a finding as to merit. A bot that, having been approved for testing, was not tested by an editor, or one for which the results of testing were not posted, for example, would appear here. Bot requests should not be placed here if there is an active discussion ongoing above. Operators whose requests have expired may reactivate their requests at any time. The following list shows recent requests (if any) that have expired, listed here for informational purposes for at least 7 days before being archived. Older requests can be found in the respective archives: Expired, Withdrawn.