Open main menu

Wikipedia:Bots/Requests for approval

< Wikipedia:Bots  (Redirected from Wikipedia:BRFA)

BAG member instructions

If you want to run a bot on the English Wikipedia, you must first get it approved. To do so, follow the instructions below to add a request. If you are not familiar with programming it may be a good idea to ask someone else to run a bot for you, rather than running your own.

 Instructions for bot operators

Contents

Current requests for approval

PkbwcgsBot 7

Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 20:27, Saturday, December 15, 2018 (UTC)

Function overview: This is an extension to Wikipedia:Bots/Requests for approval/PkbwcgsBot 5 and I will clean out Category:Pages using ISBN magic links and Category:Pages using PMID magic links.

Automatic, Supervised, or Manual: Automatic

Programming language(s): AWB

Source code available: AWB

Links to relevant discussions (where appropriate): This RfC

Edit period(s): ISBNs will be once a fortnight and PMIDs will be once a month.

Estimated number of pages affected: 300-500 pages per run (ISBNs) and 50-100 pages per run (PMIDs)

Namespace(s): Most namespaces (Mainspace, Article Talkspace, Filespace, Draftspace, Wikipedia namespace (most pages), Userspace and Portalspace)

Exclusion compliant (Yes/No): Yes

Function details: The bot will replace ISBN magic links with templates. For example, ISBN 978-94-6167-229-2 will be replaced with {{ISBN|978-94-6167-229-2}}. In task 5, it fixes incorrect ISBN syntax and replaces the magic link with the template after that. This task only replaces the ISBN magic link with the template using RegEx.

Discussion

Working in article space only? – Jonesey95 (talk) 23:48, 15 December 2018 (UTC)

PkbwcgsBot 5

Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 09:15, Thursday, December 13, 2018 (UTC)

Function overview: The bot will fix ISBN syntax per WP:WCW error 69 (ISBN with incorrect syntax) and PMID syntax per WP:WCW error 102 (PMID with incorrect syntax).

Automatic, Supervised, or Manual: Supervised

Programming language(s): AWB

Source code available: AWB

Links to relevant discussions (where appropriate):

Edit period(s): Once a week

Estimated number of pages affected: 150 to 300 a week

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: The bot is going to fix incorrect ISBN syntax per WP:ISBN. So, if the syntax is ISBN: 819345670X, it will take off the colon and make it ISBN 819345670X. The other case of incorrect ISBN syntax this bot is going to fix is when the ISBN number is preceded by "ISBN-10" or "ISBN-13". For example, in ISBN-10: 995341775X, it will take off "-10:" and that will make it ISBN 995341775X. The bot will only fix those two cases of ISBN syntax. Any other cases of incorrect ISBN syntax will not be fixed by the bot. The bot will also fix incorrect PMID syntax. So, for example, if it is PMID: 27401752, it will take off the colon and convert it to PMID 27401752 per WP:PMID. It will not make it PMID 27401752 because that format is deprecated.

Discussion

Please make sure to avoid ISBNs within |title= parameters of citation templates. Also, is there a reason that you are not proposing to use the {{ISBN}} template? Magic links have been deprecated and are supposed to go away at some point, although the WMF seems to be dragging their feet for some reason. There is another bot that converts magic links to templates, but if you can do it in one step, that would probably be good. – Jonesey95 (talk) 12:05, 13 December 2018 (UTC)

@Jonesey95: The bot will convert to the {{ISBN}} template and it will not touch ISBNs in the title parameters of citations. Pkbwcgs (talk) 15:19, 13 December 2018 (UTC)
What about the PMID's? Creating more deprecated magic words isn't ideal. — xaosflux Talk 19:16, 14 December 2018 (UTC)
@Xaosflux: I did say that was going to happen in my description that they will be converted to templates. However, now I need to code in RegEx and I have been trying to code that but my RegEx skills are unfortunately not very good. Pkbwcgs (talk) 19:52, 14 December 2018 (UTC)
I have tried coding in RegEx but I have gave up soon after as it is too difficult. Pkbwcgs (talk) 21:14, 14 December 2018 (UTC)
@Pkbwcgs: After removing the colon you can use Anomie's regex from Wikipedia:Bots/Requests for approval/PrimeBOT 13: \bISBN(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs})++((?:97[89](?:-|(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs}))?)?(?:[0-9](?:-|(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs}))?){9}[0-9Xx])\b and \b(?:RFC|PMID)(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs})++([0-9]+)\b, or you can adjust them to account for the colon. Primefac could advise if he made any changes to them. — JJMC89(T·C) 06:27, 15 December 2018 (UTC)
@JJMC89: Thanks for the RegEx. I will be able to remove the colon easily. It is the RegEx for the ISBN that I struggled with. Thanks for providing it. Pkbwcgs (talk) 09:49, 15 December 2018 (UTC)
It is saying "nested identifier" and it is not replacing when I tested the RegEx on my own AWB account without making any edits. Pkbwcgs (talk) 09:53, 15 December 2018 (UTC)
@Pkbwcgs: The regex comes from PHP, but AWB (C#) doesn't support possessive quantifiers (e.g. ++). Replacing ++ with + in the regex should work. — JJMC89(T·C) 18:57, 15 December 2018 (UTC)
@JJMC89: I have tested the find RegEx on my AWB account without making any edits and it works. I also worked out the replace RegEx and it is {{ISBN|$1}}. That works too. I think this is ready for a trial. I will also request a small extension for this task which is to clean out Category:Pages using ISBN magic links and Category:Pages using PMID magic links. That will be PkbwcgsBot 7. Pkbwcgs (talk) 20:15, 15 December 2018 (UTC)
I adjusted the RegEx to accommodate ISBNs with a colon. Pkbwcgs (talk) 20:33, 15 December 2018 (UTC)
This diff from my account is good and perfectly justifies what this bot is going to do for this task. Is this good enough? Pkbwcgs (talk) 20:53, 15 December 2018 (UTC)
This is what it will look like if the bot handles an ISBN with the "ISBN-10" prefix. That diff is also from my account. Pkbwcgs (talk) 21:08, 15 December 2018 (UTC)

MusikBot 15

Operator: MusikAnimal (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 22:16, Sunday, November 25, 2018 (UTC)

Function overview: Add/remove {{Current TFA}} to Today's Featured Article, so that it can be referenced in edit filters.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Ruby

Source code available: GitHub

Links to relevant discussions (where appropriate): Special:PermaLink/870355366#RfC: should we automatically pending-changes protect Today's Featured Articles?

Edit period(s): Once daily at 00:00 UTC

Estimated number of pages affected: Two a day (previous TFA, current TFA)

Namespace(s): Mainspace

Exclusion compliant (Yes/No): No - not applicable.

Function details: At 00:00 UTC, the bot will place {{Current TFA}} at the bottom of the new TFA, and remove the template from the previous TFA. TFAs are identified by looking at [[Template:TFA title/date]], e.g. Template:TFA title/November 27, 2018.

The template itself will have no visual effect. It used solely so that we can create filters that target the current TFA. A second filter (or perhaps the same one) will ensure only an admin or MusikBot can add/remove the template. This functionality is similar to Special:AbuseFilter/803.

Note an attempt was made to have TFA Protector Bot do this at User talk:Legoktm/November 2018#TFA template but I received no reply.

Discussion

  • The bot relies on the Toolforge grid job queue. I have a similar bot task, User:MusikBot/RotateTDYK that also runs at midnight. It almost always runs no later than 00:02, which I think is acceptable. Look for "Rotating nomination headings..." in the contributions. If we think it's super duper important for this to happen at exactly midnight, we may have to come up with a different strategy. Thoughts? MusikAnimal talk 22:16, 25 November 2018 (UTC)
    I guess a cheap way around this is to schedule a cronjob for 23:58, 23:59, and 00:00. Each run we check the current time before making the edits; If it's on or after 00:00 proceed, otherwise abort. MusikAnimal talk 22:29, 25 November 2018 (UTC)
    You probably know this already, but Template:TFA title always lists the article name of the current day's TFA. This page is populated from the pages named Template:TFA title/November 1, 2018, etc. - Dank (push to talk) 22:45, 25 November 2018 (UTC)
    I did not know that, thank you :) I will instead go off of [[Template:TFA title/date]], since I need the previous TFA, too. MusikAnimal talk 23:01, 25 November 2018 (UTC)
  • Unlike the userspace one, we should expect lots of edit/edit attempts to TFA. Couple of concerns to look at: (1) will this placement be violating some MOS/mass cleanup list wherein bots and scripts will try to remove it or move it somewhere else on the page? (assuming the bot shouldn't care if it is on the wrong part of the page). (2) Can you build out the filter first for review? — xaosflux Talk 04:02, 26 November 2018 (UTC)
    @Xaosflux: I am not aware of any bots or MOS guidelines that would conflict with this. If using a template is at all a problem, maybe we could just a comment? I can't imagine that would cause any problems. I just figured {{Current TFA}} looks more "official" than <!-- Current TFA -->, but it's true the template otherwise doesn't make sense as it isn't used in more than one place.
    The filter that restricts addition/removal of the template (or comment) except by sysops/MusikBot will closely resemble Special:AbuseFilter/803. That will probably be a standalone filter. Per the RfC, we'd have another filter that prevents addition of imagery, which will be similar to Special:AbuseFilter/history/943/item/20158. I don't think this BRFA is much about this filter, specifically, rather making it possible for filters to target TFA. With this, the door is open to prevent other common vandalism, such as blanking the article (which wouldn't ever make sense on TFA, at least by a new user). We might also see other LTA-specific filters, at least in the short-term. MusikAnimal talk 04:36, 26 November 2018 (UTC)
  • Another possible concern: It's possible that Toolforge downtime or the like could mean the bot task fails to start. This should be very, very rare. The User:MusikBot/RotateTDYK task also runs at 00:00 UTC every day, and since October 2015 when that task first came about, I think it failed to start twice. For this reason, maybe using a template is better than a comment, as we'd be able to see the transclusions, and immediately tell if it's being used somewhere it shouldn't be. The bot could even check transclusions and remove all of them before adding the template to the new TFA. How does that sound? MusikAnimal talk 04:53, 26 November 2018 (UTC)
  • Taking this a step further... what about an additional template, say {{Main Page article}}, that is placed on all articles linked to on the Main Page? Similar to TFA, they all are subject to more disruption. We don't necessarily need to do anything contentious, at least without discussion, but again the idea here is that it will become possible for filters to target these pages, should we ever need to. I'm mostly thinking about LTA-related abuse. MusikAnimal talk 04:53, 26 November 2018 (UTC)
    Actually, I think only WP:DYK's and the other "Today's featured..." are predictable. WP:ITN for instance could change at any time, so probably not best to let the bot handle that, at least for this task. Let's just save the {{Main Page article}} idea for a separate discussion/BRFA. MusikAnimal talk 05:00, 26 November 2018 (UTC)

RonBot 14

Operator: Ronhjones (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 20:29, Wednesday, November 14, 2018 (UTC)

Function overview: Upscales the nominal size of an non free SVG (but not exceeding NFC guideline) post a manual request.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: Based on RonBot4 - User:RonBot/4/Source1 (used to change the nominal size of oversized NFC svg files, so the resultant png files are below NFC guideline), will change to use a different category as input, to be set by a new template.

Links to relevant discussions (where appropriate): (Copied from my Talk Page)...

Hi Ron, I had an idea for new functionality in RonBot. Your bot already fixes images that are too large to meet the NFC guidelines, but another frequent problem with SVG files is that they'll be too small. What often happens is, someone will extract a logo from a PDF where it appeared very small, and as a result, their SVG will have a tiny nominal size. Someone reading an article will click on the logo, expecting to see it bigger, and instead they'll see a much smaller version of it. An example of this right now is the logo on Charlotte Independence (used to affect maybe half of the teams in that league, but I've manually fixed most of them).

Would you consider developing new functionality for RonBot to raise SVGs to the maximum allowed resolution for NFC? Obviously, you wouldn't want to automatically scale up every SVG, since some are presumably intended to be so small. Instead, an editor would have to manually tag the image to be upsized. The current process of manually fixing this is quite tedious, so being able to simply click a button in Twinkle and add a template would make it much easier to eradicate these unnecessarily-tiny logos. Let me know what you think of my idea. Thanks, IagoQnsi (talk) 19:13, 14 November 2018 (UTC)

Edit period(s): Daily

Estimated number of pages affected: Low number of images, just those that are tagged manually

Namespace(s): Files

Exclusion compliant (Yes/No): Yes

Function details: Based on a manually added template (with an associated category), the bot will just adjust the width and height parameters in the "<svg" tag of the image, to allow the size to of the resultant png to be more readable, but of course below the NFC guideline.

Discussion

With all due respect, is this *really* necessary, given that MediaWiki already allows SVGs to be scaled arbitrarily? -FASTILY 01:37, 15 November 2018 (UTC)

Well the requester thought it would be useful, and since there is virtually zero coding to do, I did not see an issue. It does stop items such as File:Gun Owners of America Logo.svg looking rather silly. Ronhjones  (Talk) 21:59, 15 November 2018 (UTC)
Ronhjones, Is there anything in place to keep the bot from edit warring with a human editor? Looking at the example you mention here, the bot would re-size the SVG again, after it's already resized once, albeit from a different task. I would note that in this case - that's a good thing! I could see this irritating some editors, however. SQLQuery me! 09:52, 20 November 2018 (UTC)
SQL I doubt if there is anyone who really wants a smaller SVG - but we could add a check if necessary (examine page history for the edit summary used to upscale the image). The example shown, for some reason the author wanted a whole lot of white space above and to the right of the logo. Looking at it now, it might well be a vandal edit - it makes the article page look weird - I'm going to revert it anyway - editor seems to have vanished. Ronhjones  (Talk) 15:40, 20 November 2018 (UTC)

I can see there is a need for this sort of functionality, but I'm not convinced that bot is the best way provide it. Have you considered rolling this into a web application and putting it on toolforge? -FASTILY 02:29, 28 November 2018 (UTC)

No idea how to do that, my bots only run from my PC. We are only talking about a small number of manually tagged files. The code to edit the svg is going to be the same as RonBot 4, just upscales rather than downscales. Ronhjones  (Talk) 21:53, 28 November 2018 (UTC)

RonBot 12

Operator: Ronhjones (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 23:23, Monday, October 15, 2018 (UTC)

Function overview: Tags pages that have broken images, and sends a neutral message to the last editor.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: User:RonBot/12/Source1

Links to relevant discussions (where appropriate): Request at Wikipedia:Bot_requests#CAT:MISSFILE_bot by KatnissEverdeen

Edit period(s): Twice daily.

Estimated number of pages affected: on average, we estimate 70 articles a day are affected, so that will be 70 articles and 70 talk pages.

Namespace(s): Articles, User Talk space

Exclusion compliant (Yes/No): Yes

Function details:

Step 1 - Bot will get the list of articles at Category:Articles with missing files. It will check for the presence of {{BrokenImage}}. If not present, then it will (a) Add that template, and (b) add {{Broken image link found}} to the talk page of the last editor. NB:This message will be adjusted for the first runs as the time from the broken image to the last edit might be while - it will be better when up to date.
Step 2 - Bot will get the list of articles at Category:Wikipedia articles with bad file links (i.e. pages containing {{Broken image}}) with {{BrokenImage}}. It will check that the page is still in Category:Articles with missing files - if not, it will remove the template - this allows for cases where some other action (e.g. image restored) has fixed the problem, without the need to edit the article.

Discussion

  • I'm not sure leaving a TP message with the last editor is a good idea. I can think of several scenarios where the last editor might not have had anything to do with the image link breaking. I'd really like to hear other opinions on this. SQLQuery me! 04:31, 16 October 2018 (UTC)
I'm actually   Coding... to message the user who broke the link Galobtter (pingó mió) 06:23, 16 October 2018 (UTC)
{{BOTREQ}} ping Ronhjones Galobtter (pingó mió) 10:08, 16 October 2018 (UTC)
  • If a file is deleted (here or on Commons), then it may take several months until a page using the file shows up in Category:Articles with missing files. Whenever someone edits the page the next time, it immediately shows up in the category. Deleted files are typically removed from articles by bots, but they sometimes fail.
As a first step, I propose that you generate a database report of broken images (use Wikipedia's imagelinks table to find file use and then Wikipedia's+Commons's image tables to see if the file exists) and then purge the cache of those pages so that the category is updated. Also consider purging the cache to all pages in Category:Articles with missing files as files might not otherwise disappear from the category if a file is undeleted.
If the file is missing because it was deleted, then the latest editor to the article presumably doesn't have anything to do with this. I think that {{Broken image link found}} risks confusing editors in this situation. Consider reformulating the template.
Category:Wikipedia articles with bad file links seems to duplicate Category:Articles with missing files so I suggest that we delete Category:Wikipedia articles with bad file links and change {{Broken image}} so that the template doesn't add any category.
This is bad code:
if "{{Broken image}}" not in pagetext:
Someone might add the template manually as {{broken&#x20;image}} or some other variant and then you would add the template a second time. Consider asking the API if the template appears on the page instead of searching for specific wikicode. If the bot is unable to remove the template because of unusual syntax, then it may be a good idea if the bot notifies you in one way or another. --Stefan2 (talk) 10:23, 16 October 2018 (UTC)
Using mw:API:Templates for the existence of {{Broken image}} would seem the better way of doing it.
use Wikipedia's imagelinks table to find file use and then Wikipedia's+Commons's image tables to see if the file exists With 10s millions of files in each table I wonder how feasible doing that would be. Galobtter (pingó mió) 11:02, 16 October 2018 (UTC)
Also, the issue with deleted files would seem resolved once Wikipedia:Bots/Requests_for_approval/Filedelinkerbot_3 goes through Galobtter (pingó mió) 12:46, 16 October 2018 (UTC)

{{BotWithdrawn}} In view of the better system by Wikipedia:Bots/Requests_for_approval#Galobot_2 - I'll delete the unneeded cats and templates. Ronhjones  (Talk) 22:39, 16 October 2018 (UTC)

Restart

I have undone the withdraw at the request of Galobtter. But I have cut down the actions to a simple tagging (and de-tagging) of images based on the Category:Articles with missing files as requested. User pages will not be edited. The {{BrokenImage}} no longer generates a categorisation - instead I have used mw:API:Templates to find the list of transclusions (and removed the space in the template name to make life easier). Also removed the If "X" not in Y, for a better match code. Ronhjones  (Talk) 22:53, 17 October 2018 (UTC)

If {{BrokenImage}} no longer categorizes an article, then why add it? Also, pages which already have many maintenance templates will suffer from increased instruction/template creep. -FASTILY 07:39, 22 October 2018 (UTC)
@Fastily: Galobtter suggested it will still be useful - Special:Diff/864517326. It will highlight the fact that there is a broken image link. Passing editors may not realise there is a broken link otherwise - not all editors will be showing hidden categories (where Category:Articles with missing files is located), and how many editors will check the categories anyway? Ronhjones  (Talk) 21:40, 22 October 2018 (UTC)
With all due respect, {{BrokenImage}} states an obvious fact and does not even add a dated maintenance category. IMO, this does not improve the editor/reader experience. Given that maintenance tags are frequently ignored and/or annoying to editors (evidenced in discussions such as this), mass-tagging articles with yet another maintenance tag isn't a good task for a bot. -FASTILY 01:41, 15 November 2018 (UTC)

Galobot 2

Operator: Galobtter (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 10:06, Tuesday, October 16, 2018 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python/Pywikibot

Source code available: here

Function overview: Message users who add broken file links to articles

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#CAT:MISSFILE_bot; Wikipedia:Bots/Requests_for_approval/RonBot_12

Edit period(s): Daily

Estimated number of pages affected: ~10-20 a day

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Sends a talk page message to auto-confirmed users who add broken (red linked) file links to mainspace pages, by scanning CAT:MISSFILE. Mechanism is similar to Wikipedia:Bots/Requests for approval/DPL bot 2. Runs daily, seeing what new red linked files have been added, and messages the user who added them if they are auto-confirmed; doesn't message non-autoconfirmed users as they are likely vandals/wouldn't know how to fix the link. Most people who break file links are IPs/non-autoconfirmed so of the 70 or so broken links added each day I estimate only ~10 people will be messaged per day.

Figures out what image is broken and who did it using mw:API:Images and mw:API:Parse to get file links and finds out the revision in which the broken file link was added.

Message sent will be something like:

Hello. Thank you for your recent edits. An automated process has found that you have added a link to a non-existent file File:Hydril Compact BOP Patent.jpg to the page Blowout preventer in this diff. If you can, please remove or fix the file link.

You may remove this message. To stop receiving these messages, see the opt-out instructions. Galobtter (pingó mió) 10:06, 16 October 2018 (UTC)

Discussion

  • Consider this scenario: User A uploads a file and adds it to an article. A vandal (User B) blanks the page and User C reverts. Later, User D deletes the file. Who would be notified?
Note that it may take forever before pages with recently deleted files show up in Category:Articles with missing files so consider obtaining a list of articles from a database report and purging those so that the category is updated before you start notifying users. --Stefan2 (talk) 10:38, 16 October 2018 (UTC)
Thanks for the comment! In this case, nobody, because it skips cases where the file has been added and then removed and then added, i.e where the file has been added more than once. However if User A adds a file and later User B deletes the file, it'll notify User A, but only if that revision occurred within 24 hours before being listed in CAT:MISSFILE as it only checks the revisions that have occurred since the last run 24 hours ago. I was thinking previously, whether it should skip cases where the file has been deleted after a user adds a file? (can check deletion logs). Galobtter (pingó mió) 11:27, 16 October 2018 (UTC)
Actually, checking the deletion logs seems pretty necessary since the bot probably shouldn't spam people if FileDelinkerBot/CommonsDelinkerBot goes down. Will add Galobtter (pingó mió) 11:51, 16 October 2018 (UTC)

Not a good task for a bot. This is effectively equivalent to messaging someone every single time they make a typo and will likely be perceived as spam and/or be irritating to established editors. At 10-20 edits/day, this is pretty low impact, and comes off as a solution in search of a problem. -FASTILY 07:24, 22 October 2018 (UTC)

As it runs daily, it'll only message if people leave the broken file link for at-least a few hours. I wouldn't want to be messaged every time I made a typo but certainly if I broke a link to file and so caused an easily fixed problem in an article. And there is a definite problem it is trying to help solve: CAT:MISSFILE steadily rising and people spending quite a bit of time every day getting it down (because someone has to eventually fix the file link). That it'd only message 10-20 people a day shows that the number of people who break file links is quite low and so people are unlikely to messaged repeatedly that it becomes an irritant. Galobtter (pingó mió) 07:45, 22 October 2018 (UTC)
I'll split my response for clarity:
CAT:MISSFILE steadily rising and people spending quite a bit of time every day getting it down (because someone has to eventually fix the file link).
Unless CAT:MISSFILE is primarily populated by editors making typos, this is not a legitimate reason to run this task.
it'd only message 10-20 people a day shows that the number of people who break file links is quite low
Sounds like we don't need this task then
and so people are unlikely to messaged repeatedly that it becomes an irritant
It's irritating to people that do get messaged, especially if you're bothering them over minor things. In fact, this is one of the reasons I am opposed to this task. -FASTILY 03:55, 23 October 2018 (UTC)
I think the number here is somewhat underestimated - Wikipedia:Bot_requests#CAT:MISSFILE_bot says a 10 day trial generated 681 pages with broken file links. It hardly a minor thing if someone has broken a file link in an article, I think they would want to know. Some of these errors are definitely know to be due a poor search and replace with AWB, if the editor is not aware, then there is the strong possibility that the editor will use the same setup and create even more broken links. Ronhjones  (Talk) 19:50, 25 October 2018 (UTC)
The reason for that number is that it is mostly IPs or non-autoconfirmed users breaking links and many errors are from failures of the delinker bots upon deletion of files. Galobtter (pingó mió) 20:00, 25 October 2018 (UTC)
As a regular patroller of CAT:MISSFILE, I can say definitively that many red-linked files are due a poor search and replace with AWB or other script-assisted editors. See these two edit histories (1 and 2) for recent examples of red-linked images caused by script-assisted editing. I'm a less active patroller now than I used to be but I'm sure KatnissEverdeen and Sam Sailor can provide other examples. - tucoxn\talk 07:09, 27 October 2018 (UTC)

─────────────── I definitely would agree with Ronhjones and Tucoxn. However, while Tucoxn is definitely right that a lot of red-linked files are because of 'find and replace' AWB/script edits, I would also add that people (especially new editors) often don't realize that editing a filename breaks the image. I would argue that a message would be helpful, as I have received many confused messages on my talk page legitimately asking why I reverted them and what they did wrong. Here are a few other examples to illustrate this point (all of these people messaged on my talk page later saying they didn't know they had done something wrong). 1 2 3. Happy to provide other examples if you like. Cheers, Katniss May the odds be ever in your favor 16:17, 27 October 2018 (UTC)

Ronhjones and Tucoxn are both right here. Seasoned editors running AWB/scripts and overlooking changes to filenames is a common mistake. I am no saint myself: my first interaction with KatnissEverdeen was when she made me aware that I had overlooked a script-assisted change of a dash to emdash endash in a filename. The more "permanent" solution to these scenarios is to create redirects on Commons. I wish we had a little script for doing that, and if any of you have a good idea where to propose it, I would appreciate your feedback. Galobtter, thanks for coding the bot, I for one would like to know when I screwed something up. Sam Sailor 21:55, 27 October 2018 (UTC) (Amended. Sam Sailor 20:37, 29 October 2018 (UTC))
@Sam Sailor, KatnissEverdeen, and Galobtter: As a commons admin - I know that will be - c:Commons:Bots/Work_requests to request someone to invent/run a bot, and c:COM:BRFA for bot approvals. Ronhjones  (Talk) 00:40, 28 October 2018 (UTC)
@Ronhjones and Galobtter: I would agree with the script idea, not sure of the technical lingo I would need to use to request it though. I'm sure you all would be much better at wording the request than I would  . Sam Sailor "I am no saint myself: my first interaction with KatnissEverdeen was when she made me aware that I had overlooked a script-assisted change of a dash to emdash endash in a filename." - Haha, I totally forgot about that...very easy thing to screw up and nobody's perfect  . Katniss May the odds be ever in your favor 15:39, 29 October 2018 (UTC) (Amended "emdash" to "endash" in quote per WP:TPO for clarity. Sam Sailor 20:37, 29 October 2018 (UTC))
@Sam Sailor, KatnissEverdeen, and Galobtter: I'm not sure that commons would like such a bot. With 50 million images on site, it might be quite a few redirects! I'll post a question over there and see what they say. Ronhjones  (Talk) 15:45, 29 October 2018 (UTC)

──────────@Ronhjones: I have no clue if that is a job for a bot, I was thinking about a script that would make it a bit easier to create redirects on Commons.
Suppose you patrol CAT:MISSFILE, and you "correct" a spelling correction only to be undone which again causes a redlinked file. Here it would save some seconds with a script that could load up https://commons.wikimedia.org/wiki/File:Nutrient_absorbtion_to_blood_and_lymph.png and pop up a box containing the string File:Nutrient absorbtion to blood and lymph.png where you could change it to File:Nutrient absorption to blood and lymph.png, press Create redirect, and a redirect would be created from the latter to the former. Sam Sailor 20:37, 29 October 2018 (UTC)

Love this idea! I think this would be a super easy solution to quite a few of our issues here. Katniss May the odds be ever in your favor 20:40, 29 October 2018 (UTC)
@Sam Sailor, KatnissEverdeen, and Galobtter: Interesting. I don't write scripts very well at all, I've no idea how well a script on en-wiki would work well with commons - there are still some old users who have different usernames on commons - might cause issues! However, you don't need a commons redirect - it could be local redirect on en-wiki (does not matter if it redirects to a commons image), that would keep it much more simpler. Maybe you should ask at Wikipedia:User scripts/Requests Ronhjones  (Talk) 21:28, 29 October 2018 (UTC)
@Ronhjones: thank you, I created a local redirect at File:Nutrient absorption to blood and lymph.png to File:Nutrient absorbtion to blood and lymph.png, but it did not work. Are there special requirements to the syntax of redirects in file space? Sam Sailor 12:06, 8 November 2018 (UTC)
@Sam Sailor: Very odd and very unusual page. How did you create it? Wikitext or visual editor or dummy upload? See User:Ronhjones/Sandbox2 - three images are File:Testorientation.jpg, File:Testorientation.JPG, File:Testorientationtest.JPG - compare the last one to File:Nutrient absorption to blood and lymph.png Ronhjones  (Talk) 16:48, 8 November 2018 (UTC)

──────────@Ronhjones: Yours are working, mine are not. I tried substituting underscores for spaces in the filename in the redirect (diff), it did not change a thing. The redirect was created with Sagittarius+, but that should not be the culprit, and starting File:Nutrient absorption to blood and lymph TEST.png "manually" in the normal editor did not change anything. (I hardly ever use Visual Editor.)
I notice two things:

I wonder if my lack of the movefile flag is causing this. Would you grant me, at least temporarily, the file mover right? If you do, would you also delete File:Nutrient absorption to blood and lymph.png, so I can recreate it with the file mover right, and in any case delete File:Nutrient absorption to blood and lymph TEST.png, thanks. Sam Sailor 19:39, 8 November 2018 (UTC)

@Sam Sailor: I think you have been here long enough not to go mad with it   Done (and page deleted) Ronhjones  (Talk) 20:00, 8 November 2018 (UTC)
Thanks, Ronhjones. Recreated File:Nutrient absorption to blood and lymph.png, but the problem persists. Any ideas? Ask at VPT? Sam Sailor 20:05, 8 November 2018 (UTC)
@Sam Sailor: Bonkers! It won't work for me. I made a redirect for my balloon pic with a space - no problem, and I took out the spaces File:Nutrientabsorptiontobloodandlymph.png. The only difference I can see is that mine is jpg and yours is a png. Let me find a different png and try something. Ronhjones  (Talk) 20:26, 8 November 2018 (UTC)
@Sam Sailor:Not the png - made File:7 and 35 shields.png, all OK. Anything based on File:Nutrient absorbtion to blood and lymph.png fails. Suggest VPT, I'm now lost... :-( Ronhjones  (Talk) 20:34, 8 November 2018 (UTC)
Redirects on enwiki to files on Commons do not work. Redirects to Commons's files must be created on Commons. — JJMC89(T·C) 04:03, 9 November 2018 (UTC)

─────────────────────────Ahh, of course, thank you JJMC89. Could you, with your expertise in programming, by any chance write a script that facilitates creating redirects on Commons? Sam Sailor 08:33, 13 November 2018 (UTC)

Sam, A bot or a user script? — JJMC89(T·C) 02:37, 14 November 2018 (UTC)
JJMC89, a script something like this. Sam Sailor 08:23, 14 November 2018 (UTC)
Sorry, my knowledge of JS is insufficient to write a user script. I would try posting at WP:SCRIPTREQ or asking someone like Enterprisey or Writ Keeper. — JJMC89(T·C) 21:41, 17 November 2018 (UTC)

Bots in a trial period

PkbwcgsBot 4

Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 11:41, Sunday, December 9, 2018 (UTC)

Function overview: The bot will fix two high priority WP:WCW errors using AWB.

Automatic, Supervised, or Manual: Supervised (Error 62 will be mostly manual)

Programming language(s): AWB

Source code available: AWB

Links to relevant discussions (where appropriate):

Edit period(s): Weekly

Estimated number of pages affected: 100 to 500 a week

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: The bot will fix WP:CHECKWIKI high priority errors 3 (Reference list missing) and 62 (URL without http://). General fixes will be switched on but spelling fixes will be turned off. Every-single edit will be previewed before saving as this task will always be supervised. An example of an error 3 fix is this and an example of an error 62 fix is this when AWB added "http://" before a weblink. General fixes were switched on for both of the edits which is why AWB done general fixes on this edit.

Discussion

I assume that your changes will be taking into account the possibility that there are multiple reflists or that the External Links section comes before the References section? Primefac (talk) 15:28, 12 December 2018 (UTC)

How are you determining the protocol designation to use for links? (i.e. http:// vs https:// vs ftp:// etc etc - are you actually checking every link manually to ensure it is valid?) — xaosflux Talk 15:35, 12 December 2018 (UTC)
@Primefac: Every edit will be checked to make sure that there won't be multiple reflists and that the references and the external links are correctly placed. @Xaosflux: I shouldn't need to check every time as http:// works most of the time before any link but I still do check manually to make sure the link works. Pkbwcgs (talk) 20:16, 12 December 2018 (UTC)
Thank you, as long as your are checking, no worries. This is just along the lines of you wouldn't want your bot to a make an edit that a sensible editor wouldn't also make (such as adding an invalid link). — xaosflux Talk 20:20, 12 December 2018 (UTC)

In that example AWB added http://, but the site redirects to https://annielouisaswynnerton.com/. —  HELLKNOWZ   ▎TALK 12:09, 13 December 2018 (UTC)

@Hellknowz: I have fixed that. Pkbwcgs (talk) 15:23, 13 December 2018 (UTC)
Is your intention to fix all such cases manually? —  HELLKNOWZ   ▎TALK 15:57, 13 December 2018 (UTC)
@Hellknowz: My intention is to fix them using AWB and then check them afterwards so the task is going to be supervised. Pkbwcgs (talk) 16:32, 13 December 2018 (UTC)
So that's a yes? In other words, you'll be needing to make two edits - one that is mostly right, and the second that gets it the rest of the way there? Primefac (talk) 22:34, 13 December 2018 (UTC)
@Primefac: When I need to change from http:// to https:// then yes. Otherwise, there is no need to make a second edit. Pkbwcgs (talk) 16:14, 14 December 2018 (UTC)
Out of curiosity, do you have any idea how frequently that will be? Just in general (often, rarely, half the time, etc). Primefac (talk) 16:56, 14 December 2018 (UTC)
I'm going to guess "no", so   Approved for trial (100 edits (url protocol addition)).. —  HELLKNOWZ   ▎TALK 18:09, 14 December 2018 (UTC)
@Primefac: It will be done often once a week. @Hellknowz: URL protocol addition is done through general fixes so is it okay for you if I turn on general fixes? Pkbwcgs (talk) 18:56, 14 December 2018 (UTC)

ProgrammingBot 2

Operator: ProgrammingGeek (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 00:41, Wednesday, November 14, 2018 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): JavaScript (nodejs)

Source code available: GitHub

Function overview: Adds {{WikiProject Protected areas}} to talk pages in categories:

That do not already have the template.

Links to relevant discussions (where appropriate): WP:Bot requests#Add a wikiproject template to New York City parks articles

Edit period(s): Daily

Estimated number of pages affected: ~650

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): No

Function details: Adds the template to talk pages of articles in the categories above, provided they do not already have the template. Will fill in the class= field if there is another template with it filled out.

Discussion

  • Seems uncontroversial.   Approved for trial (100 edits). SQLQuery me! 09:23, 20 November 2018 (UTC)
  • This bot should put the WikiProject template inside {{WikiProjectBannerShell}} if it exists on the talk page. In situations where the banner shell does not exist, the bot should respect WP:TALKORDER as best as possible (putting the banner at the end of the existing templates is probably better than putting it at the top). --AntiCompositeNumber (talk) 17:54, 21 November 2018 (UTC)
  • A few notes from some things that have already happened
    • The bot started editing before it logged in. Whoops. The edits have been oversighted and the call to begin editing is now in the callback for the login function. Although I spotted the error and contacted the oversight team almost immediately, thank you to Natureium for bringing it to the bot noticeboard. I've already thanked Xaosflux for showing me the assert functionality of the API (link).
    • Thank you to AntiCompositeNumber for your input, I'm working on implementing that and will do so before continuing with the trial
    • Due to some admittedly lazy programming on my part, the bot did not properly detect templates on pages, meaning that many times the bot tagged the page with multiple templates. I'm working on fixing that issue as well, and the erroneous edits have been rollbacked (see here).
  • Thank you for your continued patience, it's been fun learning to program the bot and my skills are improving. Kind regards, ProgrammingGeek talktome 19:24, 22 November 2018 (UTC)
  • Work has now resumed (I took last week off to recover from a grueling few weeks at school). Thanks, ProgrammingGeek talktome 15:59, 26 November 2018 (UTC)
    ProgrammingGeek, No problem, thanks for keeping us updated. Take all the time you need. SQLQuery me! 20:52, 27 November 2018 (UTC)

Bots that have completed the trial period

TheSandBot 2

Operator: TheSandDoctor (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 16:38, Thursday, November 29, 2018 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: [1]

Function overview: Removes {{orphan}}, {{uncategorized}}, and {{underlinked}} where present in the draft namespace.

Links to relevant discussions (where appropriate): N/A

Edit period(s): Periodic, as necessary

Estimated number of pages affected: Variable. Currently around 561

Exclusion compliant (Yes/No): No (N/A)

Already has a bot flag (Yes/No): Yes

Function details: Removes {{orphan}}, {{uncategorized}}, and {{underlinked}} where present in the draft namespace. This is done since the various templates are non-applicable to drafts and they do not make sense to be present within the draft namespace. Though I have not started working on the code for this (could have that done within a few minutes probably, just don't have time right this moment), it would essentially take the draft namespace transclusions of each category and go through them. Once it finds the template, it simply would remove it with an edit summary similar to "rm X template, N/A in the draft namespace".

Discussion

TheSandDoctor - Seems like a pretty easy task, that makes a lot of sense.   Approved for trial (100 edits).. Please link to this brfa in the edit summary. SQLQuery me! 00:58, 2 December 2018 (UTC)

@SQL: So far the bot has updated 1 page (successfully) but ran into programatic issues afterwords that I have not had the time to resolve as I am bogged down with final projects and upcoming final exams. Is it okay if this waits for a week? Everything will be cleared up for me by December 13th and then I will be able to look into it in greater detail and resolve the issues to resume the trial. --TheSandDoctor Talk 18:13, 4 December 2018 (UTC)
TheSandDoctor, Yep, not a problem! SQLQuery me! 18:17, 4 December 2018 (UTC)
@SQL: Thanks! --TheSandDoctor Talk 18:23, 4 December 2018 (UTC)
  •   Trial complete. @SQL: Programatic problems resolved. Turned out I format to import the errors library, check for blank titles(?), and my getTransclusions method was returning a multidimensional list, which I wasn't exactly expecting and was a holdover from a proof of concept. --TheSandDoctor Talk 20:30, 13 December 2018 (UTC)
Source code added. --TheSandDoctor Talk 20:41, 13 December 2018 (UTC)

SQLBot-AmazonAffiliateRemoval

Operator: SQL (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 01:11, Tuesday, December 11, 2018 (UTC)

Function overview: Per Wikipedia:Spam#Affiliate_links, removes the affiliate (tag=) portion from amazon links.

Automatic, Supervised, or Manual: Automatic

Programming language(s): PHP

Source code available: Once complete, yes.

Edit period(s): Daily

Estimated number of pages affected: 673 in the first run

Namespace(s): Mainspace

Exclusion compliant (Yes/No): No

Function details: Looks for tag= in amazon urls, this query:

SELECT page_title, 
                el_to 
FROM   externallinks 
       JOIN page 
         ON el_from = page_id 
WHERE  ( el_to LIKE "%amazon.%tag=%" 
          OR el_to LIKE "%amzn.%tag=%" ) 
       AND page_namespace = 0;

To find pages, and URLs. Once found, removes ?tag= from every amazon url as found above.

The function used to strip the tag= portion of the url is:

$p_url = parse_url( $url );
parse_str( $p_url['query'], $out );
unset( $out['tag'] );
$q = array();
foreach( $out as $o=>$t ) {
        array_push( $q, "$o=$t");
}
$stripped = "https://" . $p_url['host'] . $p_url['path'] . "?" . implode( "&", $q );

Discussion

  Approved for trial (25 edits).xaosflux Talk 01:28, 11 December 2018 (UTC)
  • I note that SQL query is not very efficient. There's no way to really avoid the table scan, but you should be able to greatly improve subsequent runs by remembering the maximum el_id value from just before the current run and only looking at rows with higher values next time. Anomie 18:28, 11 December 2018 (UTC)
    • Also you might consider batching the query: select with AND el_id BETWEEN X AND X+10000 or something like that, process whichever rows you got from that, and repeat with increasing X until it's greater than the current MAX(el_id). Ideally adjust the "10000" there so each batch doesn't take more than a second or two to return. Anomie 18:36, 11 December 2018 (UTC)
  •   Trial complete. I ended up implementing batching as suggested above. 100000 ids at a time, each query takes between 1 and 2 seconds to run. In the beginning, there were some issues with url encoding, but those have been resolved. SQLQuery me! 23:09, 11 December 2018 (UTC)
  • Q: Is 673 pages the estimated total that have a tag, or just a small trial set? Asking because if it's a large number, it should be aware of archive URLs. Changing an archive URL will break the URL. We have many millions of archive URLs. There are a couple fairly simple ways to avoid archive URLs I can pass along if you would like. -- GreenC 23:42, 11 December 2018 (UTC)
    GreenC, That should be everything, and that's a very good catch. I could check the "host" portion of parse_url to make sure it contains either "amazon" or "amzn". I believe that would be sufficient.
    By the way - I imagine that we could probably reduce this to a one-time run as well, if someone wanted to make an edit filter or spam blacklist entry to stop these before they get started. SQLQuery me! 23:50, 11 December 2018 (UTC)
Not sure how it's parsing the article for URLs, might it still pick up URLs in the query or path portion of another URL? The two main types lead with either a "/" or "?url=" like archive.org/web/20181210010101/http://... or webcitation.org/456hsdus?url=http:// if it back checked for those leading character(s) should be safe. User:Headbomb came up with a regex for this I can try to track down if you'd like. -- GreenC 00:08, 12 December 2018 (UTC)
Don't really know what I can do to help here with regexes. However, I'll comment on blocking affiliates with an edit filter. Most often, those are just good faith copy-pastes of URLs. That shouldn't really be blocked, although an edit summary tag might be appropriate. Headbomb {t · c · p · b} 01:35, 12 December 2018 (UTC)
I've updated it to require that the hostname returned by parse_url contains either "amazon.", or "amzn." - which will handle any issues surrounding archives. SQLQuery me! 04:25, 13 December 2018 (UTC)
User:SQL: question: If the Wikisource contains http://archive.org/web/20181210010101/http://amazon.com/.. a regex for an amazon URL would match on http://amazon.com/.. portion, and parse_url would OK it since it has an amazon hostname. -- GreenC 04:41, 13 December 2018 (UTC)
GreenC, If you aren't familiar with the PHP's parse_url(), you can play with it here.
The results I get from your specific example are:
array (
'scheme' => 'http',
'host' => 'archive.org',
'path' => '/web/20181210010101/http://amazon.com/',
)
Of which, 'archive.org' would return false on a strpos( "archive.org", "amazon." ) call, skipping that URL. SQLQuery me! 04:46, 13 December 2018 (UTC)
Understand parse_url() takes a URL to be parsed so question was how the URL is retrieved from the wikisource in the first place, I assumed regex. But looking again, it is not parsing from the wikisource, rather from an SQL query. Then it modifies the URL, and presumably does a search/replace in the wikisource. In which case my initial concern is answered, there is no problem :) -- GreenC 05:06, 13 December 2018 (UTC)
─────────────────GreenC, Yep, sorry - I didn't realize that's where we hit a loop. It gets the incoming url directly from the Externallinks table. SQLQuery me! 05:12, 13 December 2018 (UTC)
Yep my misunderstanding. FYI the Externallinks table is not complete. I've done tests before and found quite a few missing URLs. The reason is they don't parse URLs contained in some templates and other reasons. Probably for this application it's OK. Could possibly supplement with a CirrusSearch afterwards to check any were missed. -- GreenC 05:36, 13 December 2018 (UTC)
  • As this is appears to be a single-purpose account, please build out the user page a bit more to explain why this bot is doing what it does so that people looking at it will have an understanding that Wikipedia isn't in some us-vs-Amazon fight. — xaosflux Talk 03:16, 12 December 2018 (UTC)
    Xaosflux,   Doing..., but while I'm doing so - I thought it might be appropriate to rename the bot as well, removing the 'amazon' bit. Both for the reason you point out above, and that I might use it later for other affiliate link removals. What do you think? SQLQuery me! 04:16, 13 December 2018 (UTC)
    @SQL: User:SQLBot-AffiliateRemoval for example sounds fine? — xaosflux Talk 12:33, 13 December 2018 (UTC)

Ahechtbot 4

Operator: Ahecht (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 21:26, Sunday, November 25, 2018 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AutoWikiBrowser

Source code available: AWB, replacement strings posted at User:Ahechtbot#Task 4

Function overview: Fixes the specific signatures and substituted template text with unclosed formatting tags listed at User:Ahechtbot#Task 4. These are now causing linter errors and/or formatting issues on entire pages due to the Change from HTML Tidy to RemexHTML. Also fixes unclosed <s>...</s> tags where found on pages where other changes are already being made.

Links to relevant discussions (where appropriate): Continuation of approved tasks at Wikipedia:Bots/Requests for approval/Ahechtbot, Wikipedia:Bots/Requests for approval/Ahechtbot_2, and Wikipedia:Bots/Requests for approval/Ahechtbot_3.

Edit period(s): Will be run in several batches, due to Wikipedia search having a 10,000 item limit.

Estimated number of pages affected: ~450,000 (the vast majority of which will be fixes to substitutions of templates such as {{Afd bottom}})

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Continuation from Task 3 with another batch of strings to replace. These strings are listed at User:Ahechtbot#Task 4. No "Automatic changes" (genfixes, etc.) will be enabled. All edits will be marked as "bot" and "minor".


Discussion

  Approved for trial (50 edits). SQLQuery me! 20:35, 27 November 2018 (UTC)

  Trial complete. Special:Contributions/Ahechtbot. At least four pages were run for each replacement string. I had to make a few tweaks to the regexes to simplify them a bit, and I added one to catch an additional bad template transcluded by Journalist along with his signature. --Ahecht (TALK
PAGE
) 04:21, 28 November 2018 (UTC)
{{BAG assistance needed}} It's been a week and a half since the trial was completed. --Ahecht (TALK
PAGE
) 20:50, 8 December 2018 (UTC)
@Ahecht: it the Spinningspark section of your fix sheet, it appears you will be creating an unmatched '''. — xaosflux Talk 01:03, 16 December 2018 (UTC)
@Xaosflux: Removing the''' balances a bold started earlier in the signature (outside the search string). See Special:Diff/870965498 for an example. --Ahecht (TALK
PAGE
) 01:20, 16 December 2018 (UTC)

Bot1058 5

Operator: Wbm1058 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 03:53, Monday, November 26, 2018 (UTC)

Function overview: Remove pages from Category:Long monitored short pages, Category:Monitored short pages length 901 to 1000, Category:Monitored short pages length 801 to 900, and Category:Monitored short pages length 701 to 800

Automatic, Supervised, or Manual: Automatic

Programming language(s): PHP

Source code available:

Links to relevant discussions (where appropriate): Template talk:Short pages monitor#Need to define and possibly rethink this template

Edit period(s): Hourly, after initial run to clear the categories

Estimated number of pages affected: 2600 on initial run, then as needed when short pages become longer

Namespace(s): Mainspace/Articles

Exclusion compliant (Yes/No): No

Function details: Remove the text added by {{subst:long comment}} from pages listed in Category:Long monitored short pages; these pages no longer need this text to keep the page off the top of the Special:ShortPages listings. No reason to make this process exclusion compliant.

Discussion

  Approved for trial (50 edits). please post link to trial edits and summary of your trial here when complete. — xaosflux Talk 16:21, 26 November 2018 (UTC)
  Trial complete. 50 edits, no issues. Forgot to ID this as "Task 5" in the edit summary, will do that before the next run. – wbm1058 (talk) 21:43, 26 November 2018 (UTC)
@Wbm1058: I don't normally maintain ShortPages - can you explain a bit more why this task should be done, what is it cleaning up after - and why that cleanup is no longer needed? Opening a random short page (Lobed cactus coral) - it looks like a fairly normal page, maybe missing a template? — xaosflux Talk 18:16, 29 November 2018 (UTC)
@Xaosflux: Right. It's not particularly intuitive, is it? It took me a while to figure it out, too. This is my understanding. Some editors, at some time (I don't know who, or whether anyone else still does – I just started doing it myself recently) monitor the list at Special:ShortPages. Look at the header on Wikipedia talk:Special:ShortPages for a link to the explanation for "Long comment to avoid being listed on short pages": There is a list of short pages that people monitor regularly. Often one can find vandalism - removing text and replacing with "skahfsakfl" or such - that makes the list, or other junk that makes it past the recent changes patrol. So, when I (and others) go through the list and something is "ok" like a short dab page, we add a long comment so that it doesn't get listed at short pages, so we don't do the same work over and over again. Just making another tool to remove vandalism more efficient for those who use it. Carlossuarez46 17:05, 11 October 2007 (UTC)
Looking at the top of the short pages list, I typically see pages that have been tagged for speedy deletion. Of course these will also be marked in Category:Candidates for speedy deletion. I rarely see anything that looks like vandalism ("skahfsakfl" or such) – maybe ClueBot or some other vandalism patroller takes care of these promptly. So the usefulness of this work queue seems marginal; maybe it was more useful ten years ago. In any event, the first four pages on the list are marked for speedy deletion. The fifth item on the list, Rupf, is 161 bytes long, and is the first legitimate "long page" on the list. There are a bunch of short set-indexes near the top, like Lobed cactus coral – that's 186 bytes long. So if an editor creates a new set index or disambiguation page that's less than 160 bytes long, then {{subst:long comment}} may be written to that page to make it long enough to get out of the "below 160 bytes range", i.e. whitelist it. However, as more line items are added to set indices and dabs, they natually become long enough to no longer need to be whitelisted. Editors set up the categories like Category:Monitored short pages length 801 to 900 to point these out for manual cleanup. I suppose with the highest priority for cleanup given to the longest "short" pages. The instructions on the category pages give the caveat to check that the article content is acceptable; a formerly very short article that has grown to more than 900 characters can indicate a mass introduction of text that may not belong. in which the vandalism or bad edit should be reverted rather than removing the long comment. No, my bot doesn't check for this sort of bad edit; it just removes the long comment. I've also seen where editors add the long comment to pages longer than 160 bytes, because they think it's a short page, when it really isn't by the current definition.
Last month I cleared Category:Long monitored short pages and Category:Monitored short pages length 901 to 1000 using JWB, and didn't find any "gotchas" that would indicate these cats couldn't be cleared by a bot. My October edit history showing the JWB edits
I plan on consolidating these four maintenance categories to the single category Category:Long monitored short pages after the other three are cleared, as with processing being done by a bot, there is no need for prioritizing them into various length ranges for humans. – wbm1058 (talk) 20:40, 29 November 2018 (UTC)
@Wbm1058: so noone is ever actually doing the "check that the article content is acceptable" - and the only reason you are making this edit is to remove the template once the pages are 'longer'? I don't see anyone discussing this, can you at least start a section at Template talk:Short pages monitor or somewhere more appropriate to see if anyone cares? — xaosflux Talk 22:05, 29 November 2018 (UTC)
@Xaosflux: Thanks for pointing me to that template talk page, which I hadn't looked at before. I've added a link to the relevant discussion at Template talk:Short pages monitor#Need to define and possibly rethink this template, which I think is what you're looking for. I think that discussion pretty much comes to the same conclusions I did. From that discussion I found that Dcirovic appears to have run an unauthorized bot that mindlessly added short-page monitoring to 4,950 pages over a span of 6 hours, 40 minutes – that was 4950 edits ÷ 400 minutes = 12.375 edits/min. But you should pardon Dcirovic as the statute of limitations has expired on that, even though they never responded after being called out for it in the talk section I just linked you to. I also added a new subsection Very short new article: edit filter 98 which notes there's an edit filter performing the equivalent function as Special:ShortPages. While I won't go so far as MZMcBride and Anomalocaris to say that "long comments" should be entirely eliminated, as some may prefer using the special page over the edit filter, or may not know how to use edit filters, I agree that the instructions need clarification, and I think a bot to manage this so as to keep it under control is essential. Hoping that after reading this, you can go ahead and approve this bot task. I'd rather not try calling in some editors who don't understand the purpose of "long comments", and either make them waste time reading to understand the issue before they vote, or take a minute and say "seems useful, make someone clear this manually" without taking enough time to understand it. I've already done my time clearing another category that was populated by a relatively "dumb" bot, that took me months to clear, without much of any help from the voters who insisted that the work should be done. I agree with the statement that Short Pages Monitor pages are no more likely to need cleanup than any other page and thus there's no rationale for making someone manually clear these categories. – wbm1058 (talk) 01:26, 1 December 2018 (UTC)
I left a notice of this BRFA, soliciting comments at both Template talk:Long comment (where you will see more comments from editors confused about its purpose) and Template talk:Short pages monitor. wbm1058 (talk) 03:16, 1 December 2018 (UTC)

PkbwcgsBot

Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 13:55, Saturday, October 27, 2018 (UTC)

Function overview: The bot will make fixes to some WP:WCW errors using WPCleaner.

Automatic, Supervised, or Manual: Automatic, but I can do supervised on request (for some errors)

Programming language(s): WPCleaner

Source code available: WPCleaner bot tools

Links to relevant discussions (where appropriate):

Edit period(s): Each error, three times a week as clarified below

Estimated number of pages affected: Around 800 pages per one-hour period of editing at 15epm, three one-hour sessions a week will make this approximately 2,700 pages fixed. Approximately six to seven minutes will be spent fixing per error in an editing session at 15 epm.

Namespace(s): Mainspace/Articles

Exclusion compliant (Yes/No): Yes

Function details: I am going to run WPCleaner using the bot account and it will make WP:WCW fixes for errors: 1 (Template contains useless word Template:), 2 (Tag with incorrect syntax), 6 (DEFAULTSORT with special characters), 9 (Multiple categories on one line), 16 (Unicode control characters), 17 (Category duplication), 20 (Symbol for dead), 37 (DEFAULTSORT missing for titles with special letters), 54 (Break in list), 64 (Link equal to linktext), 85 (Tags without content), 88 (DEFAULTSORT with a blank at first position), 90 (Internal link after external link), 91 (Interwiki link written as external link or used as a reference) and 524 (Duplicate arguments in template calls). The bot will not do 45 (Interwiki duplication) because automatic fixing for interwiki duplication is causing errors. Most of the errors are being done by other bots. The bot is going to use the bot tools provided by WPCleaner. Each error will be run three times a week on a Monday, Thursday and Sunday with a one-hour editing session on each of the three days with the aim to fix approximately 900 pages at the rate of 15epm. There are nine errors so the bot will stick to a maximum of 100 fixes per error (800/8 = 100) in a single editing session. Due to over 6,000 pages reported as error 90, more time will be spent of it but this will come in a future BRFA but I will stick to a maximum of 100 fixes per editing session for this error for the time being. The figure of a maximum of 100 fixes per error at a one-hour editing session making it 300 fixes per error per week using this bot.

Discussion

  • You will need to register this account, also please make a userpage for your bot, you may want to redirect its talk page to yours. — xaosflux Talk 14:55, 27 October 2018 (UTC)
  • List of "errors": 1,2,6,9,16,17,20,37,54,64,85,88,90,91,524. — xaosflux Talk 14:55, 27 October 2018 (UTC)
    A large portion of these are marked as "cosmetic only" - making only cosmetic updates with a bot is generally not supported, can you talk some about your strategy here? — xaosflux Talk 14:58, 27 October 2018 (UTC)
    @Xaosflux: Some of the errors listed are not cosmetic. Yes, I agree that fixing error 64 is cosmetic as it has no visible change. However, error 90 and 91 are not cosmetic because it changes the internal/interwiki from an external link to an internal link. Error 524, error 2 and error 16 are also not cosmetic. I will strike error 64 as I know that is definitely cosmetic. Pkbwcgs (talk) 15:07, 27 October 2018 (UTC)
    Bot account has now been created. Pkbwcgs (talk) 15:12, 27 October 2018 (UTC)
    @Xaosflux: I have struck off some more errors that I felt will be cosmetic. Out of the errors I have said, 2 (Tag with incorrect syntax), 90 (Internal link after external link) and 91 (Interwiki link written as external link or used as a reference) are either high priority or middle priority. Error 524 is also important as well as error 16 because it strips out unicode control characters which will reduce the bytes of the page. I can do error 20 manually on my account as there rarely is a backlog for error 20. I don't know what you think about the other errors. Pkbwcgs (talk) 18:06, 27 October 2018 (UTC)
    Wikipedia:WikiProject_Check_Wikipedia/List_of_errors doesn't have error numbers above 113, please point to a current documentation of error numbers you are dealing with. — xaosflux Talk 18:12, 27 October 2018 (UTC)
    The errors at Wikipedia:WikiProject Check Wikipedia/Translation has error 524 and WPCleaner is also configured to fix error 524. Pkbwcgs (talk) 18:15, 27 October 2018 (UTC)
  • You are requested an edit rate of 50epm, will you be configuring MAXLAG? — xaosflux Talk 18:11, 27 October 2018 (UTC)
    @Xaosflux: Yes, as that is a requirement when editing a high level of pages per minute. However, how to enable MAXLAG? Pkbwcgs (talk) 18:13, 27 October 2018 (UTC)
    mw:Manual:Maxlag parameter , however if you don't know how to do this, you will need to just throttle down to a slower level like 10epm. — xaosflux Talk 18:17, 27 October 2018 (UTC)
    I can come down to 10epm to 20epm and make the editing time longer. I will amend this in "Estimated number of pages affected:". Also, based on what I can see at WP:WCW and the amount of pages that need fixing, I plan to run this bot two to three times a week. One hour for each editing session and that way, I can make it 10epm to 20epm and have things fixed quickly. Pkbwcgs (talk) 18:22, 27 October 2018 (UTC)
    @Xaosflux: I have amended the data above. I plan on doing 15epm without going over. Pkbwcgs (talk) 18:25, 27 October 2018 (UTC)
    15epm is OK. — xaosflux Talk 18:30, 27 October 2018 (UTC)
    Okay. The bot will stick to 15epm. Pkbwcgs (talk) 18:31, 27 October 2018 (UTC)
    @Xaosflux: Is there any update on this BRFA yet? Pkbwcgs (talk) 13:26, 18 November 2018 (UTC)
  • Comment: Fixing duplicate arguments in template calls requires some tricky logic, and (IIRC) Sporkbot is already handling this fix for bot-fixable instances. Do you propose an improvement on what Sporkbot is doing? – Jonesey95 (talk) 20:18, 3 November 2018 (UTC)
    • @Jonesey95: It will handle ones which can be fixed automatically by the bot. Pkbwcgs (talk) 20:44, 3 November 2018 (UTC)
      • More detail is needed. "Handle" does not describe what the bot will do. Please see the discussion at Wikipedia:Bots/Requests for approval/SporkBot 5. – Jonesey95 (talk) 21:23, 3 November 2018 (UTC)
        • @Jonesey95: The ones that can be fixed by WPCleaner. For example, if there are two blank parameters in an infobox then it will eliminate one of them. E.g. In this diff (on my account), there were two duplicate arguments in the infobox, the parameter |membership = was duplicated twice in this instance so it will eliminate one. Another instance of WPCleaner fixing duplicate arguments, is this diff where there are two duplicate parameters and both of them have the same value (| name = Augusto Heleno Ribeiro Pereira) so WPCleaner will eliminate one of them and it did in that diff. It also fixed link equal to linktext but when I am running the bot, I will not allow that to happen. I hope that helps. Thanks. Pkbwcgs (talk) 09:14, 4 November 2018 (UTC)
          • Thanks. I believe that the key to bot eliminations of duplicate parameters is that the edit must not have an effect on the rendered page, except for the elimination of the hidden duplicate parameters category. As long as the bot adheres to this condition, it should be fine. – Jonesey95 (talk) 09:57, 4 November 2018 (UTC)
            • {{BAGAssistanceNeeded}} Is this bot ready for trial? Pkbwcgs (talk) 18:47, 11 November 2018 (UTC)
              • I've been a bit busy, but anyone from BAG can move this along, I've added a tag to attract attention. — xaosflux Talk 15:21, 18 November 2018 (UTC)
  •   Approved for trial. I'd like to see a short trial of exactly what the bot does please. Could you please make 3 edits to each error you plan to fix, and link them below? Please link to this BRFA in the edit summary. SQLQuery me! 09:46, 20 November 2018 (UTC)
    • Error 2: 1, 2, 3
    • Error 6:
    • Error 16: 1, 2, 3
    • Error 17: 1, 2, 3
    • Error 37:
    • Error 85: 1, 2, 3
    • Error 88: 1, 2, 3
    • Error 90: 1, 2, 3
    • Error 91: 1, 2, 3
    • Error 524: 1, 2, 3
  • @SQL:   Trial complete. Here is the summary, it went fine and just as expected. However, I had to do error 91 manually as there was no automatic fixing for it. I didn't do error 6 and error 37 because there was no automatic fixing for those two and I didn't feel they were necessary so we are down to eight errors. I didn't use bot tools for this as it was only a few edits per error my trial was approved for so I felt that I wouldn't use bot tools for now. Also, I think that because it is not possible to stop WPCleaner from fixing for other algorithms. So, for example, if I was fixing error 2 and it found a page that contains error 64 (which I haven't put in this BRFA), it will automatically fix error 64 in that page as well. However, I will specifically not use the bot tools to fix error 64 directly. After from that, it all went fine. I feel that another extended trial maybe required with the use of bot tools for each error. Pkbwcgs (talk) 21:32, 20 November 2018 (UTC)
    •   A user has requested the attention of a member of the Bot Approvals Group. Once assistance has been rendered, please deactivate this tag by replacing it with {{tl|BAG assistance needed}}. What is the next step? Trial has been completed and functions have been updated as per the result of the trial. Pkbwcgs (talk) 16:56, 22 November 2018 (UTC)
      • I have waited a week for a response and it has been over a month since I first opened this BRFA. Pkbwcgs (talk) 21:18, 29 November 2018 (UTC)
        • @Xaosflux: Can you please give some input on this. This has been waiting for a while. Pkbwcgs (talk) 19:55, 14 December 2018 (UTC)


Approved requests

Bots that have been approved for operations after a successful BRFA will be listed here for informational purposes. No other approval action is required for these bots. Recently approved requests can be found here (edit), while old requests can be found in the archives.


Denied requests

Bots that have been denied for operations will be listed here for informational purposes for at least 7 days before being archived. No other action is required for these bots. Older requests can be found in the Archive.

Expired/withdrawn requests

These requests have either expired, as information required by the operator was not provided, or been withdrawn. These tasks are not authorized to run, but such lack of authorization does not necessarily follow from a finding as to merit. A bot that, having been approved for testing, was not tested by an editor, or one for which the results of testing were not posted, for example, would appear here. Bot requests should not be placed here if there is an active discussion ongoing above. Operators whose requests have expired may reactivate their requests at any time. The following list shows recent requests (if any) that have expired, listed here for informational purposes for at least 7 days before being archived. Older requests can be found in the respective archives: Expired, Withdrawn.