Open main menu

Wikipedia:Bots/Requests for approval

< Wikipedia:Bots

BAG member instructions

If you want to run a bot on the English Wikipedia, you must first get it approved. To do so, follow the instructions below to add a request. If you are not familiar with programming it may be a good idea to ask someone else to run a bot for you, rather than running your own.

 Instructions for bot operators

Contents

Current requests for approval

Muhbot

Operator: Muhandes (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 16:30, Thursday, February 14, 2019 (UTC)

Automatic, Supervised, or Manual: supervised

Programming language(s): AWB

Source code available: AWB

Function overview: Replace obsolete parameters used in {{Single chart}}. Pages located at Category:Singlechart usages for Germany2


Links to relevant discussions (where appropriate): Template talk:Single chart#Cleanup effort, Template talk:Single chart#Parameter name

Edit period(s): Twice

Estimated number of pages affected: About 6,000

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): No

Function details: The bot will first replace id with songid to avoid future confusion. This is achieved by a simple regexp search/replace of (Germany2.*)\|id= to $1|songid=. The term is sufficiently obscure and I made a quick run of 150 manual edit with no problem

For the first step it will will use AWB parameter replacement to replace |id= with |songid= but only when Germany2 is within the template (a subrule).

Once Category:Singlechart usages for Germany is cleaned up, the bot will be used to replace Germany2 with Germany and make that the default mode. If required, this can be made in a separate task request.

Discussion

  • Would it not make more sense to just use AWB's parameter replacement tool to replace |id= with |songid= and |Germany2= with |Germany=? Also, as a minor note, this task falls under the purview of User:PrimeBOT/30 if you'd rather go that way than manually supervise 6k edits. Primefac (talk) 16:39, 14 February 2019 (UTC)
Thanks, it would make more sense to use parameter replacement. I changed the function details. For the second step this is an unnamed parameter, so a search and replace within tempalte code will do. --Muhandes (talk) 17:14, 14 February 2019 (UTC)

Dreamy Jazz Bot 3

Operator: Dreamy Jazz (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 10:59, Monday, February 4, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: On request

Function overview: Extending Task 2 to create portal talk pages with WikiProject tags from the associated root article.

Links to relevant discussions (where appropriate): User_talk:Dreamy_Jazz#Suggestion_for_User:Dreamy_Jazz_Bot

Edit period(s): Daily on new portals, monthly on all portals (as Task 2 already does)

Estimated number of pages affected: At max an extra 20 pages created a day, first run with the extension around ~600 pages created

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: The Task 2 code would be extended to create root portal talk pages (will not create subpage talk pages). The bot would take the WikiProject tags from the root article only, creating the talk page with these templates (with class=portal), and adding {{WikiProject Portals}} and {{Portal talk}} after these WikiProject tags. The bot won't affect existing portal talk pages. If the bot sees that there are more than 3 wikiproject banners, it will encase it in the {{WPBS}} template. The root article would be detected from the portal using the same code as already being used for the currently approved Task 2.

Edits would be marked as being Task 2.1, as the creation of portal talk pages is different enough to the original purpose of Task 2. For the purposes for disabling, Task 2.1 would be disabled by the Task 2 shutdown page (the shutdown page will shutdown both Tasks to ensure that serious enough errors, which require a shutdown, don't affect the other task's edits).

For example the bot would create Portal talk:The Incredibles with:
{{WPBS|{{WikiProject Film|class=portal}}
{{WikiProject Animation|class=portal}}
{{WikiProject Disney|class=portal}}
{{WikiProject Comics|class=portal}}
{{WikiProject Portals}}}}
{{Portal talk}}

Discussion

  A user has requested the attention of a member of the Bot Approvals Group. Once assistance has been rendered, please deactivate this tag by replacing it with {{tl|BAG assistance needed}}. Dreamy Jazz 🎷 talk to me | my contributions 12:46, 12 February 2019 (UTC)

MusikBot II 3

Operator: MusikAnimal (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 03:02, Saturday, February 2, 2019 (UTC)

Function overview: Automatically protect high-risk templates and modules

Automatic, Supervised, or Manual: Automatic

Programming language(s): Ruby

Source code available: GitHub

Links to relevant discussions (where appropriate): Special:Permalink/881367182#Bot proposal: automatically protect high-risk templates and modules

Edit period(s): Daily

Estimated number of pages affected: ~500 on the first run. Variable for future runs, perhaps 0 to 5 pages daily.

Namespace(s): Template, Module, Wikipedia, User

Exclusion compliant (Yes/No): Yes, going by the exclusions hash in the bot configuration.

Adminbot (Yes/No): Yes

Function details: Every day, a query is ran to identify templates and modules that have N number of transclusions, and it will protect them accordingly based on the bot configuration. Here is an explanation of each option and the initial values (per the WP:AN discussion):

  • The thresholds option specifies what protection level should be applied for what transclusion count. For now this will be set to 500 transclusions for semi-protection (autoconfirmed), and 5000 for template protection. extendedconfirmed and sysop are available as options but for now will be left null (unused).
  • The exclusions (and regex_exclusions for regular expressions) option is a list of pages that the bot will ignore entirely. The keys are the full page titles (including namespace), and the values are an optional space to leave a comment summarizing why the page was excluded.
  • The ignore_offset option specifies the number of days the bot should wait after a previous protection change (by another admin) before taking further action. The initial value for now will be 7 days.
  • namespaces Which namespaces to process. For now this includes Template, Module, Wikipedia, and User.

For now, the bot will not lower the protection level to conform to the settings. The bot will also ignore any page specified at MediaWiki:Titleblacklist which includes the noedit flag. This is easy thanks to the action=titleblacklist API.

Discussion

  • If you're running a query like SELECT tl_namespace, tl_title, COUNT(*) AS ct FROM templatelinks GROUP BY tl_namespace, tl_title HAVING COUNT(*) >= 500 to find the templates to protect, I note that'll be a pretty expensive query and I wonder whether it can be run less often than daily. Anomie 03:26, 2 February 2019 (UTC)
    Anomie, I imagine (as you have suggested to me in the past) that this could be batched, like:
    SELECT tl_namespace, 
           tl_title, 
           Count(*) AS ct 
    FROM   templatelinks 
    WHERE  tl_from BETWEEN 1 AND 32500 
    GROUP  BY tl_namespace, 
              tl_title 
    HAVING Count(*) >= 500;
    
    The above takes under 2s to complete, and the size of the batches could in theory be adjusted on the fly. I know there isn't a tl_id field, which would be ideal, but this would make the overall query much less expensive. SQLQuery me! 05:37, 2 February 2019 (UTC)
    I've been doing something similar to:
    SELECT page_title
    FROM page
    JOIN templatelinks ON page_title = tl_title
    	AND page_namespace = tl_namespace
    LEFT JOIN page_restrictions ON pr_page = page_id
    	AND pr_level IN (...)
    	AND pr_type = 'edit'
    WHERE tl_namespace = 10
    	AND pr_page IS NULL
    GROUP BY page_namespace, page_title
    HAVING COUNT(*) >= 500;
    
    Which usually takes around 30 seconds, and only a few seconds for the Module namespace. If you've any suggestions to improve it, please enlighten :) I don't think it's crazy long for a task of this nature. Do I need to check other namespaces, too?
    Another thing I should bring up: There are subpages of Template:POTD for each individual day, and the current day always has a lot of transclusions. The following day it's removed from whichever template and the count goes back down again. Should I exclude these templates? Or add some special code to protect/unprotect accordingly, every day? MusikAnimal talk 07:43, 2 February 2019 (UTC)
    POTD only has ~600 transclusions on low-visibility and low-vandalism-target userpages, so I think those templates should be excluded. Galobtter (pingó mió) 07:58, 2 February 2019 (UTC)
    Per Wikipedia:Administrator's noticeboard/Incidents#The Signpost vandalized, Could the bot run the query on Wikipedia space/other namespaces? (perhaps only weekly if the query is too slow?)
    And I note that Wikipedia:Database reports/Unprotected templates with many transclusions appears to run a very similar query. Galobtter (pingó mió) 10:00, 2 February 2019 (UTC)
    @Galobtter: Sure, we can include the Wikipedia namespace. It after all seems like the only other namespace that would contain highly-transcluded pages. MusikAnimal talk 10:04, 2 February 2019 (UTC)
    I can see some userboxes and user templates having high transclusions - e.g there's User:Resoru/UBX/VG/ with 1700 transclusions etc. Galobtter (pingó mió) 10:17, 2 February 2019 (UTC)
    Userboxes... of course. Sure, we can check the userspace too. MusikAnimal talk 19:10, 2 February 2019 (UTC)
    @MusikAnimal: Wow, that's surprisingly fast. Looks like it touches about 1e7 rows, finding the possible titles first then diving into the templatelinks tl_namespace index for each one. ... The ~4% of templates that are already protected account for ~98% of all template transclusions, so the query only has to look through the remaining ~2% of transclusions. I withdraw my concern, but you might have it give you some sort of warning if that query starts taking significantly more time. Anomie 13:42, 2 February 2019 (UTC)

Now that you've posted the source code, I've given it a quick review. Note I don't actually know python Ruby, so I mainly looked at the general logic.

  • L70-L81: The query you have here seems significantly slower than the one you posted earlier. Among other things, there should be no need for "DISTINCT(page_title)" nor for ordering the results.
          SELECT page_title AS title, COUNT(*) AS count
          FROM page
          JOIN templatelinks ON page_title = tl_title
            AND page_namespace = tl_namespace
          LEFT JOIN page_restrictions ON pr_page = page_id
            AND pr_level IN ('autoconfirmed', 'templateeditor', 'extendedconfirmed', 'sysop')
            AND pr_type = 'edit'
          WHERE page_namespace = #{ns}
            AND pr_page IS NULL
          GROUP BY page_title
          HAVING COUNT(*) >= #{threshold}
    
  • L77, L80, L93: I don't see anything obvious that prevents SQL injection if #{ns}, #{threshold}, or #{@mb.config[:ignore_offset]} are set to unexpected values. Yes, MediaWiki's restriction of editing .json pages helps, but it doesn't hurt to double check it. Simply casting them to integers before interpolating would be good.
  • L88-L93: Seems like you could add LIMIT 1 to the query to avoid fetching extra rows when all you care about is whether any rows exist.
  • L99: Does the tbnooverride parameter to action=titleblacklist not work here?

HTH. Anomie 14:06, 4 February 2019 (UTC)

Ha, it is clear that you don't actually know python, because the code is in Ruby :) Galobtter (pingó mió) 14:30, 4 February 2019 (UTC)
And I don't know Ruby either! ;) Anomie 13:01, 5 February 2019 (UTC)
@Anomie: Thanks for the code review! I have made some changes based on your feedback. I am using prepared statements now, but am not doing any type casting. I think it's better for it to fail entirely in this case (and logged to User:MusikBot II/TemplateProtector/Error log). You were right that the main query is a little slower, apparently due to the COUNT in the SELECT clause? It still maxes out at around 1 to 2 minutes run time, which I don't think is terrible. The whole task takes about 5 minutes to complete. Does the tbnooverride parameter to action=titleblacklist not work here? -- it does not appear to. I always get "ok" when logged in as the bot. Regards, MusikAnimal talk 20:34, 4 February 2019 (UTC)
The selecting of COUNT(*) isn't the problem, the problems were the ORDER BY (which you fixed) and GROUPing BY tl_title instead of page_title (which you didn't fix yet). Sometimes MySQL can figure out things are equivalent based on join or where clauses and sometimes it can't, and this seems to be one where it can't.
Switching to a parameterized query for self.recently_protected? should be sufficient, as it should result in an SQL error being thrown on bad input rather than an SQL injection.
What's the exact query you're trying with tbnooverride? It works when I try something like this, both with this account and with AnomieBOT's account. Anomie 13:15, 5 February 2019 (UTC)
@Anomie: An example would be for Template:Taxonomy/Doridina, e.g. [1]. I get "ok" while logged in and "blacklisted" while logged out. I guess it's just an issue for titles restricted to autoconfirmed? MusikAnimal talk 17:25, 5 February 2019 (UTC)
Yeah, it looks like there's no way to override the "autoconfirmed" restriction. Anomie 21:58, 5 February 2019 (UTC)
  • Another issue I've encountered: Sometimes there is a highly visible Wikipedia page that is managed by a bot, for instance Wikipedia:Good article nominations/Topic lists/Video games. If MusikBot were to template-protect, the bot can no longer edit it. In my opinion, we should just add template editor rights to such bots. If the transclusion count is really that high, I don't think it's safe to leave it under mere semi-protection. Another option is to check the revision history and try to deduce if it is bot-maintained. That seems error-prone and would be rather expensive, so I'm going to advise against this strategy. Finally, we could just ignore the Wikipedia namespace altogether. I have not encountered a bot-maintained Template or Module, and I suspect such bots would be handed template editor rights anyway. Thoughts? MusikAnimal talk 01:09, 6 February 2019 (UTC)
    Ugh, there's also WikiProject to-do templates, e.g. Wikipedia:WikiProject Bangladesh/to do. Many include constructive edits from unconfirmed users. I can exclude these using the regex_exclusions option, since all seem to end in "to do", "ToDo", or "to-do", etc. But again... some have an awfully high transclusion count. What to do? MusikAnimal talk 01:18, 6 February 2019 (UTC)
    IANA BAG member or BOTP expert, but if you were to generate a one-time list of guesses for such bots, I'd be more than comfortable granting template-editor to such bots assuming their operators having the equivalent. As you say, high risk transclusions should be protected, and bots should be made to work within that system not the other way around. (As an aside, this would be/have been a good argument for allowing ECP in this task.) ~ Amory (utc) 01:23, 6 February 2019 (UTC)
    @Amorymeltzer: There's Legobot for Wikipedia:Good article nominations/Topic lists/Video games and WugBot for Wikipedia:Good article nominations/backlog/items. That's the only two I've encountered thus far. MusikAnimal talk 01:34, 6 February 2019 (UTC)
    Sweet! WP:GAN/backlog/items has only ~700, so safely far from the TE level, and I think we can trust Legoktm   ~ Amory (utc) 01:40, 6 February 2019 (UTC)
    It might make more sense to just put such pages on your exclude list than to give random bots templateeditor. Anomie 01:57, 6 February 2019 (UTC)
    @Anomie: I've already semi'd both. But Wikipedia:Good article nominations/Topic lists/Video games has nearly 80,000 transclusions. That's a lot! At some point we have to draw the line... or use extended-confirmed protection? MusikAnimal talk 02:03, 6 February 2019 (UTC)
    Given the circumstances (Legobot isn't a template editor), I went ahead and broke the rules by adding ECP to Wikipedia:Good article nominations/Topic lists/Video games. The issue of what MusikBot should do in this scenario still stands. I guess we'll just handle it on a case by case basis. MusikAnimal talk 04:16, 6 February 2019 (UTC)
    I think the easiest thing would be to, after the initial run, wait ~week for the next run, so that people can point out these edge cases to be added to the exclusion. Another thing you'd want to do is prepopulate the exclusion list with templates that have been ECP protected, because they will almost all be ones where people like Primefac have lowered the protection after (batch) template-protecting templates, and the bot shouldn't annoy people by again template-protecting the templates.
    You'd also definitely want to exclude WikiProject banner templates from template-protection - Primefac batch protected all templates with ~2000+ transclusions nearly a year ago but reduced to semiprotection WikiProject templates, as they don't really need TPE. Galobtter (pingó mió) 06:31, 6 February 2019 (UTC)
    Would it be better to just flat-out exclude all WikiProject banner templates, since they're likely all semi'd by now? I assume new WikiProjects aren't created that often. I'd prefer to leave this special handling to humans. Otherwise we'd need to further complicate the configuration by allowing you to specify protection levels for each of the exclusions and regex_exclusions.
    Right now the bot only targets unprotected templates/modules, so we wouldn't be template-protecting anything that Primefac had lowered to ECP. Again if we want options like "exclude this page from template-protection, but do include it for semi-protection", it will complicate the configuration, which I'm hoping to avoid. MusikAnimal talk 18:14, 6 February 2019 (UTC)
  • My general thoughts is that counting transclusions isn't a very good metric for "highly visible template/module". I think page views is a significantly better metric. Legoktm (talk) 05:32, 6 February 2019 (UTC)
    Page views would be extremely slow to query, but I suppose the bot could set the threshold for template-protection as: either 2000+ article space transclusions - which are very disproportionately viewed - or 10000+ non-article space transclusions, because vandalism or disruption on templates transcluded on talk pages is lower, and semi-protection would stop most of it. Though I think the blanket 5000 threshold works fine enough and not sure if it should be complicated. Galobtter (pingó mió) 06:31, 6 February 2019 (UTC)
    I thought about going by pageviews. It would be interesting to see the results, to say the least! Though I question how feasible it is to go through every Template/Module/User/Wikipedia page and get the pageviews of all the transclusions :( Depending on the circumstances, it could take days to finish and be error-prone. Pageviews anomalies happen a lot too: e.g. false traffic from an undeclared bot, or recent deaths/incidents that can overnight send a generally unpopular page to the top of the charts. Defining the conditions for pageviews and working out all the edge cases is going to be a nightmare, let alone how slow it would be and challenging to implement. I'd love to rope in some pageviews logic, but hopefully we can save that for version 2 :)
    But I do like Galobtter's compromise of going by the namespaces of the transclusions. That is a simple tweak to the query, and might even make the task as a whole faster (or slower... ;). It will complicate the configuration, though. I guess it would look something like:
    "thresholds": {
        "sysop": null,
        "template": {
            "mainspace": 2000,
            "non_mainspace": 10000
        },
        "extendedconfirmed": null,
        "autoconfirmed": {
            "mainspace": 500,
            "non_mainspace": 500
        }
    }
    
    a little ugly :/ I'm a bit hesitant to change the thresholds at this time. Shouldn't we go back to WP:AN for further input? I'd argue we should go with the current consensus, and see how people react after the first round of protections. I really like how simple the system is right now. MusikAnimal talk 17:54, 6 February 2019 (UTC)
  • Regarding The bot will also ignore any page specified at MediaWiki:Titleblacklist which includes the noedit flag., the bot should still template-protect taxonomy templates like Template:Taxonomy/Embryophytes, being used on 10000+ articles and regularly getting disruption from people not getting a consensus for their changes (and as autotaxobox gets more widely used over manual taxobox, the transclusions of these templates are rising pretty quickly). Galobtter (pingó mió) 06:31, 6 February 2019 (UTC)
    That seems reasonable. Maybe we should compare against the Titleblacklist protection level. So for taxonomy templates, if there are less than 5000 transclusions, we don't protect at all since it is already done by the Titleblacklist (as specified with autoconfirmed). If the template has >= 5000 transclusions, we template-protect as we would any template. That keeps it simple; basically checking the Titleblacklist is only done to avoid redundant protections. I think this is what Od Mishehu was going for when they commented on the WP:AN discussion. MusikAnimal talk 18:04, 6 February 2019 (UTC)
  • Regarding multiple protection types - how will you handle these pages? (e.g. if the page has different move protections and edit protections) — xaosflux Talk 15:44, 7 February 2019 (UTC)
    Only edit protection is applied, though we could do move as well if you think it makes sense to do so? Note also we're only looking for templates/modules that are completely unprotected (for editing, not moving). MusikAnimal talk 17:05, 8 February 2019 (UTC)
  • Redirect handling? How are you going to handle redirects? (e.g. {{CLEAR}} vs {{Clear}}) ? — xaosflux Talk 15:44, 7 February 2019 (UTC)
    Redirects are not followed. MusikAnimal talk 17:06, 8 February 2019 (UTC)
    When a redirect is transcluded, MediaWiki includes both the redirect and the target page in the templatelinks table. So if 700 pages transclude {{CLEAR}} and 900 transclude {{Clear}} directly (and no pages transclude any other redirect), the bot would see 700 for Template:CLEAR and 1600 for Template:Clear. And, I presume, it would protect each page accordingly? Anomie 21:12, 8 February 2019 (UTC)
    Yep! The bot goes by whatever the count is in templatelinks, regardless if the page is a redirect. That is the intended behaviour, I hope? MusikAnimal talk 21:22, 8 February 2019 (UTC)
    Sounds like good behavior to me. Anomie 12:31, 9 February 2019 (UTC)
  • Are you implmeneting downgrade prevention? Under what circumstances would you downgrade protection? — xaosflux Talk 15:50, 7 February 2019 (UTC)
    Nope. Protection levels are never lowered by the bot. Future iterations of the bot may do this, pending discussion. For now, I'd like to get a simple solution deployed and see how the community reacts. MusikAnimal talk 17:13, 8 February 2019 (UTC)

DannyS712 bot 4

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 02:39, Saturday, February 2, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Javascript

Source code available: User:DannyS712 test/bilateral.js

Function overview: Add short descriptions to pages for bilateral relations between countries

Links to relevant discussions (where appropriate): Wikipedia talk:WikiProject Short descriptions#Proposal: Standard format of short description for bilateral relations

Edit period(s): One time run

Estimated number of pages affected: <5961

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Go through Category:Bilateral relations by country and its subcategories, and for pages in the form of "Country-Country relations" add a short description if one is not already there. I calculated that there are ~6000 pages in these categories, not including duplicates, that may qualify, but I'm not sure. I posted about this at the short descriptions wikiproject, and received no comments (I also added a note about this at the international relations wikiproject page, with no comments there either).

Discussion

There's no issue with this sitting open, but I think that discussion should have some actual participation (i.e. there is currently no consensus for this) before we even think about going to trial. Editing 6k pages based on a reasonable-but-undiscussed idea isn't really a good thing. Primefac (talk) 21:54, 5 February 2019 (UTC)

@Primefac: Should I leave messages on the talk pages of people involved in both wikiprojects? --DannyS712 (talk) 22:11, 5 February 2019 (UTC)
It looks like you've already pinged everyone, so I'm not sure of the best way to drum up a conversation. I do suppose if no one has anything to say it's a SILENT consensus, but again it's not like we want to have to undo 6k edits if someone does complain. Primefac (talk) 02:51, 10 February 2019 (UTC)
@Primefac: Well, I have my notifications set up to tell me if a ping goes through, or if it fails, and I wasn't alerted either way, so the pings might not have gone through. Can I suggest a limited trial, followed by me posting with a link to this discussion and to trial edits, to see if there is opposition? --DannyS712 (talk) 03:06, 10 February 2019 (UTC)

PkbwcgsBot 7

Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 20:27, Saturday, December 15, 2018 (UTC)

Function overview: This is an extension to Wikipedia:Bots/Requests for approval/PkbwcgsBot 5 and I will clean out Category:Pages using ISBN magic links and Category:Pages using PMID magic links.

Automatic, Supervised, or Manual: Automatic

Programming language(s): AWB

Source code available: AWB

Links to relevant discussions (where appropriate): This RfC

Edit period(s): ISBNs will be once a fortnight and PMIDs will be once a month.

Estimated number of pages affected: 300-500 pages per run (ISBNs) and 50-100 pages per run (PMIDs)

Namespace(s): Most namespaces (Mainspace, Article Talkspace, Filespace, Draftspace, Wikipedia namespace (most pages), Userspace and Portalspace)

Exclusion compliant (Yes/No): Yes

Function details: The bot will replace ISBN magic links with templates. For example, ISBN 978-94-6167-229-2 will be replaced with {{ISBN|978-94-6167-229-2}}. In task 5, it fixes incorrect ISBN syntax and replaces the magic link with the template after that. This task only replaces the ISBN magic link with the template using RegEx.

Discussion

Working in article space only? – Jonesey95 (talk) 23:48, 15 December 2018 (UTC)

@Jonesey95: The problem is in multiple namespaces, not just the article namespace. Pkbwcgs (talk) 09:39, 16 December 2018 (UTC)
Since Magic links bot is already handling article space, it looks like this bot's focus will be in other spaces. I think those spaces will require manual oversight in order to avoid turning deliberate magic links into templates. Happily, there are only 4,000 pages, down from 500,000+ before the first couple of bots did their work. – Jonesey95 (talk) 12:10, 16 December 2018 (UTC)
I can distinguish deliberate magic links and not touch them. There are very few deliberate ones; an example is at Wikipedia:ISBN which shouldn't be changed. Pkbwcgs (talk) 13:06, 16 December 2018 (UTC)
"I can distinguish" -- how can you do this automatically? This is WP:CONTEXTBOT. —  HELLKNOWZ   ▎TALK 21:45, 16 December 2018 (UTC)

We don't generally approve bots for non-mainspace unless there is a specific problem. Especially without a discussion or consensus. In short, the problem with non-mainspace namespaces is that there is no expectation that article policies or guidelines should apply or are even necessary. Userspace is definitely not a place for bots to run without opt-in. You also cannot automatically work on talk pages with a task like this -- users can easily be discussing syntax and no bot should be changing their comments. The discussion may very well be archived. Same goes with Wikipedia and there are many guideline and help and project pages where such a change may not be desired. Draft, File and Portal seem fine. To sum up, we either need community consensus for running tasks in other namespaces or bot operator assurance (proof) that there are minimal to none incorrect/undesirable edits. —  HELLKNOWZ   ▎TALK 21:45, 16 December 2018 (UTC)

@Hellknowz: I have struck the namespaces which I feel that will cause problems. I assure that there won't be no incorrect edits. Pkbwcgs (talk) 21:51, 16 December 2018 (UTC)
Looks good then. Will wait for resolution at Wikipedia:Bots/Requests for approval/PkbwcgsBot 5. —  HELLKNOWZ   ▎TALK 21:57, 16 December 2018 (UTC)
I think the revised list of spaces (at this writing: Main, Draft, Portal, File) makes sense. – Jonesey95 (talk) 01:46, 17 December 2018 (UTC)
@Hellknowz: Task 5 has went to trial and this has been wait for over one month. Pkbwcgs (talk) 22:08, 27 January 2019 (UTC)

Bots in a trial period

PkbwcgsBot 23

Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 18:18, Monday, January 28, 2019 (UTC)

Function overview: The bot will fix pages with Template:RomanianSoccer with deprecated parameters. The pages with deprecated parameters are located at Category:RomanianSoccer template with deprecated parameters.

Automatic, Supervised, or Manual: Supervised

Programming language(s): AWB

Source code available: AWB

Links to relevant discussions (where appropriate):

Edit period(s): One-time run

Estimated number of pages affected: 739

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: The bot will fix deprecated parameters in Template:RomanianSoccer. An example edit is located here. The bot is going to change the old_id parameter to id if id is not defined in the template. For example, * {{RomanianSoccer|old_id=a/achim_sebastian}} is wrong because old_id has no accompanying id parameter. This was change in my later edit to * {{RomanianSoccer|id=a/achim_sebastian}} which is correct. It is quite clear at Template:RomanianSoccer's documentation "The "old_id" parameter may contain an ID such as a/augustin_ionel, which is the ID portion of http://www.romaniansoccer.ro/players/a/augustin_ionel.shtml or http://www.statisticsfootball.com/players/a/augustin_ionel.shtml. This parameter is optional if the "id" parameter (or unnamed parameter "1") is used." Update: The "a/" before the name of the player with change to "97/" in the template as stated at Template:RomanianSoccer#Examples and the name will be reversed so "achim_sebastian" will become "sebastian-achim" so this task will also update the links. However, there may need to be some regex to have those changes.

Discussion

I got the URL updates wrong so I will stick with fixing the deprecated parameters. Pkbwcgs (talk) 18:43, 28 January 2019 (UTC)

  •   Approved for trial (50 edits).. Pkbwcgs, please link to this BRFA in the edit summaries. This was actually a task that I was thinking of taking on with DFB a couple months ago but ultimately did not have the time, so I thank you for taking this on. --TheSandDoctor Talk 06:11, 29 January 2019 (UTC)

HostBot 9

Operator: Maximilianklein (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 18:50, Monday, January 7, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: https://github.com/notconfusing/hostbot-ai

Function overview: User:Jtmorgan and User:Maximilianklein have planned, and received consent to run an A/B experiment between the current version of HostBot and a newly developed-AI version. The AI version uses a machine-learning classifier based on ORES to prioritize which users should be invited to the TeaHouse whereas the current version uses rules. The point is to see if we can improve user retention by turning our attention to the most promising users.

The two versions would operate simultaneously. Both versions would log-in as "User:HostBot" so that the end-users would be blinded as to what process they were interacting with.

The A/B experiment would run for 75 days (calculated by statistical power analysis).



Links to relevant discussions (where appropriate): Wikipedia_talk:Teahouse#Experiment_test_using_AI_to_invite_users_to_Teahouse

Edit period(s): Hourly (AI-version) and Daily (rules-version)

Estimated number of pages affected: ~11,000

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: All technical details on meta:Research:ORES-powered_TeaHouse_Invites.

Discussion

Just posting here to confirm that I am excited to collaborating with Maximilianklein on this experiment. I've been wanting to improve HostBot's sampling criteria for a while now, and other Teahouse hosts have asked for it. J-Mo 19:33, 7 January 2019 (UTC)

Thought I'd drop by to voice my support, both for the experiment and for Maximilianklein. During the earlier discussion, I posted a couple of question on their talk page and got both a timely and thoughtful reply. I'm also interested in learning about the outcomes of this experiment, looking forward to them! Cheers, Nettrom (talk) 15:20, 15 January 2019 (UTC)

PkbwcgsBot 5

Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 09:15, Thursday, December 13, 2018 (UTC)

Function overview: The bot will fix ISBN syntax per WP:WCW error 69 (ISBN with incorrect syntax) and PMID syntax per WP:WCW error 102 (PMID with incorrect syntax).

Automatic, Supervised, or Manual: Supervised

Programming language(s): AWB

Source code available: AWB

Links to relevant discussions (where appropriate):

Edit period(s): Once a week

Estimated number of pages affected: 150 to 300 a week

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: The bot is going to fix incorrect ISBN syntax per WP:ISBN. So, if the syntax is ISBN: 819345670X, it will take off the colon and make it ISBN 819345670X. The other case of incorrect ISBN syntax this bot is going to fix is when the ISBN number is preceded by "ISBN-10" or "ISBN-13". For example, in ISBN-10: 995341775X, it will take off "-10:" and that will make it ISBN 995341775X. The bot will only fix those two cases of ISBN syntax. Any other cases of incorrect ISBN syntax will not be fixed by the bot. The bot will also fix incorrect PMID syntax. So, for example, if it is PMID: 27401752, it will take off the colon and convert it to PMID 27401752 per WP:PMID. It will not make it PMID 27401752 because that format is deprecated.

Discussion

Please make sure to avoid ISBNs within |title= parameters of citation templates. Also, is there a reason that you are not proposing to use the {{ISBN}} template? Magic links have been deprecated and are supposed to go away at some point, although the WMF seems to be dragging their feet for some reason. There is another bot that converts magic links to templates, but if you can do it in one step, that would probably be good. – Jonesey95 (talk) 12:05, 13 December 2018 (UTC)

@Jonesey95: The bot will convert to the {{ISBN}} template and it will not touch ISBNs in the title parameters of citations. Pkbwcgs (talk) 15:19, 13 December 2018 (UTC)
What about the PMID's? Creating more deprecated magic words isn't ideal. — xaosflux Talk 19:16, 14 December 2018 (UTC)
@Xaosflux: I did say that was going to happen in my description that they will be converted to templates. However, now I need to code in RegEx and I have been trying to code that but my RegEx skills are unfortunately not very good. Pkbwcgs (talk) 19:52, 14 December 2018 (UTC)
I have tried coding in RegEx but I have gave up soon after as it is too difficult. Pkbwcgs (talk) 21:14, 14 December 2018 (UTC)
@Pkbwcgs: After removing the colon you can use Anomie's regex from Wikipedia:Bots/Requests for approval/PrimeBOT 13: \bISBN(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs})++((?:97[89](?:-|(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs}))?)?(?:[0-9](?:-|(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs}))?){9}[0-9Xx])\b and \b(?:RFC|PMID)(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs})++([0-9]+)\b, or you can adjust them to account for the colon. Primefac could advise if he made any changes to them. — JJMC89(T·C) 06:27, 15 December 2018 (UTC)
@JJMC89: Thanks for the RegEx. I will be able to remove the colon easily. It is the RegEx for the ISBN that I struggled with. Thanks for providing it. Pkbwcgs (talk) 09:49, 15 December 2018 (UTC)
It is saying "nested identifier" and it is not replacing when I tested the RegEx on my own AWB account without making any edits. Pkbwcgs (talk) 09:53, 15 December 2018 (UTC)
@Pkbwcgs: The regex comes from PHP, but AWB (C#) doesn't support possessive quantifiers (e.g. ++). Replacing ++ with + in the regex should work. — JJMC89(T·C) 18:57, 15 December 2018 (UTC)
@JJMC89: I have tested the find RegEx on my AWB account without making any edits and it works. I also worked out the replace RegEx and it is {{ISBN|$1}}. That works too. I think this is ready for a trial. I will also request a small extension for this task which is to clean out Category:Pages using ISBN magic links and Category:Pages using PMID magic links. That will be PkbwcgsBot 7. Pkbwcgs (talk) 20:15, 15 December 2018 (UTC)
I adjusted the RegEx to accommodate ISBNs with a colon. Pkbwcgs (talk) 20:33, 15 December 2018 (UTC)
This diff from my account is good and perfectly justifies what this bot is going to do for this task. Is this good enough? Pkbwcgs (talk) 20:53, 15 December 2018 (UTC)
This is what it will look like if the bot handles an ISBN with the "ISBN-10" prefix. That diff is also from my account. Pkbwcgs (talk) 21:08, 15 December 2018 (UTC)
{{BAG assistance needed}} There is a huge backlog at Wikipedia:WikiProject Check Wikipedia/ISBN errors at the moment. This task can cut down on that backlog through replacing the colon with the correct syntax. It has also been waiting for two weeks. Pkbwcgs (talk) 22:12, 27 December 2018 (UTC)

  Approved for trial (25 edits). --slakrtalk / 20:43, 4 January 2019 (UTC)

The first thirteen edits are here. Pkbwcgs (talk) 09:54, 12 January 2019 (UTC)
This edit put the ISBN template inside an external link, which is an error. This one has the same error. The other eleven edits look good to me. I recommend a fix to the regex and more test edits. – Jonesey95 (talk) 19:51, 12 January 2019 (UTC)
@Jonesey95: I fixed those errors. Pkbwcgs (talk) 19:57, 12 January 2019 (UTC)
  Approved for extended trial (25 edits). OK try again. — xaosflux Talk 04:10, 30 January 2019 (UTC)

Bots that have completed the trial period

GreenC bot 10

Operator: GreenC (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 15:25, Wednesday, February 6, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Awk

Source code available: TBU

Function overview: Add {{Shadows Commons}} to candidate File pages.

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#Shadows_Commons

Edit period(s): Weekly

Estimated number of pages affected: 30

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Add {{Shadows Commons }} template to File: pages on EnWiki that have the same name on Commons. It uses Quarry 18894 to find candidate articles.

Discussion

  Approved for trial (30 edits).xaosflux Talk 15:12, 7 February 2019 (UTC)
Comment - I would very strongly suggest the bot use it's own query running directly rather than relying on a manually updated quarry one. ShakespeareFan00 (talk) 18:32, 7 February 2019 (UTC)
I hadn't planned on that. Just downloading the JSON with each run. What problem do you foresee? -- GreenC 18:52, 7 February 2019 (UTC)
There's no guarantee that a Quarry query is updated in a timely way. ShakespeareFan00 (talk) 01:00, 8 February 2019 (UTC)
What does 'timely' mean for Quarry? The database on Tools has a replication lag, also. The tool is only running once a week or so. -- GreenC 01:22, 8 February 2019 (UTC)
@GreenC: are you doing any checks if the shadow template is already in place (to avoid placing a second one), and/or that the file is actually shadowing? If so it won't really matter too much if this is delayed or using older data. This would be for cases where on edit someone else has already tagged the file, or the commons file has since been moved or deleted (i.e. the same checks we would expect of a human editor). — xaosflux Talk 13:42, 10 February 2019 (UTC)
Those are good points. I was going to check for the existence, but hadn't thought to check that the shadow exists. Both are relatively easy and not costly and yeah it would resolve any problem with delays in the replication server pool. -- GreenC 15:52, 10 February 2019 (UTC)
Looks like Quarry is not stable, the link to the JSON file changes with each run of Quarry. It will connect to the DB directly. -- GreenC 19:29, 10 February 2019 (UTC)

Images are the same

  • @ShakespeareFan00: one of the images is File:Mosh kashi self portrait.jpg (Commons]. According to the template instructions, when the images are the same, the template should not be used. Not the only case, also File:Léon-Vasseur.jpg and probably others. What would happen in these cases? A bot can't determine the images are the same. Should it add the template anyway - or is the bot not viable? -- GreenC 07:23, 11 February 2019 (UTC)
One solution: add the template regardless. The burden will be manual removal of the template. This is less work than manual addition of the template, as the ratio of additions to removals is high. It can also leave instructions in the template like:
{{Shadows Commons |bot=Added by shadows bot. Remove this template if the images are the same. The bot will remember.}}
The bot will keep a record and not add a second time. As a bonus the bot will now have a list of images that are the same, if ever needed. -- GreenC 08:12, 11 February 2019 (UTC)
That sounds reasonable. Identifying images for CSD F8 (i.e Images identical), would be a related task. You could use an image hash to check IIRC. ShakespeareFan00 (talk) 09:31, 11 February 2019 (UTC)
Like with File:Mosh kashi self portrait.jpg they have different dimensions so it's complicated. Will keep image comparison in mind, it would probably require a machine learning API and some other work. Currently the bot is skipping images with templates {{Shadows Commons}}, {{Keep Local}}, {{Now Commons}} and {{Do not move to Commons}} (+ aliases) as well as anything with the magic word {{PROTECTIONLEVEL:(edit|move|copy)}}. Anything else to avoid? -- GreenC 16:31, 11 February 2019 (UTC)
Huh. I was expecting that someone would file such a bot in due time. Having worked on Shadows Commons cases in the past, I have a few thoughts:
Jo-Jo Eumerus (talk, contributions) 17:01, 11 February 2019 (UTC)
Hi Jo-Jo Eumerus, thanks for the info. {{PROTECTIONLEVEL:(edit|move|copy)}} as they are high-risk (use on the main page etc) so renaming or moving to Commons would likely be avoided? I'm on-board with |keeplocal= as replacement for {{keep local}}. Not positive about {{Do not move to Commons}} as that template is further embedded in 8 other templates. Something like |reason={{Do not move to Commons|reason=Original reason}}}} and moving any of those 8 templates creates complexity of embedded templates and |reason= (for future bots and tools). It would still work with separate templates I believe. -- GreenC 18:28, 11 February 2019 (UTC)
It is confusing with all the moving parts. Current thinking what action to take when the bot encounters:
Thoughts / comments? -- GreenC 22:16, 11 February 2019 (UTC)
I would ignore anything tagged {{Now Commons}} , as those have already been identified. ShakespeareFan00 (talk) 17:49, 12 February 2019 (UTC)
Done. -- GreenC 17:59, 12 February 2019 (UTC)
@ShakespeareFan00: Actually it was done in the SQL you gave me, but I added a few more aliases and backup regex check in the source. The current SQL list. The additions are all aliases.
Extended content
  ('ShadowsCommons',
   'Shadows_commons',
   'Shadows_Commons',
   'Now_Commons',
   'NowCommons',
   'Nowcommons',
   'NowCommonsThis',
   'Now_commons',
   'CommonsNow',
   'NC',
   'NCT',
   'Nct',
   'Db-now-commons',
   'Db-nowcommons',
   'Uploaded to Commons',
   'Pp-template',
   'Keep_local_high-risk',
   'Pp-upload',
   'C-uploaded',
   'C-upload',
   'C uploaded',
   'C-uploaded',
   'M-protected',
   'Main page protected',
   'Mpimgprotected',
   'Mprotect',
   'Mprotected',
   'PP-main',
   'PP-main-page',
   'PP-mainpage',
   'ProtectedMainPageImage',
   'Uploaded_from_Commons',
   'Protected_sister_project_logo',
   'Rename_media',
   'lfr',
   'Image_move',
   'Media_rename',
   'Rename_file',
   'Rename_image',
   'Rename-image',
   'Rename_media',
   'RenameMedia',
   'Renamemedia',
   'Ffd',
   'FFD',
   'lfd',
   'Imagevio',
   'PUF',
   'Puf',
   'PUi',
   'Pui',
   'PUIdisputed'
  )

Trial results

Trial results:

Extended content

  Trial complete. I accidentally issued a "-continuous" to jsub which circumvented the bots internal halts so it processed all available (44) instead of 33. I forgot the |bot= message which is now included. Question about a few cases like File:Garlin Gilchrist II in Ann Arbor (cropped).jpg that have {{Copy to Wikimedia Commons}} and have been copied but the image still exists on Enwiki. Should it be tagged? @Jo-Jo Eumerus and ShakespeareFan00: -- GreenC 17:41, 12 February 2019 (UTC)

I think yes, they should still be tagged. Jo-Jo Eumerus (talk, contributions) 17:43, 12 February 2019 (UTC)
Ok. -- GreenC 18:00, 12 February 2019 (UTC)

DannyS712 bot 2

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 02:35, Tuesday, January 8, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AutoWikiBrowser

Source code available: AWB

Function overview: Add {{WikiProject Soil}} to talk pages per request (https://en.wikipedia.org/w/index.php?title=Wikipedia:Bot_requests&diff=877156145&oldid=877118167&diffmode=source)

Links to relevant discussions (where appropriate):

Edit period(s): One time run

Estimated number of pages affected: <1261 pages (not accounting for duplicated pages or those already having the wikiproject tag)

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): No

Function details:

AWB:

  1. Make list from category
  2. Convert to talk pages
  3. Preparse to remove if {{WikiProject Soil}} already present
  4. Prepend wikiproject tag.
Full list of categories requested:
Analyzed list of categories, including individual sizes

Total (not accounting for duplicated): 1261

Discussion

How many talk pages are expected to be edited with this run? Primefac (talk) 20:14, 8 January 2019 (UTC)

@Primefac: Less than 1261 pages will be edited. For a trial, you may want to choose one of the small categories listed in the "Analyzed list" (which includes individual sizes) and I can check every edit manually afterwards. --DannyS712 (talk) 21:12, 8 January 2019 (UTC)
Followup questions:
  1. What do you mean by Preparse to remove if {{WikiProject Soil}} already present?
  2. Are you just adding the template, or are you also assessing?
Primefac (talk) 21:28, 8 January 2019 (UTC) (please do not ping on reply)
Here is what I would do with AWB:
  1. Tell AWB to skip a page if it already contains {{WikiProject Soil
  2. Turn on "pre-parse" mode, and run
    This means that, rather than editing each article that is not skipped before proceeding, it saves articles that should be edited, and skips those that shouldn't be
    Thus, at the end, I know that the list I have does not contain any pages that already have {{WikiProject Soil}}
    For more information, see Wikipedia:AutoWikiBrowser/User manual#Options
  3. Turn off pre-parse mode, and prepend {{WikiProject Soil |class= |importance= }} to the remaining talk pages
    This would not actually assess the article itself. I can see a future task where, for pages already tagged as stubs, |class= | is replaced with |class=stub |, but that is not part of this request.
Thanks, --DannyS712 (talk) 00:42, 9 January 2019 (UTC)
{{BAGAssistanceNeeded}} --DannyS712 (talk) 03:00, 18 January 2019 (UTC)
  •   Approved for trial (25 edits). Primefac (talk) 00:36, 20 January 2019 (UTC)
      Trial complete. I'm going to review each of the edits, and will link to my analysis. --DannyS712 (talk) 01:15, 20 January 2019 (UTC)
    Edits and issues documented at User:DannyS712 bot/Task 2 Trial analysis --DannyS712 (talk) 01:32, 20 January 2019 (UTC)
    Initial thought looking at your analysis - a PETSCAN would fix the majority of the "duplicate template" issues. Also, TALKLEAD can be fixed by doing genfixes after adding the new text. Primefac (talk) 20:39, 20 January 2019 (UTC)
    For the petscan - what do you mean? I'm already scanning for the pages to edit, and then removing the false positives with awb, but how would I remove them with petscan? Also, for the TALKLEAD thing - I just want to verify that you want me to run awb with genfixes enables, and have genfixes run after adding the template so that it is in the correct location? Thanks, --DannyS712 (talk) 21:36, 20 January 2019 (UTC)
    @Primefac: ^^ --TheSandDoctor Talk 17:40, 21 January 2019 (UTC)

─────────────────────────You don't need to remove anything, just create your list based on https://petscan.wmflabs.org/?psid=7313284 and then you won't have to add or remove anything. As for TALKLEAD, yes, if you append the text and then do genfixes, it will make sure all of the GA stuff etc will end up in the right place. To be honest I don't know if it will take care of the {{WPBS}} issue, but it sounds like your logic works. Primefac (talk) 20:08, 22 January 2019 (UTC)

Okay. I will use that list instead. Does genfixes also work if i prepend the template/does it matter? --DannyS712 (talk) 20:35, 22 January 2019 (UTC)
I converted that list to a awb friendly format at User:DannyS712 bot/Task 2 list#Better format --DannyS712 (talk) 20:45, 22 January 2019 (UTC)
  Approved for extended trial (25 edits). I'd like to see if the issues presented above have been worked out. Primefac (talk) 19:53, 28 January 2019 (UTC)
  Trial complete. I misunderstood the trial approval - I thought it was 25 pages, I'm sorry. I made 45 edits, 25 additions of the template, and then 20 more on the same pages to apply genfixes to correct the location. I haven't figured out how to have genfixes run in the same edit, but after the addition of the template. Sorry, I'll analyze the results soon. --DannyS712 (talk) 20:31, 28 January 2019 (UTC)
45 edits: https://en.wikipedia.org/w/index.php?title=Special:Contributions&target=DannyS712%20bot&start=2019-01-28&end=2019-01-28
The only issue I found, other than the 2 passes needed, was this page. I guess I missed it with genfixes.
If approved, I would, in batches of ~50:
  1. Go through and add the template
  2. Go through and apply genfixes
Thanks, --DannyS712 (talk) 20:37, 28 January 2019 (UTC)
That's not a good way to do it. If you're going to add the templates, it should be done in one edit. Unfortunately, I've done some dry runs and it doesn't look like AWB's genfixes will deal with the TALKLEAD issues, so you'll likely have to make do with some regex. Of course, with only 350 pages left that need the template, it might be best to just do this supervised and manually fix any that need it before saving the page. Primefac (talk) 03:48, 30 January 2019 (UTC)
@Primefac: I'm willing to try that --DannyS712 (talk) 04:00, 30 January 2019 (UTC)

─────────────────  Approved for extended trial (25 edits). All righty then. Primefac (talk) 04:00, 30 January 2019 (UTC)

@Primefac: I don't have time in the next few days though - is that okay? --DannyS712 (talk) 04:25, 30 January 2019 (UTC)
There's no rush; take all the time you need. Primefac (talk) 04:27, 30 January 2019 (UTC)
  Trial complete. (contributions all listed here) edits made manually (corrected the mistakes before saving) per discussion above. Should this be relabeled as a manual or supervised task? --DannyS712 (talk) 22:40, 9 February 2019 (UTC)

PkbwcgsBot 17

Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 15:16, Friday, December 28, 2018 (UTC)

Function overview: The bot is going to fix high priority lint error "Tidy bug affecting font tags wrapping links" which current has over 4,000,000 pages but there is no intention to fix them all at once but this will run twice a week.

Automatic, Supervised, or Manual: Manual (due to automatic fixing not working on lint errors)

Programming language(s): WPCleaner

Source code available: WPCleaner

Links to relevant discussions (where appropriate):

Edit period(s): Twice a week

Estimated number of pages affected: 500 pages will be fixed per session; twice a week

Namespace(s): All namespaces

Exclusion compliant (Yes/No): Yes

Function details: The bot is going to fix high priority lint error "Tidy bug affecting font tags wrapping links". The bot will go onto "linter categories" on WPCleaner, select "Tidy bug affecting font tags wrapping links" (without templates) and then it will do automatic fixing on the first five hundred pages that contain this lint error. It will move the font tags into the signature from outside the brackets as shown here. So, an example would be:

  • Before: <font color="red">[[User talk:Example|Example]]</font>
  • After: [[User talk:Example|<font color="red">Example</font>]]

Change to the rendering:

Discussion

Apparently, automatic fixing is not working on the lint errors. I posted a comment here. However, this can be done manually through the WPCleaner CheckWiki fixing interface and fairly quickly still. Pkbwcgs (talk) 15:32, 28 December 2018 (UTC)

  Approved for trial (25 edits or 7 days). - please be careful with the automatic fix but this seems pretty non-controversial. -- Tawker (talk) 00:37, 29 December 2018 (UTC)

@Tawker: How many trial edits would you like made? Pkbwcgs (talk) 20:23, 29 December 2018 (UTC)
As I have discussed with User:Xaosflux in task 18, the bot will also fix "Misnested tags" and "Missing end tag" if it is found on the same talk page so that no lint errors that are fixable will be left behind. Pkbwcgs (talk) 21:22, 31 December 2018 (UTC)
{{BAG assistance needed}} I would like to know how many edits should my bot make in this trial. Pkbwcgs (talk) 12:46, 4 January 2019 (UTC)
I set a 25 count for you. — xaosflux Talk 20:13, 4 January 2019 (UTC)
@Xaosflux: I have made preparations for the trial. Like you said in task 18, if "Missing End Tag" or "Misnested Tag" is found on the same page as this lint error is found, shall I get WPCleaner to fix those? Pkbwcgs (talk) 20:59, 4 January 2019 (UTC)
{{BAG assistance needed}} I will need an answer to my above question before I can proceed with the trial. Pkbwcgs (talk) 18:21, 28 January 2019 (UTC)
There is zero reason to have your bot make multiple edits to fix multiple issues on the same page if it can handle them all in one go. So I think the short answer to your question is "yes". Primefac (talk) 20:00, 28 January 2019 (UTC)
  Trial complete. 50 edits have been made here. Pkbwcgs (talk) 18:49, 30 January 2019 (UTC)
Excellent, thanks. Offwiki life took me offline for a bit -- Tawker (talk) 02:01, 6 February 2019 (UTC)

LkolblyBot

Operator: Lkolbly (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 18:57, Monday, December 24, 2018 (UTC)

Function overview: This bot automatically updates Alexa rankings in website infoboxes by querying the Alexa Web Information Service.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: https://github.com/lkolbly/alexawikibot (presently, the actual saving is commented out, for testing)

Links to relevant discussions (where appropriate): Previous bot that performed this task: OKBot_5

Edit period(s): Monthly or so

Estimated number of pages affected: 4,560 articles are in the current candidate list. A subset of these pages will be updated each month. Other pages could be pulled into the fray over time if someone adds alexa information to a page. Also, there will be a whitelist copied from User:OsamaK/AlexaBot.js of pages that will be edits (presently containing 1,412 pages).

Namespace(s): Articles

Exclusion compliant (Yes/No): Yes (via whatever functionality is already in pywikipedia)

Function details: This bot will scan all pages (using a database dump as a first pass) to find pages which have the "Infobox website" template with both "url" and "alexa" fields.

It will parse the domain from the url field using a few heuristics, and query the domain with AWIS. Domains that have subdomains return incorrect results from AWIS (e.g. mathmatica.wolfram.com returns the result for just wolfram.com), so these domains are discarded (and the page not touched). It will then perform an AWIS query to determine the current website rank and trend over 3 months.

Websites will be classified into {{Increase}}, {{Decrease}}, and {{steady}} ( ,  , and  , respectively). A site increasing in popularity will gain it the   tag, even though it is numerically decreasing (previously, many sites were also classified into IncreaseNegative and DecreasePositive that I didn't understand)

Then, in the text of the article, whatever the current alexa data is will be replaced by something like:

{{Increase}} 169,386 ({{as of|2018|12|24}})<ref name="alexa">{{cite web|url= http://www.alexa.com/siteinfo/darwinawards.com | publisher= [[Alexa Internet]] |title=Darwinawards.com Traffic, Demographics and Competitors - Alexa |accessdate= 2018-12-24 }}</ref> <!-- Updated monthly by LkolblyBot -->

(e.g.   169,386 (As of 24 December 2018)[1] )

There are two as-yet untested test cases that I'll test (and fix if necessary) before any full-scale deployment:

  • Apparently some infoboxes have multiple |alexa= parameters? I have to go find one and see what the bot does with it. (probably the right thing to do is to not touch the page at all in that situation)
  • Some pages have an empty |alexa= parameter, which should be fine, but worth testing anyway.

Discussion

Please make the bot's talk page.

"whatever the current alexa data is will be replaced" - how do you know there isn't more than just the previous value? Or that there isn't a reference that is used elsewhere?

I imagine many pages that copy-paste the template code will have an empty |alexa= parameter. This would not be any different to not having it at all.

Do you preserve template's formatting?

The particular citation style the bot uses may not match the article's, especially the date format. (I wonder why we don't have an Alexa citation template still.) —  HELLKNOWZ   ▎TALK 21:26, 24 December 2018 (UTC)

The format of the template code overall is preserved, the value is replaced by replacing the regex r"\|\s*alexa\s*=\s*{}".format(re.escape(current_alexa)), so the rest of the template is unaffected. (the number of spaces before the equal sign goes from "any number" to "exactly one", though)
Yeah, I was debating having it skip empty alexa parameters. There's value in adding it (as much as updating it), though for very small sites the increase/decrease indicator may not be particularly useful.
I didn't think to check whether there's more than the previous value, though I can't think of what else would be there. There's at least two common formats for this data, basically the OKBot format, and a similar format with parenthesized month/year instead of the asof (see https://en.wikipedia.org/wiki/Ethnologue - note lack of a reference). I guess it would be safest to check that the value is in a whitelisted set of alexa formats to replace, I'll bet a small number of regexes could cover 90% of cases (and the remaining 10% could be changed to a conforming case by hand :D)
The reference is interesting, because it's basically a lie. It's a link to the alexa page, but that isn't where the data was actually retrieved from, it was retrieved from their backend API. As for if someone's already using that reference, it shouldn't be too hard to check for that, I would think. I imagine (with only anecdotal evidence) that most of those cases will be phrases like "as of 2010, foobar.com had an alexa rank of 5". Updating that reference to the present value may not make sense in the context of the article (myspace isn't as big as it used to be, an article talking about how big it was in 2008 won't care how big it is now). But either way they should probably be citing a source that doesn't automatically change as soon as you read it.
The ethnologue page already looks like it has diverging date formats? I don't know how common that is, I'll have to go dig up the style guide for citations (maybe we should have a bot to make that more uniform). What would it take to make a template? (also, would that solve the uniformity issue? I guess at least it'd be uniform across all alexa rankings)
Lkolbly (talk) 14:52, 25 December 2018 (UTC)
WP:CITEVAR and WP:DATEVAR is the relevant stuff on date and citation differences. On English wiki, changing or deviating from citation or date style without a good reason is very controversial. The short answer is "don't". Bots are generally expected to do the same, although minor deviations are somewhat tolerated. But bots are expected to follow templates, like {{use dmy dates}} or |df= parameters. —  HELLKNOWZ   ▎TALK 16:36, 25 December 2018 (UTC)
Okay, it looks like it should be pretty straightforward to just check for the two Template:Use ... dates tags and set the |df= parameter. Lkolbly (talk) 14:39, 26 December 2018 (UTC)
Updated the bot so that it follows mdy/dmy dates, updating the accessed date and asof accordingly. Also constrained the pages that will be updated to a handful of matching regexes and also pulled a list from User:OsamaK/AlexaBot.js, which eventually I'll make a copy of. Lkolbly (talk) 18:20, 1 January 2019 (UTC)
  •   Approved for trial (50 edits). Primefac (talk) 00:43, 20 January 2019 (UTC)
  Trial complete. Ran bot to edit 50 randomly selected pages. So far I've noticed two bugs that cropped up, one involving leading zeros in the access dates and another where the comment "Updated by LKolblyBot" got repeated. I'm going to go through and fix the issues by hand for now and apply fixes to the bot. Lkolbly (talk) 20:20, 27 January 2019 (UTC)
Also, looking closer, some pages got a "Retrieved" date format that doesn't match the rest of the page (e.g. Iraqi News), but I'm pretty sure it's because those pages aren't annotated with dmy or mdy. Lkolbly (talk) 20:47, 27 January 2019 (UTC)
I have questions.
  • First, Special:Diff/880480890 - is there a reason it chooses http over https?
  • Second, why do some diffs use ISO formatting for the date while others actually change it to dmy?
  • Third, are OKBot and Acagastya still updating these pages, and would it make sense to remove those names from the comments?
My fourth/fifth questions were going to be what you were going to do about duplicate names, but it looks like you noticed that and are taking care of it, along with a lack of leading zeros issue with the 2019-01-27.
Also, as a minor point, even if you've only done 44 edits with the bot, please make sure when you finish with a trial that you link to the specific edits, since while "Contribs" might only show those 44 edits now, after you've made thousands they won't be the first thing to look at.
Actually, I do have another thought - for brevity, it might be best to have a wikilink in the edit summary instead of a full URL. Primefac (talk) 20:12, 28 January 2019 (UTC)
I have answers.
  • There's no particular reason it uses http over https for the alexa.com link, I hadn't given it a second thought. I can change it to https.
  • The variations in date formatting are an attempt to stick with the articles predominant style. The default style being ISO format, and if there's a use dmy or use mdy tag it uses the respective format.
  • OKBot appears defuct, I wasn't aware of Acagastya, though from their user page it looks like they've left English Wikipedia at least. It does make sense to remove the (now duplicate) comments, that was ultimately the goal but it didn't work as planned.
  • Good point on the making a list of the trial edits, conveniently it looks like I can search the contribs to make a view of just the trial edits.
  • Yeah, the wikilink idea occurred to me a few minutes too late, it looks terrible in the commit message :/ Lkolbly (talk) 23:32, 28 January 2019 (UTC)
With the constant modification that Alexa goes through, it is not a good idea to put manual labour for updating the ranks.
acagastya 08:53, 29 January 2019 (UTC)
  • Regarding the 'Updated monthly by ...' lines - as is being demonstrated here there are stale entries - and it can be expected as no bot should ever be expected to operate in the future. To that end I don't think this should be added, and would support having the next update remove any existing such comment codes. — xaosflux Talk 15:21, 7 February 2019 (UTC)
      Approved for extended trial (50 edits). Please implement the above changes in this run. Primefac (talk) 21:30, 14 February 2019 (UTC)

PkbwcgsBot 12

Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 13:10, Monday, December 24, 2018 (UTC)

Function overview: The bot will fix a range of unicode control characters in articles. This is WP:WCW error 16.

Automatic, Supervised, or Manual: Supervised

Programming language(s): AWB

Source code available: AWB

Links to relevant discussions (where appropriate):

Edit period(s): Five times a week

Estimated number of pages affected: 100-250 at a time

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: This is an extension to Task 1 as I am already fixing Unicode Control Characters there. However, this task does more fixes to error 16 and fixes a range of Unicode control characters that WPCleaner can't fix. The following will be removed:

  • U+200E - Left-to-right mark (the bot will be careful when it comes to Arabic text and other foreign text as this is a supervised task)
  • U+FEFF - Byte order mark (all instances of this can be safely removed)
  • U+200B - Zero-width space (the bot will be careful when it comes to Arabic text and other foreign text as this is a supervised task)
  • U+2028 - Line separator (all instances of this can be safely removed)
  • U+202A - Left-to-right embedding (the bot will be careful when it comes to Arabic text and other foreign text as this is a supervised task)
  • U+202C - Pop-directional formatting (the bot will be careful when it comes to Arabic text and other foreign text as this is a supervised task)
  • U+202D - Left-to-right override (the bot will be careful when it comes to Arabic text and other foreign text as this is a supervised task)
  • U+202E - Right-to-left override (the bot will be careful when it comes to Arabic text and other foreign text as this is a supervised task)
  • U+00AD - Soft hyphen (all instances of this can be safely removed)

The following will be turned into a space:

  • U+2004 - Three-per-em space
  • U+2005 - Four-per-em space
  • U+2006 - Six-per-em space
  • U+2007 - Figure space
  • U+2008 - Punctuation space
  • U+00A0 - No-breaking space (any cases of U+00A0 that are okay per MOS:NBSP will not be removed) (this is the most frequent unicode character in WP:WCW error 16)

The bot will use RegEx and general fixes will be switched on but typo fixing will be turned off as they are both not required for this task.

Discussion

I'm not sure about some of these. In particular, U+00AD may have been added by editors to specify the proper place for long words to be broken, and U+00A0 should more likely be turned into the &nbsp; entity than changed into U+0020. The same might apply to the other space characters, editors may have specifically used these in preference to U+0020. Anomie 17:06, 24 December 2018 (UTC)

@Anomie: After going through the WP:WCW list, there are no instances of U+00AD anywhere. However, if it does come up, then I will replace it with a hyphen. U+00A0 takes up more bytes than a regular space (U+0020) so it is easier to leave a space. The other space characters can be safely replaced as they are unnecessary and they mostly come up in citations. See 1 which is taking out U+2005 which is four-per-em space, 2 which is taking out U+2008 which is punctuation space, 3 which is taking out U+2005 again, 4 which is taking out U+2008 again and 5 which is also taking out U+2008. All these occurred inside citations. Pkbwcgs (talk) 17:43, 24 December 2018 (UTC)
Replacing U+00AD with a hyphen would not be correct either. You'd want to replace it with {{shy}} or the like. For NBSP "takes up more bytes" is a very poor argument, and replacing it with a plain space could break situations described at MOS:NBSP. A figure space might be intentionally used to make columns of numbers line up properly where U+0020 would be a different width, and so on. I don't object to fixing things where specific fancy spaces don't make a difference, but you're arguing that they're never appropriate and that strikes me as unlikely. Anomie 17:55, 24 December 2018 (UTC)
@Anomie: There are no cases of U+00AD so the bot doesn't need to handle that. In terms of U+00A0, I will make sure my RegEx replaces the cases described at MOS:NBSP with &nbsp or otherwise skip them. Pkbwcgs (talk) 18:04, 24 December 2018 (UTC)
If you're not intending to handle U+00AD after all, you should remove mention of U+00AD from the task entirely. (I see you struck it) As for "the cases described", good luck in managing to identify every variation of those cases. It would probably be better to just make that part of the task be manually evaluated rather than "always replace". Anomie 18:09, 24 December 2018 (UTC)
@Anomie: The bot will still strip U+00A0 in wikilinks because replacing them with &nbsp is not going to work. Pkbwcgs (talk) 18:15, 24 December 2018 (UTC)
Replacing the cases stated at MOS:NBSP is trickier than I thought so I am going to skip those cases manually. This task is supervised. Pkbwcgs (talk) 18:20, 24 December 2018 (UTC)
{{BAG assistance needed}} I have made some amendments to this task including reducing down to five times a week and added general fixes so the removal of unicode control characters and general fixes can be combined together. I have also specified that non-breaking space will not be removed in cases described at MOS:NBSP and the bot will replace those cases with "&nbsp" with the general fixes. Pkbwcgs (talk) 20:10, 17 January 2019 (UTC)
  •   Approved for trial (50 edits). Primefac (talk) 00:45, 20 January 2019 (UTC)
    @Primefac:   Trial complete. The edits are located here. WP:GenFixes were switched on as stated for this task. I will point out a couple of good edits. This edit removed unicode control character no-breaking space in the infobox. Because of that character, the "distributor" character disappeared from the infobox. Once the bot removed that character, it re-appeared which makes it a good edit. There were some good general fixes in this edit as well as the removal of a non-breaking space character. This edit is also a good edit because it changed the direction of text from being right-to-left to left-to-right. Before, the right-to-left text would have been confusing but now the direction is changed so it is not confusing anymore. That edit removed the U+202E character which is "Right-to-left override". Some edits were removing non-breaking space within citations, U+200E was also removed in some edits in Arabic text and a few edits were removing U+2008 which is punctuation space. Pkbwcgs (talk) 20:02, 20 January 2019 (UTC)
    It might take me a few days to be able to verify any of these (and I have zero issue if another BAG gets to it first), but as a note it's much more helpful to point us to the bad/incorrect edits. In other words, we know how the bot is supposed to run, and pointing us to runs where the bot did what it was supposed to is... kind of pointless. Primefac (talk) 20:14, 22 January 2019 (UTC)
    Anomie, I don't know if you wanted to go through these or not, given your previous interest/concerns. Primefac (talk) 19:52, 28 January 2019 (UTC)

GreenC bot 7

Operator: GreenC (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 01:14, Friday, December 28, 2018 (UTC)

Function overview: Add {{Unreferenced}} and {{No footnotes}} to the tops of pages that have no references or are missing in-line footnotes.

Automatic, Supervised, or Manual: Automatic

Programming language(s): GNU Awk and BotWikiAwk framework

Source code available: Yes (TBU)

Links to relevant discussions (where appropriate):

Edit period(s): one time run

Estimated number of pages affected: ~ 130,000

Namespace(s): Mainspace articles

Exclusion compliant (Yes/No): Yes


Function details:

As background, members of New Page Patrollers (WP:NPP) have caught up tagging the backlog of new pages. However there is still an older backlist of articles created since day 1 up to about 2012 which still contain many untagged articles. Estimates could be half a million or more untagged. A request was made on BOTREQ by a NPP member. I took a try at creating an algorithm to detect when a page could reasonably be tagged. Dry-run tests on 10,000 articles show it be successful. Discussion at Wikipedia_talk:WikiProject_Unreferenced_articles#List_of_unreferenced_articles shows support for an automated bot to help find articles needing attention and tag the pages.

Test results are available at User:GreenC/data/noref

The bot will start slowly and be fully supervised initially, running in batches, checking results.

Discussion

What kinds of main namespace pages are you exempting? (e.g. Redirects, Disambiguation pages?) — xaosflux Talk 01:56, 28 December 2018 (UTC)

User:GreenC/data/noref#Algorithm lists stubs, redirects, any containing one of 1500+ templates (eg. {{EB1911}} and dabs), the HTML tag <ol class="references">, any that begin with "List of ..", "Index of .." or "<year> in .." -- GreenC 02:22, 28 December 2018 (UTC)
Personally, I would also skip stubs and articles with a dedicated 'further reading/external links' from the run. Headbomb {t · c · p · b} 17:51, 2 January 2019 (UTC)
Right "stubs" are already skipped. If there is an external links section they get {{no footnotes}}. -- GreenC 13:41, 9 January 2019 (UTC)

  Approved for trial (10 edits for each tag.). Let's see it in action then. Headbomb {t · c · p · b} 16:45, 9 January 2019 (UTC)

  • The sum total of three editors from a niche project who've supported this so far isn't really representative of the community and we'd really need a broader discussion, ideally at the village pump. There are segemnts of the community that take issue with the perceived indiscriminate tag-bombing performed by human editors, so I'm not sure having a bot take up this activity could be completely uncontroversial. There are differences of opinion on whether {{unreferenced}} should be placed on any unreferenced article, or only on those where the fact of being unreferenced is not immediately obvious to readers (ex. the article is long) and there is good reason to doubt the veracity of the content. And even if assume that all unreferenced articles should be tagged, I'm not sure how a bot could do that within acceptable bounds of the rate of false positives. The bot currently employes a good deal of nuance (I like that it excludes lists and articles with external links), but I don't see how it could reasonably detect all types of references. Sources can be present without templates or ref tags (an aricle might only have a bibliography list at the end, and in-text attribution like "According to Strand's lengthy article in the 1953 issue of the JBF" is acceptable even in the absence of such a bibliography), or they may be implicit in the external links to standard identifiers in some types of infoboxes, or in the authority control templates at the end of articles. Heck, I've seen even human editors do a poor job of figuring out if an article is unreferenced, so I'm not confident a bot could do that either. – Uanfala (talk) 14:33, 12 January 2019 (UTC)
  • Maybe so, but that does not preclude a small scope trial to see exactly what is being proposed in action. And keep in mind a small amount of false positive is acceptable.Headbomb {t · c · p · b} 15:23, 12 January 2019 (UTC)
For the next round I'm going to make more entries to the test data results. An admirable User:Boleyn tagged some of the previous test results so more would be good. Boleyn is a great example who has already benefited from this bot to make improvements as part of the NPP process. -- GreenC 14:05, 13 January 2019 (UTC)
  • Rather than asking if the bot can deal with these things or how, Uanfala declares the bot "could not do that". It is unclear Uanfala has looked at the test data results. Every issue raised was already encountered when it showed up during testing, I coded for it, the bot edits like a discriminate person would, it's been trained and can be further trained. The question is, which article in the test data do you take exception to? If the position is no tags at all, why does {{unreferenced}} have 200,000 and why does NPP add tags systemically every day (this tool might add another 20k so a 10% increase). There is consensus for tagging and WP:NPP has been widely lauded for their work which involves significant tagging. If the position is zero mistakes then that is unreasonable for bot or person. If the position is it makes too many mistakes, that is pure conjecture, see test results, and misses the fact this bot is being carefully run by a programmer who is checking results, taking feedback and continually improving it. -- GreenC 14:05, 13 January 2019 (UTC)
I acknowledge you've done great work with the bot, and apologies if I haven't been clear enough, and for having to repeat what I've alredy written above. The major issue is that there is so far no meaningful consensus for allowing a bot to tag thousands of articles: to get consensus for something that affects so many articles, there needs to be a well-attended discission at a place like the village pump. And no, the fact that two editors have so far stated this would be a good idea (especially given that one of them is known for her extreme views as to what constitutes appropriate tagging) is very far from that. Yes, I did look at the test results (and incidentally, that's where I came across an article with {{Authority control}} that was earmarked for tagging – I wouldn't have thought of that otherwise). And again, I'm sorry if this sounds like I'm simply postulating without seeing the data, but I think we would all agree that even in this age of AI optimism, that it's unlikely for a bot to be able to make good judgements as to which articles need to be tagged: again, the tag is not meant to be placed on every article without blue superscript numbers (though some NPP reviewers seem to do just that), but only where there's good reason to alert the reader to it. – Uanfala (talk) 18:16, 13 January 2019 (UTC)
The bot is not "AI". I am not an "optimist" who thinks computers are the solution for everything. The bot does not put a tag on every article "without a blue subscript", it is more nuanced than that. The question again is why anyone (bot or person) would not tag the articles identified. If you think {{Authority control}} should be skipped, that is a trivial feature. -- GreenC 19:23, 13 January 2019 (UTC)

@GreenC: feel free to proceed with the trial whenever you want. Headbomb {t · c · p · b} 18:49, 13 January 2019 (UTC)

@Headbomb: - please withdraw the BRFA. It is going to prove too controversial. Not that I agree (otherwise i wouldn't have made the BRFA) but there are evidently some old wounds in the community about tagging and this bot will reopen old battle scars. And there is more than 1 way to make use of the tool, it's purpose is to discover and identify potential candidate articles, information others can do with as they please. If there is support to re-open the BRFA it should go through VP or an RFC first. -- GreenC 15:28, 14 January 2019 (UTC)

Well, it's your BRFA, so you can withdraw it if you want (simply put {{BotWithdrawn}} somewhere here). I still think you should proceed with the trial personally. Headbomb {t · c · p · b} 15:31, 14 January 2019 (UTC)
Well thank you for the support. There are trial "edits" in the test data so we can see what it does (would do) after processing 16,000 articles. If you or anyone else would like to start an RfC that is fine by me, but I don't fancy leading this fight it will not be pretty. This was an interesting technical challenge and the data it produces can still be posted either way. I will withhold withdraw for a bit in case anyone wants to initiate a consensus discussion. -- GreenC 15:43, 14 January 2019 (UTC)
Will you proceed with a trial or not? Because if not, there is little point in keeping this open. Headbomb {t · c · p · b} 15:45, 14 January 2019 (UTC)
You mean just adding the tag? Normally page edits can be complex and/or error prone so they need be trialed, but dropping a tag at the top of a page is trivial and not error prone. In this case the problem is more about consensus. -- GreenC 15:51, 14 January 2019 (UTC)
I mean doing edits yes. That's what a trial is. Headbomb {t · c · p · b} 15:52, 14 January 2019 (UTC)

I have opened a discussion at Wikipedia:Village_pump_(proposals)#Bot to add Template:Unreferenced and Template:No footnotes to pages (single run) to ease or confirm GreenC's fears. Any watching this page may be interested in commenting there. Thanks all for comments. Sorry to get in the middle of the BRFA here. Cheers! Ajpolino (talk) 17:20, 14 January 2019 (UTC)'

Sigh... Headbomb {t · c · p · b} 18:11, 14 January 2019 (UTC)

Hi. Would a page such as List of United States Supreme Court cases, volume 586 be tagged by this bot?

I'm not convinced that tagging pages such as Grand Ducal Highness with {{no footnotes}} is particularly helpful, but shrug. If it's a one-time run, people can presumably just remove the ugly tags if they don't like them. --MZMcBride (talk) 01:41, 15 January 2019 (UTC)

To my understand, it would be no to both, the first is a list (excluded) and the second has external links (also excluded). Headbomb {t · c · p · b} 01:43, 15 January 2019 (UTC)
Right the first one is no since it is a "List of" .. the second would get {{no footnotes}} assuming there is consensus. In this page, User:GreenC/data/noref/14001-15000 it shows three independent bot algorithms. Which algorithm(s) the bot deploys is up to the community. It can do the first, or all three, or some combo. By default it will do all three, but if there is concern about the two {{no footnotes}} algos one or both could be dropped. -- GreenC 02:52, 15 January 2019 (UTC)

The bot doesn't have support for a parenthetical referencing either bare parentheticals or ones generated by {{harv}}. It lists Nummer 5 as an article to be tagged as type 2 {{no footnotes}} though the article has inline citations generated by {{harv}} and {{harvnb}}. It also lists Fossilized affixes in Austronesian languages to be tagged as type 2 {{no footnotes}} though it uses parenthetical citations with page numbers. Wugapodes [thɑk] [ˈkan.ˌʧɹɪbz] 08:25, 17 January 2019 (UTC)

Thanks. The {{harv}} etc is fixed. The parenthetical citation method is not common and hard to check for, it may be a good idea to tag these anyway for community attention so they can be converted. More developed articles using the parenthetical method will likely get skipped as they will probably contain other bits of info that will flag it for bypass. -- GreenC 15:50, 17 January 2019 (UTC)

  Trial complete. - 20 edits at Special:Contributions/GreenC_bot on January 17 ("via noref bot") -- GreenC 20:11, 19 January 2019 (UTC)

GreenC, Easy link to this trial's edits: [2] SQLQuery me! 05:09, 24 January 2019 (UTC)
@GreenC: Note that the tags must be added after any hatnote templates (if any exist). The following code may help: pageText = pageText.replace(/^\s*(?:((?:\s*\{\{\s*(?:about|correct title|dablink|distinguish|for|other\s?(?:hurricaneuses|people|persons|places|uses(?:of)?)|redirect(?:-acronym)?|see\s?(?:also|wiktionary)|selfref|the)\d*\s*(\|(?:\{\{[^{}]*\}\}|[^{}])*)?\}\})+(?:\s*\n)?)\s*)?/i, "$1" + tagText); (source: Twinkle) SD0001 (talk) 15:46, 25 January 2019 (UTC)
That probably will not occur as currently the bot skips anything with a preexisting banner in an abundance of caution to avoid over-tagging, presumably some-one/thing has looked it and decided it needed that banner and not this one. We've got lower hanging fruit than piling on banners. Eventually they can be revisited when the tracking categories are reduced. -- GreenC 19:54, 31 January 2019 (UTC)
The list is extensive, thousands, I made a separate program to auto-generate which templates to avoid (including their alias/redirect names), then a function to autogenerate a lengthy regex. -- GreenC 19:57, 31 January 2019 (UTC)


Approved requests

Bots that have been approved for operations after a successful BRFA will be listed here for informational purposes. No other approval action is required for these bots. Recently approved requests can be found here (edit), while old requests can be found in the archives.



Denied requests

Bots that have been denied for operations will be listed here for informational purposes for at least 7 days before being archived. No other action is required for these bots. Older requests can be found in the Archive.

Expired/withdrawn requests

These requests have either expired, as information required by the operator was not provided, or been withdrawn. These tasks are not authorized to run, but such lack of authorization does not necessarily follow from a finding as to merit. A bot that, having been approved for testing, was not tested by an editor, or one for which the results of testing were not posted, for example, would appear here. Bot requests should not be placed here if there is an active discussion ongoing above. Operators whose requests have expired may reactivate their requests at any time. The following list shows recent requests (if any) that have expired, listed here for informational purposes for at least 7 days before being archived. Older requests can be found in the respective archives: Expired, Withdrawn.