Open main menu

Wikipedia:Bots/Requests for approval

< Wikipedia:Bots  (Redirected from Wikipedia:BRFA)

BAG member instructions

If you want to run a bot on the English Wikipedia, you must first get it approved. To do so, follow the instructions below to add a request. If you are not familiar with programming it may be a good idea to ask someone else to run a bot for you, rather than running your own.

 Instructions for bot operators

Contents

Current requests for approval

RonBot 12

Operator: Ronhjones (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 23:23, Monday, October 15, 2018 (UTC)

Function overview: Tags pages that have broken images, and sends a neutral message to the last editor.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: User:RonBot/12/Source1

Links to relevant discussions (where appropriate): Request at Wikipedia:Bot_requests#CAT:MISSFILE_bot by KatnissEverdeen

Edit period(s): Twice daily.

Estimated number of pages affected: on average, we estimate 70 articles a day are affected, so that will be 70 articles and 70 talk pages.

Namespace(s): Articles, User Talk space

Exclusion compliant (Yes/No): Yes

Function details:

Step 1 - Bot will get the list of articles at Category:Articles with missing files. It will check for the presence of {{BrokenImage}}. If not present, then it will (a) Add that template, and (b) add {{Broken image link found}} to the talk page of the last editor. NB:This message will be adjusted for the first runs as the time from the broken image to the last edit might be while - it will be better when up to date.
Step 2 - Bot will get the list of articles at Category:Wikipedia articles with bad file links (i.e. pages containing {{Broken image}}) with {{BrokenImage}}. It will check that the page is still in Category:Articles with missing files - if not, it will remove the template - this allows for cases where some other action (e.g. image restored) has fixed the problem, without the need to edit the article.

Discussion

  • I'm not sure leaving a TP message with the last editor is a good idea. I can think of several scenarios where the last editor might not have had anything to do with the image link breaking. I'd really like to hear other opinions on this. SQLQuery me! 04:31, 16 October 2018 (UTC)
I'm actually   Coding... to message the user who broke the link Galobtter (pingó mió) 06:23, 16 October 2018 (UTC)
{{BOTREQ}} ping Ronhjones Galobtter (pingó mió) 10:08, 16 October 2018 (UTC)
  • If a file is deleted (here or on Commons), then it may take several months until a page using the file shows up in Category:Articles with missing files. Whenever someone edits the page the next time, it immediately shows up in the category. Deleted files are typically removed from articles by bots, but they sometimes fail.
As a first step, I propose that you generate a database report of broken images (use Wikipedia's imagelinks table to find file use and then Wikipedia's+Commons's image tables to see if the file exists) and then purge the cache of those pages so that the category is updated. Also consider purging the cache to all pages in Category:Articles with missing files as files might not otherwise disappear from the category if a file is undeleted.
If the file is missing because it was deleted, then the latest editor to the article presumably doesn't have anything to do with this. I think that {{Broken image link found}} risks confusing editors in this situation. Consider reformulating the template.
Category:Wikipedia articles with bad file links seems to duplicate Category:Articles with missing files so I suggest that we delete Category:Wikipedia articles with bad file links and change {{Broken image}} so that the template doesn't add any category.
This is bad code:
if "{{Broken image}}" not in pagetext:
Someone might add the template manually as {{broken&#x20;image}} or some other variant and then you would add the template a second time. Consider asking the API if the template appears on the page instead of searching for specific wikicode. If the bot is unable to remove the template because of unusual syntax, then it may be a good idea if the bot notifies you in one way or another. --Stefan2 (talk) 10:23, 16 October 2018 (UTC)
Using mw:API:Templates for the existence of {{Broken image}} would seem the better way of doing it.
use Wikipedia's imagelinks table to find file use and then Wikipedia's+Commons's image tables to see if the file exists With 10s millions of files in each table I wonder how feasible doing that would be. Galobtter (pingó mió) 11:02, 16 October 2018 (UTC)
Also, the issue with deleted files would seem resolved once Wikipedia:Bots/Requests_for_approval/Filedelinkerbot_3 goes through Galobtter (pingó mió) 12:46, 16 October 2018 (UTC)

{{BotWithdrawn}} In view of the better system by Wikipedia:Bots/Requests_for_approval#Galobot_2 - I'll delete the unneeded cats and templates. Ronhjones  (Talk) 22:39, 16 October 2018 (UTC)

Restart

I have undone the withdraw at the request of Galobtter. But I have cut down the actions to a simple tagging (and de-tagging) of images based on the Category:Articles with missing files as requested. User pages will not be edited. The {{BrokenImage}} no longer generates a categorisation - instead I have used mw:API:Templates to find the list of transclusions (and removed the space in the template name to make life easier). Also removed the If "X" not in Y, for a better match code. Ronhjones  (Talk) 22:53, 17 October 2018 (UTC)

Tom.Bot 7

Operator: Tom.Reding (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 03:18, Wednesday, October 17, 2018 (UTC)

Automatic, Supervised, or Manual: manual

Programming language(s): AWB

Source code available:

Function overview: Replace non-standard non-keyboard apostrophes with the WP:MOS#Apostrophes-preferred keyboard apostrophe '.

Links to relevant discussions (where appropriate): WP:AWB/Typos#Apostrophe S

Edit period(s): One-time bulk run with sparse follow-ups

Estimated number of pages affected: ~565,000

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: An expansion to the "apostrophe-s" AWB typo rule by Smasongarrison has identified at least 500k pages with this fix (the database scanner maxed out, but I can get a better estimate if requested). As pointed out by John of Reading, the frequency of this rule firing may often swamp other changes to the same page.

Due to the size, simplicity, and inseverity of this fix, it seems like an ideal bot task.

The regex will resemble (\w)[´ˈ׳᾿’′Ꞌꞌ`]s\b(?<!\.[^\s\.]{0,999}), to be replaced with $1's. Additional non-keyboard apostrophes may be added, with the exception of the {{okina}} (for just-in-case-ness, as brought up by Certes).

WP:GenFixes can be left on.

All participants of the originating discussion have already been pinged, above, for their information and possible input to questions raised here.

Discussion

  Needs wider discussion If you want to make such a minor change on half a million pages you will need to establish a strong consensus in a well attended location first. Suggest you open a discussion at WP:VPR. — xaosflux Talk 03:28, 17 October 2018 (UTC)
Especially as you want to run genfixes blindly on all the pages as well. — xaosflux Talk 03:34, 17 October 2018 (UTC)
I said "WP:GenFixes can be left on" (my emphasis).   ~ Tom.Reding (talkdgaf)  04:08, 17 October 2018 (UTC)
Xaosflux, what # of pages would you say would not require a WP:VPR? The reason being that there will be some histogram of pages with this correction (e.g. many with 1 correction, fewer with 2, still fewer with 3, etc.), and I can restrict this BRFA only to those pages with, say, 5 or more corrections (whatever number-of-corrections is required to bring the total # of pages affected down to BFRA-level). Then, if/when I want to complete the rest of them, the ones with, say 1-4 corrections, I can go to WP:VPR later.   ~ Tom.Reding (talkdgaf)  04:08, 17 October 2018 (UTC)
Technical trial can certainly be tested. If you think this is worthwhile, what issue do you see with getting wider discussion? — xaosflux Talk 04:11, 17 October 2018 (UTC)
The pages with a large number of corrections each is the main issue here; I don't want perfection to get in the way of progress. I'd rather take care of the worst cases quickly than all of them slowly.   ~ Tom.Reding (talkdgaf)  04:17, 17 October 2018 (UTC)
@Tom.Reding: perhaps an example would help illustrate this, please provide a list of 10 pages with 10+ instances below for general review. — xaosflux Talk 13:02, 17 October 2018 (UTC)
Xaosflux, as requested:

Questions: How will this bot avoid introducing unintended italic and bold formatting into pages? How will it avoid replacing prime marks and other intentional punctuation that resembles quotation marks but should not be replaced? – Jonesey95 (talk) 05:05, 17 October 2018 (UTC)

Jonesey95, I think what I'll do is use that regex to crudely estimate how many uses there are per page, then use AWB's typo fixing on the worst ones. Since AWB employs many of its own checks to make sure typos are applied in the right spots, I won't bother trying to replicate it all. This means it won't be in automatic mode but manual. This is what I've been doing, but people such as Materialscientist have objected, at least to pages where this is the only change. Materialscientist, what #, if any, of these apostrophe changes on a page, assuming worse-case that they are the only changes to a page, would not cause objection?   ~ Tom.Reding (talkdgaf)  13:07, 17 October 2018 (UTC)
That sounds reasonable to me. BTW, I support these edits. The reason for my question is that I have done similar edits on a small scale, and I occasionally run into a curly quote mark immediately following italic markup, which then becomes bold formatting if you don't catch it.
I do not think Materialscientist's objections are based on any valid policy or rule; I change curly quotes to straight quotes as my only edit with some frequency; the edits may be minor, but they absolutely change the rendered page. A minor edit to fix one visible problem on a mainspace page is a valid edit in nearly every circumstance.
Have you considered working on quotation marks as well? – Jonesey95 (talk) 15:26, 17 October 2018 (UTC)
Then I think I'll add a slightly safer precaution '\w to the live rule's negative lookbehind, which can be changed to ''\w after the bulk are complete.
Yes, I considered working on quotations for all of...a few minutes...
I would be useful if there was some way to programatically determine, before saving in AWB, how many times a rule fired, like parsing the final edit summary, but I'm not sure how to do this without recreating all of the scaffolding around typo fixes.   ~ Tom.Reding (talkdgaf)  17:01, 17 October 2018 (UTC)
I see you have changed your request to purely manual, how do you propose to reduce the scope/# of pages to something that can be done manually? — xaosflux Talk 02:05, 18 October 2018 (UTC)
Xaosflux, I've been running a scan, which will still take about another day to finish. In fact, being in manual, and after seeing Jonesey95's comments & support, I might just withdraw the request until some later time when I'm more confident to put the bot in at least supervised mode, basically by isolating this particular typo so that others don't fire. That request would be for the vast majority of corrections (i.e. < 5 or < 10 corrections per page) and after much testing.   ~ Tom.Reding (talkdgaf)  12:53, 18 October 2018 (UTC)
@Tom.Reding: OK, let us know one way or the other. (Or eventually this will just move to 'expired' if discussion dies out). — xaosflux Talk 12:57, 18 October 2018 (UTC)
Xaosflux, I don't know how long it'll take before I get around to this (could be a few days, could be a few weeks). If it's ok with you, I'd prefer to keep this request active until either it expires or I'm more sure of my time/interest, whichever comes first. If not, I can withdraw.   ~ Tom.Reding (talkdgaf)  17:02, 18 October 2018 (UTC)
Note the other bot requests on this page RonBot 12 and Galobot 2, trying to fix the problems listed in CAT:MISSFILE. Some of these are known to occur by semi-auto edits changing file names with odd punctuation. An example is File:Forrest Guth, Clancy Lyall and Amos “Buck” Taylor 117380.jpg, where an AWB edit (diff) changed the "fancy quotes" to "normal quotes" (edit summary = standard quote handling in WP;standard Apostrophe/quotation marks in WP; MOS general fixes), and of course that broke the link. How will the bot avoid changing file names? - Which will be either of the form of the full file name or without the "File:" prefix in an infobox. Ronhjones  (Talk) 01:27, 18 October 2018 (UTC)
Ronhjones, I used the 'before' version of your example diff to isolate exactly what AWB's WP:GenFixes & WP:AWB/Typos fixed, and what was done by that user. AWB ignores files, images, and a host of other potential-problem-areas, which have been found over years of development. However, a user can write their own rules/code that aren't as well thought out, which was the case here, and which won't be the case for this bot.   ~ Tom.Reding (talkdgaf)  12:53, 18 October 2018 (UTC)
Thanks for that. Apparently this is not an unusual event. Nice to know that your bot will not add to the problems. Ronhjones  (Talk) 15:38, 18 October 2018 (UTC)

TheSandBot

Operator: TheSandDoctor (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 00:47, Wednesday, October 17, 2018 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: https://github.com/TheSandDoctor/election_converter

Function overview: Looks through the linked csv file, converting from the old title format to the new one.

Links to relevant discussions (where appropriate): [1], RfC on election/referendum article naming format

Edit period(s): Run until done

Estimated number of pages affected: Approximately 35, 227

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): No

Function details: The bot goes through the compiled csv file (in GitHub repo, more easily read form is the Excel document, which is also included). The bot then pulls up the individual page objects, double checks that they are not themselves redirects (ie that they haven't been moved) and exist. If both conditions are satisfied, the bot moves the page (leaving behind a redirect) to the corresponding title in column B (.xlsx doc). This corresponds with the latest RfC on election/referendum article naming format and was created per request of Number 57.

The code itself is relatively straight forward, with most of the heavy lifting being handled by the mwclient Python library's move function, which is a part of the page object.

Due to the large number of page moves required, I would also request that the bot flag be assigned should this request be approved. The bot is not exclusion compliant as that is non-applicable given the context.

Discussion

  • Would it be possible for you to generate a randomized list of 100-500 articles to be moved, and the titles that they would be moved to in your userspace please? SQLQuery me! 01:11, 17 October 2018 (UTC)
    @SQL: Certainly! It can be viewed here: User:TheSandDoctor/sandbox2. The first 150 (of 151) were generated at complete random (though I did ensure no duplicates were chosen) by a Python script. Since there are over 35 thousand and only 150 selected at (near as computers can get) random (so ~0.43% of the articles, assuming my math is correct), I included a variant not shown by the randomly generated list (#151). --TheSandDoctor Talk 01:49, 17 October 2018 (UTC)
    Here is the random generation code, if you are interested. Incase it wasn't clear before, this is the script itself. I should really clean up the repo... --TheSandDoctor Talk 01:54, 17 October 2018 (UTC)
    This is a very large request. I'd love some more opinions from other recently active BAG members. Xaosflux, MusikAnimal, Anomie, The Earwig, Headbomb, Cyberpower678. SQLQuery me! 02:12, 17 October 2018 (UTC)
    If it is determined that throttling would be required, the addition of a wait timer would be relatively trivial. All I would need is the time you want it to wait between moves and that could be added rather painlessly. --TheSandDoctor Talk 02:04, 17 October 2018 (UTC)
    @SQL: my concern is determining if there was sufficient input to that RfC, it closed with ~69% support over only 16 editors despite a claim that it would be advertised to "as many relevant WPs as I can find". At the very least notice of this at WP:VPR would be a good "last chance" notification to editors at large. — xaosflux Talk 03:17, 17 October 2018 (UTC)
    I would support that. The absolute number of pages involved isn't very large for a bot, but it's the kind of change that can be very contentious, especially if people feel they weren't notified. I generally support the proposal, for what it's worth, but consensus isn't as clear as I would've liked. — Earwig talk 03:22, 17 October 2018 (UTC)
    It seems like a WP:VPR discussion and people reviewing this list would both be helpful here. Anomie 12:34, 17 October 2018 (UTC)
    Number 57 would you like to do the honours, or would you like me to? I agree that bringing attention to this BRFA at WP:VPR might be a good idea. --TheSandDoctor Talk 20:34, 17 October 2018 (UTC)
    I'll give it a go. Please amend what I've done if it's not right (never started a discussion there before). Cheers, Number 57 21:06, 17 October 2018 (UTC)
    I'll keep an eye out for it Number 57   --TheSandDoctor Talk 21:14, 17 October 2018 (UTC)
  • No objections here at running it at full speed. Getting it done as fast as possible would present minimal disruption to people's watchlists. We just need to make sure the renames work correctly. We wouldn't want a bunch of moves going to "Test move" or "Oopsie".—CYBERPOWER (Trick or Treat) 02:37, 17 October 2018 (UTC)
    "Test move" and "oopsie" are not in the list, I can assure you that   . --TheSandDoctor Talk 02:41, 17 October 2018 (UTC)
    Conventional wisdom is that you want to go slowly enough to allow for manual review and to minimize disruption if something goes wrong, especially for a task that is not time-sensitive like this one. Unless the bot flag doesn't work for moves (?), watchlist disruption shouldn't be an issue. I would recommend at least a few seconds between edits. — Earwig talk 03:16, 17 October 2018 (UTC)
    @Earwig: Easily done. The script is currently configured with a 4 second delay, but that could be changed in less time than it took to write this sentence. --TheSandDoctor Talk 04:02, 17 October 2018 (UTC)
  • I'm not convinced all of the proposed wordings are correct. For example, Oregon Ballot Measure 58 (2008) should probably not be moved. I see a lot of similar issues in the .csv with other propositions/ballot measures; e.g. California Proposition 10 (1998) going to 1998 California Proposition 10 seems wrong. There are some other strange wordings: should Polish presidential election, 1922 (special) go to 1922 Polish presidential election (special) as suggested or 1922 Polish presidential special election or similar? Perhaps we can come up with a tighter definition of the grammar for acceptable renames, like leaving titles with parentheses for manual review. — Earwig talk 02:54, 17 October 2018 (UTC)
    @The Earwig: If a title contains parenthesis anywhere, it could certainly be compiled into its own list, recorded, and skipped over in the move. Would only add a couple of lines. The thing is, for that sort of thing, I need well defined and clear rules in order to write the regex to test for. --TheSandDoctor Talk 02:57, 17 October 2018 (UTC)
    Right, and I'm not sure what that would look like yet. I noticed another strange phrasing, which would currently move Ohio's 13th congressional district election, 2006, to 2006 Ohio's 13th congressional district election. So at the very least, we can probably have extra eyes on titles with parentheses or apostrophes, and titles without "election" or "referendum"? — Earwig talk 03:16, 17 October 2018 (UTC)
  • The change file includes moving non-year values to the start of the title, the MOS doesn't appear to address this, nor did the RfC. e.g. French constitutional referendum, October 1946 (Guinea) --> October 1946 French constitutional referendum (Guinea). Do you mean to sort these with "O"? — xaosflux Talk 03:21, 17 October 2018 (UTC)
    @Xaosflux: The list was compiled by Number 57, so they would probably be the best to ask. That said, it does appear to be the case and does make logical sense, given the RfC and its approaches with the postfix years. The whole purpose of the RfC appears to be moving like this. Moving otherwise would not make sense. This is not to comment on the above point or "(Guinea)" not being moved, just "October 1946". --TheSandDoctor Talk 04:01, 17 October 2018 (UTC)
    @TheSandDoctor: so in this example, the SORTKEY is currently under "French con..", now it will be under "October" (any very specifically NOT under "1946") - unless additional sortkey adjustments are made. What is the category sorting goal? — xaosflux Talk 04:04, 17 October 2018 (UTC)
    @Xaosflux: Number 57 is going to have to answer that one. I just saw the WP:BOTREQ and made the bot. I will happily answer or give my assessment on the technical side of things (related to the script), but the excel document and the RfC was Number's brainchild. --TheSandDoctor Talk 04:09, 17 October 2018 (UTC); expanded for clarity 04:22, 17 October 2018 (UTC)
en dash issue is OK, covered with another bot. — xaosflux Talk 19:26, 17 October 2018 (UTC)
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

  • For new pages with "–" in the titles, now at the start for ranges, are you going to make redirects from the common redirect "-"? — xaosflux Talk 03:25, 17 October 2018 (UTC)
    Unless I'm missing something, or don't understand - I don't see any titles matching " - " in the docx. SQLQuery me! 03:52, 17 October 2018 (UTC)
    @SQL: For example new page will be July–August 1990 Bulgarian presidential election, may need a redirect from July-August 1990 Bulgarian presidential election created. — xaosflux Talk 04:02, 17 October 2018 (UTC)
    @Xaosflux: The word "may" could be problematic here as scripts can't do human thought. That would probably necessitate output to be posted in the user space for human editors to look over and make the call if we go down that route (definitely possible, would just need to decide between the bot posting it or it generating a file and myself periodically updating the page). Aside from that though (ignoring it momentarily, if you will), a regex could be crafted to scan a title looking for '–'s and then launch another method to create a redirect without (assuming '–'s are present). --TheSandDoctor Talk 04:14, 17 October 2018 (UTC)
    I'm only calling it out for commentary here, I'm not that current on MOS about dashes and don't want to fall asleep on my keyboard reading the MOS right now! — xaosflux Talk 04:20, 17 October 2018 (UTC)
    @Xaosflux: Not a problem and I hear you  . I am happy to work with the community on this and share the technical knowledge I have. I am hoping that we can iron out the details regarding this. I still believe that some sort of a bot is needed for this, should the RfC stand, since 35k+ articles is a tad too much to do by hand very easily. --TheSandDoctor Talk 04:27, 17 October 2018 (UTC)
    Isn't this already covered by Anomie Bot? ~ Amory (utc) 15:22, 17 October 2018 (UTC)
    I think you're right; as an example, when 2018–19 Southern Football League was created, Anomie Bot created 2018-19 Southern Football League a few hours afterwards. Number 57 16:11, 17 October 2018 (UTC)
    I looked through the bot's user page and did not see anything covering this BRFA. That said, after reading Number's response (which occurred while I was looking), I realize that that appears to have not been what you meant. In that case, then there probably wouldn't be any issues whatsoever on this particular point/thread. --TheSandDoctor Talk 16:22, 17 October 2018 (UTC)
    Yes, sorry — this was threaded to be in reply to Xaosflux. It's AnomieBOt 74, and works like a charm (too well, really; I see plenty of these at G8 patrolling). ~ Amory (utc) 19:24, 17 October 2018 (UTC)

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
  • To respond to some of the various points above:
  1. The RfC was advertised to the following WikiProjects: Elections and Referendums, Politics, Politics of the United Kingdom, U.S. Congress, Pakistani politics, New Zealand/Politics, Indian politics, Chinese politics and Australian politics; these are all the politics-related WikiProjects that I could find. On a vote-counting basis it is 69% in favour, but a couple of the oppose !votes are on dubious grounds (one being because an editor didn't believe that redirects appear in search, and another one who claimed they had never seen a year at the start of an article title), so I think the consensus (in terms of what a closing admin would determine) from the discussion is pretty undeniable.
  2. The proposed moves to 1998 California Proposition 10 etc are in line with the naming guideline (see the last bullet at WP:NC-GAL#Elections and referendums). The Oregon ones are currently incorrectly titled, so the move is to bring them in line with the guideline. We could have a conversation at a later date about whether the year is required at all for these types of articles (I'm not convinced it works), but currently they are in the guideline as such. If it's really a problem, perhaps we could drop them from the run?
  3. Polish presidential election, 1922 (special) is at the wrong title (one a few days before is at Polish presidential election, 9 December 1922, so the other one should be at Polish presidential election, 20 December 1922). This should therefore probably be moved to 20 December 1922 Polish presidential election;
  4. 2006 Ohio's 13th congressional district election is again a correct move in terms of the guideline (sixth bullet of WP:NC-GAL#Elections and referendums). The issue here is more around the awkward naming of the districts (e.g. Ohio's 13th congressional district) and is perhaps something that should be raised separately;
  5. The RfC did include discussion about titles that would start with a month (see Impru20's comments). The sortkey for articles like this would still be the year, followed by a numeral representing the month (e.g. "1946 1", "1946 2" for elections held in two separate months in 1946)
Cheers, Number 57 09:02, 17 October 2018 (UTC)
Looks like some bot coding is going to be needed to add/alter sort keys following the moves for new titles not starting with years. — xaosflux Talk 10:56, 17 October 2018 (UTC)
I'm not sure this is really needed; the current format means that articles don't automatically sort by year, so in many cases a sortkey has already been added. For instance, French constitutional referendum, October 1946 (Guinea) mentioned above is sorted in Category:Referendums in Guinea by the key 1946. This might be something better to do manually for the 380 articles with months in the year if not already in place. Number 57 13:00, 17 October 2018 (UTC)
For the other 4 categories it is in it is sorted only by page title such as Category:October 1946 events. I'm not sure what the 'best' answer for this is, but if doing it manually is the way that could be done prior to the page title moves to prevent issues in category view. — xaosflux Talk 13:08, 17 October 2018 (UTC)
I think the question there is what is the best category sortkey for the article in Category:October 1946 events. Number 57 13:10, 17 October 2018 (UTC)
Pages can be sorted differently in each category with a directive, but they only get one "default sort" for all undirected categories, so if in general sorting these be "month name" is undesirable a default sort should be defined for what the best general sorting should be. — xaosflux Talk 20:43, 17 October 2018 (UTC)
There doesn't seem to be any firm conventions around this – some articles starting with a year are sorted by the year in the events categories, and others by the first word after the year. However, in election categories sorting by year would definitely be desirable, so I guess the year followed by the month would be the best sorting (e.g. 1946 01, 1946 02 to 1946 12). Happy to add a DEFAULTSORT manually to these articles if it will resolve this concern for you? Cheers, Number 57 21:06, 17 October 2018 (UTC)
WP:NCGAL is a guideline, which like other guidelines says prominently at the top It is a generally accepted standard that editors should attempt to follow, though it is best treated with common sense, and occasional exceptions may apply. The use of a bot on such a huge scale precludes the consideration of exceptions. --BrownHairedGirl (talk) • (contribs) 03:03, 19 October 2018 (UTC)
    • PS The discussion above about some exceptions such as the naming of some Polish elections should not be conducted on this page. I am sure that @Number 57 is making recommendations on a well-reasoned basis, but decisions on the titles of individual articles should not buried in a technical page such as this; they belong at WP:RM. --BrownHairedGirl (talk) • (contribs) 03:08, 19 October 2018 (UTC)
      • This isn't RFA. Bolded "votes" aren't needed, or helpful. I do believe that this idea needs more discussion, and I'm surprised (in a good way!) at how much attention, and the overall quality of comments that this request is getting. I'm not sure about listing ~35,000 pages at WP:RM. I could see some parties seeing that as disruptive. SQLQuery me! 03:28, 19 October 2018 (UTC)
        • @SQL: no mater how many good comments there are here, this remains a technical page whose remit is to decide whether and how to use bots to implement a consensus. It is not a suitable place to form a consensus on whether to bypass WP:RM on this scale; that is right outside the remit of WP:BAG. --BrownHairedGirl (talk) • (contribs) 03:42, 19 October 2018 (UTC)
          • Right. I have an idea of how this works. I've been around a BRFA or two. You can see comments above questioning the RFC by other BAG members. I'm not sure where you get the idea that this is just going to pass without addressing that. SQLQuery me! 04:15, 19 October 2018 (UTC)
            • My point is simply that BRFA is not the place to address those issues. BRFA's role starts when they have been resolved. --BrownHairedGirl (talk) • (contribs) 04:50, 19 October 2018 (UTC)
  • I have raised this at WP:Village pump (policy)#Mass_renaming_of_election_articles,_bypassing_WP:Requested_moves. --BrownHairedGirl (talk) • (contribs) 03:36, 19 October 2018 (UTC)
  • Using some regex magic, I have split the original csv (which is still in the repository) into two camps. format1.xlsx (and .csv) contain the "odd ball" formats which could conceivably be the more contentious of the two groups, given the above discussion. format2.xlsx (and .csv) contain the "election(s), year" (where "year" is the end of the title), which appear to be less contentious per the above. While that doesn't necessarily lessen the problems above, they are now in two distinct datasets easier analyzed. It appears that 21972 are in the latter of the two, with 13253 being in format1. It should, however, be noted that the format1 dataset could be trimmed down by multiple thousands further if the words "referendum", "measures", and each of the state names were considered in the format (instead of just "election(s)"). --TheSandDoctor Talk 05:06, 19 October 2018 (UTC)
    If anyone wants it, I will split based on those words in 2 other files as well. A bot could conceivably run through format2 and leave the first for human intervention as it appears few (if any) concerns were raised about that case directly. That said, I made this bot based on the original bot request above, which was simple to implement, and submitted it accordingly. If another RfC is desired to further solidify the consensus and address concerns raised, I am all for it and do not wish to rush anything. Pinging all active participants: Number 57, Xaosflux, MusikAnimal, Anomie, The Earwig, Headbomb, Cyberpower678, SmokeyJoe, BrownHairedGirl, SQL. (hope that's everyone, if I missed anyone I apologize). --TheSandDoctor Talk 05:40, 19 October 2018 (UTC)
    Thanks, @TheSandDoctor. I am sure that you acted in good faith in respinse to the bot request. However, I do think that this request should be placed on hold pending a fresh RFC, and that the bot should not be run unless and until it is clear that there is a v broad consensus on a) the guidelines, and b) the use of a bot to bypass RM for >35k articles. I really don't see any basis for asserting that there is a consensus to rename e.g. the 860 relevant articles under WikiProject Ireland's scope with zero notification to WP:IRE or on any one of the 860 article pages. --BrownHairedGirl (talk) • (contribs) 05:55, 19 October 2018 (UTC)
    User:TheSandDoctor, I suggest putting a random selection, maybe ten, through the standard RM process. This will draw in critical comments from RM regulars. --SmokeyJoe (talk) 06:48, 19 October 2018 (UTC)
    @SmokeyJoe: I think that's a good idea, but wildly insufficient. This needs to go way beyond the RM regulars, who are few in number. And Joe, you are rightly critical of how CFD tends to be dominated by regulars. Same goes here.
    This needs to draw in editors who who have sufficient experience of each sub-topic (e.g. Spanish local elections, or Kenyan parliamentary elections) to assess how the broad principle works in their field and hopefully to look for any exception. --BrownHairedGirl (talk) • (contribs) 07:40, 19 October 2018 (UTC)
    Put ten through RM this week, then go back to RFC next week. The previous RFC was pretty sad in drawing attention, despite the attempt at publicising. —SmokeyJoe (talk) 08:14, 19 October 2018 (UTC)
  • The RfC supposedly supporting these mass moves looks very very dubious on a quick glance. The closing statement is terribly inadequate. This looks a tad overenthusiastic. 35,226 page moves with nothing mentioned at WP:RM? --SmokeyJoe (talk) 05:14, 19 October 2018 (UTC)
  • I will echo what Joe already said. That discussion is clearly insufficient for a change of this magnitude coupled with the bland closure. To avert unnecessary crisis I will suggest a new RFC on Village pump with detailed rationale and be well advertised . –Ammarpad (talk) 08:31, 19 October 2018 (UTC)
  • I knew nothing about this proposed task until today when it was mentioned at WT:RM. Given the number of pages involved it would be best to advertise this in all the relevant places where editors of such articles would be watching and get further input. Thanks. — Frayæ (Talk/Spjall) 08:39, 19 October 2018 (UTC)
  • Number 57 and others, for the record I have now changed my vote from oppose to support. However, I don't agree with my oppose vote being characterised in that way. I would generally advise against editors attempting to explain the reasons for other people's votes. Onetwothreeip (talk) 11:23, 19 October 2018 (UTC)
  • Strong oppose. This proposal does not have an adequate consensus given the large number of pages concerned, long-standing titles, and the high profile nature of the pages. The proposal should be tested in a few RMs first to see if it really has consensus.  — Amakuru (talk) 22:52, 19 October 2018 (UTC)
  • Number 57, I'm sorry, but I feel a bit misled that you cite WP:NCGAL as justification for moving California Proposition 10 (1998) to 1998 California Proposition 10, when it was you who changed the guideline in response to the RFC four days ago. I don't see any discussion in the RFC that supports this unnatural wording. Perhaps I am being pedantic, but I think this is an important distinction because "California Proposition 10" is a proper noun that external sources use directly, while "1998 California Proposition 10" really isn't. If there was only one Proposition 10 in California, I see a strong argument for excluding the year (c.f. California Proposition 46, though there aren't many examples), further supporting that the year acts as disambiguation and not as part of the proper name. I'm open to discussing this point further, but I don't feel it's clear-cut enough for the bot. — Earwig talk 02:36, 20 October 2018 (UTC)
    • Hi Earwig. I'm not really sure what the issue is here. The guideline previously stated that propositions should be of the title format "California Proposition 10, 1998" (the article itself is not named correctly according to the guideline by using parentheses). The RfC proposal was to move the year from the end to the start of the title, so California Proposition 10, 1998 would therefore become 1998 California Proposition 10. Number 57 10:40, 20 October 2018 (UTC)

Galobot 2

Operator: Galobtter (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 10:06, Tuesday, October 16, 2018 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python/Pywikibot

Source code available: https://github.com/galobtter/galobot

Function overview: Message users who add broken file links to articles

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#CAT:MISSFILE_bot; Wikipedia:Bots/Requests_for_approval/RonBot_12

Edit period(s): Daily

Estimated number of pages affected: ~10-20 a day

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Sends a talk page message to auto-confirmed users who add broken (red linked) file links to mainspace pages, by scanning CAT:MISSFILE. Mechanism is similar to Wikipedia:Bots/Requests for approval/DPL bot 2. Runs daily, seeing what new red linked files have been added, and messages the user who added them if they are auto-confirmed; doesn't message non-autoconfirmed users as they are likely vandals/wouldn't know how to fix the link. Most people who break file links are IPs/non-autoconfirmed so of the 70 or so broken links added each day I estimate only ~10 people will be messaged per day.

Figures out what image is broken and who did it using mw:API:Images and mw:API:Parse to get file links and finds out the revision in which the broken file link was added.

Message sent will be something like:

Hello. Thank you for your recent edits. An automated process has found that you have added a link to a non-existent file File:Hydril Compact BOP Patent.jpg to the page Blowout preventer in this diff. If you can, please remove or fix the file link.

You may remove this message. To stop receiving these messages, see the opt-out instructions. Galobtter (pingó mió) 10:06, 16 October 2018 (UTC)

Discussion

  • Consider this scenario: User A uploads a file and adds it to an article. A vandal (User B) blanks the page and User C reverts. Later, User D deletes the file. Who would be notified?
Note that it may take forever before pages with recently deleted files show up in Category:Articles with missing files so consider obtaining a list of articles from a database report and purging those so that the category is updated before you start notifying users. --Stefan2 (talk) 10:38, 16 October 2018 (UTC)
Thanks for the comment! In this case, nobody, because it skips cases where the file has been added and then removed and then added, i.e where the file has been added more than once. However if User A adds a file and later User B deletes the file, it'll notify User A, but only if that revision occurred within 24 hours before being listed in CAT:MISSFILE as it only checks the revisions that have occurred since the last run 24 hours ago. I was thinking previously, whether it should skip cases where the file has been deleted after a user adds a file? (can check deletion logs). Galobtter (pingó mió) 11:27, 16 October 2018 (UTC)
Actually, checking the deletion logs seems pretty necessary since the bot probably shouldn't spam people if FileDelinkerBot/CommonsDelinkerBot goes down. Will add Galobtter (pingó mió) 11:51, 16 October 2018 (UTC)

Bots in a trial period

EranBot 3

Operator: ערן (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 16:07, Saturday, September 15, 2018 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: source on github

Function overview:: This bot submits newly added text to the iThenticate API which determines if other sources are similar to it. Suspected copyvios (>50% similarity) can then reviewed manually (copypatrol; top reviewers: Diannaa, Sphilbrick, L3X1). In this BRFA I would like to ask to join it to copyviobot group, to access pagetriagetagcopyvio API which will be used by PageCuration extension aka Special:NewPagesFeed (see phab tasks).

Links to relevant discussions (where appropriate): prev BRFA, copyviobot, Epic task for copyvio in new pages feed (and subtasks)

EditActive period(s): Continuous

Estimated number of pages affected: N/A. The bot will tag suspected edits using API. This may be used by special page Special:NewPagesFeed.

Namespace(s): main namespace and drafts (the bot is not editing them, but may check them for copy)

Exclusion compliant (Yes/No): N/A

Function details:

  • any diff (except rollbacks) in main and draft NS which adds large chunck of text may be a subject for copyvio check
  • Copyvio check is done using iThenticate service (WP:Turnitin who kindly provided us access to their service)
  • Changes that are similar to existing text in external source are reported (can be reviewed in https://tools.wmflabs.org/copypatrol/en ) so users can further review them manually.
  • (new) By adding the bot to copyviobot group, it will be possible to access to suspected diffs more easily from Special:NewPagesFeed later

Eran (talk) 16:07, 15 September 2018 (UTC)

Discussion

48% of the edits reported as suspected copyvio required additional follow up ("page fixed"). In tools.labsdb:select status, count(*) from s51306__copyright_p.copyright_diffs group by status;

The full details how it is going to be shown in Special:NewPagesFeed would probably need to be discussed with community and with Growth team (MMiller, Roan Kattouw) - however, it is already possible to see an example in beta test wiki (search for "copyvio"). It would be important to note tagged page just means an edit may contain copied text (such edits may be OK [CC-BY content from government institutions], copyright violation [copy & paste from commercial news service] or promotional content [may be legally OK sometimes, but violates WP:Promo). Eran (talk) 16:07, 15 September 2018 (UTC)

It isn't sinking in how this fits in with the CopyPatrol activities. I'd like to discuss this further. Please let me know if this is a good place to have that discussion or if I should open up a discussion on your talk page or elsewhere.--S Philbrick(Talk) 18:03, 15 September 2018 (UTC)
Sphilbrick: I think it is relevant in this discussion, can you please elaborate? thanks, Eran (talk) 19:30, 15 September 2018 (UTC)
I start with a bit of a handicap. While I understand the new pages feed in a very broad sense, I haven't actually worked with it in years and even then had little involvement.
It appears to me that the goal is to give editors who work in the new page feed a heads up that there might be a copyvio issue. I've taken a glance at the beta test wiki — I see a few examples related to copyvios. I see that those entries have a link to CopyPatrol. Does this mean that the new page feed will not be directly testing for copyright issues but will be leaning on the copy patrol feed? I checked the links to copy patrol and found nothing in each case which may make sense because those contrived examples aren't really in that report, but I would be interested to know exactly how it works if there is an entry.
The timing is coincidental. I was literally working on a draft of a proposal to consider whether the copy patrol tools should be directly making reports to the editors. That's not exactly what's going on here but it's definitely related.
What training, if any is being given to the editors who work on the new pages feed? Many reports are quite straightforward, but there are a few subtleties, and I wonder what steps have been taken to respond to false positives.--S Philbrick(Talk) 19:57, 15 September 2018 (UTC)
CopyPartol is driven by EranBot with checks done by iThenticate/Turnitin. This BRFA is to send revision IDs with possible violations to the API, which will cause the CopyPatrol links to be shown in the new pages feed. — JJMC89(T·C) 04:53, 16 September 2018 (UTC)
Sphilbrick: thank you for the good points.
  • Regarding training for handling new pages feed and copyvios - I was about to suggest to document it, but actually it is already explained in Wikipedia:New pages patrol#Copyright violations (WP:COPYVIO) quite well (but we may want to update it later)
  • Directly making reports to the editors - This is good idea, and actually it was already suggested but was never fully defined and implemented - phab:T135301. You are more than welcome to suggest how it should work there (or in my talk page and I will summarize the discussion on phabricator).
Eran (talk) 18:40, 16 September 2018 (UTC)
Thanks for the link to the training material. I have clicked on the link to "school" thinking it would be there, but I now see the material in the tutorial link.
Regarding direct contacts, I'm in a discussion with Diannaa who has some good reasons why it may be a bad idea. I intend to follow up with that and see if some of the objections can be addressed. Discussion is [[|User_talk:Diannaa#Copyright_and_new_page_Patrol|here]].--S Philbrick(Talk) 18:54, 16 September 2018 (UTC)
@Sphilbrick: thanks for the questions, and I'm sorry it's taken me a few days to respond. It looks like ערן has summarized the situation pretty well, but I'll also take a stab. One of the biggest challenges with both the NPP and AfC process is that there are so many pages that need to be reviewed, and there aren't good ways to prioritize which ones to review first. Adding copyvio detection to the New Pages Feed is one of three parts of this project meant to make it easier to find both the best and worst pages to review soonest. Parts 1 and 2 are to add AfC drafts to the New Pages Feed (being deployed this week), and to add ORES scores on predicted issues and predicted class to the feed for both NPP and AfC (being deployed in two weeks). The third part will add an indicator next to any pages who have a revision that shows up in CopyPatrol, and those will say, "Potential issues: Copyvio". Reviewers will then be able to click through to the CopyPatrol page for those revisions, investigate, and address them. The idea is that this way, reviewers will be able to prioritize pages that may have copyvio issues. Here are the full details on this plan. Xaosflux has brought up questions around using the specific term "copyvio", and I will discuss that with the NPP and AfC communities. Regarding training, yes, I think you are bringing up a good point. The two reviewing communities are good at assembling training material, and I expect that they will modify their material as the New Pages Feed changes. I'll also be continually reminding them about that. Does this help clear things up? -- MMiller (WMF) (talk) 20:32, 20 September 2018 (UTC)
Yes, it does, thanks.--S Philbrick(Talk) 21:37, 20 September 2018 (UTC)
  • User:ערן how will your bot's on-wiki actions be recorded (e.g. will they appear as 'edits', as 'logged actions' (which log?), etc?). Can you point to an example of where this get recorded on a test system? — xaosflux Talk 00:22, 16 September 2018 (UTC)
    Xaosflux: For the bot side it is logged to s51306__copyright_p on tools.labsdb but this is clearly not accessible place. It is not logged on wiki AFAIK - If we do want to log it this should be done in the extension side. Eran (talk) 18:40, 16 September 2018 (UTC)
    phab:T204455 opened for lack of logging. — xaosflux Talk 18:48, 16 September 2018 (UTC)
Thanks, Xaosflux. We're working on this now. -- MMiller (WMF) (talk) 20:33, 20 September 2018 (UTC)
  • I've never commented on a B/RFA before, but I think that another bot doing copyvios would be great, esp if it had less false positives than the current bot. Thanks, L3X1 ◊distænt write◊ 01:12, 16 September 2018 (UTC)
    • L3X1: the Page Curation extension defines infrastructure for copyvio bots - so if there are other bots that can detect copyvios they may be added to this group later. AFAIK the automated tools for copyvio detection are Earwig's copyvio detector and EranBot/CopyPatrol and in the past there was also CorenSearchBot. The way it works is technically different (one is based on a general purpose search using Google search, one is based on Turnitin copyvio service) and they are completing each other with various pros and cons for each. I think Eranbot works pretty well (can be compared to Wikipedia:Suspected copyright violations/2016-06-07 for example)
    • As for the false positives - it is possible to define different thresholds for the getting less false positives but also missing true positives. I haven't done a full Roc analysis to tune all the parameters but the arbitrary criteria is actually works pretty well somewhere in the middle ground. Eran (talk) 18:40, 16 September 2018 (UTC)
  • Follow up from BOTN discussion, from what has been reviewed so far, the vendor this bot will get results from can check for "copies" but not necessarily "violations of copyrights" (though some copies certainly are also copyvios), as such I think all labels should be limited to descriptive (e.g. "copy detected"), as opposed to accusatory (humans should make determination if the legal situation of violating a copyright has occured). — xaosflux Talk 01:30, 16 September 2018 (UTC)
    That would be part of the new pages feed, which the bot doesn't control. Wikipedia talk:WikiProject Articles for creation/AfC Process Improvement May 2018 or Phabricator would be more appropriate venues for discussing the interface. — JJMC89(T·C) 04:53, 16 September 2018 (UTC)
    @JJMC89: what I'm looking for is where is a log of what this bot does control. As this is editor-managed, its not unreasonable to think another editor may want to run a similar or backup bot in the future. — xaosflux Talk 05:14, 16 September 2018 (UTC)
  • Would it be possible to assign a number of bytes to "large chunck of text"? SQLQuery me! 02:25, 16 September 2018 (UTC)
    500 bytes. — JJMC89(T·C) 04:53, 16 September 2018 (UTC)
  • Procedural note: The components for reading changes, sending data to the third party, and making off-wiki reports alone do not require this BRFA; making changes on the English Wikipedia (i.e. submitting new data to our new pages feed, etc) are all we really need to be reviewing here. Some of this may have overlap (e.g. what namesapces, text size, etc), however there is nothing here blocking the first 3 components alone. — xaosflux Talk 18:54, 16 September 2018 (UTC)
  • @MMiller (WMF): any update on verbiage related to phab:T199359#4587185 ? — xaosflux Talk 18:35, 2 October 2018 (UTC)
    • @Xaosflux: If the verbiage issue is resolved, I was wondering if we could move ahead with a trial for this BFRA. The way that PageTriage works is that it won't allow bots to post copyvio data to it unless the bot belongs to the "Copyright violation bots" group. So for the trial, you'll need to add EranBot to the group with whatever expiration time you like. It would be good to have at least a couple days so that we can make sure everything is running properly on our end as well. Ryan Kaldari (WMF) (talk) 17:35, 4 October 2018 (UTC)
    • @Xaosflux: ping. Ryan Kaldari (WMF) (talk) 17:12, 10 October 2018 (UTC)
  • Thank you, pending community closure at WP:VPP. As far as a trial goes, any specific day you would like to do the live run? — xaosflux Talk 03:54, 14 October 2018 (UTC)
  • Closed the VPP thread as succesful. WBGconverse 06:23, 14 October 2018 (UTC)
{{OperatorAssistanceNeeded}} In prepping for a live trial, what day(s) would you like to do this? I want to make sure we send notices to Wikipedia:New pages patrol and perhaps a note at MediaWiki:Pagetriage-welcome. — xaosflux Talk 13:48, 15 October 2018 (UTC)
Xaosflux: Would it work to run with reports between 16 October ~16:00 UTC time - 17 October ~16:00 UTC ? Eran (talk) 15:23, 15 October 2018 (UTC)
That sounds good for us. What do you think Xaosflux? Ryan Kaldari (WMF) (talk) 17:24, 15 October 2018 (UTC)

Trial

  •   Approved for trial (7 days). I've added the cvb flag for the trial and let the NPP/Reviewers know. Do you have a good one line text that could be added to MediaWiki:Pagetriage-welcome to help explain things and point anyone with errors here? — xaosflux Talk 18:41, 15 October 2018 (UTC)
User:Ryan Kaldari (WMF), I don't actually see an option for using this filter in Special:NewPagesFeed - is it hidden because there are none currently? — xaosflux Talk 19:30, 15 October 2018 (UTC)
I'm not seeing on betalabs either - how is anyone going to actually make use of this? — xaosflux Talk 19:32, 15 October 2018 (UTC)
I was guessing it would show in the filters under "potential issues", but there's nothing there. FWIW, "attack" also has no articles, but is still shown there. I think I might be misunderstanding how this works altogether. Natureium (talk) 19:39, 15 October 2018 (UTC)
@Natureium: just regarding the "attack" filter having no pages, that is behaving correctly. It is very rare that a page gets flagged as "attack", because whole pages meant as attack pages are rare. It's much more common for pages to be flagged as "spam", and you can see some of those in the feed. To see some flagged as attack, you can switch to "Articles for Creation" mode and filter to "All" and "attack". -- MMiller (WMF) (talk) 22:52, 15 October 2018 (UTC)
@Xaosflux and Natureium: During the trial period, they will need to add "copyvio=1" to the Special:NewPageFeed URL to see the interface changes. So https://en.wikipedia.org/wiki/Special:NewPagesFeed?copyvio=1. Nothing has been tagged as a potential copyvio yet, so not much to see at the moment. Ryan Kaldari (WMF) (talk) 20:16, 15 October 2018 (UTC)
I added a note at Wikipedia talk:New pages patrol/Reviewers with the above info. Ryan Kaldari (WMF) (talk) 20:20, 15 October 2018 (UTC)
@Ryan Kaldari (WMF): thank you, I included a link to that in the header for Special:NewPagesFeed to help guide any testers. — xaosflux Talk 20:53, 15 October 2018 (UTC)
@Xaosflux: thanks for your help here and for adding that editor. Roan Kattouw (WMF) edited it to link to the project page for this work so that people coming across it have some more context. -- MMiller (WMF) (talk) 22:52, 15 October 2018 (UTC)
  • Regarding the scope of this bot, User:ערן / User:Ryan Kaldari (WMF) the function overview calls for this to run against "newly added text", but the trials suggest it is only running against newly added pages - is this limited to new pages? — xaosflux Talk 13:37, 17 October 2018 (UTC)
    • Xaosflux: the bot runs against any added text and reports for suspected edits to CopyPatrol and to pagetriage. Page triage accepts only edits for pages in the NewPagesFeed (e.g new pages). Eran (talk) 14:43, 17 October 2018 (UTC)
      • Thanks, I'm not concerned with updates offwiki (such as to CopyPatrol) for this BRFA, just trying to clarify when activity will actually be made on-wiki. For example with page [[Sarkar (soundtrack)]: it was created on 20181007, and a revision made today (20181017) triggered the bot action. Are you only attempting actions for pages where the creation is within a certain timeframe? — xaosflux Talk 15:07, 17 October 2018 (UTC)
        • Xaosflux: No it doesn't care for the page creation time (page creation time isn't that meaningful for drafts). However this is viewable only in Special:NewPagesFeed which is intended for new pages, but I'm not sure what is the definition of new page for the feed (User:Ryan Kaldari (WMF) do you know?). Eran (talk) 16:32, 17 October 2018 (UTC)
          • Will these be usable for recent changes or elsewhere, to catch potential copyvios being introduced to 'oldpages' ? — xaosflux Talk 16:36, 17 October 2018 (UTC)

Handling Wikipedia mirrors

(  Buttinsky) I know very little about the techincal aspect of this, so if I need to just pipe down I will, but No it doesn't care for the page creation time (page creation time isn't that meaningful for drafts) is one of the main problems that exists with Turintin-based CopyPatrol. Dozens upon dozens of revisions are flagged as potential CVs even though the dif in question did not add the text that is supposedly a CV, most of the time it seems as if if anyone edits a page with a Wikipedia mirror (or whne someone else has wholesale lifted the article) no matter how small the edit, it will be flagged. Most of the 6000 some cases I've closed have been false positives along those lines, and I think it might be of some use to make the software segregate against any cases where the editor has more than 50,000 edits. Thanks, L3X1 ◊distænt write◊ 02:46, 18 October 2018 (UTC)

L3X1, thank you for the comment. I think this is good comment that should be addressed and disscussed in a seperate subsection, I hope this is OK.
Historically, EranBot detects Wikipedia mirrors (see example in User:EranBot/Copyright/rc/44; look for Mirror?) where the intention is to handle also Copyvio within Wikipedia. That is, if user copies content from other article, he should give credit in the edit summary. (e.g example for sufficent credits in summary: "Copied from Main page").
This is common case, and somewhat different from copying from external site/book. I think CopyPatrol interface doesn't show this indication of mirror (as other indications of CC-BY). So how should we address it:
  • Do the community wants reports on "internal copyvio" (copy within Wikipedia/Wikipedia mirror) without credits? (if no, this can be disabled, and we will not get anymore reports on such edits)
  • If the community does want reports for "internal copyvio":
    • We can add the hints in the CopyPatrol side (Niharika and MusikAnimal) if this isn't done already. (I think it doens't)
    • This is up to community wheather we want to have distinction of labels in NewPagesFeed?
(based on community input here, this will be tracked technically in phab:T207353)
Eran (talk) 05:29, 18 October 2018 (UTC)
Though I suspect this will be a minority viewpoint, I don't think detecting copying within Wikipedia is as important as catching from external sources. Thanks, L3X1 ◊distænt write◊ 16:56, 18 October 2018 (UTC)
  • If these are only page creations, I think this would be useful for finding pages that have been copy-pasted from other articles, because this also requires action. Natureium (talk) 16:59, 18 October 2018 (UTC)

MusikBot II 2

Operator: MusikAnimal (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 03:23, Monday, September 24, 2018 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Typically I use Ruby, but here it may have to be PHP or possibly Node.js.

Source code available: GitHub

Function overview: Syncs Wikipedia:Geonotice/list.json (to be created, fully-protected) with MediaWiki:Gadget-geonotice-list.js.

Links to relevant discussions (where appropriate): Special:PermaLink/862124571#Geonotices closed discussion, support of usage at Wikipedia talk:Interface administrators (see also RFC for IAdmins at top of that page allowing bot access where bot operator is also an IAdmin)

Edit period(s): Continuous

Estimated number of pages affected: 1

Namespace(s): MediaWiki

Exclusion compliant (Yes/No): No, not applicable.

Adminbot (Yes/No): Yes

Function details: First, some background: With the advent of the interface administrator user group, sysops can no longer edit MediaWiki:Gadget-geonotice-list.js. Many of these users are not particularly tech-savvy, and have no use for editing site-wide JS outside configuring geonotices for outreach purposes, etc. The configuration is literally just a JavaScript object, with key/value pairs. Using a JSON page then makes much more sense, which they'd be able to edit. However currently we cannot put JSON pages behind ResourceLoader (phab:T198758), so for performance reasons we need to continue to maintain the JS page. The proposed workaround is have a bot sync a JSON page with the JS page. This is in our best interests for security reasons (fewer accounts with access to site JS), but also JSON is easier to work with and gives you nice formatting, hence less prone to mistakes.

Implementation details:

  1. Check the time of the last edit to Wikipedia:Geonotice/list.json.
  2. If it is after the time of the last sync by the bot (tracked by local caching), process the JSON.
  3. Perform validations, which include full JSON validation, validating the date formats, country code (going off of ISO 3166), and format of the corners.
  4. If validations fail, report them at User:MusikBot II/GeonoticeSync/Report (example) and do nothing more.
  5. If validations pass, build the JS and write to MediaWiki:Gadget-geonotice-list.js (example), and update the report stating there are no errors (example).

The comment block at the top of MediaWiki:Gadget-geonotice-list.js can be freely edited. The bot will retain this in full.

Discussion

Per Xaosflux I've preemptively started this BRFA. I haven't done any coding but I think this is a good time to discuss implementation details. Concerns that come to mind:

  • What to do when there are syntax errors. The native JSON editor should mean admins won't introduce syntax errors, because it won't even let you save. But, it can happen -- say the admin ironically has JavaScript disabled. As a safeguard, the bot can validate the JSON, too (easy, thanks to existing libraries). Similar to User:MusikBot/PermClerk/Report, the bot could have a status report page, transcluded at Wikipedia talk:Geonotice/list.json. This way they can get some debugging info should something go wrong. If we want to get real fancy, the bot could also report when the configuration doesn't match the expected format, as described in the comments at MediaWiki:Gadget-geonotice-list.js. I think that would a nice feature, but not a requirement.
  • After deployment, we'd need to update the JS page to clearly say it should not be edited directly. We could do a two-way syncing, but I'd prefer not to, just to keep it simple.
  • I can confirm MusikBot II's account is secure and 2FA is enabled (with some caveats). The bot solution still puts us on the winning end, as there will be fewer int-admin accounts than if we added it to all who manage geonotices.
  • Anything else? MusikAnimal talk 03:23, 24 September 2018 (UTC)
  • @MusikAnimal: for the directional sync concerns, a defined "section" (delineated by comments) should be the only area edited - this section should be marked "do not edit directly" - and the bot should only edit within the section. This way if other changes to the page are needed they won't interfere. — xaosflux Talk 04:25, 24 September 2018 (UTC)
    • This should work fine, just like her PERMclerking, right? Would be good if there are rush edits, last-minute-changes, etc. ~ Amory (utc) 16:22, 24 September 2018 (UTC)
    • Yeah, we can definitely reserve a part of the JS page for free editing, much like we do at WP:AWB/CP. MusikAnimal talk 16:41, 24 September 2018 (UTC)
  • I'd like to see some tests over at testwiki that can be used to demonstrate the edits. — xaosflux Talk 04:25, 24 September 2018 (UTC)
    • No problem. Though I don't think we need to test Geonotice itself (could be tedious), rather just that the JS was generated properly. MusikAnimal talk 16:41, 24 September 2018 (UTC)
      • Agree, don't need to actually implement the geonotice, just that things work as expected in the namespaces and content types. — xaosflux Talk 01:21, 25 September 2018 (UTC)
  • Syntax errors could still occur in the data - will you validate this as well? For example putting start/end dates in their own cells, validate that this is date data and not something else? Everything should be validated (e.g. this should not be a route to inject new javascript). — xaosflux Talk 04:25, 24 September 2018 (UTC)
    • Perhaps make the mock-up json page to demonstrate? — xaosflux Talk 04:25, 24 September 2018 (UTC)
    • JS injection shouldn't be possible, unless there are vulnerabilities in Geonotice itself. I would hope it doesn't use eval on the strings. Arbitrary code (e.g. begin: alert('foo!') isn't valid JSON and hence would fail on the initial validation (and the MediaWiki JSON editor won't let you save it, either). We can still validate it ourselves, to be sure. As I said this would be a nice feature. I don't know that I want to validate things like the country, though. We could validate the 'begin'/'end' date format, in particular, but for everything else I think the bot will just look for permissible keys and the data type of the values ('country' is a string, 'corners' is an array of two arrays, each with two integers). MusikAnimal talk 16:41, 24 September 2018 (UTC)
      • Injection would be if you accepted arbitrary "text" and just made it js, where the text could contain characters that would terminate the text field and then continue along in javascript. — xaosflux Talk 17:11, 24 September 2018 (UTC)
  • For the JSON page, not the bot: we'll also have to move the normal explanation text into an editnotice or regular notice, since comments are stripped on save for pages with the JSON content model. Enterprisey (talk!) 23:32, 24 September 2018 (UTC)
  • Got a prototype working, see Special:Diff/861109475. There are quotations around all the keys, but JavaScript shouldn't care. Maybe we should test against testwiki's Geonotice to be sure. This does mean the rules have changed -- you don't need to escape single quotes ', but you do for double quotation marks ". This is just a consequence -- that's how JSON wants it. I think single quotes are probably more commonly used in the geonotice text anyway, so this might be a welcomed change. The bot could find/replace all "'s to ', but this would be purely cosmetic and error-prone when it is not really needed. Other formatting has changed, mostly whitespace. Also in the edit summary we're linking to the combined diff of all edits made to the JSON page since the last sync. That way we can easily verify it was copied over correctly. We do loose attribution here (as opposed to linking to individual diffs), but I think that's okay? Source code (work in progress) is on GitHub. I've made this task translatable, should other wikis be interested in it. I'm going to stop here until the bot proposal discussion has closed. MusikAnimal talk 04:55, 25 September 2018 (UTC)
    I agree with the quoting change. You may want to specify the number of edits if it's more than one, but I don't know if that's required for attribution. (And it's displayed on the diff page anyway.) Enterprisey (talk!) 06:16, 25 September 2018 (UTC)
  • I started adding some 'directions' at Template:Editnotices/Page/User:MusikBot II/GeonoticeSync/Test config.json, please fill out with more directions, requirements, etc. As far as attribution, in the edit request at least pipe to the name of the source page to make it clear where the source is without having to follow the diff. — xaosflux Talk 15:21, 28 September 2018 (UTC)
    • Another option there is to put the whole attribution (source, diff, time, user of diff) into the comments of the .json, and only minimal in the edit summary (revid, sourcepage). --Dirk Beetstra T C 13:59, 2 October 2018 (UTC)

─────────────────────────

  • Update I've resumed work on this and am ready for more feedback. The current implementation is described in the "function details" above. I still need to work on filling out Template:Editnotices/Page/User:MusikBot II/GeonoticeSync/Test config.json, please feel free to help. That page will be moved to Template:Editnotices/Page/Wikipedia:Geonotice/list.json when we're ready to go live.

    For validations, see Special:PermaLink/863494086 for an example invalid config (with lots of errors!) and Special:PermaLink/863494234 for generated report. A few notes:

    • I'm using Ruby internal methods to tell if the date is valid. This works for "Invalid date" or "35 January 2018 00:00 UTC" but not for invalid month names as with "15 Foobar 2018 00:00 UTC". Going by some logic I don't understand it chooses some other valid month. I could use regular expressions to ensure the month names are valid, but I want this bot task to work in other languages where I assume they're able to put in localized month names, if not a different format entirely (which Ruby should still be able to validate). Anyway I think this is fine. There were no validations before, after all :)
    • Validating the country code actually works! It's going off of the ISO 3166 spec, which is what's advertised as the valid codes Geonotice accepts.
    • Coordinates are validated by ensuring there are two corners, and each with two values (lat and lng), and that the values are floats and not integers or strings.
    • The keys of each list item are also validated, ensuring they only include "begin", "end", "country", and either "corners" or "text".
    • I added code to check if they escaped single quotations (as with \'), since Geonotice admins probably are used to doing this. Amazingly, MediaWiki won't even let you save the JSON page if you try to do this, as indeed it is invalid JSON. So turns out no validation is needed for this, or any other JSON syntax errors for that matter. This should mean we don't need to worry about anyone injecting malicious code.
    • The comment block at the top of the JS page is retained and can be freely edited.
    • Back to the issue of attribution in the edit summary, I went with Xaosflux's recommendation and am using a combined diff link, piped with the title of the JSON page. I'm not sure it's worth the hassle of adding in comments in the generated JS code directly, but let me know if there are any strong feelings about that.

Let me know if there's anything else I should do, or if we're ready for a trial! MusikAnimal talk 04:35, 11 October 2018 (UTC)

MediaWiki won't let you save the page with invalid JSON even if you turn off JS or use the API, right? Because if it does you may want to validate for that case. Enterprisey (talk!) 04:43, 11 October 2018 (UTC)
Luckily it's server side. It shows the error "Invalid content data", even if you have JS turned off. I haven't tested the API yet, but if it does work it's probably a bug in MediaWiki :) MusikAnimal talk 16:38, 11 October 2018 (UTC)
But I should clarify, the bot does validate JSON content, but I haven't tested to see if this works because I am unable to create invalid JSON :) At any rate, we would not end up in a situation where an invalid JS object is written to MediaWiki:Gadget-geonotice-list.js, because the core JSON methods that we're using would error out before this happens. MusikAnimal talk 20:15, 11 October 2018 (UTC)
This seems fine to me. My main concern (and it is fairly minor) is making sure the bot's role is clear in documentation/notices, and that people will know how to look for errors if something doesn't get updated because validation failed (because there won't be immediate user feedback as there is with the basic MW-side JSON validation). I'm giving this for a two-week trial, pending completion of the editnotice(s) and related pages and granting of the i-admin flag; based on history, that should allow for at least a handful of edits to test with, but feel free to extend if more time is required.   Approved for trial (14 days). — Earwig talk 05:51, 14 October 2018 (UTC)
@The Earwig: will this be trialing on the actual pages or in userspace? Ping me if you need a flag assigned for trialing. — xaosflux Talk 00:06, 15 October 2018 (UTC)
@Xaosflux: My intention is for a full trial. I saw there were already reasonable tests done in the userspace, so given that MA feels comfortable working with the actual pages now, I'm fine with that too. As for the IA flag, it's not clear to me from the policy whether we can do that here or a request needs to be explicitly made to BN? I would prefer MA post something to BN to be safe, but I suppose one interpretation of the policy would let you grant it immediately without the waiting period. — Earwig talk 03:22, 15 October 2018 (UTC)
@MusikAnimal: what authentication options do you have configured for this bot account? (e.g. 2FA, BotPasswords, OAuth) — xaosflux Talk 11:57, 15 October 2018 (UTC)
@Xaosflux: 2FA is enabled. Historically I have not had a good solution for OAuth, but times have changed. I'll try to look into this today. For the record the current consumer for MusikBot II can only edit protected pages, all other admin rights are not permitted. We will use a different consumer here, and all related edits will be tagged with the application name. We could use "GeonoticeSync 1.0" (what I've dubbed the task, and then a version number), or is there a better name? For permissions, I believe the consumer only needs editsiteconfig.
So no need to grant int-admin just yet -- although it should be safe to do so, because we have 2FA enabled and the current consumer can't edit sitewide or user JS/CSS.
The outstanding to-dos:
  1. Create OAuth consumer and rework the bot to use it.
  2. Create Wikipedia:Geonotice/list.json to reflect current configuration, fully protect it, and move Template:Editnotices/Page/User:MusikBot II/GeonoticeSync/Test config.json to Template:Editnotices/Page/Wikipedia:Geonotice/list.json.
  3. Update documentation at Wikipedia:Geonotice and also describe the new process in the comment block at MediaWiki:Gadget-geonotice-list.js.
  4. Ping all the current Geonotice admins to make sure they know about the new system, and the new rules (don't escape single quotes, but do for double, etc.).
  5. Grant int-admin to MusikBot II, and enable the task.
I'll ping you when I'm done with steps 1-3, and once given the final okay we'll do 4-5. Sound like a plan? If we have to rollback or abandon the new system, I'll be sure to let everyone know that they can go back to editing MediaWiki:Gadget-geonotice-list.js directly. MusikAnimal talk 16:58, 15 October 2018 (UTC)
Sounds fine, let us know when you are ready. — xaosflux Talk 18:37, 15 October 2018 (UTC)
@Xaosflux and The Earwig: I spoke with Bawolff, a security expert working for the Foundation. It would seem in this case, bot passwords is no less secure than OAuth. OAuth is more about editing on behalf of users, or authorizing users to use some centralized service. This is fantastic news, because I found the library I was going to use is more for webservices (specifically Ruby on Rails or similar), which doesn't apply here. I would have to implement my own client. To illustrate the complexity, have a look at this "simple" example written in PHP.
So if it's alright, I'd like to move forward with the current bot infrastructure. I have gone ahead and set up the JSON config to reflect the current JS config, and filled in the edit notice. I'm going to be out of town this weekend, so I can start the trial early next week if we're ready to move forward (first doing steps 3-5 above). MusikAnimal talk 03:09, 19 October 2018 (UTC)
@MusikAnimal: its not "as good" but it is still much better than using standard passwords. Please let us know what BP grants you are including (you don't have to disclose the allowed IP ranges (that you should also use if you can). — xaosflux Talk 03:15, 19 October 2018 (UTC)

ZackBot 10

Operator: Zackmann08 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 20:36, Friday, September 28, 2018 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Ruby

Source code available: User:ZackBot/Infobox-needed

Function overview:The goal is to scan pages that are in Category:Wikipedia articles with an infobox request and remove any pages that already have an infobox.

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#Bot_to_update_'Needs_infobox'

Edit period(s): One time run for now.

Estimated number of pages affected: Very difficult to say. Per PetScan there are currently 88,074 talk pages that fall in the category. I'd guess that somewhere between 3%-8% of those have Infoboxes and thus would be affected by this script. So A guess would be somewhere around 7,000-8,000 pages? But that is a TOTAL guess. This will be greatly dependent on how many of these sub categories I will run the script against.

Namespace(s):Main

Exclusion compliant (Yes/No): yes

Function details:

The functionality is pretty straight forward:

  1. Take a list of pages from a PetScan search. These will be Talk pages that are marked as needing an infobox.
  2. Check the text of the page and search for the word infobox. My research thus far has indicated that just looking for the word infobox should be good enough as it is not a term used in any other context that I can find. However, if granted a trial run, this will be an area I will be focusing my attention on.
  3. If the page is found to contain the word then go back to the talk page and look for the param 'needs-infobox' and remove it from the templates.
  4. In the event that the needs-infobox parameter is not found, an error is raised and logged for manual inspection.

The ONLY change that this script will be making is to Talk pages, and it will be to remove text matching \|\s*needs-infobox\s*=\s*y(?:es){0,1}\s*

--Zackmann (Talk to me/What I been doing) 20:36, 28 September 2018 (UTC)

Discussion

  • Could it regex for something like [{][{][ \n\t]*[Ii]nfobox ? -- GreenC 21:34, 28 September 2018 (UTC)
  • @GreenC: so technically speaking it can search for any regex. I think you are on the right track, but that has a few problems. Not all infoboxes start with {{infobox.... But perhaps something like /\{\{[\s\w\n]*infobox/i. See my testcase: here --Zackmann (Talk to me/What I been doing) 21:41, 28 September 2018 (UTC)
Great. -- GreenC 23:35, 28 September 2018 (UTC)

I'd like to see a short trial to see how it works.   Approved for trial (100 edits). SQLQuery me! 03:30, 14 October 2018 (UTC)

Filedelinkerbot 3

Operator: Krd (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 06:09, Tuesday, October 2, 2018 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Perl

Source code available: No

Function overview: The bot is active as a Commonsdelinker clone since 2014, removing links to files deleted at Commons. It was requested at Wikipedia:Bot requests#CAT:MISSFILE bot that the bot shall also remove links to files deleted locally, which has been activated as trial and appears to work without problems.

Links to relevant discussions (where appropriate): none

Edit period(s): Continuous

Estimated number of pages affected: 10 per day

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes.

Function details: n/a

Discussion

Does it check to see if the file exists locally before de-linking? Unlikely, I know, but possible all the same. SQLQuery me! 23:01, 10 October 2018 (UTC)

Yes, of course, but irrelevant for this request as this is about local deletions. --Krd 04:50, 11 October 2018 (UTC)

Good task for a bot. ImageRemovalBot used to remove red-linked images until it went AWOL last month -FASTILY 16:50, 13 October 2018 (UTC)

  Approved for trial (7 days). SQLQuery me! 03:29, 14 October 2018 (UTC)

Bots that have completed the trial period

FRadical Bot

Operator: FR30799386 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 17:58, Thursday, September 20, 2018 (UTC)

Automatic, Supervised, or Manual: Manual

Programming language(s): AutoWikiBrowser

Source code available: AWB

Function overview: This bot-task will try to remove all instances of the use of MiszaBot, MiszaBot I, MiszaBot II, MiszaBot III from the parameters of the template {{Auto archiving notice}} and replace them with Lowercase sigmabot III.

Links to relevant discussions (where appropriate):

Edit period(s): (Irregular) As and when I get time to run the bot. I will try not to exceed a 15 edits/minute edit rate.

Estimated number of pages affected: ~4294 pages will be affected

Namespace(s):Talk: namespace

Exclusion compliant (Yes/No): No

Function details: In most of the article talkpages, the manually set |bot= parameters of the template {{Auto archiving notice}} point to the long inactive set of MiszaBots namely MiszaBot, MiszaBot I, MiszaBot II, MiszaBot III. I will via this bot account (using AWB) try to make the notice point to the right bot, namely Lowercase sigmabot III. The logic used is outlined below :

  • First all the pages transculding the template Auto archiving notice are extracted using the Make List function of AWB.
  • These pages are then filtered to include only those in the Talk: namespace.
  • The pages are then pre-parsed to remove those with \|bot *\= *Lowercase\ sigmabot\ III

* Finally, the pages are then checked for the strings = *MiszaBot(regex), MiszaBot I, MiszaBot II, MiszaBot III and then replaced with =Lowercase sigmabot III for the first and Lowercase sigmabot III for the rest.

  • Find instances of \{\{([Aa]utoarchivalnotice|[Aa]utoarchive|[Aa]utoArchivingNotice|[Aa]utoarchivingnotice|[Aa]uto[ _]+archiving[ _]+notice)(.*?)\|bot\=( *)MiszaBot *I* and replace it with {{$1$2|bot=$3Lowercase sigmabot III

Additionally, each and every edit will be reviewed by the operator(me) via the AWB. Regards — fr+ 17:58, 20 September 2018 (UTC)

Discussion

The bot is currently configured to run based on MiszaBot's template transclusions. I could, in theory, reconfigure it to use a new (as of yet nonexistent) template for lowercase sigmabot, but I intentionally did not do so to avoid making hundreds of thousands of needless edits to change a transclusion. I would not recommend proceeding further with this BRFA. Σσς(Sigma) 22:34, 22 September 2018 (UTC)
Whoops. I misread. As far as lowercase sigmabot's behaviour is concerned, this looks fine, I'll let the BAG decide what's best. Σσς(Sigma) 22:43, 22 September 2018 (UTC)
  • Not to ask the stupid question, but if you're doing it totally manually, why not just get an "AWB account"? Primefac (talk) 19:59, 23 September 2018 (UTC)
Primefac The bot will edit pages that have extremely high number of watchers (For example : Talk:Mahabharata which has 622 watchers, 69 of which watch recent changes regularly). Since the (bot) flag will allow edits to be hidden from the watchers, I would prefer to use a bot account over a AWB account the edits of which cannot be hidden from the watch-list. Regards — fr+ 10:51, 24 September 2018 (UTC)
While not in scope of this task, if expanding to user_talk: in a future task a bot flag will be critical to prevent 'new messages' alerts. — xaosflux Talk 12:54, 24 September 2018 (UTC)
  • @FR30799386: is this solely in the "Talk:" namespace, or also in "talk namespaces" (e.g. user_talk, wikipedia_talk)? — xaosflux Talk 02:18, 24 September 2018 (UTC)
Xaosflux In this bot request, Talk: does mean only those pages with the Talk: prefix (i.e. only those in ns:1). However, I have plans to extend the bot functionality to encompass the rest of the talk namespace in a later BRFA. A full list of all pages this bot is expected to edit can be found here — fr+ 10:51, 24 September 2018 (UTC)

Seems like a good task for a bot. All these references to Misza I/II/III Bot are likely confusing for newbies. -FASTILY 05:20, 26 September 2018 (UTC)

  • xaosflux Would it be okay to add the bot to the AutoWikiBrowser check page, so that I can run some mock tests in my userspace ? — fr+ 11:16, 27 September 2018 (UTC)
    @FR30799386: OK added to AWBCP, only own-user spaces should be used right now. — xaosflux Talk 12:07, 27 September 2018 (UTC)
  • Xaosflux I have made the mock test in my userspace [diff]. I have also posted the revised RegEx (developed as result of the mock test) in the function details parameter of the request. Regards — fr+ 15:57, 29 September 2018 (UTC)
  • {{BAG assistance needed}} — fr+ 08:16, 5 October 2018 (UTC)
  • It has been around a week since the last BAG member edited this page. Are there any outstanding queries which I need to resolve ? Will it be possible to have a trail this ensuing week ? I am asking this because I will be chronically unavailable from 14th to 22th October. It would be good if I can finish the trail before that. — fr+ 08:16, 5 October 2018 (UTC)
    I know that it's the 14th now, and I am sorry for the wait. I think this is an excellent task for a bot, and you've addressed everything brought forward. Let's see a good size trial to make sure everything works right,   Approved for trial (250 edits). SQLQuery me! 03:35, 14 October 2018 (UTC)
  • @SQL:  Trial complete.. I missed yesterdays bus so I got a little time off during which I finished the trial. I accidentally overshot the limit of 250 pages by ~seven pages as a result of my absentmindedness(I was looking at the diffs and forgot to loom at the counter regularly). Additionally, there was a two glitches while performing the trail both of which I think were adequately resolved:
  • The edit summary was a truncated at the start of the trail. I changed the edit summary.
  • The bot could not detect pages with | bot=(red spot indicates pattern which it failed to recognize). This occurred within the first five pages. I fixed the bot to recognize those particular patterns and have had not problems through out the rest of the trail.

. All pages edited can be found here.Thanks — fr+ 11:11, 15 October 2018 (UTC)

  • Comment: Aha, thanks for taking on my request! Or was that just a coincidence? :-) Anyhow, I noticed it on my watchlist at Talk:Apricot. In any case, try to keep in mind what I wrote there about other templates containing the term "MiszaBot" and the fact that the bot shouldn't edit beyond the first heading. Graham87 15:18, 15 October 2018 (UTC)
    Looking over the edits, it seems like the trial went pretty well. Looks like you've addressed any problems that came up during the trial run. SQLQuery me! 15:55, 15 October 2018 (UTC)


Approved requests

Bots that have been approved for operations after a successful BRFA will be listed here for informational purposes. No other approval action is required for these bots. Recently approved requests can be found here (edit), while old requests can be found in the archives.


Denied requests

Bots that have been denied for operations will be listed here for informational purposes for at least 7 days before being archived. No other action is required for these bots. Older requests can be found in the Archive.

Expired/withdrawn requests

These requests have either expired, as information required by the operator was not provided, or been withdrawn. These tasks are not authorized to run, but such lack of authorization does not necessarily follow from a finding as to merit. A bot that, having been approved for testing, was not tested by an editor, or one for which the results of testing were not posted, for example, would appear here. Bot requests should not be placed here if there is an active discussion ongoing above. Operators whose requests have expired may reactivate their requests at any time. The following list shows recent requests (if any) that have expired, listed here for informational purposes for at least 7 days before being archived. Older requests can be found in the respective archives: Expired, Withdrawn.