Open main menu

Wikipedia:Bots/Noticeboard

< Wikipedia:Bots
Bots noticeboard

This is a message board for coordinating and discussing bot-related issues on Wikipedia (also including other programs interacting with the MediaWiki software). Although this page is frequented mainly by bot owners, any user is welcome to leave a message or join the discussion here.

If you want to report an issue or bug with a specific bot, follow the steps outlined in WP:BOTISSUE first. This not the place for requests for bot approvals or requesting that tasks be done by a bot. General questions about the MediaWiki software (such as the use of templates, etc.) should be asked at Wikipedia:Village pump (technical).

Contents


Archive URLsEdit

In a nutshell: Do not make mass modifications to URLs without accounting for archive URLs which are complicated and require more than a search-replace

Since about 2015, the number of archive URLs on Enwiki has gone from about 600,000 to 3 million. This is the commendable work of IABot, and the numbers increase daily. The problem arises that user scripts, bots and tools that make modifications to URLs but are often not accounting for archive URLs.

Examples why search and replace does not work. :

  1. The website http://example.com/... has moved domain to http://example.co.uk/... so a bot or script changes all occurrences in a search-replace. This causes http://web.archive.org/web/20180901010101/http://example.com/... -> http://web.archive.org/web/20180901010101/http://example.co.uk/... but this archive URL does not exist creating a dead archive URL.
  2. However even if the archive URL is skipped with regex, using the same example: {{cite web |url=http://example.com/... |archiveurl=http://web.archive.org/web/20180901010101/http://example.com/... }} Arguably the correct action in this case is to replace the |url= with http://example.co.uk/... and delete the |archiveurl= (and any |archivedate= and |deadurl=) because the new link is working, and the old |archiveurl= is no longer an archive of the |url=. Search and replace does not work.
  3. Even if #2 is not done and the original |archiveurl= is kept, the |deadurl=yes would be converted to |deadurl=no as the |url= is no longer dead.
  4. If there is a {{dead link}} template next to the link, that template would be deleted as the link is no longer dead.
  5. In addition to CS1|2, there are similar issues with {{webarchive}} and bare URLs.
  6. There are similar issues with the 20+ other archive providers listed at WP:WEBARCHIVES. It is not isolated to Wayback which is only about 80% of the archives.

My bot WP:WAYBACKMEDIC is able to fully automate URL changes while accounting for archives. It's not a simple bot so I don't expect anyone else to custom build something like it though hope others will. For now, I'm trying to intercept URL change requests at BOTREQ, and to remind bot makers at BRFA.

Should this be in policy ("Archives should be accounted for when modifying URLs")? Should we have a Wikipedia:Bots/URL subpage for requests; or a project for URL change requests? Should a project be on enwiki, or at meta to notify other language wikis globally? Feedback or thoughts, thanks. -- GreenC 01:14, 30 December 2018 (UTC)

I don't really understand your problem with #2. I have seen no documentation anywhere to indicate that the archived page must reflect the URL of any currently living page. And there are some cases where it cannot or will not, such as a domain-specific archive URL (e.g. The NYT). Our objective is the content at the live URL. --Izno (talk) 17:59, 30 December 2018 (UTC)
Initially I responded why I believe #2 is a best practice, but am refactoring because in the end it's up to you what to do. The point of this OP is that mass URL changes require more than a search-replace it needs a special-purpose bot. The hope here is to raise awareness of the issues so that in the future whenever URL changes come up, there is recognition of what is required, or at least taken into account. -- GreenC 22:35, 30 December 2018 (UTC)
I may start to sound like a broken record, but in general: what do we expect human editors to do in these situations? From the samples above, it certainly makes sense that human editors should not be introducing broken links (like in example 1) - so bots should not either. The guidance on this should be a general citation/archive guidance more than just a bot-guidance though. — xaosflux Talk 00:44, 31 December 2018 (UTC)
As for "old" archiveurl's - I'm not very up to speed on this, is it meant as only "an archive of the url that is in archive=" or "here is another archive where this reliable source can be found"? — xaosflux Talk 00:48, 31 December 2018 (UTC)
@Xaosflux: sorry, I missed your reply. Agreed, good idea to document the best practice and I hope to do that. I'm having trouble following your second question, but the "old" |archiveurl= just means whatever the archive URL was before it got modified by the search/replace script. Does it make sense why that is a problem? I can step through it if you want. -- GreenC 00:33, 21 January 2019 (UTC)

CyberbotEdit

Seems like several tasks are broken now:

  1. Sometimes, bot blank WP:CHUS (pinging @1997kB and K6ka:)
  2. Bot did pointless edit warring edits in CratStats task, RfX Reporter and RfXTallyBot (pinging @DeltaQuad, Amalthea, and Xeno:)
  3. Bot only archive less than 5 days RFPP request in WP:RFPPA (now only 19 January and 20 January)
  4. Disable fiction seems broken
Though @Cyberpower678: promised to fix the first and the second, but still broken Hhkohh (talk) 23:20, 20 January 2019 (UTC)
As the bot is now ignoring the stop pages I will be blocking the bot. It's clearly broken and needs intervention before editing again. -- Amanda (aka DQ) 00:24, 21 January 2019 (UTC)
DeltaQuad, I honestly have no idea what is going with the bot, other than it started after the server migration happened in Labs. I haven't gotten an answer about it though. —CYBERPOWER (Around) 02:43, 21 January 2019 (UTC)
Fair, that's why the block was issued though. -- Amanda (aka DQ) 03:55, 21 January 2019 (UTC)
Hhkohh, I've got a massive failure log, I'm downloading it now. —CYBERPOWER (Around) 02:47, 21 January 2019 (UTC)
Well I just took a look at the logs, and it's caused by a lot of 503 responses from the MW API. Massive amounts actually. I have 20 GB of data for just the last 7 days. Pinging Anomie. What could be causing this. Here's a snippet of my log:
Extended content

Date/Time: Thu, 27 Dec 2018 01:02:27 +0000 Method: GET URL: https://en.wikipedia.org/w/api.php Parameters: Array (

   [action] => query
   [prop] => info
   [inprop] => protection|talkid|watched|watchers|notificationtimestamp|subjectid|url|readable|preload|displaytitle
   [titles] => Wikipedia:Requests for adminship/JJMC89
   [redirects] => 
   [format] => php
   [servedby] => 
   [requestid] => 667437408

)

Raw Data: <!DOCTYPE html> <html lang=en> <meta charset=utf-8> <title>Wikimedia Error</title> <style>

  • { margin: 0; padding: 0; }

body { background: #fff; font: 15px/1.6 sans-serif; color: #333; } .content { margin: 7% auto 0; padding: 2em 1em 1em; max-width: 640px; } .footer { clear: both; margin-top: 14%; border-top: 1px solid #e5e5e5; background: #f9f9f9; padding: 2em 0; font-size: 0.8em; text-align: center; } img { float: left; margin: 0 2em 2em 0; } a img { border: 0; } h1 { margin-top: 1em; font-size: 1.2em; } .content-text { overflow: hidden; overflow-wrap: break-word; word-wrap: break-word; -webkit-hyphens: auto; -moz-hyphens: auto; -ms-hyphens: auto; hyphens: auto; } p { margin: 0.7em 0 1em 0; } a { color: #0645AD; text-decoration: none; } a:hover { text-decoration: underline; } code { font-family: sans-serif; } .text-muted { color: #777; } </style>

<a href="https://www.wikimedia.org"><img src="https://www.wikimedia.org/static/images/wmf-logo.png" srcset="https://www.wikimedia.org/static/images/wmf-logo-2x.png 2x" alt="Wikimedia" width="135" height="101"> </a> Error

Our servers are currently under maintenance or experiencing a technical problem. Please <a href="" title="Reload this page" onclick="window.location.reload(false); return false">try again</a> in a few minutes.

See the error message at the bottom of this page for more information.

</html>

UNSERIALIZATION FAILED

@Anomie: I know it says too many requests, but I don't see how that could be. Surely Cyberbot couldn't possibly be hitting the max limit on apihighlimits.—CYBERPOWER (Around) 03:12, 21 January 2019 (UTC)
It often does not do its job at RfPP properly but now we have nothing there at all, so no archiving is going on. I suppose I'm going to have to manually archive until the bot gets fixed. Enigmamsg 06:12, 21 January 2019 (UTC)
I am glad to help archive and hope Cyberpower678 can fix it Hhkohh (talk) 09:10, 21 January 2019 (UTC)
Hhkohh, The numerous failures is because the API is throwing 503s at more than half of the requests it makes. From what I can tell, unless a process has gone rogue, Cyberbot shouldn’t be anywhere near the API rate limit. —CYBERPOWER (Around) 12:16, 21 January 2019 (UTC)
This is an error returned by Varnish, it has nothing to do with the API rate limit. According to https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/varnish/templates/text-frontend.inc.vcl.erb$262 , Varnish is not supposed to return 429 for Wikimedia networks including Labs IP addresses. I guess that wikimedia_nets variable needs to be updated with the new IP addresses used by Labs. Nemo 14:45, 21 January 2019 (UTC)
Nemo bis, Well, this certainly confirms that this is not a fault on the bot's end. Any chance we can get this updated, soonish? Labs recently migrated servers, and since then Cyberbot has been really weird in the way it behaves on Wikipedia. I assumed something changed, but never had an opportunity to look into it. —CYBERPOWER (Chat) 16:13, 21 January 2019 (UTC)
No idea how quick a fix might come. It should be a trivial configuration change but I see that on a related ticket there is some debate about it (and of course sysadmins would prefer if every client did what it's told by error 429). Varnish seemingly ignores the requests which come with a login cookie, so you could also try that way. Nemo 16:25, 21 January 2019 (UTC)
Nemo bis, except for that the bot is actually getting a 503 HTTP code and I’m not going to create code to parse human readable HTML where another error code is nested. —CYBERPOWER (Chat) 16:50, 21 January 2019 (UTC)
@Nemo bis and Cyberpower678: Looks like phab:T213475 in that case. Anomie 22:21, 21 January 2019 (UTC)

Cyberbot IIEdit

Cyberbot II (talk · contribs)

Please stop this one too.

The last edit in this list happened after notification about the same issue at User_talk:Cyberpower678#Cyberbot_II ~ ToBeFree (talk) 15:57, 22 January 2019 (UTC)

I am curious why the run page is disable but bot is still running the disabled task? Hhkohh (talk) 16:06, 22 January 2019 (UTC)
Blocked, @Cyberpower678: please review critical article blanking bug, and why the run page is being ignored - if you have changed this to use a new functionality please update your documentation. — xaosflux Talk 16:11, 22 January 2019 (UTC)
Xaosflux, See one section up. Cyberbot is suffering from a MW web service issue which is causing critical malfunctions. It can only be fixed once the above issue is fixed. —CYBERPOWER (Chat) 21:05, 22 January 2019 (UTC)
Thank you, any admin (including you) should feel free to unblock if you think the issue is resolved and you are ready to perform a supervised restart. — xaosflux Talk 21:09, 22 January 2019 (UTC)
Xaosflux, Of course, and thank you. I will be leaving the bot blocked for a while until I hear from the sysadmins. Cyberbot runs on ancient code, and I am working to rewrite it to use newer, more error handling code. It's still a ways off though. —CYBERPOWER (Chat) 21:16, 22 January 2019 (UTC)
Xaosflux, with the bot stable again, can you unblock meta and commons? —CYBERPOWER (Be my Valentine) 12:46, 13 February 2019 (UTC)
@Cyberpower678: Cyberbot II wasn't blocked at meta, but Cyberbot I was - I've unblocked that one. I'm not a commons: admin though. — xaosflux Talk 12:59, 13 February 2019 (UTC)
Xaosflux, just a typo. ping yourself? Hhkohh (talk) 13:42, 13 February 2019 (UTC)
Haha fixed thanks, — xaosflux Talk 13:54, 13 February 2019 (UTC)
Xaosflux, thanks. Isn't Fastily a commons admin? —CYBERPOWER (Be my Valentine) 14:27, 13 February 2019 (UTC)
Nevermind. But -revi is one. —CYBERPOWER (Be my Valentine) 14:30, 13 February 2019 (UTC)
(edit conflict) @Cyberpower678: nope, try commons:User:Taivo who blocked it, and is recently active. — xaosflux Talk 14:31, 13 February 2019 (UTC)
And Cyberbot I too
After more than ten years with no nifty Template:Adminstats to create a pretty display of my admin actions, and having no sense of anything missing, I happened upon it somewhere a few months ago and followed instructions to place it on one of my user pages. Now the Cyberbots get cut off at the pass and no more updates!
I'm disappointed in myself for feeling tempted to be a crybaby about it. Thanks for all the great work, Cyberpower, if they never fix the thing that's breaking it we'll still owe you a lot. – Athaenara 05:41, 2 February 2019 (UTC)
Athaenara, It'll be back. MW Varnish just needs updating, but that's something only the sysadmins can do. —CYBERPOWER (Chat) 21:24, 2 February 2019 (UTC)
It's back!! – Athaenara 08:54, 13 February 2019 (UTC)

Discussion NoticeEdit

There is a discussion that may be of interest at Wikipedia_talk:Username_policy#UAA_Bot_Work. RhinosF1(chat)(status)(contribs) 17:51, 8 February 2019 (UTC)

Template:Category redirectEdit

What bot(s) currently maintain(s) category redirects? I'd like to chat with the operator(s), but I have no idea whom to address. Nyttend (talk) 00:11, 12 February 2019 (UTC)

@Nyttend: See User:RussBot and Wikipedia:Bots/Requests for approval/RussBot --DannyS712 (talk) 00:14, 12 February 2019 (UTC)
Hi, DannyS712, and thank you. Note now left on R'n'B's talk page. Nyttend (talk) 00:32, 12 February 2019 (UTC)