User talk:GreenC/WaybackMedic

Latest comment: 7 years ago by Patrug in topic Adding archive date

ia save command url edit

Since this bot is fixing internet archive urls in |archive-url=, you should consider adding another task. There is a legitimate url that looks like this:

https://web.archive.org/save/http://www.example.com

That command should never be used in cs1|2 templates because each time a reader clicks it, another copy of the current version of the targeted website is saved at internet archive. When Module:Citation/CS1 detects these urls, it emits an error message, and disables the archive link in the final rendering of the citation. cs1|2 templates with this particular error are categorized in Category:Pages_with_archiveurl_citation_errors.

The fix that I would suggest for this bot would be to change /save/ to /*/.

Trappist the monk (talk) 15:35, 28 May 2016 (UTC)Reply

@Trappist the monk: Thanks, good info. After this bot is finished running, I hope to make a version #2 that will sweep all IA links (the current is limited subset) and the problems listed in Category:Pages_with_archiveurl_citation_errors can be part of it. I believe /*/ is also an error and should be replaced with the closest available snapshot (which can be retrieved via the IA API). -- GreenC 17:27, 28 May 2016 (UTC)Reply

I disagree. The closest available snapshot (not clear to me what that actually means: closest to what?) may be a 404; the content of the 'closest' may not be the same as, or even substantially similar to, the content of the original site on the date that the citation was added to the article. That is why the module doesn't just replace /*/ with a timestamp concocted from |archive-date=. Replacement of /*/ is a task that is best accomplished by humans who can evaluate the content of the archived page to see that it supports the content of the Wikipedia article.
Trappist the monk (talk) 17:40, 28 May 2016 (UTC)Reply
Closest to the accessdate, the date added to the article. The IA API handles 404s, it only returns requested codes (200s etc). -- GreenC 18:48, 28 May 2016 (UTC)Reply
A robot should not be making the choice of which snapshot should be linked by a cs1|2 template. If that were the case, again, the module could do it and we wouldn't need |archive-url= and |archive-date=. Fixing the /save/ urls is sufficient.
Trappist the monk (talk) 23:55, 28 May 2016 (UTC)Reply

At 1970s energy crisis, I just hand-corrected two citations where WaybackMedic inserted incorrect ref links that didn't match the cited content. Seems like a dangerous thing for a bot to be doing without human verification. At a minimum, I'd suggest teaching WaybackMedic to leave detailed Talk page messages in these cases, as Cyberbot II did at Talk:1970s energy crisis#External links modified. —Patrug (talk) 02:40, 31 May 2016 (UTC)Reply

Adding archive date edit

I'm not sure whether the bot already does it, I certainly don't see this listed as a function, but it may be useful to have it extract |archivedate= from the URL if missing, as editors might sometimes forget to include it. nyuszika7h (talk) 17:20, 3 June 2016 (UTC)Reply

This task is done by Cyberbot II. The two bots are similar enough that it might be worth trying to merge them, to combine the best Wayback-related features of both. As another example, Cyberbot II describes its edits on the article's Talk page for human verification, which I also recommended for WaybackMedic in the section above. —Patrug (talk) 07:47, 8 June 2016 (UTC)Reply

False dead link edit

In this edit, the bot removed the archive link and marked the ref with {{dead link}}. The archive link was bad, but the original URL was still good. Could the bot check the value of parameter deadurl? If deadurl=no, it should refrain from adding the dead-link template. — Gorthian (talk) 10:24, 6 June 2016 (UTC)Reply

Yes it shouldn't have added a dead link. I'll have to research what happened there. It doesn't rely on deadurl value because it could be out of date, it checks the site for return codes. Thanks for the notice. -- GreenC 13:26, 6 June 2016 (UTC)Reply