User talk:Wbm1058/Continuing null editing

Continuing null editing edit

Followup to Wikipedia:Village pump (technical)/Archive 196#A page is populating a hidden maintenance category, but the category is empty!
Technical stuff: Tunneling into the replica English Wikipedia database via PHP on Windows

I don't know of a way to check to see if your bot is still null editing, so I am asking here. Have you considered continuing null-editing stale pages, beyond the ones that had no refresh date? As of tstarling's comment of 28 Jan 2021 at T157670, there were about 16 million pages more than 1.5 years stale. Would your bot be able to work your way through those pages from oldest refresh date to newest? – Jonesey95 (talk) 19:07, 28 March 2022 (UTC)Reply

No point in stopping before the task is finished, or runs into a wall. 16.6 million is now down to 3.6 million. You can monitor progress with these links:
So far I've seen two reports of editors noticing the impact of my cache-purging:
Am curious to know whether you've noticed anything else. – wbm1058 (talk) 12:05, 29 March 2022 (UTC)Reply
It came up at WT:Linter as well. Thanks for taking on this task, and thanks for the helpful links. I hope that you will continue operating the bot, ideally with the goal of keeping all pages from getting too old. In one of the tickets, I proposed that no page (or at least no article) be allowed to become more than a month stale. I don't know what that volume of null edits would look like once the bot has caught up with the backlog, but my hope is that it would add no more than a few percent to the overall processing time on the servers. At this point, 2.4 million article-space pages are more than a year out of date. That is definitely leading to many undetected error conditions. – Jonesey95 (talk) 15:39, 29 March 2022 (UTC)Reply
Article-space is now updated to within 50 days: page_links_updated by date, mainspace. The bot is currently null-editing pages last purged on February 19. – wbm1058 (talk) 00:34, 9 April 2022 (UTC)Reply
@Wbm1058 is the bot also null editing other namespaces? I see that while article space is at 22/3/22 other namespaces still have pages from 2020. Gonnym (talk) 13:24, 3 May 2022 (UTC)Reply
Yes, I've been null-editing all namespaces, with highest priority on mainspace. User: and User talk: are the laggards because there are so damn many of them. I think the main benefit there is catching lint errors in user signatures; is there a date after which that shouldn't be an issue? My bot is not yet fully automated as I'm dependent on manual Quarry queries to produce my list of pages to null edit. I need to find a way for the bot to make an API query to get the list of pages to null edit, to make this fully automated. Whatever issue(s) caused T290146 have slowed me down. Would be nice not to be dependent on Quarry. – wbm1058 (talk) 03:33, 5 May 2022 (UTC)Reply
I don't think there is a null-edit date after which we will know that we have refreshed all of the pages with potential Linter errors in them, because there are multiple bug reports in Phabricator dealing with Linter false positives and false negatives. Those get fixed occasionally, and then those changes get rolled into the MediaWiki software, and then every page needs to be refreshed before we know that we have handled all of those newly detected (or ignored) errors correctly. A batch of fixes for the "bogus image options" Linter error was just rolled out over the last few weeks. TL;DR: T157670 still needs to be fixed in a systematic way by the developers or system administrators of the MediaWiki instances. – Jonesey95 (talk) 04:29, 5 May 2022 (UTC)Reply

@Jonesey95 and Gonnym:   BRFA filedwbm1058 (talk) 02:53, 25 June 2022 (UTC)Reply

As my BRFA has recently been approved, it's time to archive this section. – wbm1058 (talk) 17:14, 27 August 2023 (UTC)Reply

Below are several links I'm posting here for my records, so I can close several browser tabs and still be able to easily find this stuff. wbm1058 (talk) 16:58, 25 June 2022 (UTC)Reply

Quarry queries edit

Links for the historical record edit

Links documenting apparent "won't fix" issues edit

Refreshlinks query and possibly bot needed on Commons edit

Related to these queries and your bot that keeps links refreshed here at en.WP, I have recently started working on Linter errors at Commons, and pages are appearing in formerly empty categories. Some of them had not been edited in more than eight years. This tells me that the links table is terribly out of date. Is there any magic you can work over at Commons to refresh the link tables in the same way that your bot does here? – Jonesey95 (talk) 15:17, 30 October 2022 (UTC)Reply

I've never run a bot on commons and don't even know whether they have a bots requests for approval process. Making a commons bot for this might just be as simple as changing https://en.wikipedia.org/w/api.php to https://commons.wikipedia.org/w/api.php in User:RMCD bot/botclasses.php but it's probably not that easy. Don't know how much you've been following my current BRFA but I should probably work on closing that rabbit hole before starting to dig a new one. – wbm1058 (talk) 20:51, 30 October 2022 (UTC)Reply
Oof, that's a saga. I put in a couple of phab comments today on five-year-old tasks that would probably avoid all of this trouble. Why would the developers not want to keep their sites up to date with some basic cron jobs? Maybe, as you are finding, it is more complicated than that. I really don't know. – Jonesey95 (talk) 01:14, 31 October 2022 (UTC)Reply
@Jonesey95: my bot has finally been approved, and is running nearly fully-automated under Kubernetes on the Toolforge.
I found commons:Commons:Bots and commons:Commons:Bots/Requests, so will look into that.
Linking to the relevant phab comments from October 2022.
page_links_updated by date, on commons
It looks like someone started working on this, as the NULLs and about four years 2014–2017 have been refreshed, but it's still stuck at only March 2018. – wbm1058 (talk) 18:26, 27 August 2023 (UTC)Reply
Thanks for persisting with this useful task. It appears that bots will be needed for a while, until something happens under the hood on the WMF servers to keep pages up to date. – Jonesey95 (talk) 16:02, 28 August 2023 (UTC)Reply

commons:Help:Namespaces: Commons has 33 namespaces now, one fewer than they had on 30 May 2017. Namespaces "490" and "491" are gone.

English Wikipedia only has 25 namespaces. I'll need to change the commons version of my code to support all the commons namespaces. – wbm1058 (talk) 22:30, 31 August 2023 (UTC)Reply

Well now it's refreshed to October 2018! I didn't do that... wbm1058 (talk) 03:29, 1 September 2023 (UTC)Reply

I've been editing a bunch of highly transcluded pages over there, so it's possible that had something to do with the latest updates. There may be another editor gnoming away at those stale pages. I don't know. It looks like it would take only a few thousand null edits to get caught up to 2021. – Jonesey95 (talk) 19:09, 1 September 2023 (UTC)Reply
I'm manually running a script now to make null edits. It's all the File: namespace that needs refreshed at the back end, and the process has been slow going, I think because of the size of the files. I was seeing so many of these error messages:
cURL Error: Operation timed out after 30000 milliseconds with 0 bytes received
that I cut the number of pages processed by a single API call in half, from 20 to 10. Still, I think I can automate this; I should create a bot account on commons so that I don't keep doing this from my main account. – wbm1058 (talk) 19:26, 1 September 2023 (UTC)Reply

@Legoktm: are you working on this? Last updates from you I've seen are resolving T159512 and commenting on the still-open, low priority T157670 on February 20, 2023. – wbm1058 (talk) 11:18, 2 September 2023 (UTC)Reply

Well now 2019 is done and the oldest date needing links refreshed is January 1, 2020. Either Legoktm is running his code on the inside, or someone forked my code and is running it from the outside. I'm just watching at this point. – wbm1058 (talk) 00:35, 4 September 2023 (UTC)Reply

Whoever else was refreshing links on commons took a break after they finished 2019, so I've picked up where they paused. Now refreshed halfway through January 2020. – wbm1058 (talk) 14:07, 5 September 2023 (UTC)Reply

Update: @Jonesey95: I shut down my link-refreshing bots, because they stopped working. See Wikipedia:Village pump (technical)/Archive 208#Purge API broken?. – wbm1058 (talk) 14:12, 24 October 2023 (UTC)Reply

Hmm, T349348 was closed yesterday, with a fix 967304 "remove check for purge right from APISite.purgepages — The right has not been checked by MediaWiki since 2016."
Hopefully this fix gets into the next MediaWiki update, and makes my refresh-links tasks start working again. – wbm1058 (talk) 17:06, 2 November 2023 (UTC)Reply
I will cross my fingers and make a small sacrifice to the Wikigods. – Jonesey95 (talk) 17:47, 2 November 2023 (UTC)Reply
It's still not working, and I mistakenly thought that the T349348 patch would help. I opened a new discussion: Purge API seems to still be broken. – wbm1058 (talk) 20:31, 20 November 2023 (UTC)Reply
I suspect that changes related to T265749 broke my bot's purging. – wbm1058 (talk) 18:20, 27 November 2023 (UTC)Reply
Bugs fixed; my bots are working hard to catch up. – wbm1058 (talk) 01:20, 14 February 2024 (UTC)Reply

Update: @Jonesey95: Talk page management: I moved this discussion section from my talk, so related null-edit bot discussions are on the same talk page. English Wikipedia: My bots are catching up to the point they were at before last September's bugs shut them down temporarily. Mainspace is almost caught up, and should be back on schedule within a week. Other namespaces should be back on schedule in another month or two. Commons. I just restarted my commons bot yesterday. It had been stuck after getting the links refreshed up to September 21, 2020. And now, I've shut down my commons bot again already because Quarry page_links_updated by date, on commons reports a 5 hour replication lag. I don't know whether lag https://replag.toolforge.org/ was caused by my restarting the bot, or the active database maintenance to Gradually drop old pagelinks columns. Not sure I can be bothered to go on IRC or Phabricator to ask. I'll just wait, and restart my commons bot when I notice that the lag has gone away. – wbm1058 (talk) 14:27, 14 February 2024 (UTC)Reply

Replication lag has ended, and I restarted the commons-links refreshing bot. It's up to up to September 22, 2020 now. – wbm1058 (talk) 01:07, 15 February 2024 (UTC)Reply
Now up to January 1, 2022! wbm1058 (talk) 16:43, 22 March 2024 (UTC)Reply
Now up to November 1, 2022. – wbm1058 (talk) 09:32, 7 May 2024 (UTC)Reply