Open main menu

Wikipedia β

This is a page for requesting work to be done by bots per the bot policy. This is an appropriate place to simply put ideas for bots. You might also check {{Botcats}} to see if the bot you are looking for already exists, in which case you can contact the operator directly on his or her talkpage. If you need a piece of software written for a specific article you may get a faster response time at the computer help desk.

If you have a question about one particular bot, it should be directed to the bot owner's talk page or to the Bots Noticeboard. If a bot is acting improperly, a note about that should be posted to the owner's talk page or to the Administrators' Noticeboard (see WP:BOTISSUE for guidance).

If you are a bot operator and you complete a request, note what you did, and archive it. {{BOTREQ}} can be used to give common responses, and to make it easier to see at-a-glance what the response is.

There are a number of common requests which are regularly denied, either because they are too complicated to program, or do not have consensus from the Wikipedia community. Please see Wikipedia:Bots/Frequently denied bots for a list of such requests, and ensure that your idea is not among them.

If you are requesting that a bot be used to add a WikiProject banner to the talkpages of all articles in a particular category or its subcategories, please be very careful to check the category tree for any unwanted subcategories. It is best to give a complete list of categories that should be worked through individually, rather than one category to be analyzed recursively. Compare the difference between a recursive list and a properly vetted one.

Please add your bot requests to the bottom of this page.
Make a new request

Take over GAN functions from LegobotEdit

Legobot is an enormously useful bot that performs some critical functions for GAN (among other things). Legoktm, the operator, is no longer able to respond to feature requests, and is not very active; they've asked in the past if someone would be willing to take over the code. I gather from that link that the code is PHP; see here [1]. There would be a lot of grateful people at GAN if we could start addressing a couple of the feature requests, and if we had an operator who was able to spend more time on the bot. This is not to criticize Legoktm at all -- without their work, GAN could not function; Legobot is a core part of GAN functionality.

I left a note on Legotkm's talk page asking if they would mind a request here for a new operator, and Redrose64 responded there with a link to the note I posted above, so I think it's clear they'd be glad for someone else to pick this up. Any takers? Mike Christie (talk - contribs - library) 23:10, 6 February 2018 (UTC)

I've heard from Legoktm and they would indeed be glad to have someone else take this over. If you're capable in PHP, this is your chance to operate a bot that's critical to a very active community. Mike Christie (talk - contribs - library) 00:21, 8 February 2018 (UTC)
I would like to comment that it would be good to expand the functionalities of the bot for increased automation, like automatically adding to the GA lists. Perhaps it would be better to rewrite the bot in a different language? I think Legoktm has tried to get people to take over the php for awhile with no success. Kees08 (Talk) 04:44, 8 February 2018 (UTC)
The problem with adding to the GA lists is knowing which one. There is no indication on the GAN as to where. All we have is the topic. Even the humans have trouble with this. Hawkeye7 (discuss) 20:20, 16 February 2018 (UTC)
To correct for the past, we could add a parameter to the GA template for the 'subtopic' or whatever we want to call that grouping. A bot could go through the current listing and then add that parameter to the GA template. Then, when nominating, that could be in the template, and the bot could carry that through all the way to automatically adding it to the GA page at the end. Kees08 (Talk) 20:23, 16 February 2018 (UTC)
Nominators would need to know those tiny divisions within the subtopics; as it's not something we have on the WT:GAN page, I doubt most are even aware of the sub-subtopics. Even regular subtopics are sometimes too much for nominators, who end up leaving that field blank when creating their nominations. BlueMoonset (talk) 22:15, 26 February 2018 (UTC)

@Hawkeye7: For what it is worth, due to your bot's interactions with FAC, I think it would be best if you took over the GA bot as well, for what it is worth. I think at this point it is better to just write a new bot than salvage the old bot; no one seems to want to work on salvaging. Kees08 (Talk) 21:59, 26 February 2018 (UTC)

We'd need to come up with a full list of functionality for whoever takes this on, not only what we have now but what we're looking for and where the border conditions are. BlueMoonset (talk) 22:15, 26 February 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I might interested in lending a hand. A features list and functionality details (as mentioned by BlueMoonset) would be nice to affirm that decision though. I shall actively watch this thread. --TheSandDoctor (talk) 21:30, 11 March 2018 (UTC)

Okay, I will attempt to list the features, please modify as needed:

  • Place notifications on nominators talk page when their nomination is onreview diff, onhold diff, passed diff, failed
  • Update GAN page when status of a review changes (new, on hold, on review, passed, failed, also number of reviews editors have performed) diff
  • Update the stats page (related to the last bullet point, this is where the stats are stored) diff
  • Transcludes GA review on article talk page diff
  • Adds GA icon to articles that pass diff
  • Adds the oldid parameter to the GA tempate diff

@BlueMoonset: Are you aware of other functions? Looking through the bots edit history and going off of what I know of the bot, this is what I came up with. Kees08 (Talk) 22:10, 11 March 2018 (UTC)

Thanks Kees08. Does anyone know if it would be possible to take a look at the database structure? --TheSandDoctor (talk) 22:28, 11 March 2018 (UTC)
@Legoktm: Are you able to answer their question? Thanks! Kees08 (Talk) 23:43, 11 March 2018 (UTC)
TheSandDoctor, it's great that you're interested in this. Kees08, the second item (about updating the GAN page) is much broader. I don't know whether the bot simply updates the GAN page or generates/recreates the contents of all sections on it. It's basically dealing with all the GA nominee templates out there—which indicate what is currently a nominated article that belongs on the GAN page, and the changes to that page. If an entry wasn't on the page last time but is this time, then it's considered new; if it was on last time but a review page has appeared for it, then it's considered under review and the review page is parsed for reviewer information (but if the GA nominee template instead says "status=onhold", then it's noted as being on hold)... there's a lot involved, including cross-checking, and any field in the GA nominee template, including page number, subtopic, status, and note, can change at any time. If the GA nominee template has disappeared and a GA template is there for that same "page" number, then it has passed; if a FailedGA is there for that same "page" number, then it has failed (but the current bot's code doesn't check this properly, so any FailedGA template on the talk page results in the "failed" message being sent to the nominator even if the nomination was just passed with a new GA template showing). Sometimes review pages disappear when they were created illegally or by mistake and are speedy deleted, and the bot realizes their absence and updates the GAN page accordingly, so it's a comprehensive check each time the bot runs (currently every 20 minutes). If the bot doesn't know how to characterize the change it has found, it appears under an edit summary of "Maintenance": status changes to 2ndopinion go here, as do passes and failures where there was something wrong with the article talk page according to its lights. For example, it views with suspicion any talk page of a nomination under review that doesn't have a transcluded review on it, so it doesn't send out pass or fail messages for them (and maybe not even hold messages; I've never checked that).
There's a difference here between features and functionality. I think the features (with the exception of the 2ndopinion status and the display of anything in the "notes" field of GA nominee) have been listed here. The actual functions—how it needs to work and what it needs to check—are harder to break down. One thing that was mentioned above is the use subtopics: we have been unable to add new subtopics for several years now, so new subtopics on the GA page are not yet available on the GAN page. I'm not sure how the bot gets its list of subtopics—I've found more than one possible page where they could be read from, but there may be a database for subtopics and the topics they come under that actually controls them, with the pages I've found being a place for some templates, like GA, FailedGA, and Article history, to figure out what subtopics translate to which topics, and which subtopics are legitimate. GA nominee templates that have invalid subtopics or missing status or note fields (or other glitches) can cause the bot to try every 20 minutes to enter or update a nomination/review and fail to do so; there are times when a transaction is listed dozens of times, one bot run after another, as the GAN edit summary because it needs to happen, but it ultimately doesn't (until someone sees the problem and fixes the problematic GA nominee template or GA review page). I'm hoping any new bot will be smarter about how to handle these (and many other) situations, and maybe there will be an accessible error log to aid us in determining what's wrong. BlueMoonset (talk) 00:55, 12 March 2018 (UTC)
Yeah there is a lot in the second bullet point I did not include diffs for, on account of me being lazy. I will try to do that tonight maybe. I tried to limit what I said to the current functionality of the bot and not include my wishlist of new things, including revamping how subtopics are done. There was an error log at some point in time (located here), not sure when we stopped using that, and if it was on purpose or not. Kees08 (Talk) 01:18, 12 March 2018 (UTC)
@TheSandDoctor: Just giving you a ping in case this slipped off your radar. Kees08 (Talk) 07:48, 20 March 2018 (UTC)
Thanks for the ping Kees08. I had not forgotten, but was waiting for other responses. I am still interested (and might be able to port it to Python), we just need to get Legoktm involved in the discussion. --TheSandDoctor Talk 15:31, 20 March 2018 (UTC)
Kees08 BlueMoonset I have started to (somewhat) work on a Python port of the GAN task. There are some libraries that can be taken advantage of to (hopefully) reduce the number of lines (hopefully simplify it) etc. --TheSandDoctor Talk 22:49, 20 March 2018 (UTC)
That's great, TheSandDoctor. I'm very happy you're taking this one on. There are some places where the current code doesn't do what it ought. Here are a few that I've noticed:
  • As mentioned above, even if the review has just concluded with the article being listed as a GA, if the article talk page also has a FailedGA template on it from prior nomination that was not successful, the bot will send out a "Failed" message rather than a "Passed" message.
  • If a subtopic isn't capitalized exactly right, the nomination is not added to the GAN page even though the edit summary claims it is; for example, the subtopic "songs" isn't written as "Songs", which prevents the nomination from being added to the page until it is fixed.
  • If a GA nominee template is missing the status and/or note fields, a new review is not added to the template, even though it is (ostensibly) added to the GAN page. One example: Abdul Hamid (soldier) was opened for review and appeared on the GAN page as under review, but in actuality, the review page was transcluded but the GA nominee status was not updated because the GA nominee template was missing the "note" field; only after that was manually added did the bot add the "onreview" status. It would make so much more sense for the bot to add the missing field(s) to GA nominee and proceed with adding the status to the template (and the transclusion of the review page on the talk page), instead of leaving its process step incomplete.
  • When an editor opens a GA review, the bot will increment the number of reviews they have, and it will adjust this number on all nominations and reviews that editor has open. Unfortunately, not only does it produce an edit summary that lists the new review, it also includes those other reviews in the edit summary because of that incremented number, when nothing new has happened to the other reviews. This was a problem before, and it's gotten much worse now that edit summaries can be 1024 characters rather than 128 or 256. For example, when Iazyges opened a GA review of Jim Bakker, the edit summary overflowed the 1024 characters, and it shouldn't have; the Bakker review was the only one that should have been listed for Iazyges.
I'm sure there are others; I'll try to think of them and let you know. Thanks again for taking this on. BlueMoonset (talk) 04:52, 21 March 2018 (UTC)
@BlueMoonset: Thanks! At the moment I am just trying to get the current code ported, but once I am confident that it should work, I will see about the rest. (The main issue of course being that I cannot actually test/run the ported script (isn't ready for that stage yet, but once it is. The most I could do would be to output to text files diffs instead of saving for a couple as I dont have bot access etc etc; Lego needs to be a part of these discussions at some point as they involve their bot). --TheSandDoctor Talk 05:17, 21 March 2018 (UTC)


@BlueMoonset:@Kees08: I have emailed Legoktm requesting for a glimpse at the database structure. --TheSandDoctor Talk 16:00, 27 March 2018 (UTC)
TheSandDoctor, that's excellent news. I hope you hear back soon. Incidentally, I noticed that Template:GA/Subtopic was modified by Chris G, who was the GAN bot owner (then called GAbot) prior to Legoktm, back when the Warfare subtopic "War and military" was changed to "Warfare", so I imagine this is one of the files that might need to be updated if/when the longstanding requests to update/expand the subtopics at GAN to break up some of the single-subtopic topics (something that's already been done at WP:GA. In particular, the Warfare topic/subtopic and the Sports and recreation topic/subtopic have been on our wishlist for several years, but Legoktm never responded to multiple requests; the last changes we had were under Chris G before his retirement in 2013. I don't know whether Template:GA/Topic is involved, and the underlying Module:Good article topics and the data it loads at Module:Good article topics/data, which would also need to be updated when topics and/or subtopics are revised or added to. BlueMoonset (talk) 23:26, 28 March 2018 (UTC)
Hi there BlueMoonset, I was waiting to hopefully hear from Lego, but have not. Exams have delayed my progress in this (and will continue to do so until next week), but unfortunately, even when I have the bot converted, there is no guarantee with would work (at first) as I don't have a way to test it nor do I have access to the existing database etc. I could probably figure out what the database looks like from the code, but the information contained within would be very useful (especially to get it up and running). It is still unclear if I would gain access to Legobot or have to make a "Legobot 2" (or similar). (cc Kees08) --TheSandDoctor Talk 01:05, 20 April 2018 (UTC)
TheSandDoctor, I don't know what you can do at this point, aside from pinging Legoktm's email account again. I know that Legoktm would like to give over the responsibility for this code, but doesn't seem to be around Wikipedia enough any more to give the time necessary to help achieve such a transition. With luck, one of these pings will eventually catch them when they have time and energy to make it happen. I do hope you hear something soon. BlueMoonset (talk) 03:42, 20 April 2018 (UTC)
@TheSandDoctor: What database are you looking for? This may be a dumb question..but we identified where the # of reviews/user list was, and the GA database is likely just from the GA page itself. Is there another database you are looking for? Kees08 (Talk) 00:04, 22 April 2018 (UTC)
Hi there and sorry for the delay in my response Kees08, Legobot uses its own database to keep track of the page states (to know if they have changed). Having access or at least an outline of the structure would speed things out somewhat as I would not have to regenerate the database and could have it clarified what exactly is stored about the pages etc. It is not a necessity, but would be a nice convenience, especially if I am to take over the bot's functions and maintenance to have access to its database (or at least a "snapshot" of its structure). As for further development on the translation to Python, once finals are wrapped up (by Tuesday PST), I should hopefully have more time to dedicate to working on it. In the meantime, I have an important final in between me and programming. I shall keep everyone updated here. I still foresee an issue with verifying that the bot works as expected though due to the lack of available testing and a bot account to run it on. Things will sort themselves out though in the next while, I am sure. Minus editing I could always check if it "compiles"/runs and could probably work in a dry-run framework similar to my other projects (where they go through the motions, without making actual edits, printing to a local text file(s) instead). --TheSandDoctor Talk 05:44, 22 April 2018 (UTC)
Sounds good; no rush, just seeing if I can help you hit the ground running when you get to it. Perhaps DatGuys's config structure would help you figure out a way to do dry runs; mildly similar, you would just have to make up some pages and a database structure, to get the best dry run that is possible prior to hitting the real articles. Best of luck on your finals, and if it makes you feel any better, you will still wake up in cold sweats about them several years in the future (note to dreaming self: no, I have no finals. No, it does not matter you did not study.). Kees08 (Talk) 06:21, 22 April 2018 (UTC)
Not sure how this is going, but I have found User:GA bot/Stats to be inaccurate. It simply needs to list the number of pages created by each editor with "/GA" in the title. Most editors have less listed than they have done. It might b e easier to look into this while the bot is being redone. AIRcorn (talk) 22:13, 11 May 2018 (UTC)
Can't we get the database structure from the code? Enterprisey (talk!) 04:25, 13 June 2018 (UTC)
I committed the current database structure. Let me know if you want dumps too. Legoktm (talk) 07:04, 13 June 2018 (UTC)

Replace architecture= parameter value in Infobox religious building post-mergeEdit

{{Infobox Mandir}} and {{Infobox Hindu temple}}, and maybe a couple of other related templates, have been merged into {{Infobox religious building}}. As part of the conversion, the value of the |architecture= parameter in the merged templates has been assigned a different meaning.

In the pre-merge templates, |architecture= could take a value like "Dravidian architecture". In {{Infobox religious building}}, |architecture= takes a value of "yes" to indicate that the infobox should have an Architecture section, and the actual architectural style is placed in |architecture_style=.

In Category:Pages using infobox religious building with unsupported parameters, templates with an unsupported value for |architecture= are listed under the "Α" section heading (note that "Α" is a Greek letter that is listed after "Z" in the category listing).

I am looking for someone who would be willing to run through that section of the tracking category with AWB and replace this:

| architecture = [any value] |

with this:

| architecture = yes | architecture_style = [any value] |

The "[any value]" string should be preserved in each infobox. For example, | architecture = Dravidian architecture | would be changed to | architecture = yes | architecture_style = Dravidian architecture |

Here's a sample edit.

This will have to be a supervised run, since there could be some strange stuff in the parameter values. It looks like there are about 1,000 pages to fix. – Jonesey95 (talk) 16:04, 12 April 2018 (UTC)

That seems very poor template design. Why isn't the |architecture= automatically set to yes (or something functionally equivalent) when there's a non-null |architecture_style=? Headbomb {t · c · p · b} 01:36, 19 May 2018 (UTC)
That sounds like a good idea, but we need these 800 or so pages fixed first so that the merge can be completed. – Jonesey95 (talk) 14:26, 19 May 2018 (UTC)
Well it seems to me no bots need to be involved here if I understand the situation correctly. Just treat |architecture= as an alias of |architecture_style=. A bot could replace |architecture= with |architecture_style= if the old parameter is to be deprecated, but it seems good to update the template before the bot, rather than after the bot. Headbomb {t · c · p · b} 14:37, 19 May 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Upon review the template, I think your original course of action is better and that my suggestion above isn't adequate for the current functionality. |architecture=yes enables a whole section of the infobox. I still think it'd be good to have the section be displayed on whether its parameters are empty or present, but that's a different discussion entirely. Headbomb {t · c · p · b} 14:41, 19 May 2018 (UTC)

Bot needed for updating introduction section of portalsEdit

Many portals lack human editors, and need automated support to avoid going stale.

Most portals have an introduction section with an excerpt from the lead of the root article corresponding to the portal. The content for that section is transcluded from a subpage entitled "Intro".

The problem is that the excerpts are static, and grow outdated over time. Some are many years out of date.

What is needed is a bot to periodically update subscribed portals, by refreshing the excerpts from the corresponding root article leads.

Each excerpt should end similar to this:

...except that the link should go to the corresponding root article, rather than aviation.

There are over 1500 portals, and so it would be quite tedious for a human editor to do this. Some portals are supported, while others aren't updated for years.

Portals are in turmoil, and so, this is needed sooner rather than later.

Of course, they need greater support than this. But, we've got to start somewhere. As the intros are at the tops of the portal pages, it seemed like the best place to start.    — The Transhumanist   07:06, 14 April 2018 (UTC)

Probably better to do section transclusion, i.e, like Portal:Donald Trump/Intro Galobtter (pingó mió) 07:09, 14 April 2018 (UTC)
I tried various forms of transclusion of the lead, and they all require intrusive coding of the source. Either section markers, or noinclude tags.
I think an excerpt-updater would be better, as there would be zero impact in the source pages in article space. Cumulatively, portals include tens of thousands of excerpts. Injecting code for each of those into article space would be unnecessary clutter, when we could have a bot update the portal subpages instead.    — The Transhumanist   00:38, 15 April 2018 (UTC)
Adding code to the mainspace pages to facilitate translations is a really bad idea. GF editors will just strip the coding. I don't support automatically changing the text of portals to match the ledes I'd rather see them redirected to the matching articles. Short excerpts don't really help the reader especially for broad concept articles whicb is what most portals try to cover. Legacypac (talk) 04:19, 15 April 2018 (UTC)
Such coding generally has comments included with it so that GF editors don't remove it. As for support/oppose, that's irrelevant, as it is allowable code, like all the other wikicode we use. They added an entire extension to MediaWiki, available on all MediaWiki sites, for transcluding content based on inserted code, and it's already a standard feature on Wikipedia. I think such code makes the source less readable, and think it is best practice to avoid it. As long as there is an alternative, like bot-updated excerpts in portals.
Redirects would be links. Portals with just links are lists, not portals. To go to merely redirects, the portal design itself would need to be changed via a new consensus. Portals display content by transcluding excerpted content, that's their core design element. One of the biggest problems with portals is that there aren't enough editors to refresh the excerpts manually. Hence, the bot request.
Short excerpts are exactly the point of portals. To let editors dip in to the subtopics of a subject, in exactly the same way the main page does that for the entire scope of Wikipedia. While you may not find them useful, I find the main page highly useful and entertaining. I rarely follow the links to the rest of the article, but am glad I read the excerpts. The thing I love about it most is that the content changes daily. If portals were set up like that, I would visit the portals for my favorite subjects often. I might even assign one as my home page. Bots can accomplish this. But rather than tackling the whole thing at once, focusing on a bot for updating the portion at the topmost part of the page, the intro, seems like a good place to start.    — The Transhumanist   05:49, 15 April 2018 (UTC)
For a way to avoid that, see this revision I did on Portal:Water. Only problem is that it transcludes the entire page which is pretty heavy..then uses regex to find the first section... But yeah, I do agree with you - I don't see how the excerpts help much. Galobtter (pingó mió) 06:04, 15 April 2018 (UTC)
Excerpts are the current design standard of portals. Changing the practice of using excerpts would be a change in the design standard of portals, which is outside the scope of this venue. Bots are for automating routine and tedious tasks. The method for updating excerpts has been for the most part to do it manually. A bot is needed to help with this onerous chore.    — The Transhumanist   06:09, 15 April 2018 (UTC)
Excerpts not helping much is part of the my general position that portals don't help much in general haha Galobtter (pingó mió) 06:26, 15 April 2018 (UTC)
Based on the replies, the strongest exception to portals was that they are out of date and unmaintained. Both of which problems can be solved with bots. So, I've come to the experts. I'm sure they can find an automatable solution.    — The Transhumanist   23:21, 15 April 2018 (UTC)
Excerpts are part of the problem, not a solution. Portals are a failed idea amd no amount of bot mucking around is going to fix them. Legacypac (talk) 18:08, 16 April 2018 (UTC)
Please keep in mind when transcluding anything from mainspace, fair use media is currently restricted to "articles" and should not be transcluded to Portal space. — xaosflux Talk 19:15, 17 April 2018 (UTC)
You mean, like pictures of book covers, logos, and the like?    — The Transhumanist   04:17, 18 April 2018 (UTC)
I tend to think both bot updating and transclusion of content from article space are problematic approaches. The stated problem that this request is trying to fix is that portal intros become stale over time because no one is paying attention. If some automated process is adopted, portal pages could well become broken and stay that way for long periods of time because no one is paying attention. I'd take stale over broken any day. Of course, the risk of such breakage depends on how the automation is done, but isn't a better solution to simply avoid potentially dated language/information in portal intros, or to mark such stuff with, say, {{as of}} or {{update after}}? This would require an initial round of assessments to add such templates (/fix problematic wording, etc.), but it looks like with all the attention portals are getting there will be a concerted effort to review portals once the current RFC fails is closed. - dcljr (talk) 22:38, 19 April 2018 (UTC)
To reduce the "brokenness rate", the bot could first add {{historical}} to all the portals which have less than a certain threshold of edits in a certain period, then after a week perform the proposed edit to the existing "intro" section/subpage of the portals which are not marked historical. --Nemo 12:15, 16 May 2018 (UTC)
  • At this point the vast majority of portals have been updated with a variety of templates which transclude content from mainspace directly. This was a good idea at the time, but does not seem to have been the prefered option. JLJ001 (talk) 15:25, 29 May 2018 (UTC)

Bot to tag all remaining disambiguation links.Edit

We developed a consensus a while back to tag all remaining disambiguation links in the project with a {{dn}} tag. In order to avoid excessive tagging, the idea is to generate a list of all links, let it sit for a few weeks, then recheck it and tag everything that has still not been fixed after that interval. Any takers? bd2412 T 22:20, 17 April 2018 (UTC)

@BD2412: - I think I can write this one. Basically, we're looking for links to pages in Category:Disambiguation pages inside articles. Generate a list based on that, and after a couple weeks - rerun with tagging enabled for the links in that list. The query to find those pages should be: quarry:query/26624 if I understood right (Quarry's taking a long time to run it - DBeaver came back with 20,000+ hits) SQLQuery me! 21:56, 23 April 2018 (UTC)
There should be fewer than 8,200 total disambiguation links at this time, per The Daily Disambig; of those, at least 2,100 should already be tagged (you can exclude pages that already have such a tag on them), although many of the articles with tags are likely to include multiple tagged links, so I would think that the task should involve no more than 6,000 links to be tagged. bd2412 T 22:29, 23 April 2018 (UTC)
Good point, I'll rewrite the query to exclude {{dn}}. SQLQuery me! 22:32, 23 April 2018 (UTC)
@SQL: Hi, just following up on this. Cheers! bd2412 T 22:51, 9 June 2018 (UTC)

Invalid fair use mediaEdit

How are we doing on getting a bot together that detects improper use of non-free media (if not the actual removal from the articles)? That is, the use of non-free media on articles for which the file description page lacks a valid WP:FUR specific to that article - I've just found and removed this, 366 days after this image was added lacking a valid FUR for the article, contrary to WP:NFCCP#10c. --Redrose64 🌹 (talk) 19:10, 20 April 2018 (UTC)

My bot is approved for this, however, there was far too much whining during the brief time that it was running. — JJMC89(T·C) 00:03, 21 April 2018 (UTC)
Whining at being told about copyright issues is a tradition almost as old as Wikipedia itself. I can recall many heated debates and people threatening to quit the project... Someguy1221 (talk) 00:19, 21 April 2018 (UTC)
  • I think that a proposal on a larger forum would not find consensus for this. On the other, producing a list of articles and what images are problematic is still helpful and no one would object to a list. Oiyarbepsy (talk) 00:07, 21 April 2018 (UTC)
    OK, take my original post and ignore the parenthesis "(if not the actual removal from the articles)". Can we at least do the detection? I don't mind if it's a list, a notice placed on the talk page of the file, or a notice on the talk page of the article. The latter two would need some sort of tracking category. --Redrose64 🌹 (talk) 08:19, 21 April 2018 (UTC)

JJMC89, could your bot task be modified to simply log these uses instead of removing them? Oiyarbepsy (talk) 01:12, 22 April 2018 (UTC)

It is easier to write something new than to modify the other script. The issue will be the time needed to check the 602,549+ files. I'm doing some testing. — JJMC89(T·C) 05:28, 23 April 2018 (UTC)
Already at about 5,000 violations and only in the G's. — JJMC89(T·C) 01:23, 24 April 2018 (UTC)
Report available at User:JJMC89 bot/report/NFCC violations (Warning: large page). — JJMC89(T·C) 01:06, 26 April 2018 (UTC)
  Thank you, I'll look at it next time I have a day off work (Saturday?) --Redrose64 🌹 (talk) 07:06, 26 April 2018 (UTC)
  • Redrose64 I've been going thru the resulting list (starting at the top), but the large size of the page has left me unable to edit to remove the ones I've completed. If you do work on it, consider starting from the bottom so we don't duplicate each others work. Oiyarbepsy (talk) 04:20, 28 April 2018 (UTC)
    OK, will do... unfortunately, although today is Saturday, I've been called in to work to cover an absence. Will get round to it ASAP. --Redrose64 🌹 (talk) 10:51, 28 April 2018 (UTC)
    I've updated the report to only list 1000 files at a time to make the page size manageable. This is configurable, so let me know if you want a different limit. — JJMC89(T·C) 07:37, 6 May 2018 (UTC)

WikiProject Athletics taggingEdit

It's been four years since this project last had a tagging run and I'm looking to get Article Alerts to cover the many relevant articles that have not been tagged since. Anyone interested in doing a tagging run of the articles and categories under Category:Sport of athletics? SFB 19:03, 4 May 2018 (UTC)

  Working on this tagging part over the next few days.   ~ Tom.Reding (talkdgaf)  22:11, 4 May 2018 (UTC)
Sillyfolkboy, 2 questions:
  1. there are ~6000 ~5600 pages to tag. I will propagate the |class= of other WikiProjects, if available. Should I leave |importance= blank, or use |importance=Low? The idea being that if importance were > "Low", it probably would have been tagged as such by now. I can also do this for articles less than a certain size instead.
  2. I'll leave pages alone (for now) which do not have any WikiProject tagged. To make classification of the resulting unclassified pages faster, I can apply |class=Stub to all pages less than 1000, 2000, 3000, etc. bytes. Please take a look at that list of ~6000 ~5600 and let me know what threshold below which to tag pages as stubs (if at all).
WP Athletics notified for input as well.   ~ Tom.Reding (talkdgaf)  23:47, 4 May 2018 (UTC)
@Tom.Reding: I would recommend propagation of other project's class if available, or mark as stub if under 2000 bytes. You can place importance as low by default. The project is quite well developed now, so the vast majority of important content is already tagged. These will mainly be recent articles on lower level athletes and events.
Category:Triathlon, Category:Duathlon, Category:Foot orienteers‎, Category:Athletics in ancient Greece and Category:Boston Marathon bombing need to be manually excluded. Thanks SFB 01:41, 5 May 2018 (UTC)
PetScan link updated to exclude those cats, ~400 removed. Won't start on this for a few days for possible comments.   ~ Tom.Reding (talkdgaf)  03:15, 5 May 2018 (UTC)
Orienteers were not excluded on the previous run, and consequently a lot of orienteering articles are now tagged as being within the scope of WikiProject Athletics, even though (with some exceptions) they're actually not. Would it be possible to untag them by bot? Sideways713 (talk) 16:20, 5 May 2018 (UTC)
Sideways713, pages < 2000 b mostly done. Will leave |class= blank for those >= 2000 b. Let me know if there's any desired change to the above guidance. Can do the untagging after.   ~ Tom.Reding (talkdgaf)  13:16, 18 May 2018 (UTC)
Re: Orienteering+Athletics, this scan shows 459 which are tagged as both. However, just because someone is in Orienteering doesn't mean they shouldn't be in Athletics, only that they're a candidate for removal. So it's probably best to do this manually, unless there's some rigorous exclusion criteria available?   ~ Tom.Reding (talkdgaf)  17:28, 19 May 2018 (UTC)
If you exclude those in subcategories of Track and field athletes (at any level), and those in subcategories of Sports clubs at level 3 or lower, and possibly those in subcategories of Mountain running (I'm not entirely sure about this one - how does @Sillyfolkboy feel?), and untag the rest, that should be good enough. (That's only a few dozen exclusions.) There will probably still be some false removals - orienteers who dabble in running enough they could be marked as runners on wiki, but aren't yet - but it's a lot less effort to happen upon those later and tag them manually than it is to untag the other 400 pages manually, and the false removals should all be of athletes whose main claim to fame is orienteering and whose articles will be more naturally developed by members of that wikiproject. Sideways713 (talk) 22:43, 19 May 2018 (UTC)
I'm good with the above. There isn't actually a whole lot of crossover between orienteering and elite long-distance running, probably because the later is much better paying than the former so it isn't something, say, a marathon specialist would consider normally. SFB 23:43, 19 May 2018 (UTC)
Sideways713 & SFB: here is the PetScan (434 results) for these doubly-tagged pages with Category:Track and field athletes & Category:Mountain running, both fully recursed, removed. I've tried removing Category:Sports clubs at level 3 or lower via PetScan and locally via AWB's variably-recursive category utility, but both timeout at depths of 5 and greater. The tree grows very quickly, with ~13,000 unique mainspace pages at a depth of 2, to ~311,000 at a depth of 4. D2's pages subtracted from D4's pages gives a ~298,000 pool of pages to try to remove from the 434, but only 2 pages are removed (Brit Volden & Øyvin Thon), leaving 432, so this isn't a practical approach.   ~ Tom.Reding (talkdgaf)  14:31, 20 May 2018 (UTC)
@Tom.Reding: On that basis, I would leave this to a manual task. Given the small article base, there aren't any major downsides to the accidental inclusion in scope, especially as WikiProject Orienteering seems inactive at the moment. SFB 14:47, 20 May 2018 (UTC)
I meant exclude levels 1, 2 and 3 but don't exclude 4 and up, rather than the opposite. Sorry if that was unclear. Sideways713 (talk) 16:24, 20 May 2018 (UTC)
Right, but it's a distinction without a real difference. It would return a subset of the ~300k I found (since I lumped level 3 into those 300k instead of excluding them), so I decided to not be any more precise, since there's no need - the result would be either the same (i.e. I'd still find those same 2 to be removed from the 434) or worse (I'd find 0 or 1 of those same 2); basically a way for programmers to rationalize exerting least effort...   ~ Tom.Reding (talkdgaf)  19:45, 20 May 2018 (UTC)
No, what I meant is this, which gives 420 results. Sorry if there's a communication problem, Sideways713 (talk) 21:53, 20 May 2018 (UTC)
Sideways713, sorry for the delay. Just to be sure: those 420 results need to have {{WikiProject Athletics}} removed?   ~ Tom.Reding (talkdgaf)  14:25, 5 June 2018 (UTC)
SFB, can you confirm instead?   ~ Tom.Reding (talkdgaf)  21:11, 6 June 2018 (UTC)
@Tom.Reding: above link is down so I can't see the results, but I still think this action is better done manually, given the cross-over in the sports (i.e. just because an orienteer isn't currently in a track athlete category doesn't necessarily mean the athlete has not competed in track). Happy for you to proceed on your rationalized approach per above. SFB 22:27, 6 June 2018 (UTC)

Athletics piped linksEdit

There is a historical link issue that needs sorting out for the article Sport of athletics.

Would it be possible to amend all piped links to Athletics (sport) (an old title and currently a redirect) to point directly to Sport of athletics? The old title is still ambiguous with Athletics (physical culture), which was the reason for the subsequent move. 99% of the incoming links are valid, as it's a non-natural title choice.

There is also a sub-sport distinction link issue with track and field. I've seen many links in the style [[track and field|athletics]] and [[track and field athletics|athletics]] – these piped links should also be piped to sport of athletics to remove the WP:EASTEREGG aspect. Similarly, links like [[sport of athletics|track and field]] should simply point to track and field. SFB 19:22, 4 May 2018 (UTC)

@Sillyfolkboy: Has consensus been established for this change? It makes sense on its face, but it's best to ask on the article talk page or at the appropriate WikiProject first. Richard0612 22:36, 14 May 2018 (UTC)
@Richard0612: I've added this to the Wikiproject talk page and at Talk:Sport of athletics.
  • [[track and field|athletics]] → [[sport of athletics|athletics]]
  • [[track and field athletics|athletics]] → [[sport of athletics|athletics]]
  • [[sport of athletics|track and field]] → [[track and field]]
  • [[sport of athletics|track and field athletics]] → [[track and field]]
  • [[athletics (sport)|track and field]] → [[track and field]]
  • [[athletics (sport)|track and field athletics]] → [[track and field]]
  • [[athletics (sport)|athletics]] → [[sport of athletics|athletics]]
Better specified the targeted changes too SFB 15:00, 15 May 2018 (UTC)

Can anyone bulk undo edits by a single user?Edit

Can anyone bulk undo the most recent edit by User:Dispenser tot he commented out list of articles? They were fine edits, but I need the previous state of the article to show up for the public and the information from those edits can be gotten later out of the article history. Abyssal (talk) 20:30, 4 May 2018 (UTC)

I would recommend asking Dispenser directly to see if they'd be able to mass-rollback the edits in question. If not, ping me and I'll look into it. Richard0612 22:34, 14 May 2018 (UTC)
  Not a good task for a bot It was only 111 edits not worth getting a bot operator involved. — Dispenser 19:03, 16 June 2018 (UTC)

WikiSpaces wikis linked from all WikipediasEdit

Hello! WikiSpaces is closing on July 2018. It would be helpful having a list of all "" from external-links table in all Wikipedias (and sister projects too, why not). In WikiTeam we will try to preserve all these open-knowledge sites. Thanks. emijrp (talk) 13:02, 5 May 2018 (UTC)

@Emijrp: You can get this yourself with a really simple search. @Cyberpower678: may want to do a botjob for the domain. --Izno (talk) 13:55, 5 May 2018 (UTC)
The domain needs archiving first. I've submitted a list of discovered Wikispaces URLs that IABot found during the course of its runs to the maintainers of the Wayback Machine for mass archiving.—CYBERPOWER (Chat) 18:11, 5 May 2018 (UTC)
@Cyberpower678: Can you send me a copy of that URLs? Wayback Machine is good (archive HTMLs), but I am coding a bot to export the wikicode from the wikis. emijrp (talk) 07:39, 6 May 2018 (UTC)
Here you go.—CYBERPOWER (Chat) 13:32, 6 May 2018 (UTC)

Cyberbot I Book report updatesEdit

The Cyberbot I (talk · contribs) bot used to update all the book reports but stopped from January 2018. It seems its owner is caught up IRL to fix this. Can anyone help by checking the bot or the code? —IB [ Poke ] 16:13, 5 May 2018 (UTC)

You need to check with User:cyberpower678 - see Wikipedia:Bots/Requests for approval/Cyberbot I 5 - User:cyberbot I says it's enabled. There is no published code. Ronhjones  (Talk) 20:53, 14 May 2018 (UTC)
@Ronhjones: I have tried contacting Cyberpower a number of times, but he/she does not look into it anymore. Although the bot is listed as active for book status, it has stopped updating it. So somewhere it is skipping the update somehow. —IB [ Poke ] 14:30, 21 May 2018 (UTC)
@IndianBio: Sadly the original request Wikipedia:Bots/Requests for approval/NoomBot 2 has "Source code available: On request", so there is no working link to any source code. If User:cyberpower678 cannot fix it the current system, then maybe the only option is to write a new bot from scratch. I see user:Headbomb was involved in the original BRFA, maybe he might have some ideas? I can think about a re-write if there's no alternative - I will need a bit more info on what the bot is expected to do. Ronhjones  (Talk) 14:57, 21 May 2018 (UTC)
The modification likely isn't very big, and User:Cyberpower678 likely has the source code. The issue most likely consist of finding what makes the bot crash/not perform, and probably update a few API calls or something hardcoded into the bot (like a category). Headbomb {t · c · p · b} 15:01, 21 May 2018 (UTC)
Yes, I have the source, but it was modified as needed to keep it operational over time. @Headbomb: If you email me, I can email you a current copy of the source to look at.—CYBERPOWER (Chat) 15:58, 21 May 2018 (UTC)
I suppose I could, but I'm a really shit coder. Is there a reason to not make the source public? Headbomb {t · c · p · b} 16:35, 21 May 2018 (UTC)
It actually is.—CYBERPOWER (Chat) 17:03, 21 May 2018 (UTC)
@Headbomb: will you take a look at the code? I'm sorry I really don't understand the link which Cyberpower has given. I only code in Mainframe lol, but let me know what seems to be the issue. —IB [ Poke ] 08:04, 22 May 2018 (UTC)
Like I said, I'm a shit coder. This looks to be in PHP so presumably anyone that knows PHP could take over the book reports. Headbomb {t · c · p · b} 13:09, 22 May 2018 (UTC)
Someone only need to file a pull request, and I will deploy it.—CYBERPOWER (Chat) 13:42, 22 May 2018 (UTC)
I can have a look - I'm not a PHP expert by any means (I prefer Python! ;) ) but I've used it extensively in a past life. Richard0612 19:31, 22 May 2018 (UTC)
Richard0612, that will be a real help if you can do it. A lot many books are lagging in their updates. —IB [ Poke ] 12:32, 23 May 2018 (UTC)

(→) Hey @Richard0612: was wondering did you get a chance to look into the code base? —IB [ Poke ] 09:17, 4 June 2018 (UTC)

Not getting any response, so pinging @Cyberpower678: what can be done? —IB [ Poke ] 06:32, 10 June 2018 (UTC)

MeetUp: Women of Library HistoryEdit

Hello There, I would like to send a MeetUp invitation to all active Wikipedians in New Orleans (particularly librarians)--here is our MeetUp page: Wikipedia:Meetup/New_Orleans/WomeninLibraryHistory Please let me know if I need to do anything else--thanks! RachelWex (talk) 01:16, 19 May 2018 (UTC)

@RachelWex: Well the first thing you need is to craft the message to be sent and have a list of people/pages to notify. Then several people can sent those notices. Headbomb {t · c · p · b} 01:24, 19 May 2018 (UTC)
@Headbomb: I can craft the message, but I do not know how to locate the active Wikipedians in New Orleans. Any suggestions? RachelWex (talk) 01:37, 19 May 2018 (UTC)
I'd suggest lookling at Wikipedia:WikiProject Louisiana/Wikipedia:WikiProject New Orleans. Headbomb {t · c · p · b} 01:40, 19 May 2018 (UTC)
Not sure why this is at BOTREQ - it seems more of a task for WP:MMS (if you have a list of recipients) or for WP:GN (if you have a geographical area to target). --Redrose64 🌹 (talk) 07:37, 19 May 2018 (UTC)

Alexa rankings / Internet ArchiveEdit

This isn't really a bot request, in the sense that this doesn't directly have anything to do with the English Wikipedia and no pages will be edited (no BRFA is required), but I'm putting it here nonetheless because I don't know of a better place and it's used 500% more than Wikidata's bot requests page. However, it will benefit both Wikidata and Wikipedia.

I have been archiving (with wget and a list of URLs) a lot of pages onto the Internet Archive and, currently about 75,000 daily (all the same pages). This was originally supposed to be for Wikidata and would have been done once a month on a lot more URLs, but that hasn't materialized. Unfortunately maintaining this automatically would be beyond my rudimentary shell script skills, and to run it as I am doing currently would require time which I do not have.

Originally d:User:Alexabot did this based on some URLs from Wikidata, but the operator seems to have vanished after being harangued on Wikidata's project chat because he added the data to items which were not primarily websites. It follows that in the absence of an established process to add values for the property to Wikidata, the archiving should be done separately, with the data to be harvested where needed. Module:Alexa was to have been used with the data, but the bot only completed three runs so it would be outdated at best, and the Wikidata RFC might end up restricting its use.

Could someone set their Unix-based computer, and/or or their bit of the WMF cloud servers, to

  • once a day, archive (to the Internet Archive) and/or download several lists of domain names (e.g. those used on Wikipedia and Wikidata; from CSV files which are sitting on my computer; lists of the top 1 million websites) and combine the lists
  • format those domain names with the regular expression below
  • once a month (those below about ~100,000 in rank) or daily/weekly (those ~100,000 and above), archive (to the Internet Archive or all of the URLs (collected on a given day) between 17:10 UTC and 16:10 UTC the day after (Alexa seems to refresh data erratically between 16:20 and 17:00 each day, independent of daylight saving time)
    • wget allows archival of lists of websites; use -i /path/to/file and -o /path/to/file flags for archival and logging respectively
  • possibly, as an unrelated process, download the archived pages using URL format (where YYYYMMDD is some date) and then harvest the data (Unix shell script regular expressions are almost entirely sufficient)
    • alternatively, just download directly from around the same time (see below)$1\&y=t\&b=ffffff\&n=666666\&f=999999\&p=4e8cff\&r=1y\&t=2\&z=30\&c=1\&h=150\&w=340\&u=$1\&y=q\&b=ffffff\&n=666666\&f=999999\&p=4e8cff\&r=1y\&t=2\&z=0\&c=1\&h=150\&w=340\&u=$1


  • The Wikidata property currently uses the URL access date as the point in time, instead of the date that the data was collected (one day and 17 hours before a given UTC time), and does not require an archive URL or even a date. This might be fine for Google's Wikidata item since it will be number 1 until the end of time, but for everything else it will need to be fixed at some point
  • If you don't archive the graph images at the same time (or you archive pages too quickly), will start throttling connections from the Internet Archive and you will be unable to archive /siteinfo/* for about a week
  • does not allow a large number of incoming connections per IP for either upload or download (only tested with IPv4 addresses – might be better with multiple IPv6 addresses), so you may want to get around this somehow. I have been funneling connections through Tor simply because it seemed easier to configure torsocks, but this is not ideal
  • Given the connection limit, it is only possible to archive about 100,000 pages and 200,000 graphs per day per IP address (and there might be another limit on, which I haven't tried testing)
  • You can use wget's --spider and --max-redirect flags to avoid downloading content
  • Rankings below a certain point (maybe 1 million) are probably not very useful, since the rate of change is high. The best way to check this – which I haven't tried, because it only just occurred to me – is probably to download pages straight from while the data is being archived, and check website rankings that way.
  • Some URLs are inexplicably blocked from archival on the Wayback Machine. Those are …/, …/ and …/ (there may be others but I haven't found any more). (which archives page requisites server-side) seems to block repeated daily archival after a certain point but you can avoid this by using URLs which redirect to the content to be archived
    • isn't supposed to be scriptable, but I did it anyway with a Lynx script
  • Some websites inexplicably disappear from the rankings from day to day, so don't stop archiving websites simply because their ranking disappears

If you want I can send you the CSV files of archive links that I've accumulated (email me and/or my alternate account, Jc86035 (1)). I have also been archiving and if it would be of any use I've written a shell script for that website.

Notifying Cyberpower678, just because you might know something I don't due to running InternetArchiveBot (or you might be able to get the Internet Archive to do this from their servers). Jc86035 (talk) 15:02, 20 May 2018 (UTC)

Jc86035 - I read all this and understand some things, but don't really understand what the goal is. Can you explain in a sentence or two. Are you trying to archive all URLs obtained through onto / on a daily/weekly basis? What is the connection to wikidata? What is the purpose/goal of this project? -- GreenC 15:53, 26 May 2018 (UTC)

Jc86035 - re: - there are some libraries on GitHub for making page saves, but generally when doing mass uploads you'll want to setup an arrangement with the site owner to feed them links, as it gets better results if he does the archiving in-house, as he can get around country blocks and other things. -- GreenC 15:53, 26 May 2018 (UTC)

@GreenC: Originally the primary motivation was to collect data for Wikidata and Wikipedia. Currently most citations do not even have an archive link (and sometimes don’t even have a date), so the data is completely unverifiable unless Alexa for some reason releases archives of their old data. Websites ranked lower than 10,000 had usually been archived about once before I started archiving. However, I don’t really know what data should be archived (I don’t know how to make a list based on Wikipedia/Wikidata outlinks and haven’t asked anyone for such a list yet), and as such have just archived pages based on other, more easily manipulable lists of websites (such as some CSV file that I found in a web search for the top 1 million websites, which is apparently monthly Alexa data), and because it’s generally difficult and tedious to maintain I’ve just gotten a script to run the same list of about 75,000 archive links at the same time every day. seems to only archive one page every two seconds at maximum, based on its RSS feed. Since the Internet Archive is evidently capable of a much higher throughput I would rather not overwhelm with lots of data which isn’t really all that important. I might ask the website owner to archive those three pages every day, though. Jc86035's alternate account (talk) 15:21, 27 May 2018 (UTC)
Anytime a new external link is added to Wikipedia, the Wayback Machine sees it and archives it. This is done automatically daily with a system created and run by Internet Archive. In addition has done active archiving of all links though I am not sure what the current ongoing status. Between these two most (98%) newly added links are getting archived. I don't know what an Alexa/com citation is, a Special:External links search only shows about 500 URLs on enwiki. -- GreenC 04:14, 30 May 2018 (UTC)
@GreenC: How does the external link harvesting system work? Is the link archival performed only for mainspace, or for all pages? If an added external link has already been archived, is the link archived again? (A list could be created in user space every so often, although there would be a roughly ​136 chance of a given page's archival being done when the pages are being changed to use the next day's data, which would make the archived pages slightly less useful.)
There are lots of pages which currently do not have Alexa ranks but would benefit from having them added, mostly the lists of websites and the articles of the websites listed (as well as lists of other things which have websites, like newspapers). It would work as a proxy for popularity and importance. Jc86035's alternate account (talk) 08:11, 7 June 2018 (UTC)
@Jc86035: NoMore404. It gets links via the IRC system which I believe is for all spaces. Could test by adding a link to a talk page (not yet on Wayback) and check in 48hrs to see if it's on Wayback. Once a link is in the Wayback it automatically recrawls though how often hard to say. some pages multiple times a day, others once a year, etc.. not sure how they determine freq. -- GreenC 12:48, 7 June 2018 (UTC)
@GreenC: I've added links to Draft:Alexa Internet and User:Jc86035/sandbox, which should be enough for testing. Jc86035's alternate account (talk) 06:18, 8 June 2018 (UTC)
Both those URLs redirect to a page already existing in the Wayback not sure how nomo404 and wayback machine will respond. Redirects are a complication on Wayback. -- GreenC 15:42, 8 June 2018 (UTC)
@GreenC: None of the URLs have been archived. I think I'll probably stick to using the long list of URLs, although I might try putting them in the WMF cloud at some point. Jc86035 (talk) 16:19, 16 June 2018 (UTC)
Jc86035 The test URLs you used won't work, they are already archived on the Wayback. As I said above, "Both those URLs redirect to a page already existing in the Wayback". Need to use URLs that are not yet archived. -- GreenC 18:47, 16 June 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @GreenC: Okay. I've replaced those links with eight already-archived links and eight unarchived links; none of them are redirects. Jc86035 (talk) 06:39, 17 June 2018 (UTC)

Ok good. If not working 24-48hr I will contact IA. -- GreenC 14:07, 17 June 2018 (UTC)
Jc86035 - those spotify links previously exist on Wayback, archived in March. Need to find links not yet in the Wayback. -- GreenC 13:50, 19 June 2018 (UTC)
@GreenC: Eight of them (2017-12-14) were archived by me, and the other eight (2018-06-14) are too recent to have been archived by me. Jc86035 (talk) 14:41, 19 June 2018 (UTC)

Tag covers of academic journals and magazines with Template:WikiProject Academic Journals / Template:WikiProject MagazinesEdit

The task is "simply"

I believe in both cases, the parameters may be simply the name of the file (e.g. File.svg), or a full [[File/Image:....]] thing.

The task would need to be run daily/weekly. Headbomb {t · c · p · b} 14:31, 24 May 2018 (UTC)

Not to ask the stupid question, but why not just check the talk page of any article using the above infoboxes and place the appropriate talk page tag if necessary? Primefac (talk) 17:42, 24 May 2018 (UTC)
Because it's the (non-free) image files (i.e. pages in the File namespace) that need to be tagged, not the articles. feminist (talk) 03:04, 25 May 2018 (UTC)
Oh, right, my apologies. I misread "the associated file" as "the associated talk page" for some reason. Primefac (talk) 12:49, 25 May 2018 (UTC)
  Coding... should hopefully get this done. Dat GuyTalkContribs 07:16, 29 May 2018 (UTC)
Headbomb want me to also use the class from any wikiprojects in the talk page? Dat GuyTalkContribs 16:13, 5 June 2018 (UTC)
Never mind, stupid question. Dat GuyTalkContribs 16:26, 5 June 2018 (UTC)
  BRFA filed. Dat GuyTalkContribs 16:18, 15 June 2018 (UTC)

Removing succession boxes from song and album articlesEdit

The consensus during a recent RFC was to remove succession boxes from song and album articles. Since these appear in over 4,200 song[2] and 2,000 album articles,[3] it seems that this may be a good job for a bot. —Ojorojo (talk) 14:33, 30 May 2018 (UTC)

  BRFA filedRonhjones  (Talk) 15:36, 10 June 2018 (UTC)

Em dashesEdit

I find myself removing spaces around em dashes frequently. Per the MOS, "An em dash is always unspaced (that is, without a space on either side)".

Example of occurrence

Since this is such a black and white issue, a bot to automatically clean this up as it happens would be useful. Kees08 (Talk) 05:49, 31 May 2018 (UTC)

This is a context-sensitive editing task, since some spaced em dashes should be converted to en dashes, not to unspaced em dashes. Others, such as those in file names, should be left alone. – Jonesey95 (talk) 12:46, 31 May 2018 (UTC)
True there are cases it matters. Cases such as file names will require exceptions that can be written in the code. As for em dashes that should be en dashes, since and dashes can be spaced or unspaced, switching to spaced will not hury anything, unless there is a specific context I am missing. Kees08 (Talk) 03:40, 1 June 2018 (UTC)

Orphan tagsEdit

Hi, could you please give a bot an extra task of removing orphan tags from articles that have at least one incoming link from mainspace articles, lists and index pages but not disambig pages or redirects as per WP:Orphan. The category is Category:All orphaned articles but exclude Category:Orphaned articles from February 2009 as an admin is checking those. A rough estimate is there are at least 10,000 misplaced tags, thanks Atlantic306 (talk) 17:07, 2 June 2018 (UTC)

JL-Bot already removes the orphan tag, but based on the original discussion it requires 5 or more links (ignoring type). This was done as checking the type of link is not always straightforward and adds processing time. The 5 links was a community agreed compromise. The only exception is dab pages which should never be tagged as orphans (it will de-tag those regardless of number of links). That task runs ever week or two. If someone wants to build a fancier checking, let me know and I will discontinue mine. -- JLaTondre (talk) 22:44, 2 June 2018 (UTC)
This botreq started on my talk page, I suggested posting here first, glad as I didn't know about JL-Bot. I wouldn't know how to improve on JL-Bot other than by using API:Backlinks but it's a wash in terms of functionality. BTW I wrote a command-line utility wikiget (github) that can be hooked through a system call eg. "wikiget -b Ocean -t t" will output all transcluded backlinks for Ocean. It handles all the paging and various API:Backlink options. -- GreenC 23:15, 2 June 2018 (UTC)
Atlantic306, how is this different from the request you made a month ago at Wikipedia:AutoWikiBrowser/Tasks#AWB Request 2, which was decidedly a non-starter? Pinging the other contributors from that discussion, Premeditated Chaos & Sadads. If a large # of orphans have already been manually checked and all that remains for that group is the busywork of removing the tag, then that might be ok if others agree, but we need to see a link to such a discussion.
JLaTondre, do you have a link to the 5+ link discussion?   ~ Tom.Reding (talkdgaf)  12:03, 4 June 2018 (UTC)
Hi, this is different to the AWB proposal as that was for the early category of 9000 articles whereas this proposal leaves that category out as it is being manually checked and referrs to all of the remaining orphan categories. As above, a bot is already removing tags but I think this needs to be set at one valid link as per WP:Orphan as the JPL bot approval was back in 2008 and now consensus has changed that one valid link is sufficient for the tag removal, thanks Atlantic306 (talk) 12:11, 4 June 2018 (UTC)
Discussion is shown here and here. GreenC (talk · contribs) has said his bot can differentiate the links so perhaps his bot could take over the task, thanks Atlantic306 (talk) 12:25, 4 June 2018 (UTC)
WP:ORPHAN says "Although a single, relevant incoming link is sufficient to remove the tag, three or more is ideal...", I would object to a bot removing orphan tags on articles with fewer than 3 links on this basis alone. Headbomb {t · c · p · b} 12:25, 4 June 2018 (UTC)
The conclusions reached at WP:AWB/Tasks#AWB Request 2 apply to all most orphans, from 2009 up until some arbitrary time in the near-past.   ~ Tom.Reding (talkdgaf)  12:30, 4 June 2018 (UTC)
(edit conflict) I still think automated removal of orphan tags in general is a bad idea. To me, going through the orphan categories isn't just about making sure something else points there. Orphan-tagged articles often suffer other issues, so the tag is kind of a heads-up that the article needs to be looked at. It's like a sneeze. It could be nothing, but it could mean you have allergies, or a cold.
Same thing with an orphan-tagged article. It could be a great but under-loved topic. But maybe it's a duplicate article or sub-topic and can be merge/redirected. Maybe it's a copyvio that flew under the radar. Maybe it's not actually notable and should be deleted. Maybe the title is wrong and it's orphaned because all the links point to the right (redlinked) title. Maybe the incoming links are incorrect and are trying to point to something else, and need to be changed.
If you just strip the tags without checking the article, you're getting rid of the symptom without checking to see if there's an underlying illness, which essentially reduces the value of the tag in the first place. ♠PMC(talk) 12:55, 4 June 2018 (UTC)
Echoing this from PMC. We don't suffer from having a neverending backlog, and the current bots (per discussion above) and AWB minor fixes already remove templates from pages that are already in the clear. I would much rather that we take the time to go through and find merges or deletes, get these pages added to WikiProjects, and generally do other minor cleanup that happens when human eyes are on the pages. Anything that has lived with few or no links for 9+ years, suggests to me that it hasn't been integrated into the Wiki adequately. If we just remove the tag, we remove the likilihood of it's discovery again. Sadads (talk) 14:35, 4 June 2018 (UTC) 

Someone to take over User:HasteurBotEdit

Hasteur (talk · contribs) has retired, it would be good if someone could take over the bot, that would be nice

The code can be found at is at, with hasteur stipulating "All I ask is that the credit for the work remains."

@Firefly: Hasteur posted this on your talk page, any interest in taking over? Headbomb {t · c · p · b} 10:40, 4 June 2018 (UTC)

@Headbomb: Yep, I'm happy to do this. Will look at it and submit a BRFA tonight (hopefully!) ƒirefly ( t · c · who? ) 13:27, 4 June 2018 (UTC)

Peer review - periodically contacting mailing list with unanswered reviewsEdit

Hi all, could a bot editor help out at peer review by creating a bot that periodically contacts editors on our volunteer list with a list of unanswered reviews? Some details

  • Discussion is here: WT:PR - the problem we are trying to answer is that of a large number of outstanding reviews that haven't been answerd
  • List of unanswered reviews is here: WP:PRWAITING
  • List of volunteers is here: WP:PRV
  • We will remove inactive volunteers, and I will reformat the list in a bot readable format similar to this: {{User:Tom (LT)/sandbox/PRV|Tom (LT)|anatomy and medicine|contact=never}}
    • Editors will opt in to the system - all will be set to default to never contact
    • Options for contact will be never, monthly, quarterly, halfyearly, and yearly (unless you can think of a more clever way to do this)

Looking forward to hearing from you soon, --Tom (LT) (talk) 23:12, 4 June 2018 (UTC)

Addit: ping also to Anomie who very kindly helped create the AnomieBot that now runs PR.--Tom (LT) (talk) 23:12, 4 June 2018 (UTC)
In the meantime, I'll mention that WP:AALERTS will report peer reviews requests to Wikiprojects, if the articles are tagged by banners. Headbomb {t · c · p · b} 11:29, 5 June 2018 (UTC)

Popular pages - indexing and WikiProject bannersEdit

Could someone help with doing the following to the pages in Category:Lists of popular pages by WikiProject?:

  • add the name of the WikiProject as a sort key to Category:Lists of popular pages by WikiProject
  • add the corresponding WikiProject category, with sort key "Popular Pages"
  • create a talk page (if it doesn't exist) and add the corresponding WikiProject banner

Oornery (talk) 05:14, 6 June 2018 (UTC)

Friendly Search SuggestionsEdit

Hi, suggesting that the Template:Friendly search suggestions be added by bot to every stub article talk page to aid the improvement of the articles, thanks Atlantic306 (talk) 20:53, 6 June 2018 (UTC)

That seems like an incredible waste of time and effort, as well as the patience of the community. Is there a consensus that this should be done? Primefac (talk) 12:42, 7 June 2018 (UTC)
  • Disagree, its completely uncontroversial and helpful to the community as the template has a large number of search options to improve stub articles and surely thats a very good use of time and effort to improve the Encyclopedia, for something so minor is consensus really needed? thanks Atlantic306 (talk) 20:30, 7 June 2018 (UTC)
Every stub article talk page - you're talking hundreds of thousands if not millions of stubs (Just checked the cat, which is at 2+ million). InternetArchiveBot and Cyberpower got harassed simply for placing (in my opinion completely relevant) talk page messages on a fraction of that. So yes, I do think you need consensus. Primefac (talk) 22:08, 7 June 2018 (UTC)
I would oppose that with tooth and nail. Absolutely not suitable for a bot task. Headbomb {t · c · p · b} 03:03, 8 June 2018 (UTC)
    • Well, its certainly too much for a human editor - if it was limited to 300 articles a day it would not cause much disruption, Atlantic306 (talk) 19:17, 14 June 2018 (UTC)
    • Will start an RFC when I have more time, thanks Atlantic306 (talk) 20:38, 16 June 2018 (UTC)

Sort Pages Needing Attention by Popularity/daily viewsEdit

I suggest, for example, that someone sort the items on this page Category:Wikipedia_requested_photographs by page popularity, similar to how this page is sorted: Wikipedia:WikiProject_Computer_science/Popular_pages Instead of clicking through random obscure pages, a sorted table would allow people to prioritize pages that need attention the most. The example bot is found here User:Community_Tech_bot. Turbo pencil (talk) 00:57, 8 June 2018 (UTC)

@Turbo pencil: Try Massviews. --Izno (talk) 02:23, 8 June 2018 (UTC)
@Izno: Thanks a lot Izno. Super helpful! — Preceding unsigned comment added by Turbo pencil (talkcontribs) 04:12, 8 June 2018 (UTC)

WP:RESTRICT archive botEdit

I asked for this over a year ago, and one bot op said they would do it... but they never did, so I’m asking again.

WP:RESTRICT is an incredibly bloated list of everyone who is currently sanctioned by arbcom or the community as well as those under “last chance” unblock conditions. In order to reduce the size of these lists and make them easier to navigate, it was decided that any sanction on a user who had been inactive or blocked for more than two years be moved to an archive. The sanction is still valid, just not displayed on the main page anymore, and can be moved back if the user returns to editing.

I did the initial archiving myself 14 months ago. It ranks as pretty much the most tedious thing I have ever done in nearly 11 years of contributing here. I would therefore like to again request that some bot or other be instructed to review listings there once a month or so and remove any fitting the criteria to the archive. If it could move back those that have returned to editing that would be amazing. We seem to be able to auto-generate such data for inactive admins so I am guessing (as someone who admittedly knows nothing at all about programming bots) that this should be fairly straightforward. Thanks for your time. Beeblebrox (talk) 03:32, 8 June 2018 (UTC)

For now, I gave the page WP:RESTRICT#Active_editing_restrictions a spitshine with collapsible tables. Headbomb {t · c · p · b} 13:46, 8 June 2018 (UTC)

Replace links to AP news hosted by Google with AP website linksEdit

Can anyone create a bot to replace links matching the regex*\?docID=([0-9a-f]{32}) with$1. There are about 2800 links to AP news hosted by Google and all the links are dead. I estimate about 20–30% of these links have the docId tag and can be rewritten to link to AP's website. This doesn't always work, but it works often enough to make this worth the effort. You'll need to download the page first and check for absence of the string "The page you’re looking for doesn’t exist. Try searching for a topic." and the presence of a non-empty div of articleBody class. You'll also have to flip the deadurl tag to no after replacement and avoid references that have already been archived. Some examples:

Gazoth (talk) 13:09, 8 June 2018 (UTC)

Bot to change redirects to 'Redirect'-class on Talk pages?Edit

As per edits like this one I just did, is it possible to have a bot go around and check all the extant Talk pages of redirect pages, and confirm/change all of their WP banner assessment classes to 'Redirect'-class?... Seems like this should be an easy and doable task(?)... FWIW. TIA. --IJBall (contribstalk) 15:41, 10 June 2018 (UTC)

Just ran a petscan on Category:All redirect categories, which I assume includes all redirects, but which also contains 2.9 million pages. Granted, 1.4mil of these are in Category:Unprintworthy redirects (which likely do not have talk pages of their own), and there are probably another million or so with similar no-talk-page status, but that's still a metric buttload of pages to check. Not saying it can't be done (haven't really even considered that yet), just thought I'd give an idea of the scale of this operation. Primefac (talk) 16:25, 10 June 2018 (UTC)
No one says it needs to be done "fast"!...   Maybe a bot can check just a certain number of redirect pages per day on this, so it doesn't overwhelm any resources. --IJBall (contribstalk) 16:59, 10 June 2018 (UTC)
I believe the banners automatically set the class to redirect when the page is a redirect, sp that would fall under WP:COSMETICBOT. Not sure what happens if |class=C is set on a redirect, but that should be easy to test. If |class=C overrides redirect detection, that would be suitable for a task. Headbomb {t · c · p · b} 03:39, 11 June 2018 (UTC)
I'm telling you, I just had to do this twice, earlier today – two pages that had been converted to redirects years ago still had still had |class=B on their Talk pages. It's possible that this only affects pages that were converted to redirects years ago, but it looks there is a population of them that need to be updated to |class=Redirect. --IJBall (contribstalk) 03:47, 11 June 2018 (UTC)
Setting |class=something overrides the automatic redirect class. This should be handled by EnterpriseyBot. — JJMC89(T·C) 05:11, 11 June 2018 (UTC)
Yup, as JJMC89 mentioned, this is EnterpriseyBot task 10. It hasn't run for a while, because I let a couple of breaking API changes pass by without updating the code. I'm going to fix the code so it can run again. Enterprisey (talk!) 05:45, 11 June 2018 (UTC)
If such a task is done, it's best to either remove the classification and leave the empty parameter |class=, or remove the parameter entirely. As Headbomb and JJMC89 have noted, redirect-class is autodetected when no class is explicitly set. This is true with all WikiProject banners built around {{WPBannerMeta}} (but see note), so setting an explicit |class=redir just means that somebody has to amend it a second time if the page ceases to be a redirect.
Note: there are at least four that are not built around {{WPBannerMeta}}, and of the four that I am aware of, only {{WikiProject U.S. Roads}} autodetects redir class; the other three ({{WikiProject Anime and manga}}; {{Maths rating}}; and {{WikiProject Military history}}) do not autodetect, so for these it must be set explicitly; moreover, those three only recognise the full form |class=redirect, they don't recognise the shorter |class=redir that many others handle without problem. --Redrose64 🌹 (talk) 07:54, 11 June 2018 (UTC)
Yes, I skip anime articles explicitly, and the bot won't touch the other two troublesome templates due to the regular expressions it uses.
A bigger problem concerns the example diff that started this thread. It's from an article in the unprintworthy redirects category. I thought the bot should have gotten to that category already, so I just went into to inspect the logs. Unbelievably, after munching through all of the redirect categories, it has finally gotten stuck on exactly that category (unprintworthy redirects). Apparently Pywikibot really hates one of the titles in it. I'm trying to figure out which title precisely, so I can file a bug report, but for now the bot task is on hold.
However, all of the other redirect categories that alphabetically come before it should only contain articles that the bot checked already. Enterprisey (talk!) 20:11, 18 June 2018 (UTC)

[r] → [ɾ] in IPA for SpanishEdit

A consensus was reached at Help talk:IPA/Spanish#About R to change all instances of r that either occur at the end of a word or precede a consonant (i.e. any symbol except a, e, i, o, or u) to ɾ inside the first parameter of {{IPA-es}}. There currently appear to be about 1,190 articles in need of this change. Could someone help with this task with a bot? Nardog (talk) 19:24, 12 June 2018 (UTC)

Indexing talk pageEdit

User:Legobot has stopped indexing talk pages and archives and User:HBC Archive Indexerbot is deactivated. I would like a replacement for that task. --Tyw7  (🗣️ Talk to me • ✍️ Contributions) 20:06, 12 June 2018 (UTC)

Any these work? Category:Wikipedia_archive_bots -- GreenC 14:31, 17 June 2018 (UTC)
Most of them are archive bots. Looking for index bots to tame over Legobot which has developed a big and doesn't index all talk pages. --Tyw7  (🗣️ Talk to me • ✍️ Contributions) 17:52, 17 June 2018 (UTC)
User:Legobot has 33 tasks not sure which one (Task #15?). Did Legobot say why they stopped or was it abandoned without a word? -- GreenC 20:41, 18 June 2018 (UTC)
it's working on random page and many people had reported it but it hadn't been fixed. See the discussion at User talk:Legobot --Tyw7  (🗣️ Talk to me • ✍️ Contributions) 21:05, 18 June 2018 (UTC)
Looks like Legobot is on vacation until July 7. They should either fix the bugs (if serious) or give permission for someone else to take it over, should anyone wish. -- GreenC 21:17, 18 June 2018 (UTC)
Legobot (talk · contribs) is not on vacation, it is still running (there would be chaos on several fronts if it had stopped completely). It is Legoktm (talk · contribs) that is on vacation, and if you have been following both User talk:Legobot and User talk:Legoktm, you'll know that Legoktm has not been responding to questions concerning Legobot (other than one or two specifics on this page such as #Take over GAN functions from Legobot above) for well over two years. --Redrose64 🌹 (talk) 17:41, 19 June 2018 (UTC)

New York Times archives movedEdit


The new URL can be obtained by following Location: redirects in the headers (in this case two-deep). I bring it up because of the importance of NYT to Wikipedia, uncertainty how long redirects last and the new URL is more informative including the date. -- GreenC 21:46, 13 June 2018 (UTC)

Move WikiProject Articles for creation to below other WikiProject templatesEdit

In Special:Diff/845715301, PRehse moved WikiProject Articles for creation to the bottom and updated the class for WikiProject Video games from "Stub" to "Start". Then, in Special:Diff/845730267, I updated the class for WikiProject Articles for creation, and moved WikiProject Articles for creation back to the top. But then, in Special:Diff/845730984, PRehse decided to move WikiProject Articles for creation to the bottom again. For consistency, we should have a bot move all {{WikiProject Articles for creation}} templates on talk pages to below other WikiProject templates. If the WikiProject templates are within {{WikiProject banner shell}}, then {{WikiProject Articles for creation}} will stay within the shell along with other WikiProject templates. GeoffreyT2000 (talk) 16:44, 14 June 2018 (UTC)

  Needs wider discussion That sounds like a lot of bot edits for questionable benefit. Seek approval at one of the village pumps. Anomie 17:35, 14 June 2018 (UTC)

This change should be fine per Wikipedia:Talk page layout. -- Magioladitis (talk) 18:24, 14 June 2018 (UTC)

If anything, this could be bundled in AWB, assuming it has consensus, so that AWB bots make the change when they do other task. However, this very likely wouldn't get consensus to be done on its own. Headbomb {t · c · p · b} 20:08, 14 June 2018 (UTC)
There are no bots doing tasks in this direction. Unless, we decide that wikiproject tagging bots should also be doing this. Only Yobot ued to do this but right now there is no guideline to ask bot owners to perform this action. So we have two ways: Form a strategy or approve a sole task for this. I would certainly support the task to be done if ther was a discussion held somewhere about this task of similar tasks. -- Magioladitis (talk) 22:54, 14 June 2018 (UTC)
  • I concur this needs wider discussion. Why does the order of the WikiProject banners matter? Primefac (talk) 02:19, 15 June 2018 (UTC)
For instance, we have a lose rule that WikiProjct Biography "comes before any other WikiProject banners". At the moment, I do not see why WikiProject Articles for creation should be at the bottom of all Projects but there is a place to discuss this. If this get support we should then create bots to do it. It's about 60,000 talk pages with this template. -- Magioladitis (talk) 07:31, 15 June 2018 (UTC)
Dedicated WP:TPL bots never had support as far as I recall. Maybe there was one shoving banners into the metabanner after a certain threshold, but that'd be the only one if it ever was a thing. I don't see what'd be different here. Headbomb {t · c · p · b} 13:05, 15 June 2018 (UTC)
There was a bot that was adding WPBS and was doing that task and Yobot was doing it as part of WikiProject tagging. My main questions are: a) whether we have a guarantee that current BAG will continue to accept this as a secondary task and b) is there a need to actully do it as a sole task? WP:TPL bots did not have much luck in the past due to not conrecte rules (which now we have; I decicated a lot of time in this direction) and not built in AWB tools (which now we have since at some point I did some thousands of edits to rename templates to standard names). -- Magioladitis (talk) 14:12, 15 June 2018 (UTC)
BAG cannot guarantee that any specific thing will be accepted by the community. If a task is proposed and there is consensus for it (or at least a lack of objections after a call for comments/trial), it'll be approved. If there is no consensus for the task to be done, it won't be approved. Headbomb {t · c · p · b} 20:19, 18 June 2018 (UTC)

Potentially untagged misspellings reportEdit

Hi! Potentially untagged misspellings (configuration) is a newish database report that lists potentially untagged misspellings. For example, Angolan War of Independance is currently not tagged with {{R from misspelling}} and it should be.

Any and all help evaluating and tagging these potential misspellings is welcome. Once these redirects are appropriately identified and categorized, other database reports such as Linked misspellings (configuration) can then highlight instances where we are currently linking to these misspellings, so that the misspellings can be fixed.

This report has some false positives and the list of misspelling pairs needs a lot of expansion. If you have additional pairs that we should be scanning for or you have other feedback about this report, that is also welcome. --MZMcBride (talk) 02:58, 15 June 2018 (UTC)

Oh boy. Working with proper names variations are often 'correct', usage is context dependent so a bot shouldn't decide. My only suggestion is to skip words that are capitalized. For the rest use something like approximate (fuzzy) matching to identify paired words that are only slightly different due to spelling (experiment with the agrep threshold without creating too many false positives), then use a dictionary to determine if one of the paired words is a real word and the other not. At that point there might a good case for it being a misspelling and not an alternative name. This is one of those problems computers are not good at and is messy. Unless there is an AI solution. -- GreenC 14:23, 17 June 2018 (UTC)

Misplaced braceEdit

In this diff, I replaced } with | As a result, the article went from [[War Memorial Building (Baltimore, Maryland)}War Memorial Building]] to [[War Memorial Building (Baltimore, Maryland)|War Memorial Building]], and the appearance went from [[War Memorial Building (Baltimore, Maryland)}War Memorial Building]] to War Memorial Building. Is a maintenance bot already doing this kind of repair, and if not, could it be added to an existing bot? Nyttend (talk) 13:39, 21 June 2018 (UTC)