Archive 5 Archive 8 Archive 9 Archive 10 Archive 11 Archive 12 Archive 15

Anyone willing to analyse some bot pseudocode?

I'm building a research (ie, no edit) bot in C++... since I'm not really that experienced in programing I was wondering if someone would be willing to check my pseudocode?

The basic concept behind the bot is to identify when a particular string of text was added to an article using a binary search method. In theory it could search though the history of a page with 10,000 edits with less then 15 page-requests.

A research program like this will be a helpful tool in tracking down subtle vandals and spammers. So.. I've kinda drifted. Anyone more experienced with OOP languages want to audit my pseudocode? ---J.S (T/C) 23:59, 28 December 2006 (UTC)

Here's the link... User:J.smith/pseudocode. ---J.S (T/C) 00:06, 29 December 2006 (UTC)
I've written a perl interpretation of your pseudocode, but am having trouble understanding precisely the context of that block. How will it be used? Is that the 'main' function? Where is the return value used? -- Jmax- 08:41, 29 December 2006 (UTC)
I'm not certain, but I believe the idea is the user provides the wikipedia page and a string that's in the current version of the article. The function returns the diff of when that string was added. So, I would say that no this wouldn't be main, this would probably be 'search'. Vicarious 09:26, 29 December 2006 (UTC)
Does it recurse? Where should it recurse, if it does? -- Jmax- 09:31, 29 December 2006 (UTC)
No I don't think so, Main would take the user's input, run the search function then either link to or redirect the user to diff page. Vicarious 09:34, 29 December 2006 (UTC)

Here is a perl implementation, less the essential bits (which could easily be added). I'm not entirely sure if the algorithm will even work properly, actually. Something seems off about it. -- Jmax- 10:09, 29 December 2006 (UTC)

Well, it does have limitations. If the text was added in and taken out multiple times it won't necessarily find the -first- time the string was added, but it will find one of the times the string was added. There are a number of elements I haven't designed yet so the code is incomplete. ---J.S (T/C) 17:38, 29 December 2006 (UTC)
The basic idea here is that the user would input the name of the article to search and the string of text they were looking for and then the program would output a link to the first version of the page with that paticular string. ---J.S (T/C) 17:43, 29 December 2006 (UTC)
As was hinted at above, a binary search skips over many alterations. A binary search will find one alteration where the string appeared, but it might not be the first time the string appeared. The bot might look at versions 128, 64, 32, 48, 56, 60, 58, 59 and identify version 59 as having the string while 58 does not. But the string might have been inserted in version 34 and deleted in version 35, as well as several other times. (SEWilco 05:45, 30 December 2006 (UTC))
Yes, but even that can be usefull information when tracking stuff down...
It occurs to me, this might be useful for tracking down an unsigned post on a talk-page when the date is completely unknown. Hmmm... ---J.S (T/C) 05:48, 30 December 2006 (UTC)
It at least can help in many situations. You wanted comments on the method, and now you know some of the limitations. If you really want to find the first insertion of a string you could examine the article-with-history format which is used in data dumps. (SEWilco 16:00, 30 December 2006 (UTC))
That could be done, but a db dumb is quite huge:( Maybe I should chat with the toolserver people on that when they get replication up and running? ---J.S (T/C) 09:35, 31 December 2006 (UTC)
Is the full-with-history available through Export? (SEWilco 15:03, 31 December 2006 (UTC))
Help:Export says the full history for a page is available, but at bottom of page is a note that it has been disabled for performance reasons. If the history was available you'd have a single file where you'd just have to recognize the version header (and a few others such as Talk page) and by remembering the earliest version with the desired text be able to find the version in a single read of one file. At present that's only relevant if you search a mirror with export history enabled. (SEWilco 06:39, 3 January 2007 (UTC))
Although I don't think this is as big of issue as you guys do, I have a relatively elagent solution to the finding the first insertion problem. Run the exact same search again on only the preceding versions. Have it include a case where if it never finds the string it'll let the first search know it found the right one. This method won't work if the string has been absent from most of the versions, but by far the most common reason the original search won't work is it'll find pageblankings and attribute the sentence to the person that reverts it, this solution solves that problem. Vicarious 01:08, 1 January 2007 (UTC)
That's a brilliant solution! I'll certainly include a function for this. ---J.S (T/C) 19:06, 2 January 2007 (UTC)

forced autoarchive

I know there's already autoarchive bots running such as the one archiving this page, but I think a bot that operates a little differently could be effectively used to archive all article talkpages. First off, it would only archive talk pages that are very long, so 3 year old comments on a tiny talk page would be left untouched. When the bot runs across a very long talk page it will archive similarly to current bots, but with a high threshold, for example all sections older than 28 days (rather than the typical 7 days). Also, unlike current bots I'd suggest we make this opt out rather than opt in, although very busy talk pages or talk pages that are manually archived wouldn't be touched anyway because they'd either be short enough or would have no inactive sections. Vicarious 03:56, 1 January 2007 (UTC)

If there is interest in this and someone can code it up and get it approved, I'll volunteer to host it and run it under the EssjayBot banner. Essjay (Talk) 03:58, 1 January 2007 (UTC)
Werdnabot is customizable as to how many days of no replies in a section before it archives - If the only feature you want is to only archive after a certain page length is reached, wouldn't it just be easier to put in a feature request to Werdna, rather than re-inventing the wheel? ShakingSpirittalk 04:04, 1 January 2007 (UTC)
Unless I've missed something, Werdnabot is like the EssjayBots, it's an opt-in. Archiving all article talk pages on Wikipedia would require a bit more than just a new feature; it's going to have several hundred thousand talk pages to parse, it's going to need a lot more efficient code than the current opt-in code. Essjay (Talk) 04:08, 1 January 2007 (UTC)
My apologies, I missed that part. Though that does bring up a new point - wouldn't this cause very unnecessary stress on the servers? Crawling through every single article talk page on wikipedia must be very bandwidth-intensive, even using Special:Export or an API, or some such. It would also create a huge number of new pages - again, putting strain on the database server. Does the small convenience of having a shorter talk page to look though justify this? Maybe I'm playing devil's advocate, but I'm sure this has been debated before and wasn't found to be such a good idea ^_^ ShakingSpirittalk
Well, it'll be a strain on the server that hosts it, parsing all those pages. However, if it's done right, it will only archive pages of a certain length, which should avoid most of the one or two line talk pages that are out there, thus reducing any server load. At this point, with 6,804,491 articles, 60,304,657 pages, and tens of thousands of edits a minute, one little bot archiving pages (and set on a delay, to avoid any problems) is hardly likely to bring the site down. As long as it is given a reasonable delay time on it's editing, it should be fine. The real problem will be getting the community signed on to the idea. Essjay (Talk) 04:28, 1 January 2007 (UTC)
As for bandwidth, I don't think it would be an issue. First off it could run once on a database dump to get the ball rolling, then it could patrol recent changes looking only at "talk:" changes. If it still seems like it could hog bandwidth I can think of many more ways to cut down the number of pages it checks. First off ignore any pages that just had characters removed instead of added. Secondly only check every third page (or so), this operates under the premise that big talk pages get big because they're edited often, so it'll pop up again soon if it's going to need archiving. Thirdly, the bot could store a local hash table of page lengths so rather than loading the page each time it would add (or subtract) the number of characters listed on Special:Recentchanges and could only load the page if it needs archived. This wouldn't be as hard on a bot as it sounds, the storage space would only be a few megs because all it needs is the page's hash and size. Also the computation would be easy, because it would hash, not search for the page so the lookup time is O(1) and the calculations are all real simple. Vicarious 04:34, 1 January 2007 (UTC)
Ok, the archive bots are great and seem to work really well... but forced archiving of one particular style on a project-wide scale? I'd so rather we keep the opt-in system and some active "recruiting" of large talk pages. ---J.S (T/C) 19:04, 2 January 2007 (UTC)

congratulations bot

I suspect this idea isn't even remotely feasible, but I thought I'd suggest it in case I was wrong. A bot that posts a note on a user's talk page when they reach a milestone edit count (1k, 5k, whatever). It'd say congrats and maybe have a time and link for the thousandth edit. Vicarious 05:26, 1 January 2007 (UTC)

Probably not realistically possible. It would be too much of a strain looking through all users' contributions and counting them. We currently have 3,140,639 registered users, and loading Special:Contributions 3,140,639 times would just be a killer, and continuing to do that over time would just be even worse. —Mets501 (talk) 06:37, 1 January 2007 (UTC)
Users could opt-in by posting their count somewhere, or on irc, and the bot could watch the irc RC channel and just count edits. While it would work, is there enough of a point? ST47Talk 19:48, 2 January 2007 (UTC)
It sounds a bit counter productive actually. Except for making the distinction between a new editor and a regular editor there isn't much value in an edit count... and focusing on edit count has negative impact. ---J.S (T/C) 00:04, 3 January 2007 (UTC)
I think you've missed the point a little. This isn't about telling editors their worth, it's a tiny pat on the back. I enjoy seeing the odometer on my car roll over to an even 10,000 even though it has no significance; this was supposed to be similarly cute and lighthearted. Accordingly because it would be difficult it's not worth it. Vicarious 05:33, 3 January 2007 (UTC)
I think that by having a bot to do this, we'd be giving legitimacy to making edit count matter, which many people feel it does not. ^demon[omg plz] 01:24, 5 January 2007 (UTC)
If the database dump settles down to fortnightly, it may be more feasible, albeit time lagged. Interestingly someone can pass one of these milestones several times, if raw database access is not use, because deletion can cause edit count to go down. Rich Farmbrough, 14:42 13 January 2007 (GMT).

MessageBot

I suspect this idea has been thought of before but i don't see its fruit so here goes. When i first discovered the talk page here I couldn't for the life of me understand why wikipedia couldn't have a normal message box interface, even if it need be public. This simply means showing the thread of an exchange on different talkpages. It would save us having to keep an eye for a reply on the page we left a message the day before etc. A bot can simply thread a talk exchange, of course this would require tagging our talk as we reply to a message. This is more a navigational issue but since its not been integrated into the main wiki OS it seems to be left for a bot. I don't know how it would run though. Suggestions? frummer 17:57, 1 January 2007 (UTC)

Why not a TalkBot? User A posts a message on User B's talk page. User B responds on his/her own talk page, and this posting invokes (somehow) the TalkBot (maybe a template in the section, like {{TalkBot}}?). The TalkBot determines that the orginal posting from A isn't on A's talk page, and so (a) copys the section heading on B's talk page to A's talk page as a new section; (b) adds You wrote: and the text of A's posting, to that page, and (c) copies B's response on B's page to A's talk page (all with proper indentation). John Broughton
It gets tricker if A responds on A's talk page and isn't a subscriber to the TalkBot service, but perhaps the bot could insert a hidden comment in the heading of the new section on A's talk page, such as <--- Section serviced by TalkBot --->, and then watch for that textstring in the data stream?
Thats a good clarification. frummer 14:15, 2 January 2007 (UTC)
Here is an idea... Why not have a bot that can automaticly move the conversation to a (new)sub-page and then include the conversation into the talk page. Anyone else who wants the conversation on their talk page can include the conversation as well. A new template can be made to "trigger" the bot. Hmmm ---J.S (T/C) 19:21, 2 January 2007 (UTC)
We have used transcluded subpages in the past, but IIRC it doesn't trigger "new mesages" and it needs a "purge" fr new mesages to show up sometimes. Rich Farmbrough, 14:44 13 January 2007 (GMT).

I was wondering if it would be possible to create a bot that would serve solely to revert the addition of a link to the oregon trail article. once every other week or so a user adds a link for a free game download that we delete off the article. the bot would just have to monitor the External links category, removing the link: http://www.spesw.com/pc/educational_games/games_n_r/oregon_trail_deluxe.html Oregon Trail Deluxe download whenever it appears. Thanks, please let me know on my talk page if this is a possibility that anyone could take up. Thanks again, b_cubed 17:00, 2 January 2007 (UTC)

You might want to see WP:SPAM. Blacklisting the link might be an option...
If it's not, you might want to contact the user who runs User:AntiVandalBot to have that added to the list of things it watches for. ---J.S (T/C) 19:09, 2 January 2007 (UTC)
I've added the link to Shadowbot's spam blacklist. Thanks for the link! Shadow1 (talk) 21:48, 2 January 2007 (UTC)
No, thank you :) b_cubed 21:58, 2 January 2007 (UTC)

May someone please operate this for me? It's already been userpaged, accounted, and flagged. D•a•r•k•nes•s•L•o•r•di•a•n•••CCD••• 22:12, 2 January 2007 (UTC)

Operate it for you? As in execute? -- Jmax- 14:14, 3 January 2007 (UTC)
Yes, you can change the name even, but give me a little credit for creating it before my bot malfunctioned. :( D•a•r•k•nes•s•L•o•r•di•a•n•••CCD••• 00:44, 4 January 2007 (UTC)
Why can't you operate it? -- Jmax- 02:46, 4 January 2007 (UTC)

Children Page Protection Bot

I would like to suggest the creation of a BOT to defend articals for children's show. For some reason these pages appile to vandals and I think somthing needs to help protect them. I'll use an exsample before the Dora the Explorer page was put back under protection it was vandalized alot one time sticks in my mind the most was by a user named Oddanimals who, stated Dora was 47 and had a sex change along with a few other sex related comments, and replaced the word Bannana in Boot's artical with the S curse word. This is not proper to say the least and one of the users I talked to said that the Backyardagains artical is also vandalized alot. Parents, kids, and people ,like me, who just enjoy those shows look it up and this kind of thing should NOT be allowed. Thank You Superx 23:18, 2 January 2007 (UTC)

Bots are already watching those pages... but bots are dumb and can't catch all types of vandalism. ---J.S (T/C) 00:00, 3 January 2007 (UTC)

True but Those BOTs are checking other pages as well. that Vandalizm stuck out like a sore thumb and none of those bots caught it except for one after I fixed it myself and I think that just one BOT who's job it is too check those pages would be better than sevaral others who are checking a bunch of other pages as well. Superx 01:10, 3 January 2007 (UTC)

Wikipedia is not censored alphachimp. 01:15, 3 January 2007 (UTC)

Yes but that would only apily here if the stuff I mentioned ACTULLY HAD SOMETHING TO DO WITH THE SHOW! Curse words and other such stuff is only allowed if it is relavent to the artical and none of that is like that thus making that point you mentioned doesn't apliy in this situation. Superx 12:00, 5 January 2007 (UTC)

Quite right. Rich Farmbrough, 14:52 13 January 2007 (GMT).

Finishing a template migration

Need to migrate all the existing transclusions of {{CopyrightedFreeUse}} to {{PD-release}} per discussion here. BetacommandBot started on this a few weeks ago and then mysteriously quit about 7/8ths of the way through and I haven't been able to get a response from Betacommand since then. Could someone else finish this so that we can finally delete that template. Thanks. Kaldari 01:27, 3 January 2007 (UTC)

Alphachimpbot is on it. alphachimp. 01:35, 3 January 2007 (UTC)
All done. alphachimp. 08:25, 3 January 2007 (UTC)

Popes interwiki

Please add ro interwiki to all popes pages. Just created, Romihaitza 12:31, 3 January 2007 (UTC)

WikiProject France Bot

We need to add the {{WikiProject France}} to all the articles belonging to France and its sub categories. So would be nice if someone could do it for us or tell me how to do it. STTW (talk) 09:45, 4 January 2007 (UTC)

I can do this, please put a list here of the categories and indicate whether subcategories should be included. ST47Talk 11:15, 4 January 2007 (UTC)
4 levels deep, categories with France or French in the name only, 23313 hits, converted to talk, prepending template, skipping if it contains {{WikiProject France ST47Talk 20:22, 5 January 2007 (UTC)

Page-protecting syso-bot

People usually do a good job of protecting the templates on the Main Page; but there have been some that slip through the cracks and the results can be disastrous. I propose a bot that would be given sysop status. I know this is controversial, and there was a big discussion about a similar request at the AFD page awhile back. Such, anyone allowed to know the password must have already been approved for adminship through conventional means, and it should be open-source. It will protect the next day's templates in advance of them being on the Main Page (say, 24 hours) and then unprotect them afterwards. Preferably, it would make sure the pages stay protected until off the Main Page, and even be able to work with the pictures for POTD, but they'd have to be specified in advance, whereas the templates would run on the {{CURRENTDAY}} magic word system. This would be a big help in reducing the possibility of Main Page vandalism (believe me, it happens).--HereToHelp 03:52, 5 January 2007 (UTC)

see Wikipedia:Bots/Requests for approval/ProtectionBot Betacommand (talkcontribsBot) 05:54, 5 January 2007 (UTC)
Oh. I feel stupid now.--HereToHelp 03:30, 6 January 2007 (UTC)

deletion bot

I have the feeling I'm gonna get yelled at for this one, but how about a bot that deletes articles that have a clear concensus on Wikipedia:Articles for deletion. For example, it's quite obvious that Wikipedia:Articles for deletion/Myspacephobia is going to get deleted, but it's currently waiting for an admin to do the work. Yes I know this would mean an admin bot, but that's not without precedent. Also, this bot would ONLY work on articles with a very obvious concensus. As for vandals abusing the bot, I don't think it would be an issue. First off it'd ignore IPs, secondly it'd have a minimum amount of time for voting, and there's too many legitamate voters to contest a bad faith deletion for the bot to touch it. Btw, this bot would also close candidates that are clearly keep as well. Vicarious 07:39, 5 January 2007 (UTC)

Absolutely not. AFD is not a vote, it's a discussion to achieve consensus. A bot will never be in a position to properly determine whether or not consensus is achieved. I'd strongly oppose both the creation and sysop status of such an account. (Coincidentally, from a purely technical angle, such a bot would probably not be difficult to create...) alphachimp 07:45, 5 January 2007 (UTC)
Although I understand your position, I'm not sure I agree with your argument. I agree that a computer couldn't tell who was winning a debate, but it could if both people were arguing the same side. Similarly this bot couldn't decide what consensus was concluded in an opposed discussion, but I don't see why it couldn't take advantage of the fact that everyone is on the same side of the discussion and that the concensus has already been reached. Vicarious 07:58, 5 January 2007 (UTC)
So you're proposing that we break deletion debates down into purely mechanical decisions? There's a clear difference between achieving consensus and simply "counting the votes". Administrators use discretion to evaluate the weight and strength of the arguments presented, making a decision based not only on those facts that they have surmised, but also on the strength of those arguments. It's quite possible to achieve "no consensus" even with an overwhelming "vote" for deletion. alphachimp 08:08, 5 January 2007 (UTC)
But is it possible to achieve no concensus with 10 votes to delete and 0 to keep? Vicarious 08:14, 5 January 2007 (UTC)
Absolutely, because AFD is not a vote. It's possible that the arguments could be entirely baseless, and all of the "votes" placed afterwards could be founded on those arguments. alphachimp 08:16, 5 January 2007 (UTC)
I understand that it's a discussion not a vote, but I would be astonished if that scenario had happened ever, let alone with any frequency. I confess I don't spend a lot of time on WP:AFD but I've spent a little and I find your argument specious. In fact, I think if that were to happen then even the admin that came along to close the debate would likely miss the same fallacy that the other 10 editors had. Vicarious 08:26, 5 January 2007 (UTC)
I'd certainly hope not, but that's a possibility. It's still a lot more comforting to leave such important decisions up to human judgment. alphachimp 08:34, 5 January 2007 (UTC)
What would be nice is an auto AfD relisting bot that relists articles w/ less than say 5 comments on it -- 64.180.84.87 09:41, 5 January 2007 (UTC)

musical artist template

{{Infobox musical artist 2
->
{{Infobox musical artist

86.201.106.176 13:23, 5 January 2007 (UTC)
why?ST47Talk 19:23, 5 January 2007 (UTC)

Images on commons with different name

I do not have any programming skills about running bots. I can handle and run the bot if someone writes the code to replace the image link from the articles, with the existing image on commons with different name. I suppose this type of bot could be useful other than english wikipedia as well in some of the cases. There are many examples of the images could be found in this category. Shyam (T/C) 19:52, 5 January 2007 (UTC)

Ad-stopping bot

In theory, the bot will look through new articles to try and find key phrases like "our products" and "we are a". It then places a template on the page like this:

  AdBot suspects this page of being blatant advertising, otherwise known as spam.

Please check this page conforms to the neutral point-of-view policy before nominating for speedy deletion, deleting or removing this template.

And places it in a relevant category. A human (or other intelligent individual) would then look through the list and nominate any articles that are blatant ads for WP:SPEEDY.

What do you think? --///Jrothwell (talk)/// 13:15, 29 December 2006 (UTC)

Sure, why not? But "suspects that this page contains" rather than "suspects this page of being" might be a little more neutral, as well as more grammatically correct. And it would be an ad-flagging bot, not an ad-stopping bot. (I'm quibbling, I know.) John Broughton | Talk 15:13, 29 December 2006 (UTC)
Sounds good. There might be some changes in implementation (EG. That flag might cause concern), but I've found phrases like those to be dead giveaways to both commercial intent and notorious copyright infringements.
It's a good idea. Other phrases you could search for include "our company", "visit our website/site/home page", and "we provide". You might also want to add "fixing the article" to the list of suggested options. Proto:: 01:51, 31 December 2006 (UTC)
I've altered the template slightly to fit in with everyone's suggestions. Here's the revised template:
  AdBot suspects this article contains blatant advertising, otherwise known as spam.

If the subject of the article complies with the Wikipedia notability guidelines, please fix the article if it doesn't conform to the neutral point-of-view policy. If the subject is not notable, please nominate the article for speedy deletion.

I'm also making a template for user pages of people whose pages have been flagged. Any other thoughts? --///Jrothwell (talk)/// 16:06, 31 December 2006 (UTC)
The user-talk template is at User:Jrothwell/Templates/Adbot-note. Is there anyone who'd be willing to code the bot? --///Jrothwell (talk)/// 17:13, 31 December 2006 (UTC)
Sounds like a great idea for a bot. If you haven't found anyone yet, I'd be willing to code it. Best, Hagerman(talk) 19:10, 31 December 2006 (UTC)
Shouldn't it be "If the article does not assert the notability of the subject, please nominate the article for speeedy deletion. OR "If the subject is not notable, please nominate the article for deletion." Please see WP:CSD#A7. --WikiSlasher 04:23, 1 January 2007 (UTC)
I don't know if this is a good idea. Googling "we are a" gives mostly legitimate pages where the phrase appears in a quotation. I think at the least there should be a human signing off on each flagging. --24.193.107.205 06:10, 2 January 2007 (UTC)

(undent) The issue of false positives is important. Certainly if a large majority of flaggings related to a particular phrase are in fact in error, that phrase shouldn't be used by the bot. But keep in mind that this flagging will only be used for new articles, which are much more likely to be spam then existing ones, so drawing conclusions from your search of existing articles isn't necessarily a good idea.

In any case, the bot should be tested by seeing what happens using a given phrase for (say) the first ten articles it finds. For example, our products looks like a good phrase to use. A google search on that found Enterprise Engineering Center (user who created article has done nothing else), plus several others (in the top 10 results) that were tagged as appearing to be advertisements.

Finally, the bot is only doing flagging. A human has to actually nominate an article for deletion (and it's easy to remove a template). But your comment does raise a point about there being a link to click on to complain about the bot. John Broughton | Talk 15:40, 2 January 2007 (UTC)

It strikes me that the Bayesian approach commonly used to detect e-mail spam could work here as well. All we'd need (besides a simple matter of programming) is a way to train the bot. I suppose, if the bot is watching the RC feed anyway, that deletion of a tagged article could be seen as a confirmation that it was spam, while removal of the tag could be taken as a sign that it was not (until and unless the article is deleted after all). But there would still need to be a manual training interface, if only for the initial training before the bot is started. —Ilmari Karonen (talk) 03:23, 3 January 2007 (UTC)
I like the idea of a Bayesian approach because of the simplicity. However, the bot training would always have to be manual in my opinion. Having the bot treat the deletion of a tagged article as spam will likely result in it learning some behaviors outside of its design scope. For instance, if it tags an article with patent nonsense that happens to trip the filter and that article is removed while the template is still intact, it will start gobbling up patent nonsense like there is no tomorrow. While that's not a bad thing, the template we'd be leaving on the page wouldn't accurately describe what's wrong with the page.
So... either a manual interface would be necessary to make sure that the bot stays on target or we'd need to change the scope of the bot to encompass every kind of problem there is with a new page (spam, attack pages, patent nonsense, etc.) I think either approach would be good, but would anyone care to offer their feedback? Best, Hagerman(talk) 03:31, 7 January 2007 (UTC)

I suggest this template:

  AdBot suspects this article (or parts of this article) are blatant advertising, otherwise known as spam.

If the subject of the article complies with the Wikipedia notability guidelines, please fix the article if it doesn't conform to the neutral point-of-view policy. If the subject is not notable, please nominate the article for speedy deletion.

Cocoaguycontribstalk 03:42, 3 January 2007 (UTC)

Tagging closed FAC nominations

I proposed this to Raul654 on his talk page, but he'd rather not add it to his workload, though he supported using a bot instead.

I'd like a bot to watch the Featured log (for successful noms) and Featured archive (for failed noms) and automatically tag each one with a line that indicates when they were closed (i.e. added to the archive) and the result. That way, it'll be possible to determine from the page itself what happened.

I'm thinking it should add

Promoted ~~~~~

or

Not Promoted ~~~~~

at the bottom of each, in line with WP:FPC. Night Gyr (talk/Oy) 20:59, 5 January 2007 (UTC)

Nice to see someone working on this idea, why not have it archive the page like in the XFD's? That way, justr looking at it tells you if it is done or not. and the summary is at the top. The Placebo Effect 21:01, 5 January 2007 (UTC)
I figured we should have a more consistent FxC style independent of xFD style. Those big boxes and colored backgrounds make sense if the pages are still going to be transcluded along side live debates, like xFD, but it's a lot of excess formatting to add when people are less likely to be confused. Night Gyr (talk/Oy) 21:07, 5 January 2007 (UTC)
Personally,I think we should add a template at the top that says what day the article passed or failed and mention that the debate is closed. The Placebo Effect 21:10, 5 January 2007 (UTC)

Yeah, top or bottom isn't really a big issue for me, and top (immediately below the section head) is probably better for quick reference. FPC uses {{FPCresult}}, so it needn't be a complicated template. Night Gyr (talk/Oy) 21:15, 5 January 2007 (UTC)

I've thought about this before. A bot could run about once a day and do a number of tasks:
  1. Check the promotion and non-promotion logs for any updates from Raul654
  2. Add a note to the candidate sub page indicating promotion or not
  3. Remove the page from WP:FAC if it is still there (this might simplify one of Raul654's mundane tasks)
  4. Update the {{fac}} on the article talk page for non-promotions
    • This could even be done in a way that would make future fac submissions easy, eliminating quite a bit of needless work the other FA clerks currently handle
  5. Possibly verify/update wikiproject assessments on the article talk page for promoted FAs
Has anyone else set up an account to develop a bot along this line yet? Gimmetrow 04:10, 6 January 2007 (UTC)

Would like to have a similar bot do the same (in reverse) for Featured article review; rather than Promoted or Not Promoted, the bot would return Kept or Removed Featured status, based on the Featured article review archive. SandyGeorgia (Talk) 05:53, 7 January 2007 (UTC)

I will volunteer to write a bot that performs these tasks, presuming no one else would like to or has already started. -- Jmax- 09:19, 7 January 2007 (UTC)

I've already started. See Wikipedia:Bots/Requests_for_approval#GimmeBot. Gimmetrow 11:06, 7 January 2007 (UTC)
Let us know if either will require separation of the FAR archives similar to the FAC archives, or if they can be handled as is. SandyGeorgia (Talk) 13:31, 7 January 2007 (UTC)

DRVbot

I need a bot to do some routing maintenance tasks for deletion review. Possible tasks would include:

  • Create daily and monthly log pages
  • Remove headers and archive a closed daily log
  • Add <noinclude> tags to the text after archiving.

Any help with these tasks is appreciated. ~ trialsanderrors 09:07, 7 January 2007 (UTC)

Abandoned Article bot

There is now a project dealing with articles which have not been modified or viewed recently at Wikipedia:WikiProject Abandoned Articles. Would there be any way to generate a bot which might list only articles which haven't been modified since, say, 2005 (or some other really long time, maybe by year), for the use of this project to help find the most overlooked articles? Badbilltucker 20:35, 6 January 2007 (UTC)

See Special:Ancientpages. —Mets501 (talk) 22:23, 6 January 2007 (UTC)
Ancientpages has two problems. First, it only lists the oldest 1000 articles (that is, 1000 articles with the oldest "most recent edit"). That, of course, is plenty to work on for any project. But the second problem is that at least 90% of the 1000 articles are disambiguation pages - not what members of the project are really interested in. An ideal bot would be able to screen out disambiguation pages from its results.
Alternatively, I guess, a special page (database listing) similiar to ancientpages, but excluding disambiguation pages, would suffice. John Broughton | Talk 02:17, 7 January 2007 (UTC)

I'm in the process of importing a database dump and I'll gather these statistics for you. To be clear, you want a list of pages with the oldest most recent edit, and is in the main namespace, and is not a disambiguation page; Correct? -- Jmax- 07:33, 7 January 2007 (UTC)

Correct, we'd like a list; statistics aren't needed. Ideally it would look like Special:Ancientpages, which does exactly what we need, except that it includes disambiguation pages, which we don't want, and turns out (see above) to be quite problematical. Thanks. John Broughton | Talk 20:37, 7 January 2007 (UTC)

replace "often times" and "oftentimes" with "often"

Can someone run a bot to replace all instances of "oftentimes" and "often times", with "often" which is shorter, means exactly the same thing, and doesn't sound so awkward in formal written English. I tried to start doing it manually, but there's too much of it.

You can find the instances via Google:

Obviously, perhaps ignoring all the talk and user namespaces might be a good idea.

Please. Paul Silverman 12:19, 7 January 2007 (UTC)

Hmm, this could be done fairly easily with AWB or pywikipedia, but personally I'm not sure it should be done with a fully automated bot for the same reason that misspellings shouldn't be automatically 'fixed' by a bot - quite often times come up where it would make no sense to change "often times" to "often"...like in that sentence! ^_^ I suggest using AWB and checking each edit yourself before confirming the change. ShakingSpirittalk 22:08, 7 January 2007 (UTC)

UserCheck-Bot

Souldn't we delete the users who did not contributed to Wikipedia for a long period of time?--Jamesshin92 22:32, 7 January 2007 (UTC)

No, as per WP:USERNAME ShakingSpirittalk 22:34, 7 January 2007 (UTC)
No. Visit the Wikipedia:Usurpation page to support the policy. ST47Talk 00:45, 8 January 2007 (UTC)
or Wikipedia:Delete unused username after 90 days ST47Talk 00:45, 8 January 2007 (UTC)

Heading-Bot

There are many headings that to not follow Wikipedia:Manual of Style (headings)in Wikipedia Articles. --Jamesshin92 22:14, 7 January 2007 (UTC)

And...? —Mets501 (talk) 22:38, 7 January 2007 (UTC)
I want to create a program to make sure that the proper "Capitalization" (did I spell it right?) for every heading in each article. It is faster for bots to do that job.--Jamesshin92 22:43, 7 January 2007 (UTC)
For intance, the heading "Train Overview" and "USA" should be changed. The bot would notice and report to the owner. THe owner decides if the heading is valid under Wikipedia:Manual of Style. --Jamesshin92 22:51, 7 January 2007 (UTC)
Also, Headings like "The Dodos" should be altered by this bot by noticing the word "the" or "a" or "an" in the front. However, the alternation must be manually done, since there are exceptions like "The United States of America." Also Heading like "About Dodos," will be ignored although this does not follow Wikipedia:Manual of Style, (it is like bot correcting spellings).--Jamesshin92 22:59, 7 January 2007 (UTC)
"The Dodos" could be a pop group. "The Dodo" may be correctly capitalised, since we capitalise species names in some contexts. Rich Farmbrough, 15:26 13 January 2007 (GMT).
So, to simplifiy. The bot will look through every headings one by one and judge whether this heading follows WP:MSH. The user will make the alternations.
You, might think that this bot is useless since the alternation is done by humans. However, clicking into long articles and scroll down to search for headings is a time waste for humans to do. This bot will do.--Jamesshin92 23:05, 7 January 2007 (UTC)
The job for this bot is straight forward and useful.--Jamesshin92 23:07, 7 January 2007 (UTC)
WP:AWB allows for the generation of a list of non-conformist headings from a database dump. ST47Talk 00:41, 8 January 2007 (UTC)
Capitalization of section headings is a problem, but it's not clear a bot can handle it well - I'd much rather see a popups approach where an editor makes the final decision. Consider this:
      === White House communications ===
or this:
      === U.S. Senator ===
Humans know that "White House" is where the U.S. President lives, and that "Senator" is a title and is capitalized. A bot doesn't. John Broughton | Talk 03:12, 8 January 2007 (UTC)

I agree. --Jamesshin92 04:02, 8 January 2007 (UTC)

However, I still think that we can still spread this idea in different approach. For example, detecting repeated heading and special characters such as (%+^@~ and such).

We also might think of modifying the heading to standard headings such as. "Also see" into "See Also," "Links" into "External Links," and such. Jamesshin92 04:10, 8 January 2007 (UTC)

"See also" and External links" please! Rich Farmbrough, 15:24 13 January 2007 (GMT).
Yes, that could be useful. John Broughton | Talk 17:23, 8 January 2007 (UTC)
AWB does some of this (partly reto-fitted from my list I think) , and I have done some of it with a lagreish reg-ex (see User:SmackBot. I also have a project under development to classify headings as "always change, never change , sometimes change or don't know" which will have intimate knowledge of common capitlised things like CD and DVD. Even so it will be very labour intensive. Rich Farmbrough, 15:24 13 January 2007 (GMT).

Invariable symbols bot

Unit symbols are invariable (unlike abbreviations), but there are nevertheless hundreds if not thousands of articles where "s" has been added improperly. A bot could fix this automatically, as the risk of confusion with correct English is about zero. Specifically:

  • Replace "cms" with "cm" (centimetre)
  • Replace "ins" with "in" (inch) <-- This one needs care (could be e.g. "ins and outs", "sit-ins", etc.)
  • Replace "kms" with "km" (kilometre)
  • Replace "mins" with "min" (minute)
  • Replace "mms" with "mm" (millimetre)
  • Replace "secs" with "s" (second)
  • Replace "yds" with "yd" (yard)

The sought strings should be case sensitive, and the bot should leave instances immediately followed by a period alone (they could be legitimate abbreviations).

Other cases than those listed above probably exist.

Urhixidur 18:55, 8 January 2007 (UTC)

This bot would have to be manually assisted. —Mets501 (talk) 20:24, 8 January 2007 (UTC)
Although it might be prudent to manually assist or at least review, I don't think it's necessary. If it only changes when it's all lower case, not followed by a period, and is preceded by a number, then I think the incidence of incorrect change would be 0. On a side note I'm lukewarm about changing secs to s though. Vicarious 05:11, 9 January 2007 (UTC)

Conversion bot

Would it be possible to create a bot that could automatically convert the U.S. Standard System of Measurment into the European Metric System of Measurment? I think a number of articles on here could benifit from such a bot if we do not already have one. Note that I know nothing about operating a bot, this is simply an idea of mind which I got while working on the article USS Wisconsin. —The preceding unsigned comment was added by TomStar81 (talkcontribs) 06:06, 9 January 2007 (UTC).

If I understand your idea corretly I think its a good one. What i understand is you want to find all standalone imperial measurements and add a parenthesised metric measurement too, and vice versa. ie if you find "1 mile" replace it with "1 mile (1.6km)" (or whatever). The bot would have to be very careful/intelligent, but I think it can be done and is a good idea. - PocklingtonDan 08:16, 9 January 2007 (UTC)
Yeah, thats exactly what I have in mind. It seems to me that us Americans will never convert to the metric system, and the rest of the world is not going to change measurements to accomadate us, so I figure it would be best to build a bot that can automatically handle the conversions rather than pester people to consistently make the conversions themselves. TomStar81 (Talk) 08:23, 9 January 2007 (UTC)
I'm in favor of the idea. I have some ideas on how to make this bot so that it doesn't break anything. I'd be willing to write the psuedocode or review someone else's code on the bot. Vicarious 08:33, 9 January 2007 (UTC)
I really wouldn't go there with a bot. Bobblewik was doing just that, and judging from his block log, I don't think the community really liked that. —Mets501 (talk) 12:02, 9 January 2007 (UTC)
All that fuss seems to be to do with date unlinking, not this idea. If the bot is properly set up with intelligent enough rules this would be genuinely useful, I don't see what objections people could have to this bot??? - PocklingtonDan 12:34, 9 January 2007 (UTC)

(undent) It doesn't seem that controversial. Obviously some test runs, and starting slowly, would be appropriate. In general, I think Wikipedia needs more bots like this - human beings just aren't that consistent (much of which comes from not knowing everything in detail), and bots like this can compensate for that. John Broughton | Talk 15:30, 9 January 2007 (UTC)

Link Normalization

Bot request: Change article text of the form "[[foo bar|foo]] bar", "foo [[foo bar|bar]]", and "[[foo bar|foo]] [[foo bar|bar]]" to "[[foo bar]]". I call this link normalization. The bot should make one pass of the whole database every month or so.

This oddities exist due to disambiguation. The first editor writes "[[foo]] bar", a disambiguator uses a tool to replace "[[foo]]" with "[[foo bar|foo]]", leaving "[[foo bar|foo]] bar". Clearly "[[foo bar]]" is the better form. It makes the link more sensible if it covers both words. The tools could be altered, but there's more than one, they're already complex, and thousands of these links already exits. -- Randall Bart 07:29, 11 January 2007 (UTC)

Considered a waste of resources. I tried much the same thing, see my talk page, archive 5. ST47Talk 11:18, 11 January 2007 (UTC)
AWB can do this on general fixes. I wrote a reg-ex to do it before AWB had the capability, so if there are any omissions, I can share that. Rich Farmbrough, 15:30 13 January 2007 (GMT).

hot bot action for test wiki

Can someone please go over to http://test.wikipedia.org and with a bot populate Category:Really big category with anything, it doesn't matter what. Just dump every page and every image into the category please to test how the category system works when it is pushed to its limit. Testing man 22:53, 4 January 2007 (UTC)

I started, but then noticed that even if I categorized every single page on the wiki, that only comes to around 600, which we have categorys far larger than already. I can't see you'd get a very useful stress-test when the wiki is so small ^_^ ShakingSpirittalk 06:30, 5 January 2007 (UTC)
Then let the bot create ten or twenty thousand new pages containing random junk and add them. NeonMerlin 20:12, 11 January 2007 (UTC)

Why is this necessary? -- Jmax- 06:56, 12 January 2007 (UTC)

Repeated article-link bot

Could a bot that detects and removes repeats of links to other articles be created? For example the article on Lions might have the word "Africa" in the first paragraph which is linked to an article about Africa and then further down the page there is another link where the word "Africa" appears. The bot could detect this and de-link the second occurrence of the word. —The preceding unsigned comment was added by Mutley (talkcontribs) 06:21, 8 January 2007 (UTC).

I believe WP:AWB can detect this, perhaps you can use it? ST47Talk 11:55, 8 January 2007 (UTC)
While I think the concept is a good idea, it's useful to a reader not to have to go way back (up) in an article to find a wikilink. I suggest that if the separation is more than 3000 characters (that's roughly 40 lines, i.e., almost a full page), then the second occurence should not be dewikilinked. (Character count does not include characters inside templates, images, references, or hidden/commented text.) John Broughton | Talk 17:22, 8 January 2007 (UTC)
I agree with John, delink only if it's a large seperation. Also, it'd be useful if it could lookout for links caused by templates too, for example template:main. Vicarious 05:14, 9 January 2007 (UTC)
Also, if the repeated instance is in the "see also" section of the article it should be removed completely (or left alone). Vicarious 09:01, 9 January 2007 (UTC)
Since this is a bot without judgment, I'd strongly recommend "left alone" for any apparently duplicate wikilinks where the second instance is in the "See also" section. As for templates, I also suggest leaving them alone, for the same reason - anything requiring judgment isn't a good thing for a bot to do. The goal here should be to absolutely minimize false postitives (the bot doing something wrong); if that means it misses (say) 20% of the fixes it could make, that's fine, because it will still be fixing 80%. John Broughton | Talk 16:27, 12 January 2007 (UTC)

Hmm. This idea has been kicked out many times. I bleieve it could still work with care and attention. Rich Farmbrough, 15:28 13 January 2007 (GMT).

Punctuation Bot

Hey how about a bot that will put all the commas, periods (all punctuation except semi-colons, in fact) inside quotation marks; it looks quite unprofessional to see articles written with punctuation outside quotations. - Unisgned comment added by User:165.82.156.110

Not quite sure what you are proposing. Perhaps you could provide a sample of correct and incorrect punctuation within the context of quotations? - PocklingtonDan 21:17, 20 December 2006 (UTC)
I think he's referring to punctuation within quotations, which is not really an english "rule", more of a matter of style. See Wikipedia:Manual of Style#Quotation marks. There's no real way to automate this, and there's no real reason to, in my opinion -- Jmax- 21:22, 20 December 2006 (UTC)
Does he mean just the trailing full stop/period? If so, then it should always g outside the closing quotation mark, but he seems to suggest you should never use punctuation within quotation marks, which I don't understand - quotation marks are used to quote somebody. If the words you are quoted would reasonable be punctuated when written in prose, then that punctuation is included, regardless of whether it is in punctuation marks. The following is perfectly valid punctuation, regardless of style:
My friend said to me "The cat sat on the mat, then it bit me. I don't think it likes me".
- PocklingtonDan 21:34, 20 December 2006 (UTC)
And what about a single word in quotation marks, followed by a comma - the comma obviously shouldn't go inside the quotation marks. John Broughton | Talk 01:17, 28 December 2006 (UTC)
Wow. I was taught in school that the trailing punctuation should go inside the quotations. For example, 'That bastard at the movie said "shhh!" Can you belive it?' (even with the assumption that "shhh" wasn't being exclaimed.) Then again, maybe I wasn't paying good enough attention in class. ---J.S (T/C) 00:12, 29 December 2006 (UTC)
Having the punctuation always inside quotation marks is the American style; British English uses punctuation inside if it belongs to the quotation, and otherwise outside. (http://en.wikipedia.org/wiki/British_and_american_english_differences#Punctuation) Therefore I don't think this bot would be a good idea, since it only represents the American usuage. —The preceding unsigned comment was added by CJHung (talkcontribs) 03:16, 1 January 2007 (UTC).
Agreed. Also, I think the British style makes more sense anyways. Builderman 04:09, 14 January 2007 (UTC)

Hello, we are (finishing to) putting in place a new translation project.

There are two things I would greatly appreciate if it was done by a bot.

First, we had to make a small modification of the format of the translation pages which are used for every translation request. The task is : For every page in Category:Translation sub-pages version 1, this kind of change needs to be done.


Second, there are a lot of categories to initialize with a very simple wikicode, 7 for each language and they are 50 of them. All red links of the array on Wikipedia:Translation/*/Lang (except the first column of the array which has a different syntax) should be initialized with the syntax explained on this page.


Let me know if you need any furhter info

Jmfayard 18:46, 6 January 2007 (UTC)

I think I can easily do your first request with AWB but I'm a bit unclear about exactly what you want to have done. If I look at the diff you provided, all you need done is one extra parameter added (|Version2) and the instructions to the end of the page? I made a sample diff what I think should be done. Let me know if that's correct, and I'll get working. About your second request, that's also possible, but that'll require some minor custom programming, and I can do that when I'm done with your first request. Jayden54 23:23, 13 January 2007 (UTC)

Automated (non-assisted) spelling bot

Moved to User talk:PocklingtonDan/Spelling botMets501 (talk) 20:35, 13 January 2007 (UTC)

American television series by decade cleanup

Cleaning up from this category move: Wikipedia:Categories for deletion/Log/2006 December 19#American television series by decade where the meaning of the category was changed, there should be no overlap with Category:Anime by date of first release, because by the English definition no US originated-series that we know of is anime.

I'd like a bot to re-categorize with the following rule: If article in Category:Anime series and in Category:XXXXs American television series then remove from Category:XXXXs American television series and add to Category:Anime of the XXXXs instead. (The latter category includes both films and series.) --GunnarRene 05:37, 3 January 2007 (UTC)

If I wanted to run a bot of my own to do this, which one would be appropriate?--GunnarRene 21:01, 9 January 2007 (UTC)
I'd use WP:AWB. You'd need to do a WP:RFBA and create a bot account, and come up with a script to do it. I'd recommend the find and replace be set to:
Category:(....)s American television series
to
Category:Anime of the $1s
To generate a list, just use Category:Anime series, don't worry about the other, and tell it to skip if no replacement made. I'd do it, but I don't think I'm approved for that outside of WP:CFD, but the procedure is pretty easy. ST47Talk 21:56, 9 January 2007 (UTC)
Thanks for the info. Actually, this IS a CFD cleanup, so I'd say it's within your "jurisdiction". But I'm looking into it now. --GunnarRene 22:26, 13 January 2007 (UTC)
Aw. I don't have Windows.... --GunnarRene 22:26, 13 January 2007 (UTC)
Citation please for this definition given Kappa Mikey. --Damian Yerrick () 08:35, 15 January 2007 (UTC)

Historical data mining

I'm wanting to extract a list of regular text lines from the articles in the millennium, century, decade, and year series (Upper Paleolithic, 10th mil BC, 9th mil BC, 1690s BC, 499 BC, 1 BC, 1st mil AD, 3rd mil, 2066, 2100s, 30th century, and the other 3000 or so articles in between). I've already written Python code that sucessively downloads user defined ranges (and throws out formatting lines), but something happened to the database or something this morning, so I want to change my code to grab (I think it's approximately 20 MB) everything from Special:Export, and save it to disk before performing parsing operations on it. I don't know how to interact with the Export function, so I guess this is less of a bot request and more of a bot help request. Xaxafrad 00:06, 15 January 2007 (UTC)

If you don't get any help here, you might want to try Wikipedia:Village pump (technical). John Broughton | Talk 01:41, 15 January 2007 (UTC)

Shouldn't we have a bot that translates articles from German or maybe, at least, a German version of RamBot that uses German-speaking countries' (such as Austria or Germany) information on towns. Tell me what you think. ''[[User:Kitia|Kitia]] 00:37, 15 January 2007 (UTC)

For a bot to translate German, it would have to use a machine translation. Machine translations are known for their...er...inaccuracy. Humans are far better at translating than machines. PTO 01:35, 15 January 2007 (UTC)
Yes, so you can use a source like Statistik Austria or something I dont know. I just dont want to put a lot aof manual labor into trying to translate articles on little-known (and well known) Austrian towns. ''[[User:Kitia|Kitia]] 01:50, 15 January 2007 (UTC)
If the bot uses free translation software, then a user like you could use the same software. If the bot were to use commercial software, that would be a violation of the licensing terms.
A more serious problem is the matter of responsibility. Wikipedia doesn't get sued (so far) because editors add words to articles; as long as the Wikimedia Foundation responds promptly when notified of problems, it's pretty safe, legally. But running a bot that creates new articles without human review or intervention is asking for trouble - after all, the bot would be authorized (by Wikipedia processes) to do so, which (arguably) makes the Wikimedia Foundation responsible for its edits, not the creator of the bot. (What if an article about a town has a libelous paragraph regarding the mayor of the town?) John Broughton | 16:57, 15 January 2007 (UTC)
I'm just talking as a German/Austrian version of Rambot, that's all. I don't really know much about bots. ''[[User:Kitia|Kitia]] 23:01, 15 January 2007 (UTC)
Rambot is a bot used primarily to add and maintain U.S. county and city articles. User:Rambot/translation says The rambot Article Translation Project aims to translate the rambot portions of the U.S. City and County articles into other languages to be added to other Wikipedia projects and The project is currently in the setup stage. So sure, a German/Austrian equivalent would be uncontroversial, though I'm not sure how easy it would be to set up. (Note that rambot isn't intended to translate entire articles, just rambot-controlled portions of articles.) User:John Broughton 01:43, 16 January 2007 (UTC)

Double redirect Bot?

Request for double redirect bot? Running every 30 seconds? --Parker007 13:07, 15 January 2007 (UTC)

There are already 1,000,000,000 double-redirect fix bots running. I don't think there is a need for another one. If the current ones aren't getting all the double redirects fixed, would be a lot easier to run a clone of someone else's bot than to write a new one to do the same thing - PocklingtonDan 15:21, 15 January 2007 (UTC)
Well then lets clone the bot, because I don't see any bots working on redirects today? --Parker007 15:43, 15 January 2007 (UTC)
The bots, such as mine, which fix double redirects do so from Special:DoubleRedirects. This used to update quite often, but now only updates around once a month. —Mets501 (talk) 23:27, 15 January 2007 (UTC)

Category size bot

Hi. I'm looking for a bot which would update this category summary for WP:ALBUM:

User:Bubba hotep/ALBUM

The figures are updated manually at present (using AWB for the large ones). The format itself was borrowed from Dragons flight's Category tracker so I presume it would be quite easy to do this... if you know what you are doing with bots... which I don't... so any help would be appreciated! Bubba hotep 13:58, 16 January 2007 (UTC)

A Bot for another wiki project?!!

hi.. I'm Glacious. I'm currently an admin on dv.wikipedia. there i want to make a Bot which could do various tasks. But i don't know how to make a Bot. So if any one can help me in my project, please leave me a message on my talk page soon. looking forward for a reply. Thanks... --Glacious 14:22, 16 January 2007 (UTC)

Punctuation and Grammar Bot Request

As I was going over and serching articles yesterday, it shocked me to see how many articles had incorrect spelling, grammar and punctuation. I was thinking about a bot that will put all the punctuation (commas, periods, semi-colons, etc.), grammar (correctly capitallised nouns, etc.) and spelling (commonly known spelling errors) into place on any article it may 'crawl' across. Would appreciate anyone willing to create a bot for me. Many thanks, Extranet (Talk | Contribs) 08:21, 14 January 2007 (UTC)

There is already an outstanding request for something of this type at [1] I think you may find the discussion of the pros and cons, and well as the semantics of implementation useful in defining a narrow task for said bot. Builderman 21:20, 14 January 2007 (UTC)
And if you look two sections above this one, there was discussion of a bot to fix spelling problems.
In general, it's not trivial to write a bot, and they have to be very narrowly focused in order to work well (and, quite frankly, to get approved in the first place.) It's helpful to have a bit of programming background in order to understand what is feasible and what is not. For example, it would be great to have an EditorBot that would recognize all the errors identified by Strunk and White, and fix them - but no such thing exists, or could possibly exist (and if it did, Microsoft would pay a small fortune to incorporate it into their next version of Microsoft Word.) John Broughton | Talk 02:12, 15 January 2007 (UTC)

So you're saying there isn't really a need for a Punctuation and Grammar Bot? If not, why not suggest a few 'really needed' bots because I have always wanted to run a bot throughout my time here at Wikipedia. --Extranet (Talk | Contribs) 03:24, 15 January 2007 (UTC)

No, I'm saying that (a) be specific and (b) understand the limits of what computers can and can't easily do. Again, I'd love a StrunkAndWhiteEditorBot to fix articles, but it ain't going to happen, no matter how hard or long I hope for it. John Broughton | 17:00, 15 January 2007 (UTC)

Thanks for your comments. I will close this request for now but if anyone has a bot that needs a new owner or has a new bot regarding Punctuation or Grammar editing, please leave a message on my talk page. Thanks! --Extranet (Talk | Contribs) 08:25, 17 January 2007 (UTC)

ref bot

I was thinking how convenient it'd be for a bot to turn all the large amounts of web citations we have into the more formal referance tags. More specifically, it would turn this: [www.wikipedia.org] into:

<ref>{{cite web |url=www.wikipedia.org |title=Main Page - Wikipedia, the free encyclopedia |accessdate=2007-01-16}}</ref>

Note, it would assume the date the link was added was the access date, a reasonable assumption. Vicarious 08:10, 17 January 2007 (UTC)

A better example, please. First, it's against policy to cite Wikipedia as a reference/footnote. Second, any internal link (in the body of an article) should be wikified, such as (in this case) Main Page, rather than being a URL. John Broughton | ♫♫ 16:06, 17 January 2007 (UTC)
I think he was just using wikipedia.org as a neutral example.
What the bot would need to do is visit the website, grab the title and then format it into the Cite Web template. Not sure how easy that would be. ---J.S (T/C/WRE) 16:32, 17 January 2007 (UTC)
Yeah, it would have difficulty getting the title from an external website. However, if the citations was something like [www.espn.com ESPN], then it might have a better shot.--NMajdantalk 17:59, 17 January 2007 (UTC)
Fetching the page's title should be easy for a bot. It would visit the external link, then look in the page's source for the part inside the <title></title> Vicarious 21:46, 17 January 2007 (UTC)

I can see such a bot being extremely valuable. A couple of thoughts: It could go to Special:Linksearch and get a list of all external links for a specific domain (e.g., CNN). It should understand that when a URL appears twice in a page, there should not be two identical footnotes (so, the first ref/footnote gets a name="X" argument). It probably should avoid pages that have both a "Notes" and a "References" section. It probably shouldn't footnote a URL if the URL is in the "External links" section. And, obviously, it should put as much information as possible into the "cite" template. John Broughton | ♫♫ 22:47, 17 January 2007 (UTC)

Default sort key

A bot to change the default sort key for all people to {{DEFAULTSORT:''lastname'', ''firstname''}}. Would make sorting categories much easier. Λυδαcιτγ 22:11, 17 January 2007 (UTC)

And also delete piped sorting keys that become redundant after {{DEFAULTSORT}} is changed.Λυδαcιτγ 22:13, 17 January 2007 (UTC)

Data-mining please....

WikiProject Spam would like a list of all stub-templates with external links, with external links pointing to wikipedia.org filtered out. We've seen quite a bit of Google bombing on stub templates and a list like this would be quite helpful doing an initial cleanup. ---J.S (T/C/WRE) 22:26, 13 January 2007 (UTC)

I'll see what I can do for you. ST47Talk 14:51, 14 January 2007 (UTC)
Ok, I have a list, can I make a category such as Category:Stubs suspected of containing spam links, inside noincludes, as that's the best way to get AWB to do this. ST47Talk 16:30, 15 January 2007 (UTC)
If you want, you can just send me a list of the article titles. A temp category works too. Either way. ---J.S (T/C/WRE) 16:50, 15 January 2007 (UTC)
Still running, 600 checked, 3 hits, all of them were false alarms. ST47Talk 21:02, 18 January 2007 (UTC)
None in 600? Thats good news. How long does a search like this take to finish? Are you botting it or are you using AWB? ---J.S (T/C/WRE) 22:28, 18 January 2007 (UTC)

Presidential candidate protection bot

Please have a look at the discussion going on at Talk:Barack Obama#Consensus on IP edits and let us know there if you can think of ways a bot might help. My first thought is a bot that would detect vandalism, revert, then apply semi-protection for a limited period, automatically toggling it off after a defined interval. --HailFire 20:37, 17 January 2007 (UTC)

Applying semi-protection requires administrator rights. No bot to date has been granted such rights, and it's extremely unlikely to be granted except in extraordinarily narrowly defined circumstances.
More generally, the page you cite isn't unique in being vandalized, and so you're really asking for a general-purpose vandal-fighting bot. The problem, of course, is recognizing what is vandalism and what is not. For example, a few days ago I deleted most of a talk section that violated WP:NPA. It would be hard for a bot to recognize that I wasn't vandalizing that page.
Requests for page protection should go to WP:RPP. My sense is that administrators are pretty quick to respond to such requests, both to initiate and remove semi- or full protection, as appropriate. John Broughton | ♫♫ 22:40, 17 January 2007 (UTC)
Agree that equipping a bot with Administrator rights may not be such a wise move. Thinking further, how about a user activated bot that reverts all subsequent IP edits, then shuts itself down at a random time between 5 minutes and one hour? Apart from whether this is a good idea or not, would this type of bot present any significant technical challenges and/or major policy problems for Wikipedia? --HailFire 06:59, 18 January 2007 (UTC)
Well, it would certainly make a difference in an edit war between a registered user (like you) and an anon IP address - just invoke the bot, and don't worry about WP:3RR violations. It might also be rather frustrating to an anonymous IP editor who actually did have something useful to add to the article, added it, say the change take place, and then saw a bot mindlessly revert it (with what edit summary - "Sorry, we didn't tell you, but I'm here to revert all anonymous IP edits, no matter how good or bad"?).
In short, I really think you need to make a case that WP:RPP isn't working well, and that handing such funtionality to regular editors would result in a net benefit. Without your laying out such a case, I doubt that anyone would seriously consider changing the present system. John Broughton | ♫♫ 15:39, 18 January 2007 (UTC)

reference cleanup job

Anyone with AWB and some regex skill, I was thinking a bot could clean up the kind of problem you see at Super Bowl IV#References. Basically, numbered inline citations (usually of the new style nowadays) are mixed with general bulleted references. It produces and awkwardly looking format, and really should be split into inline cited "notes" and more general "references". I don't really have to figure out how to do this properly, but it should be that hard for someone more skilled with AWB than me. --W.marsh 18:18, 18 January 2007 (UTC)

Tracking down sneaky spam

A recent sneaky spammer was caught and community banned. The spammer owned a dozen+ domains and was replacing references with references to his sites, he created a dozen or so articles with the only citation being to his site and he added his address to many External Link sections. Because of the way he obfuscated it took 8 months to notice this activity.

So... I thought up a dream bot that would be able to track this crap down.

  • Step 1: Collect every external link in article space on wikipedia. (I think mediawiki already does that?)
    • Step 1a: Eliminate links on a white-list (nytimes.com, google.com, etc)
    • Step 1b: Eliminate links on the meta blacklist (send them to bl_links.txt)
    • Step 1c: Condense list (combine links to the same subdomain.domain.tld and ++ a counter)
  • Step 2: Check each link
    • Step 2a: If link is dead, add it to deadlist.txt
    • Step 2b: Check to see if the page has advertising (if not, remove it from list)
    • Step 2c: (Check HTML Comment)
    • Step 2d: Recheck deadlist.txt once, send any that are now responding to step 2B and any not to deadlist2.txt
  • Step 3: Analysis
  • Step 4: Report
    • Report some simple stats from step 1 & 2 and the list created by 3a.

Assuming we end up with a list of 1 million links and spend 5 seconds on each one then it will take slightly less then 2 months to finish running the program. I guess that sorta makes my request a little silly. :( Is there any better way to track down this with a bot? ---J.S (T/C/WRE) 22:52, 18 January 2007 (UTC)

I think there is a least one bot working it's way through to approval that addresses the dead link issue. (ShakinBot?) For that bot, I suggested adding a tag directly on the article page immediately after the dead link , so that any editor reading the article is aware of that specific bad link problem. Perhaps a similar approach could be taken with spam:
  • Check the recent changes stream, rather than scanning all 1.5 million articles, and look at each new external link.
  • Don't bother checking the link if it's on a list of acceptable sites (.gov domain, .edu domain, washingtonpost.com, etc.)
  • Otherwise, see if the site has advertising
  • If it does, put a tag on the external link that says "Possible spam". The tag should also have a link to WP:SPAM. A 6 or 12 or 24-hour delay in taggig might be good, in order not to interfere with the RC patrol, with the bot rechecking after the delay to see if the link is still there.
This approach has the advantage of enlisted any editor looking at an article as a spam-fighter, so we don't need to try to enlist a separate group of people to do this very mundane chore.
And, I suppose, the bot might also throw the suspect links into a database where their continued existence (say, a week later) could be checked, and if they are still there, added to a "top 100 possible problems" list that would be reviewed by editors (say, Wikipedia:WikiProject Spam). That would also help address the problem of the spammer returning to delete tags. -- John Broughton | (♫♫) 17:15, 19 January 2007 (UTC)


That would be exceptionaly usefull, but it would also be a huge undertakeing. Incidently, we now have a bot that creates a dairly summary of the IRC feed. ---J.S (T/C/WRE) 08:08, 20 January 2007 (UTC)

Twelwe --> twelve

I noticed that there were a fairly big amount of articles where the spelling "twelwe" was used instead of "twelve". Would anyone fancy to take on this? MoRsE 11:18, 19 January 2007 (UTC)

I only found 9 usages, and have fixed them. —Mets501 (talk) 04:53, 20 January 2007 (UTC)
Thank you MoRsE 07:57, 20 January 2007 (UTC)

Bot Request for the Non-canon Star Trek Wiki

I am an admin at the Non-canon Star Trek Wiki and we are preparing to have our namespace changed to Memory Beta. Unfortunately, a large number of articles have already been moved into that namespace and we have been told that these articles must be moved back so that the namespace can be changed, or we lose all our content.

Because of this, I wondered if anyone had a bot or could create a bot that would be able to move all the articles starting with Memory Beta: to now begin with Non-canon Star Trek Wiki:, as well as delete the redirects between those pages. All of the articles that begin with the namespace can be found here.

Any help would be greatly appreciated, Thanks. --The Doctor 20:43, 20 January 2007 (UTC)

you might look into pywikipeda framework, its on sourceforge. That is what a lot of wikipedia bot frame works are based off Betacommand (talkcontribsBot) 21:46, 20 January 2007 (UTC)

Project banner tasks for Wikipedia:WikiProject hip hop request

I was wondering if it possible for a bot to:

  • Run through all the talk pages of articles linking to this redirect Template:WikiProject Hip Hop, while replacing every instance of {{WikiProject hip hop}} with {{WikiProject hip hop|class=|importance=}}. This template redirect was the original template page but it got moved due to naming concerns (specifically whether "Hip Hop" should be in capitals).
  • Run through Category:Hip hop including all subcategories, and every article listed in all this categories, tag the top of the talk page with {{WikiProject hip hop|class=|importance=}}.

Also a lot of this pages are already tagged so the bot would have to recognize if the page is already tagged properly or not, and not to erase the code if it was rated. Is there any existing bot is able to do this tasks? Thank you in advance. — Tutmosis 21:31, 16 January 2007 (UTC)

can do and will get on this within a few days Betacommand (talkcontribsBot) 05:36, 17 January 2007 (UTC)
Alright, thank you very much Betacommand. Best regards, — Tutmosis 18:40, 18 January 2007 (UTC)
That would be an idea for WP:WPGC. -- Punk Boi 8 06:57, 23 January 2007 (UTC)

WikiProject MMO

I'm back! I'm afraid that I need some help from a bot for WikiProject MMO's banners again. We are starting up an assessment system and I thought a good way to start off would be to take ratings from WikiProject CVG's banners. Most all MMO articles have both {{WP MMOG}} (WikiProject MMO's banner) and {{Cvgproj}} (WikiProject CVG's banner) on their talk pages. I was hoping that a bot could take what WikiProject CVG assessed the article as and copy it (being {{WP MMOG|class=?}}). The articles that would be assessed are in Category:Massively multiplayer online games as well as it's sub-categories. I would prefer that you use this category over the project category as to hopefully get more articles with our banner on it. TIA! Greeves (talk contribs) 02:38, 23 January 2007 (UTC)

Sure, that would not be a problem. Alphachimpbot is on it. alphachimp 05:17, 23 January 2007 (UTC)

"Police Bot"

I would like a bot to tell me off when I edit project namespace. -- Punk Boi 8 06:56, 23 January 2007 (UTC)

Your post is incredibly ironic.... alphachimp 06:59, 23 January 2007 (UTC)
It is... lol... I think... hmm one of the JavaScripters would do you better then a bot. It's theoretical possible to program something like that, but I doubt you could find someone to do it. ---J.S (T/C/WRE) 07:59, 23 January 2007 (UTC)

Please ignore this request - I have explained to the user how to recognize which space they are in based on the page title. --Trödel 15:11, 23 January 2007 (UTC)

subst: nft and fc

The {{fc}} and {{nft}} templates are designed for ease-of-editing, and according to the talk page, they should be subst: when added to the page. At one point, I think a bot had run through them, and for awhile I had manually kept nft under control. Now, it's out of control again (350+ on nft page links and 2000+ on fc page links. I'd like to request a bot to go through and subst: all of the fc and nft occurences. Thanks!! Neier 02:22, 23 January 2007 (UTC)

No problem, getting to it with MetsBot. —Mets501 (talk) 02:24, 23 January 2007 (UTC)
  DoneMets501 (talk) 03:28, 24 January 2007 (UTC)

Update image descriptions each time image starts or stops being used in an article

It would be rather useful if the description page for images included details of when an image starts and stops being used in articles. Often it's quite hard to identify exactly when an image was added or removed (and by whom) from an article, and this bot would enable people to find out very easily by viewing the image discussion page. I suspect a bot may not be the best way to implement this though, as it would probably involve a substantial number of edits in order to achieve this - maybe this functionality would be better implemented within mediawiki itself...? --Rebroad 19:38, 23 January 2007 (UTC)

Definitely. It should be at bugzilla, not a request for a bot. —Mets501 (talk) 03:30, 24 January 2007 (UTC)
Not practical to do with a bot. The only way to catch every case where an image is added or removed is to have the bot inspect every edit to an article or template -- and Wikipedia gets between 100 and 500 edits a minute. --Carnildo 04:53, 24 January 2007 (UTC)

Orphaned fair use bot

I'd like to see a bot that can scan the contents of Category:Fair use images and all its subcategories and identify images that are older than seven days and are not used in an articles and tag them will {{Orphaned fairuse not replaced}} and also notify the uploader with {{Orphaned}}. It would also be nice if the bot could identify FU images that are used outside of article namespace (such as on userpages or templates) and it could log those so an editor could later go and remove them and notify the user. There are hundreds of thousands of images in these categories (31,000+ in the main category itself and the template that places them in the main category is outdated, meaning there should be zero in the main category) and it would be really nice to have a bot to search and tag orphaned FU images and this is now done manually. I believe there used to be a bot that did this but I can't find it now.--NMajdantalk 17:11, 17 January 2007 (UTC)

You can ask Carnildo to expand OrphanBot's contributions to cover orphaned fair use. That would probably be the easiest way. —Mets501 (talk) 12:07, 18 January 2007 (UTC)
On the OrphanBot page, it says the bots that should tag orphaned fair use images are Roomba and Fritzbot. Roomba hasn't had any activity since April 2006 and Fritzbot since November 2006. Anybody know the status of these two bots?--NMajdantalk 15:19, 18 January 2007 (UTC)
See Wikipedia:Bots/Requests_for_approval#BJBot, me and MECU have been working on it for a few days and the code is done. BJTalk 17:42, 25 January 2007 (UTC)
Also, BJBot is not going at the FU in userspace problem just yet but it is planned (the bot is going to use a cross checked list so none of the images will even be used). BJTalk 17:58, 25 January 2007 (UTC)

Changing links when a site moves

I am fairly new to Wikipedia so I have yet to find what to do in this case. There is a large number of links to the FCHD, a database of football (soccer) clubs and statistics. That site has changed its address and I assume the old links will be dead at some point. It is simply a matter of changing the domain from www.fchd.btinternet.co.uk to www.fchd.info, the rest of the URL remains the same. ByteofKnowledge 16:23, 25 January 2007 (UTC)

  DoneMets501 (talk) 17:09, 25 January 2007 (UTC)

Recategorise political parties to child categories

This Discussion has moved to User:Betacommand/Bot Tasks Betacommand (talkcontribsBot) 15:54, 26 January 2007 (UTC)

WikiProjects request

This Discussion has moved to User:Betacommand/Bot Tasks Betacommand (talkcontribsBot) 15:54, 26 January 2007 (UTC)

Interwiki image bot?

I run across a lot of images sourced from the "English Wikipedia" in my admin work on commons. These are tagged on commons as PD but on en they are fair use. As these are being uploaded for other projects (like es.wiki) the en page doesn't get {{NCT}} tagged so en admins are not aware of the bad copy, and Commons admins may well miss the copyvio image for some time. Would it be feasible to make a bot to check uploads on Commons for images sourced to en.wikipedia (or synonyms) and check that the commons image is under the same license as the en source image? ({{fairuse}} on commons redirects to copyvio, the other fair use templates could be too if need arose).--Nilfanion (talk) 18:49, 21 January 2007 (UTC)

I'm not completely clear on what you want a bot to do. Let me try and break it down:
  1. Check each upload on Commons
  2. Check if image exists on en.wikipedia
  3. If it does, check if it has the same license
  4. If not, then do what...?
Something like that? If so, I can create a bot that does that, although I would have to go through the Commons bot approval process, instead of here. Cheers, Jayden54 15:58, 27 January 2007 (UTC)

Redirect-Creation Bot

I'm currently creating redirects for all the permutations of the two-hybrid screening title. It covers E. coli and yeast (specifically known as Saccharomyces cerevisiae) two hybrid screening, but the combinations of words I need to put into titles are; (E. coli/Escherichia coli/bacterial/bacteria/yeast/Saccharomyces cerevisiae/S. cerevosoiae) (two hybrid/two-hybrid) (*nothing*/test/screen/screening/method/analysis). In total, that's 7x2x6 = 84 possible permutations! Even with clever use of the back button and copy ans paste, that's going to be a drawn-out laborious task to do by hand. Could someone please write a bot for redirect making where you enter the variables and then it goes to work? It should probably be operated only by the creator as per requests on the bot's talk page, to prevent abuse. Also, it would need to return an error/fail silently when one of the permutations already exists, rather than overwrite, in case that permutation contained a full seperate artcle or a redirect to another article. --Seans Potato Business 22:41, 18 January 2007 (UTC)

Why is my very important request not recieving any attention? :( --Seans Potato Business 09:41, 27 January 2007 (UTC)
I'm heading off to sleep, but when I wake up I will have a look into it. BJTalk 09:48, 27 January 2007 (UTC)
I can do this with mircosoft excel and AWB :D ST47Talk 12:38, 27 January 2007 (UTC)
Where are we redirecting to? ST47Talk 12:38, 27 January 2007 (UTC)
Redirect to Two-hybrid screening! Thanks! I have Openoffice. Don't know what AWB stands for. Is there some way I can do this redirect trick myself? Ideally, it'd talk to Wikipedia directly, rather than having to copy and paste permuations from a spreadsheet into the search box, click 'create page' and manually create a re-direct... let me know how it works out, 'cause I've created a lot of redirects in my time, sitting thinking what possible ways someone might type a subject in. Thank you!! --Seans Potato Business 05:12, 28 January 2007 (UTC)
AWB is AutowikiBrowser WP:AWB. BJTalk 05:22, 28 January 2007 (UTC)
Thanks! Now, how do I get all the permutations using a spreadsheet and then how do I use these to make redirects with AWB? --Seans Potato Business 05:22, 29 January 2007 (UTC)

WikiProject banners

Could somebody with a bot add {{WikiProject GeorgiaUS}} to the talk pages of all of the articles in Category:Georgia (U.S. state)? Thanks in advance, PTO 20:09, 27 January 2007 (UTC)

including subcategories? ST47Talk 20:28, 27 January 2007 (UTC)
Yes, if that isn't too much of a problem. (Some of the articles may already have banners on them because of other users. Will that be a problem?) PTO 02:51, 28 January 2007 (UTC)
Ok, no problem, I'm running now, check User:STBot's contribs to make sure it is good. ST47Talk 12:18, 28 January 2007 (UTC)
Sigh. Suspended, see my user talk and please take a look at the link there so I know what to do. ST47Talk 17:56, 28 January 2007 (UTC)
Hmm...How did articles on Hurricane Katrina get into the Georgia category? Hurricane Katrina didn't hit Georgia. Anyway, I'll do a sweep of the subcats and look for strange entries such as the one that triggered this. (Damn, this isn't good.) PTO 18:01, 28 January 2007 (UTC)
Somebody added Category:Hurricane Katrina as a subcat of Category:Georgia hurricanes, which probably wasn't too good an idea. It would be a good idea to ignore Category:Companies based in Georgia (U.S. state) to avoid annoying those who maintain those articles. Those two categories are the only categories that have subcats that stray from the original topic. Please accept my sincerest apologies; I should have checked the categories before I made the request. PTO 18:10, 28 January 2007 (UTC)
NP, removed those and I will regernerate a list ST47Talk 18:25, 28 January 2007 (UTC)