User talk:ClueBot Commons/Archives/2011/August

Latest comment: 12 years ago by Σ in topic Recursion

The problem with false positives

No doubt this bot does a good job, overall. However, there is an argument that the false positives that it produces outways the good it does. In the last 24 hours I have had two false positives (and reported them) and have received no apology or explanation from the bot operators. The first edit was this; and I received a daunting warning; now how any properly programmed bot could consider this as vandalism escapes me. Now, I am a person of reasonable firmness but friends of mine who have received warnings from this bot will never edit Wikipedia again. Why do users who have been targetted by this bot get no explanation when they report false positives? Either the programming needs radical improvement or the role of this bot needs reconsideration. The Whispering Wind (talk) 22:35, 31 July 2011 (UTC)

Edits such as the example provided are most likely reverted due to a lack of information in the ANN. All reports are manually reviewed as there are a lot of edits submitted which are clear vandalism. If the edit is in-fact a false positive it gets send to the review interface where it is further reviewed and a classification determined. Once this is complete the report interface gets updated and the data is used to train the ANN. As for the warning - the template is standard for warning users of vandalism, there is also a very clear message that the the bot /may/ have made a mistake and if so the edit should be reported and the warning removed from the talk page. If the user is un-able to read the warning given then the bot cannot do anything to resolve that. - DamianZaremba (talkcontribs) 22:41, 31 July 2011 (UTC)
Sorry; this doesn't really help inexperienced users such as me. I have no knowledge of what an 'ANN' is. What I know is that I now have in my user page history two vandalism warnings that could come back to haunt me. Statements such as "If the user is un-able to read the warning given then the bot cannot do anything to resolve that" are frankly missing the point. Many inexperienced users simply will be too put out to work their way through the verbiage of the warning. Wikipedia's problem is that it is fine for experienced users but newbies get all too readily bitten. The Whispering Wind (talk) 22:56, 31 July 2011 (UTC)
Hello. The warning delivered was a result from the backlog of false positive reports. That has reduced the bot's accuracy at catching vandalism. Know that there was no soul behind the act of giving the warning to you - just remove the warning and redo your edit. We'll take care of the backlog eventually. --Σ talkcontribs 23:00, 31 July 2011 (UTC)
It just gave me a false positive in a sense. I have a non-static IP address and it gave me a warning when my IP switched. Not sure how this should be handled. — Preceding unsigned comment added by 71.255.89.65 (talk) 04:22, 1 August 2011 (UTC)

Giving the ANN score

Is giving the ANN score to vandals a good idea? The more vandals know how Cluebot works, the less likely it is to be effective over time. I realize that probably most vandals have no clue what the ANN score is ( I assume I do, from my reading got the doc, it's a confidence indicator). It's good for editors in the edit summary, but putting it on the talk page of the vandal I'm not so sure about. Just wanted to throw that out there. Not here to create any controversy :) --TimL (talk) 22:36, 31 July 2011 (UTC)

I'm not 100% sure why it in included other than for making it really easy to see how bad the vandalism looks from the bots point of view. As for not including it you can actually get there data from multiple other sources such as the report and review interfaces (and their apis) so for a determined vandal they would have an easy job finding it. As for the actual exposure I don't think their is any real risk - the way ANN works is very complex and being able to tell what type of vandalism it scores lower would be very hard if not impossible to do imo, aside from that the ANN gets updated every so often with new data. - DamianZaremba (talkcontribs) 22:44, 31 July 2011 (UTC)
But the less the vandals know, the better for Wikipedia. Security through obscurity. --Σ talkcontribs 22:48, 31 July 2011 (UTC)
I disagree that obscurity provides any form of security that is worth mentioning! - DamianZaremba (talkcontribs) 22:50, 31 July 2011 (UTC)
security through obscurity --Σ talkcontribs 22:57, 31 July 2011 (UTC)
... is not security. Consider for a moment the hundreds of inputs into the ANN, and the thousands of nodes in the middle layers. Even with access to the inputs, and the weights on the nodes, it would take me hours to show how those inputs map up to form the one output. Giving vandals an oracle is not going to help them much in this situation. -- Cobi(t|c|b) 04:59, 1 August 2011 (UTC)
Hm. "The following is the log entry regarding this warning: Luddan was changed by The Whispering Wind (u)[odp] (t) ANN scored at 0.91773 on 2011-07-31T02:49:58+00:00 " is really confusing. Maybe sth like "all bot action are being logged and reported errors are being reviewed" (but with proper grammar) Bulwersator (talk) 15:53, 1 August 2011 (UTC)

I would have to agree now that the ANN score does not really give vandal any valuable information. Thanks. --TimL (talk) 23:04, 1 August 2011 (UTC)

Do Bots read their talk page?

The instructions regarding how to report a "false positive" are hopelessly complicated. I just reverted something at Kálmán Kubinyi but certainly won't be bothered with whatever other protocol there may be.

Botz away!! Calamitybrook (talk) 16:46, 1 August 2011 (UTC)

Could you explain how looking on the edit history/talk page to get the revert ID and putting it in a box is complicated? There are even pictures. - DamianZaremba (talkcontribs) 17:04, 1 August 2011 (UTC)
Its not the bots that read them, as it is actually reported to the bot owner(s) and bot admin(s). The revision id stuff is likely complicated code, (I don't know myself because I've never made a WikiBot) but it is best NOT (random link to a page not related to what context I was using the word not in) to worry about it. If anything, a lot of WikiBots on Wikipedia use api.php, or at least, a lot of em that I know of. api.php likely also returns each revision id for each revision, but that is just a guess. I do know a lot about computers, but I can still be wrong about some points. (I always like to say that Wikipedia is a very good source for correct information on things, but that doesn't mean it is always right (and I'm excluding false positives anti-vandalism bots sometimes might have; that, and also excluding vandalism itself)) LikeLakers2 (talk) 01:20, 2 August 2011 (UTC),,,,

Fixing archives at Talk:Virginity

Due to a template error at Talk:Virginity, ClueBot III archived its discussions to "Talk:Virginity/Archives 23" and "Talk:Virginity/Archives 24" for nearly three years, until I fixed it up recently. I've tried to change all the on-wiki pages to take this into account, but I'll need some way of making the bot redo the master index (mostly indexing archive 1). Can somebody help me to do this? Hope I haven't broken anything :-) ... Graham87 16:07, 1 August 2011 (UTC)

Cool, it seems to have almost fixed itself. But it really shouldn't index redirects. Graham87 09:23, 2 August 2011 (UTC)
I've done the same thing at Talk:The Troubles and Talk:Windows Live. Hopefully neither of these cases will need any more intervention. Graham87 10:19, 2 August 2011 (UTC)

ClueBot NG Elsewhere

Do clones of CBNG exist on another language? --Σ talkcontribs 22:59, 2 August 2011 (UTC)

None that we know of - there was some interested but the amount of work to get it to a point where it would pass the BRFA kinda put people off. - DamianZaremba (talkcontribs) 23:03, 2 August 2011 (UTC)
On plwiki vandalism rate is waaaay lower, moreover we enabled PC. So ClueBot will only result in false positives. Less than 0,1% of edits are vandalism and anyway - edits by untrusted users are reviewed Bulwersator (talk) 07:58, 3 August 2011 (UTC)

Vandalism reversion rate vs false positives

Maybe there is graph of vandalism reversion rate vs false positives? BRFA contains only outdated dead link Bulwersator (talk) 08:00, 3 August 2011 (UTC)

Not that I am aware of, the link was to one used during CBNGs trial. - Rich(MTCD)T|C|E-Mail 11:49, 4 August 2011 (UTC)

Sanjay Chandra

Please block the page of Sanjay Chandra as there are a few people who keep deleting external references and adding lines like "Sanjay Chandra is often regarded as a visionary among the young industrialists from India. He is credited with bringing about some fundamental changes in the business strategy of the Unitech. At the time of his joining in 2001, Unitech Ltd. was a real estate player with a lot of unutilized potential. Sanjay Chandra’s policies and programmes catapulted it into the big league within the short span of a few years. Currently, Unitech Ltd is the second-largest real estate player in India after DLF Limited. The promoters of the Company, i.e., the Chandras, hold 48.57 percent stake in it. Boldness of vision and action has been the hallmark of his career" turning in into a free Public relations site. Also the refernces are that are being give are from sites that let you upload press release http://www.free-press-release.com/news-sanjay-chandra-the-managing-director-of-unitech-group-the-visionary-among-the-young-industrialists-from-india-1312375558.html .... this has been uploaded on 3rd august and the same day it was linked to the wikipage. 121.245.131.152 (talk) 10:56, 4 August 2011 (UTC)

ClueBot NG is not an administrator or human and therefore cannot protect a page. You may wish to take your request to Request for Page Protection. - Rich(MTCD)T|C|E-Mail 11:50, 4 August 2011 (UTC)

Reverts

Your bot is reverting usefull edits. 97.115.60.76 (talk) 01:18, 6 August 2011 (UTC)

Report the false positive then. The instructions are on your talk page. --Σ talkcontribs 01:40, 6 August 2011 (UTC)

Thank you...

Thank you very much for catching the vandalism of Mechanical advantage. Prof McCarthy (talk) 04:58, 6 August 2011 (UTC)

Portals

Is it possible for the original ClueBot be enabled in the portal namespace? Nobody watches portals, and vandalism on one can remain for years or more. If the information on its original BRFA is correct, then simple heuristics may be an effective defence against portal vandalism. --Σ talkcontribs 07:10, 6 August 2011 (UTC)

ClueBot NG Scores for more of/ all the English Wikipedia

Hi, I already  posted an request for the logs of CluebotNG containing the so far calculated scores, which you kindly provided me with.

It is used for my research in the RENDER project, conducted together with Wikimedia Germany (read more about it on the RENDER project page on Meta).

As I tested some vandalism indicator scores, including WikiTrust, CluebotNG's seems to work the best. The problem is that it's not available for most edits in the dumps.

As I don't know how to set it up (as you say this is very complicated) my question would be if you could let Cluebot NG run over a larger amount of edits to generate scores for them and make those logs accessible for research purposes (not only for us, but also future researchers). Best would of course be all the English Wikipedia, but if that is not possible, calculating it from a certain point in time onwards or for a specified set of articles would do as well. (If computing resources are an issue, we could help with this). I'm just asking you this because it's unfeasible for me to set it up myself. Maybe you are interested in helping out, would be great. Best, --Fabian Flöck (talk) 13:47, 5 August 2011 (UTC)

You could just download, compile then run the bot - the edits for training etc can be downloaded via the scripts in SVN. It wouldn't be possible for us to do it due to how the bot works, you would need to replace the interface that listens on the RC relay to pull the data from another source such as the toolserver databases. - DamianZaremba (talkcontribs) 22:43, 6 August 2011 (UTC)

Shpongle

Excuse me, but I didn't edit Shpongle. — Preceding unsigned comment added by 64.125.236.10 (talk) 13:26, 15 August 2011 (UTC)

If you didn't edit the article it looks as if you've got a warning message that was meant for someone else then. If that's the case please accept an apology on behalf of ClueBot.--5 albert square (talk) 19:47, 15 August 2011 (UTC)

You made a big mistake

What's your problem? I did not Vandalize anything! You are by erasing traitor when Sentinel betrayed his allies over Cybertron. SHAME ON YOU!96.240.94.62 (talk) 23:28, 16 August 2011 (UTC)

Hi IP.
OK first things first, ClueBot is not human and therefore can't respond to messages left on the talk page. ClueBot is in fact a robot designed to help combat vandalism on Wikipedia. As ClueBot is a robot it has no knowledge whatsoever about the article subject betraying allies. I am guessing that in your edit there was something that set off one of ClueBot's filters and that is why ClueBot reverted the edit as possible vandalism.
However, you might be interested to know that after you re-inserted your edit, it was then removed by a human editor.--5 albert square (talk) 23:50, 16 August 2011 (UTC)

light-hearted, but pointed, response to a very irritating revert to a minor page about a minor band, The Vandals, that was designed to sharpen it all up, encyclopedia-ise it (hey, sounds like a Dalek-type command), etc. May contain comments worth considering by a non-bot humanoid type thing in a quiet, relaxed moment; definitely contains nuts. Of course it could be ignored, because life's like that (is this microphone on?, oh, damn, I forgot to use capitals!!!!)

Oh, this one I've been breeding for a while. OK - bots I like. They tidy up and make Wikipedia nice and shiny in places (though they never quite get into those corner inhabited by nasty denizens), and one day, hopefully before I snuff it, they'll help clean up while being cute running round the floor. In the meantime - ClueBot NG (who looks in my head to be a bit tattoed and has trouble walking in a non-simian way at times) says it ""produces very few false positives" but seems to, on occasion, happily just wander about picking up loads of clearly relevant stuff and reverting it BECAUSE IT CAN, OH YES, MATEY. And WITH A WARNING, like that makes it more acceptable or, more correct, right. Now I'm all for auto-cleanups, but something that looks like it just picks up major rewrites for a good reason and nukes them at random, instead of a nice little letter saying 'hey, I'm just doing a job here, and I've come across something that looks a little like you've been walking your dog on someone else's lawn, not like I'm accusing you of trespass or anything, hut can we just have a teensy look back and either get the pooper scooper out or explain that actually the fertiliser is lovely for the grass and thank you very much'. And I'd like a bit of interaction before a complete wipeout of correct information, so it can be discussed in a nice, cup of tea type, human way (oh dear - there's the problem). Any response - nope I doubt it. Sometimes notacluebot needs a re-wire. Any chance we could re-wire it to clean my garage and sort the books out? :-)) Brieflysentient (talk) 12:58, 17 August 2011 (UTC)

The bot /does/ produce very few false positives. If you take the time to look at how much data it processes on a daily basis and the thousands of human reviewed edits that the ANN is checked against then you would probably see that. The bot states it is possible vandalism, vandalism comes in many forms. The bot then CLEARLY states that the revert MAY be a false positives and TELLS you to revert the edit and REPORT it if you believe it is. Reports are human verified then human reviewed and used to TRAIN the ANN. If you want the bot to improve then help the bot to improved. Do you have any contribution to improving the bot in any way? no. Please only use my time for positive contributions rather than just going on about something totally un-related. - DamianZaremba (talkcontribs) 13:12, 17 August 2011 (UTC)

Apologies - I had already reported it the correct way, just my warped sense of humour got the best of me. Of course in a perfect world we wouldn't need to waste our time correcting stuff anyway, as it would be done to a t the first time, and I do understand how useful these bots can be. Sorry for wasting your time and probably irritating you. Brieflysentient (talk) 19:08, 17 August 2011 (UTC)

Archiving Question

Hello could someone confirm I have setup the archive template correctly at Talk:Billboard_Hot_100_50th_Anniversary_Charts? I believe the syntax is correct, but the bot doesn't seem to have done anything yet. ~ Don4of4 [Talk] 21:06, 19 August 2011 (UTC)

The bot will archive it soon. What worries me is that it hasn't done anything for four hours yet. --Σ talkcontribs 23:45, 19 August 2011 (UTC)

ClueBot or MiszaBot archiving?

Both are almost equal in how well they archive. MiszaBot's "how old before archiving" param is easier to read without having to know how to set up that same param. I can't exactly think of something that ClueBot has an advantage over MiszaBot with. Perhaps its more well known due to ClueBot NG? But what I am basically asking here: Do you prefer ClueBot archiving or MiszaBot archiving? For me, I prefer MiszaBot archiving. (not trying to advertise here; remove this message if it doesn't exactly belong here) LikeLakers2 (talk) 02:06, 20 August 2011 (UTC)

I clearly prefer ClueBot III. It has better support for different archive parameters and has advanced features that MiszaBot does not have. ClueBot III also fixes backlinks that it breaks. Furthermore, ClueBot III will keep track of archive indexes for you. And ClueBot III's format syntax is more advanced than MiszaBot's. It also supports things like tags to tell it to archive the discussion on next run. The list goes on. -- Cobi(t|c|b) 02:24, 20 August 2011 (UTC)
Of course you prefer it, you created it. I guess we could say that MiszaBot is easier to set up, (is for beginners) while ClueBot is more advanced. (has more options) LikeLakers2 (talk) 02:40, 20 August 2011 (UTC)
Yeah, you asked which I preferred, though. -- Cobi(t|c|b) 03:30, 20 August 2011 (UTC)
MiszaBot still hasn't died once, as far as I can remember (but note that I haven't been here very long). --Σ talkcontribs 03:34, 20 August 2011 (UTC)

Cluebot III is not working?!

Could someone confirm it's working? I do not believe it is. It only has a handful of edits recently and my page still is waiting for it to come by... ~ Don4of4 [Talk] 03:17, 20 August 2011 (UTC)

It seems to have stopped archiving User_talk:Citation_bot - it's not edited that page for many weeks. Could the parameters have changed at some point? Martin (Smith609 – Talk) 03:46, 20 August 2011 (UTC)
I've let Rich know. The bot should be back up soon. --Σ talkcontribs 03:49, 20 August 2011 (UTC)

Why no bots pick this one up?

[1] seems like something that would normally have been picked up by a bot of one kind or another; I'm just interested why it wasn't. Not complaining or anything, my interest is purely academic! Egg Centric 20:32, 21 August 2011 (UTC)

ClueBot specs?

(I talk on talk pages way too much...lol) Just some questions I wanted to ask:

  • What are the specs of the servers that the ClueBots run on? If you wish, I don't mind if you give separate specs for each.
  • Which ClueBot do you think is your best creation, other than ClueBot NG?
  • If ClueBot NG were to have human characteristics, what would each characteristic be? (i.e. What would height, weight be; what religion would he have, if any; etc.)

No need to answer them all. In fact, no need to answer any of them if you didn't want to. They were just some random questions I thought of. LikeLakers2 (talk) 16:01, 21 August 2011 (UTC)

I can answer the first one, ClueBot NG run on a Linode based in Newark, NJ which has 512MB of RAM and ClueBot III runs on a dedicated server from OVH in Roubaix, France which has a 1.20GHz Celeron (Atom) processor and 2 GB of RAM. I'm thinking about porting CBNG over to the dedicated server as well, but that will take some doing (as the database is HUGE). - Rich(MTCD)T|C|E-Mail 16:21, 21 August 2011 (UTC)
Well, I question why the anti-vandalism bot has less RAM than the archiving bot. But thanks for telling me! LikeLakers2 (talk) 16:47, 21 August 2011 (UTC)
It's becasue ClueBot III used to run on the same sort of system... and that moving CBNG over would take AGES! - Rich(MTCD)T|C|E-Mail 16:49, 21 August 2011 (UTC)
Why not just run the SQL server from a separate server? That way you can upgrade and/or move CBNG to a newer system without needing to move the DB. Assuming MySQL is what database thing is used, which is likely right. LikeLakers2 (talk) 17:13, 21 August 2011 (UTC)
It makes more sense to keep the databases local to the bot due to latency and the lack of redundancy. - DamianZaremba (talkcontribs) 00:08, 22 August 2011 (UTC)

Why no bots pick this one up?

[2] seems like something that would normally have been picked up by a bot of one kind or another; I'm just interested why it wasn't. Not complaining or anything, my interest is purely academic! Egg Centric 20:32, 21 August 2011 (UTC)

Cluebot III is bugging on all of the indexes!

Cluebox III is messing up all of the index's! See my talk page for an example. Don4of4 [Talk] 23:48, 21 August 2011 (UTC)

Set the index parameter to no. Alternatively, you can use the almighty built-in archive box. --Σ talkcontribs 23:58, 21 August 2011 (UTC)
That's what happens when you use underscores instead of spaces in the namespace in the template. I've fixed it for you, and CB3 will fix it next run. -- Cobi(t|c|b) 00:59, 22 August 2011 (UTC)

Um

Shouldn't this be protected so only users with admins and above can edit it? People keep changing it to False. 69.228.93.236 (talk) 21:40, 28 August 2011 (UTC)

How so? - DamianZaremba (talkcontribs) 21:43, 28 August 2011 (UTC)
By changing its protection log to Edit=Admin(indefinite)Move=Admin(indefinite)69.228.93.236 (talk) 21:46, 28 August 2011 (UTC)
My question (based on the assumption that you are talking about the run page) is how are people changing it to false, the last person to change it was me and I was testing something. - DamianZaremba (talkcontribs) 21:55, 28 August 2011 (UTC)

listen bot talk back to me or else — Preceding unsigned comment added by Sabrina1908 (talkcontribs) 12:25, 29 August 2011 (UTC)

Hi Sabrina
The bot can't talk back to you because it's a Wikipedia robot and therefore not human. However, if you're wondering why ClueBot reverted this edit then it's because you removed the entire article and replaced it with something advertising your own website. Please read WP:PROMOTION and abide by it, thanks.--5 albert square (talk) 21:19, 29 August 2011 (UTC)

More info at User:ClueBot_NG/FalsePositives please

Are false positives only supposed to be reported by the person claiming not to be a vandal? If so, please state that. If not, I suggest you change the text "From your talk page" (here) to something like "From your talk page (if you were the user whose edit was reverted)". (Maybe you think that's obvious, but it took me several seconds to realize that's what you meant.) Also, are all reports useful no matter how old? If not, please state the range (e.g. "False positives reverted by ClueBot_NG in the last 6 months"). Thanks. (N.B. My interest comes from noticing your revert ID 100355 in response to this, an edit which might not be correct, but is surely not vandalism.) Open4D (talk) 14:59, 29 August 2011 (UTC)

It doesn't matter how old the edit was, it still helpful in the false positive case :) - Rich(MTCD)T|C|E-Mail 13:49, 30 August 2011 (UTC)

Hello

I removed the edit of the previous editor because he edited some players playing for SC Vaslui, because of some rumours started by the Romanian press, which are completely untrue. Also, the club can't perform any transfer until 10 September, because of a tranfer ban received last year. I can't show you any links because, they are all in Romanian, and I don't think you`ll understand. Besides, the table with the players with national caps, is useless because it is already a table presenting the same information, in the page "FC Vaslui players". And the part with all the foreign players, is also useless because many of them played only a single match, or none, and they represent nothing for this team. I hope you see this, and you will understand my last correction. I tried to talk with the anonymous user, and tell him to stop editing premature information, about Vaslui's transfers, but he still continues to edit the same information, over and over. Please excuse my English, if I made mistakes. Thanks! Alexynho (talk) 00:01, 31 August 2011 (EET) — Preceding unsigned comment added by 79.112.231.152 (talk)

Later Edit

I'm sorry, I just realised that I wasn't logged on when I sent you the message. Alexynho (talk) 00:13, 31 August 2011 (EET)

Recursion

Recursion tends to get hit with recursive links to itself; it's cute the first 88 times or so, but one grows tired of the joke. Since it's a single article, the edit filter isn't a good choice for automated patrol: would Cluebot be able to deal with this, or is there another bot that might be a better candidate to remove recursive links to recursion? Acroterion (talk) 03:43, 31 August 2011 (UTC)

User:ClueBot NG/AngryOptin. --Σ talkcontribs 05:41, 31 August 2011 (UTC)

report for Vellore

The link for prakaram has been reverted in the page stating vandalism, but it is not actually. The term is more commonly used for circular path around a sanctum - modern term references restrict to Hindu temples S Sriram 12:16, 31 August 2011 (UTC)ssriram_mt