Wikipedia talk:Wikipedia Signpost/Single/2014-07-30


Comments edit

The following is an automatically-generated compilation of all talk pages for the Signpost issue dated 2014-07-30. For general Signpost discussion, see Wikipedia talk:Signpost.

Book review: Knowledge or unreality? (10,358 bytes · 💬) edit

"I've never seen anyone wonder why there's no dedicated noticeboard where one goes for help in figuring out whether questionable information in an article is accurate or not." That's what the article and it's talk page are for. If you find a questionable claim, just WP:CHALLENGE it. You can {{cn}} it, delete it, or post on the talk page. Paradoctor (talk) 17:20, 2 August 2014 (UTC)Reply

Of course that's the answer to the question no one's asked. And that works well when an article has a good number of watchers. But hoaxes and bad information are more prevalent on less-watched pages, and a c/n tag or a talkpage hoax there can last unaddressed for months. Newyorkbrad (talk) 17:48, 2 August 2014 (UTC)Reply
He who repeats The Word without The Tag shall be Made Fun Of. The same goes for not following up on threads you start. (I presume you meant "post" rather "hoax".) If there is any lesson to be gained, I'd say it is "Be bold in deleting stuff you find strange" as well as Wizard's First Rule.
Of course, Wikipedia being the pragmatical beast it is, we have to live with reality. IMO, in light of WP:WIP, the best we can do is enabling a souped-up version of the metadata gadget for all users. And maybe the WMF could fork over some grant money to homeless former Britannica employees for quality control-for-hire. Paradoctor (talk) 19:09, 2 August 2014 (UTC)Reply
I thought the comment about the accuracy noticeboard (a Wikipedia 'fact check' perhaps) was a really insightful one. It seems like centralising this process could work well to increase visibility for low-visibility articles, for example, where the talk page might not always work? --hfordsa (talk) 15:39, 3 August 2014 (UTC)Reply
  • Well, from my experience I'd say practically all reference works have hoaxes, or errors so blatant to lead one to suspect they are hoaxes. Harvey Einbinder's The Myth of the Britannica opens with a chapter listing some of the more egregious errors known to have been found in that work that beg to be considered intentional hoaxes, then proceeds to point out the other flaws in the Encyclopedia Britannica. (Have a look at my review for the Signpost for more about Einbinder's book.) Another example might be the article on "Gremlins" in the Funk and Wagnalls Standard Dictionary of Folklore, Legend and Mythology -- although I'd be surprised if anyone mistook that as anything but a joke. And then there's the book I've been using to revise Eponymous archon & provide reliable sources for that article -- Alan E. Samuel's Greek and Roman Chronology, a carefully researched & written book by a tenured academic: over 2 or 3 consecutive pages of this book the word "calendar" is frequently misspelled. Or maybe it is just a sign that the reader has begun to master a subject when she/he starts to catch mistakes in the reliable sources used... -- llywrch (talk) 07:05, 3 August 2014 (UTC)Reply
Taking a hint from software engineering, I'd say expecting stuff made by humans to be perfect is unrealistic. Transport the software figures to long mathematical proofs, and note that they, generally, are only checked by a handful of experts, rather than undergoing formal quality testing.
The question is not whether there are problems with our content, but is the amount of problems acceptable? Paradoctor (talk) 10:53, 3 August 2014 (UTC)Reply
This brings up the old question of how to measure the accuracy of Wikipedia. Of course, it will always be a comparison of the accuracy of two sources, always an A vs. B. Wikipedia vs. Britannica, or perhaps Wikipedia medical articles vs. medical textbooks. Brad's text "how Wikipedia's completeness and fairness and accuracy compare, not only to traditional media sources, but to the other information available on the Internet," suggests to me that the most relevant comparison is Wikipedia vs. the rest of the internet. So for example, we could find say 300 journalists and assign each an article. They would then read the article and compare that to what they learned in an hour on the rest of the internet (TRotI). My guess in many subject areas WP will come out on top. Smallbones(smalltalk) 23:25, 3 August 2014 (UTC)Reply
That depends on what parts of the Internet these journalists are allowed to access. Based on my experience researching various topics, if they are limited to the parts where content is free (as in zero cost of access, & no registration needed) Wikipedia would clearly be the winner. If resources accessible thru the Internet -- such as Nexus-Lexus & JSTOR -- are included, the comparison would be much, much closer; resources like those will always provide better quality coverage of specific topics, although those specific topics are slowly decreasing in number. -- llywrch (talk) 15:28, 4 August 2014 (UTC)Reply
  • Thanks for the informative reviews and commentary. I have a nit to pick, however, with the following statement:
"Instead, people in the wikiless world would still perform the same Google searches that today bring up their subject's Wikipedia article as a top-ranking hit. They would find the same results, minus Wikipedia, ... "
There is a significant omission form the second sentence: they would find the same results minus Wikipedia and its clones and adaptations and web pages which have mindlessly regurgitated its content. Surprisingly—to me at least—this doesn't seem to have much effect on searches on terms referring to broad general subjects. For searches on the terms "Leibniz", "Vera Lynn" and "mind-body problem", to take three I just threw up off the top of my head, the only obviously Wikipedia-influenced results in the first two pages from Google are Wikipedia itself and Google's own knowledge graph.
The results are altogether different, however, if you do a search on terms designed to find sources for dubious factoids which Wikipedia has got wrong. A Google search on the expression "cadamekela | durkeamynarda", for instance, returns 18 pages of results which, apart from one or two now flagging these as a Wikipedia hoax, have simply reproduced Wikipedia text verbatim, or regurgitated it with some form of paraphrase.
This effect is not limited to hoax material, however. More concerning to me is Wikipedia's power to increase enormously the web impact of cranks. The first page returned by a Google search on the expression ""Jafar al-Sadiq" heliocentric" currently contains links to three web pages reproducing some version of the absurd fiction that an 8th-century Islamic scholar, Ja'far al-Sadiq, had proposed a heliocentric model of the solar system. Web pages peddling this nonsense had certainly already existed before the notorious Jagged 85 added it to Wikipedia's article Heliocentrism, but one result of that addition was a rapid massive increase in the number of such pages.
David Wilson (talk · cont) 05:20, 6 August 2014 (UTC)Reply
(@User:David J Wilson) This point is quite correct and extremely important; it is a point I have made on-wiki several times before, and which I didn't stress in this book review only because the review had become quite too long already. Errors, questionable assertions, and unfair characterizations contained in Wikipedia articles almost immediately propagate all over the Internet, and may remain for years on Wikipedia-based mirror and derivative sites for quite awhile even after an error is fixed on Wikipedia itself. For the hypothetical "Wikipedia versus" search, you are right that all these sites would need to be assumed away as well. Conversely, for the real-world, this adds to my view that in prioritizing our goals for Wikipedia, accuracy and BLP compliance need to be consistently emphasized. Thanks for your input (and thanks to everyone else who has posted here as well). Newyorkbrad (talk) 22:25, 7 August 2014 (UTC)Reply
  • I'd clarify the introduction of the concept of ethnography a bit further to say that its a qualitative research tradition of embedding oneself in a culture (say that of WP editors) so as to learn how their culture works and, ultimately, to write about it. That would explain his position or intent a little better for new readers. Usually it's interesting to hear to what extent he embedded himself (did he interview or mainly use historical talk page data? did he have any offline interaction? what kind of consent did he get for his data collection?), and though it was covered, a bit more about why he did it and what he found (ethnographers constantly need to "gain trust"—how did he view that as he took on more permissions within WP?) One of the other common issues in this type of research is relationship with the participants—did he run any of his conclusions past his participants so as to have a discussion about its accuracy? Anyway, some thoughts czar  14:36, 22 November 2014 (UTC)Reply

Discuss this story

Featured content: Skeletons and Skeltons (0 bytes · 💬) edit

Wikipedia talk:Wikipedia Signpost/2014-07-30/Featured content

News and notes: How many more hoaxes will Wikipedia find? (22,855 bytes · 💬) edit

Discuss this story

The fallacy that "vandalism gets reverted immediately" is a long-standing one. It is why I keep a list of "Things that stayed too long" on my user page, as a salutary reminder to us all. There are steps we can take to catch much more of this sort of thing, by appropriate use of technology, but little appetite for doing it, it seems. All the best: Rich Farmbrough05:14, 2 August 2014 (UTC).

Generally speaking I find that obvious vandalism usually gets caught pretty quickly, where someone uses swears or the word "gay". But if it lasts five minutes (and thus drops off the bottom of Recent Changes), then it can last a long time. Likewise, subtle vandalism like the example provided here, which would need someone with a knowledge of the subject to identify. Lankiveil (speak to me) 06:26, 2 August 2014 (UTC).Reply
Wouldn't the problem be mostly solved by focusing on IPs and accounts with only one edit? Do the vandal tools and bots already look for this? A lot (but not all) of sneaky vandals generally make one edit. Viriditas (talk) 09:26, 2 August 2014 (UTC)Reply
I don't think so, unless it was also in a way that reduced the growth of the encyclopedia. IPs create a lot of article content (more than most accounts that I see), and my experience is that those creating convincing hoax entries do so with accounts with several edits. Of course, this can also make it easier to catch them, if you make sure to check all of someone's edits once you catch a single suspect entry. 86.129.14.23 (talk) 15:58, 6 August 2014 (UTC)Reply
Just last night, I came across an article with a section that has been clearly vandalized and only reverted after more than a month and 700+ views. This is just another downside of continuously losing long-term editors for past 6 years. How many of those 700+ views were stumbled upon by an editor who didn't pick up on the vandalism hint? Veteran editors would have spotted it in an instant. This change escaped recent changes patrol, bots and 6 page watchers (it's not a lot of page watchers, but better than a lot of stub articles who have 0 watchers). If you have more editors, they will catch these problems much quicker. Had I not stumbled upon this article and reverted the vandalism, I honestly don't know how long it will stay on the article until someone else picks that up. OhanaUnitedTalk page 20:00, 6 August 2014 (UTC)Reply
What about applying Wikipedia:Pending changes to IP edits and accounts with less than, say 100 or 250 edits (new editors need guidance anyway). They can still edit, but the edits can be systematically reviewed. I don't want to discourage IPs like 86.129.14.23 from contributing; hoaxes generally come from IPs or new accounts.--Milowenthasspoken 20:36, 6 August 2014 (UTC)Reply

I've come across section blanking that stood for almost a year and bogus edits uncaught for even longer but it doesn't help when Jimbo himself feeds the bears by praising the "funny" vandalism. It only encourages more bears to come pawing through our campsite looking for tasty morsels of attention. - Dravecky (talk) 05:45, 2 August 2014 (UTC)Reply

  • The mother of all hoaxes was the whole Siberian Wikipedia based on the non-existent Siberian language. To my best memory, it took nearly a year of a heated shutdown discussion to convince meta.wikimedia bureaucrats that it is a fake. Notably, it truthfully claimed to contain thousands of articles. I accidentally discovered that 95% were year articles and other not even stubs, probably generated by bot, and half of the rest was rabid Muscovitephobia. (Did I forget to mention it was 99% unreferenced?) It was shut down only after the perpetrator declared in some message board post that he got bored with the prank. The perpetrator herded numerous sock- and meat- puppets and was supported by some wiki-Russophobes, as well as by well-meaning "fellow travelers" — enthusiastic conlang fans. I was planning to write a "sign post" about this curiosity, but got distracted and forgot it, and now most of the traces faded away. And you are telling us here about subtle vandalism... I bet there are several thousand of deliberate bullshitting in areas nobody cares. -No.Altenmann >t 09:12, 2 August 2014 (UTC)Reply
Wow, that's one debacle I've never had cause to look at. Ed [talk] [majestic titan] 17:59, 2 August 2014 (UTC)Reply
  • See also the recent Chicken Korma case on Reddit: [1]. I think it is safe to assume that Wikipedia's demographics have a bearing on which hoaxes are likely to last longest (cf. different outcomes of the Bicholim conflict hoax, set in India, vs. the Upper Peninsula War, set in the US). Pending changes would help; the German Wikipedia has had its hoaxes, but generally fewer than the English one. Andreas JN[[Special:Contributions/Jayen466|466]Wikipedia] 10:12, 2 August 2014 (UTC)Reply
  • Earlier this week I stumbled across yet another hoax, which had lasted for more than 7 years. I would be extremely surprised if there were not many many more like this which have yet to be detected.
David Wilson (talk · cont) 10:43, 2 August 2014 (UTC)Reply
  • Wouldn't it be possible to see how many pages (and which ones) have no watchers? I'd think those are the pages most likely to have false information. kosboot (talk) 01:34, 3 August 2014 (UTC)Reply
  • That is correct, I cannot see that page. Can you see how many articles are listed on it?--Milowenthasspoken 02:32, 3 August 2014 (UTC)Reply
  • Well, dusting off my sooper 37173 Admin powers, I looked at that page & found there are ... a lot. It shows the pages in alphabetical order, but only shows the first 2000, ending at 1975–76 Israeli League Cup‏‎. -- llywrch (talk) 06:07, 3 August 2014 (UTC)Reply
    • User:Llywrch, don't you mean it shows the first 200, rather than the first 2,000? Andreas JN466 08:32, 3 August 2014 (UTC)Reply
      • I selected the view by 500 option. I was able to view 4 pages in the category, which allowed me to view a total of 2000 articles. They were arranged in alphabetical order.

        Now I know that sometimes I express myself in a very verbose, rambling & awkward manner, so I'm not surprised at your question. However, it should be clear from my report that this tool is useless. Unless you are interested in protecting articles beginning with the word "1975". IIRC, I'm not the first Admin to point out how this tool is handicapped, so I assume this limit was done for performance reasons. But there is a pressing need for Wikipedians to learn which pages lack watchers -- probably more pressing than some of the software projects the Foundation have taken on. -- llywrch (talk) 17:52, 3 August 2014 (UTC)Reply

        • Thanks, User:Llywrch. If we look at the alphabetical listing in Category:All articles needing cleanup, we find that the category contains 23,725 articles in total, and 1975–76 Israeli League Cup‏‎ would be the 58th article in that list. Extrapolating from this we can guess that the articles up to 1975–76 Israeli League Cup‏‎ in an alphabetical category listing of Wikipedia articles represent about 58/23,725 of the total x, and 58/23,725 = 2000/x. This gives us an estimate of well over 800,000 Wikipedia articles that no one is watching, or more than one in six. I guess we could do the same exercise with some other large categories that contain hundreds of thousands of articles, to see whether these yield similar estimates. Do you agree with the maths? Andreas JN466 23:41, 4 August 2014 (UTC)Reply
          • I'm not 100% comfortable about using that category as a base for extrapolation, but I don't have a better one to suggest. (Many obvious ways to create large numbers of articles are limited to prevent an unsustainable load on the servers.) And it is likely accurate as to the magnitude: the number is doubtlessly somewhere between 100,000 & one million. Until someone comes up with a better way to arrive at the number, it'll do. -- llywrch (talk) 16:36, 5 August 2014 (UTC)Reply
  • One week later... "Welcome to Wikipedia, the free encyclopedia that anyone can edit. Now with 73,296 articles in English!" - Dravecky (talk) 07:09, 4 August 2014 (UTC)Reply
  • The percentage of articles with hoaxes is no doubt small compared to the entire size of the project. From July 2010-September 2011 the WP:UBLPR project reviewed every BLP tagged as unreferenced (I wish I could remember the number; it was over 20,000 articles) - we did find some hoaxes, but very very few percentage-wise.--Milowenthasspoken 12:27, 4 August 2014 (UTC)Reply
  • PROD is hopelessly clogged with crap and should not be the suggested solution to any problem. Any system that fails deadly is, by definition, broken. And who cares anyway? There is no problem to solve here. Graffiti isn't a hoax in the first place, and every single statement in the entire corpus is an act of faith anyway. And that's a good thing. Maury Markowitz (talk) 18:11, 4 August 2014 (UTC)Reply
  • Considering how long it took just to get support to replace the April fools front page "jokes" with misleading but technically correct "jokes," the number of narcissistic editors who think it's funny to mislead people from cultures who don't celebrate April fools day and who can't afford expensive encyclopedic sources, are pretty high. Olive — Preceding unsigned comment added by 93.186.16.244 (talk) 06:29, 4 August 2014 (UTC)Reply
Considering the kind of abuse and trolling that I received on my talk page over briefly blocking the IP that created this hoax and semi-protection of the article, I feel vindicated. Yes, the amount of hoaxes is small compared to the size of the project. But trolls, vandals, and pro se defamation plaintiffs don't understand the meaning of the word compared, as in "compared to what?" I don't know what the ultimate remedy is -- certainly I would not mass-prod poorly-sourced articles -- but we need to take it more seriously. In the meanwhile, brief blocking of IPs and semi-portection are the only tools we have. I'd love to see a bot that could search out for some more obvious vandalism. Bearian (talk) 17:17, 4 August 2014 (UTC)Reply
You mean, like ClueBot NG? Paradoctor (talk) 18:05, 4 August 2014 (UTC)Reply
Bearian, you're putting the cart before the horse. You simply state that such and such a problem exists and it has to be fixed. Really? What is it that needs to be remedied, exactly? What is the scale of the problem, in numbers? Why do we need to take "it" more seriously? You don't propose any sort of metric to any of these obvious questions. Maybe we should start there before jumping to conclusions? Maury Markowitz (talk) 18:17, 4 August 2014 (UTC)Reply
This is a good discussion, but first of all, IMO it is in a wrong place. If the problem is serious it must be discussed in an appropriate WP venue (WT:V? Otherwise it is just keystrokes straight into bit bucket. Now quick answers: what? - (a) false info is bad (b) reputation. in numbers: (a) probably tiny (b) large: lost of noise is generated, high visibility for tiniest of tiny fraction of hoaxes detected by non-wikipedians. Why? - Perfectionism? Why we bother with FA/GA/, etc.? In any case, the first step in an approach to any problem is to decide whether the effort is manageable. Solution: a team of of bots one tags every added unreferenced statement with {{cn}}, another removes all bot-tagged pieces in, say, a week passes, third one watches tag not removed without ref added. Problem: vandals will find a way; eg., adding fake refs. Theory: This is but a special case of the problem of database integrity. Judging by a primitive implementation of whatever DB handles interwiki now, the problem of wikipedia integrity will not be solved any time soon. 'New buzzword: Wikipedia:Integrity (WP:INTERGITY): is anybody willing to jump-start it? Staszek Lem (talk) 19:19, 4 August 2014 (UTC)Reply
  • Superb points Staszek. But let's not forget that the difference between medicine and poison is the dosage. Given plummeting participation, largely due (IMHO and being guilty of exactly the same thing) to rampant perfectionism, I'd hate to burn down the Wiki to save the Wiki. And I strongly believe this is a far more dangerous problem than minor graffiti in an unimportant article. Maury Markowitz (talk) 13:30, 5 August 2014 (UTC)Reply
I've always felt that one way to cut down on vandalism is to allow editing only by registered users. Sure, determined vandals will register, but the point is requiring registration will still reduce the number of vandals. I know the majority disagrees. -- kosboot (talk) 19:41, 4 August 2014 (UTC)Reply
The topic is not vandalism, but hoaxes. The motivation to perpetrate a hoax is higher than to simply add "asshole" into text, so that registering an account is not really an obstacle for a "true" hoaxer. Staszek Lem (talk) 20:01, 4 August 2014 (UTC)Reply
  • If all IP edits, and edits from new users with less than 250 edits (who need guidance even when acting in good faith!), were subject to Pending Changes, quality would be improved, and new hoaxes minimized.--Milowenthasspoken 02:28, 5 August 2014 (UTC)Reply
  • I think this is the sort of thing that is a realistic improvement. However, as someone that sometimes edits from an IP because I'm on my phone or at someone's house, let's put a little effort into being able to link back to your account after you realize your edits will otherwise be anon. I wouldn't want to have to do anything but click a button before seeing that edit. Maury Markowitz (talk) 13:30, 5 August 2014 (UTC)Reply
@Maury Markowitz:, I think the simplest solution is to consider the harm a hoax might do. If the Foundation publicly reminds everyone that as a public carrier, it is not legally responsible for the content of Wikipedia, & announces it will not assist in the legal defense of anyone who adds a hoax that defames someone, that will limit the harm as much as can be done. That is, beyond the usual browsing of articles & dealing with any questionable material. -- llywrch (talk) 16:43, 5 August 2014 (UTC)Reply
Indeed! This seems like the best suggestion among many excellent ones. Maury Markowitz (talk) 16:47, 5 August 2014 (UTC)Reply
  • Should hoax editors be sanctioned? Should those who step forward after the fact be cut some slack? How about those whose hoaxes are long in the past? Reasons for sanctioning: 1) It sends a message that hoaxing Wikipedia is a bad thing, and 2) should anyone who perpetrated a hoax be able to have a "clean" Wiki-reputation? Reasons for not sanctioning for obviously-ex/reformed hoaxters: 1) If the hoaxters are trolls masquerading as reformed hoaxters, it just feeds them, and 2) if they truly are reformed, it would be WP:POINTy to sanction them especially years later. Sometimes forgiveness[2] is the order of the day. My personal take - if the person hasn't been disruptive recently, let the past be the past, but allow any "buried" past to be resurrected if the person becomes disruptive in the future. This doesn't have to be specific and it doesn't have to even name the account or IP address, but it should be clear enough to indicate the scope of the damage. For example, an administrator can step in and say "in 2006, this editor created a hoax which stood for 1-2 years before being corrected. There was some off-Wiki publicity which arguably damaged Wikipedia's reputation in a minor way" rather than "On March 2, 2006 at 13:45:56 UTC, the editor used the IP address 123.45.67.89 to edit Foobar. This edit stood until April 3, 2007 at 15:06:07. It was the subject of discussion at Slashdot. The resulting publicity arguably hurt Wikipedia's reputation." Recommend but don't necessarily require that the person self-disclose the broad outline of any hoaxes he's done in the past and what he's learned since then if he asks for any administrator-assigned user-rights that normally require trust (e.g. template-editor rights) or if he becomes heavily involved in an area of Wikipedia where people look to him for wisdom and advice (e.g. some project-space areas). Very strongly recommend or even require that he make a similar disclosure if he is the subject of a community discussion and the past or the perception that he is hiding the past is likely to be considered relevant by at least some discussion participants. The latter can typically be done as a positive: "In 2006, I perpetrated a hoax on Wikipedia. Since then, I've learned that this was a bad idea because .... I've also learned that .... and I strongly recommend against perpetrating hoaxes on Wikipedia for these reasons and because ...." Where sanctions are warranted against active hoaxters, they should be done pretty much as they are now against any disruptive editor: The more often the disruptive behavior is repeated and/or the more likely it looks like a "disruption-only account" or an editor who is "not here to build an encyclopedia" the more severe the sanction, the more likely it is that this is a one-off thing and that the behavior will not be repeated, the less severe the sanction or warning. davidwr/(talk)/(contribs) 18:42, 5 August 2014 (UTC)Reply

Recent research: Shifting values in the paid content debate; cross-language bot detection (999 bytes · 💬) edit

Discuss this story

It's a little late because I see we have a new edition, but I just discovered this in the Help Desk archives, which I am almost finished with. How far we have come!— Vchimpanzee • talk • contributions • 16:31, 9 August 2014 (UTC)Reply

Traffic report: Doom and gloom vs. the power of Reddit (0 bytes · 💬) edit

Wikipedia talk:Wikipedia Signpost/2014-07-30/Traffic report

Wikimedia in education: Success in Egypt and the Arab world (307 bytes · 💬) edit

Discuss this story

I'm tremendously happy to see this flourish! --Frank Schulenburg (talk) 04:12, 3 August 2014 (UTC)Reply