Wikipedia talk:Wikipedia Signpost/2015-11-25/Op-ed

Discuss this story

How old was someone, knowing that he was born in 1821 and died in 1881? Maybe 1881-1821=60 years old. But born 1821-01-01, died 1881-12-31 gives 61 years old, while born 1821-12-31, died 1881-01-01 gives 59 years old. But there are countries where the birth of a child is her first anniversary. But there are lunar years. And what remains is something between 58 and 63 years old. When someone is reported as 1821--1881, this is even worse. And therefore, the question is not about what is written in the database, but about the confidence we can give to the way the data were collected to build the database. E.g. what says Wikidata about the death of Kim Hong-do ? Pldx1 (talk) 08:25, 30 November 2015 (UTC)Reply

“doctors who graduated before they turned 20” – How would this query look like?--Kopiersperre (talk) 15:41, 30 November 2015 (UTC)Reply

  • What do you mean by "and that Western culture is essentially inherited"? 4nn1l2 (talk) 16:45, 30 November 2015 (UTC)Reply
    • My guess is, it's an assertion that other cultures are new or have a new essence. This would be appropriately, pretentiously, silly. Jim.henderson (talk) 22:40, 30 November 2015 (UTC)Reply
  • "For this reason among many others, in 2012 the Wikimedia Foundation created Wikidata": I don't really want to be that guy, but this is false. We either write Wikimedia Deutschland or "the Wikimedia movement". Aubrey (talk) 17:47, 30 November 2015 (UTC)Reply
    • Probably is not the best sentence. Post was originally written for a non-wiki audience and was an intend of storifying the message. I do agree with you.--Kippelboy (talk) 09:56, 7 December 2015 (UTC)Reply

Wikimedia-l discussion, Slate article edit

There is an ongoing discussion about Wikidata's quality issues and their wider implications on the Wikimedia-l mailing list: http://www.gossamer-threads.com/lists/wiki/foundation/654001

A key fact here is that at present, only about 20% of Wikidata content is referenced to a reliable source. About half is unreferenced, and about a third is only referenced to a Wikipedia. [1]

For wider context, see yesterday's article in Slate exploring the links between Wikidata and Google's Knowledge Graph: "Why Does Google Say Jerusalem Is the Capital of Israel?" Andreas JN466 15:54, 1 December 2015 (UTC)Reply

To be fair, regarding the 20% number: let's take a random Featured Article in Wikipedia. Such as Emma Goldman. Looking at the four paragraph intro, it contains tons of information, but only 3 of the claims made in the intro have references. Her founding an anarchist journal? No reference. Her being sentenced to 22 years in prison? No reference. Her date of birth? No reference. There are much more than 15 claims in the intro, but only 3 references. So the 20% of facts in Wikidata having a reference could also be interpreted as a much higher number than what Wikipedia offers. Much more than half of all claims in Wikipedia are without reference, probably much more than 90%. Now, obviously, this is no reason to say all is rosy for Wikidata, because Wikipedia is even worse - but I am questioning whether the metric, as presented here, is very valuable. --denny vrandečić (talk) 22:25, 2 December 2015 (UTC)Reply
Denny, that's based on a lack of familiarity with citing conventions for article leads. See WP:CITELEAD; it is longstanding practice to use citations sparingly in the lead paragraphs. The lead is intended to summarise the article content; it should not contain anything that isn't covered, and sourced, in the article body. That is where the sources for those statements are found. Andreas JN466 08:53, 3 December 2015 (UTC)Reply
You are right, I was unfamiliar with that citing convention (and I like the convention a lot). Of the three claims that I mentioned two have indeed references later (the founding of the magazine and the prison sentence) and one does not (the date of birth). But many claims in the body of the article remain without reference - her list of publications, for example. Or if you take the first paragraph of the article body, it has two references but many more claims (although it is admittedly hard to discern what exactly a reference contains).
I do not say that each of these have to have references. That would make it so much harder to read, and some claims are just obvious. In Wikidata there are claims like "the first name of Emma Goldberg is Emma", which, I mean, does it really need a reference? Or "Living my Life was written by Emma Goldman". Again, does this really need a reference?
What I want to say is - the percentages you mention are hard to interpret. What would be a good number? Is it really captured in a simple number? What is the comparison coming from Wikipedia? A lot of the referencing and citation rules on Wikidata still need to mature. What is a good reference? What needs to be referenced, and what not? Etc. Wikidata is still a young project, and it needs to find its rules. Wikipedia's citation rules were not as developed in 2004 as they are today, and Wikidata needs the time and the opportunity to find the correct set of rules as well. And every Wikipedian is invited to help at Wikidata.
Does this make any more sense? --denny vrandečić (talk) 17:46, 3 December 2015 (UTC)Reply
The Emma Goldman article became a featured article in 2007, nearly 8 years ago. Quite possibly, it needs some work to make it conform to present-day standards. The birth date certainly should be referenced. Arguably, it is verifiable from the reference present at the end of these three sentences: Emma Goldman was born on June 27, 1869. Her father used violence to punish his children, beating them when they disobeyed him. He used a whip only on Emma, the most rebellious of them.<ref>Chalberg, p. 13.</ref> Chalberg gives the birth date in the same passage (though it is on page 12, not page 13). Would I think that a birth date like that should be referenced in Wikidata? Absolutely. Similarly, most of the bibliography is verifiable, given that each of her works bar one has its own article, complete with bibliographical data. If the biography were at WP:FAC today, I would argue for holding promotion back until at least the ISBN numbers for Goldman's works are included, making verification that these works actually exist a matter of a single click on the ISBN number. Again, if we were in Wikidata, I would consider the addition of a reference like that (i.e. the ISBN number of the book's first edition) essential.
As was recently pointed out by another contributor in the mailing list discussion, Wikidata's role makes it all the more vital that its statements be referenced, because their content is likely to be copied. Given wikis' open structure, it is not uncommon for people to add false information. See for example Wikipedia, the 25–year–old student and the prank that fooled Leveson: An American man wrongly named in the Leveson Report as a founder of The Independent newspaper has expressed surprise that a judge would accept without question information on Wikipedia. Or see the case of Hannibal Fogg, which involved the invention of an author and of books that had never existed. Or see the invention of a film director who had never lived, except on the pages of Wikipedia: The greatest movie that never was. (That is a really, really good article, worth reading for its writing as well as the story it's telling.) Or see the Amelia Bedelia hoax, whose content could conceivably have been included as a statement in Wikidata. See the Brazilian aardvaark story, told in the New Yorker; again this concerns a snippet of information that could easily have been accommodated in Wikidata's statement structure. (As I pointed out on Wikimedia-l, Wikidata said for five months last year that Franklin D. Roosevelt was also known as "Adolf Hitler" – too obvious to be copied by anyone, unlike the Brazilian aardvaark moniker that entered multiple "reliable" sources.) Just today, there is this story on dozens of major news sites: This 'legend' changed a Wikipedia page to sneak backstage at gig.
Wikidata need not and should not fall into the same ditches that plagued Wikipedia during its early years, and still continue to plague it to some extent today. Instead, Wikidata would do well to take the lessons learned in Wikipedia's early years on board, because the danger is that anything present in Wikidata may come to be copied not just across several Wikipedias, but also by Google and multiple third-party sources taking either Google's or Wikidata's or Wikipedia's statement on faith. This could lead to widespread contamination of sources everywhere ("citogenesis on steroids"). Insisting on strict sourcing standards is, in my opinion, absolutely vital, given the role envisaged for Wikidata. Otherwise you are not just creating intractable problems for yourselves, some months or years down the line, but also for all reusers.
One thing I will now go and do, Denny, is insert the reference for Goldman's birth date at the end of that sentence naming it. ;) Andreas JN466 19:25, 3 December 2015 (UTC)Reply
Also, Andreas, another difference between Wikipedia and Wikidata is that the latter is growing much faster than the former ever did. Otherwise +1 to your points, especially "Insisting on strict sourcing standards is, in my opinion, absolutely vital, given the role envisaged for Wikidata." (emphasis mine). Ed [talk] [majestic titan] 02:31, 4 December 2015 (UTC)Reply

Mass updates edit

Wikidata has some way to go but has the potential to be a massive help to building and maintaining Wikipedia. For me, the biggest advantage is the ability to store information in once place that's referenced in many Wikipedia articles, and updated suddenly. The example was given of election results; I'm still finding many articles that list incorrect members of parliament or local councillors because they haven't been updated and there's no central reference of which articles contain such information. Another prime example is census data; many UK geography articles still list the population as at the 2001 census, not the (more recent) 2011 census or any of the subsequent population estimates from the Office for National Statistics.

Working through articles that find such information to update them is time consuming and mindnumbingly dull. Because we prefer to write information in prose, writing a bot to do it isn't really an option; using templates could work but would be much harder to update than Wikidata's slick user interface is. Out of date governance and demographic information is a big problem in geographical articles and Wikidata solves that problem for us; that alone is reason enough to embrace it and welcome it with open arms. Yes, it has flaws, but let's remember it's in its infancy. When someone views an article and sees a population figure that's 14 years out of date, it doesn't make us look good. So I say let's put the effort in to make WikiData work for us. WaggersTALK 11:26, 4 December 2015 (UTC)Reply