Wikipedia talk:Verifiability

Our Zhemao vulnerabilityEdit

Last year the Chinese Wikipedia uncovered an incident in which an editor, Zhemao, had created a web of hoax articles. It was particularly insidious because it exploited a social gap in the community and hurt public trust in the Wikipedia project. Mainly, Zhemao used foreign, offline sources and the community assumed on faith (i.e., blind faith) that the invented sources both existed and were accurate. I haven't seen us reckon with how we're vulnerable or liable for the same loophole.

I often work from non-English language and out-of-print sources myself. I am very aware that adding a rare citation makes it likely that few people will ever undertake the effort to verify the citation. I also often work with articles that attract rare/non-English source text from editors who are not (or were never) active, which leaves reams of text that is, for practical purposes, unverifiable and susceptible to this Zhemao vulnerability. (For context, when I say practical, I mean that I have the means to verify rare citations and if it's inaccessible to me, it's definitely inaccessible to our readership.)

The easiest solution would be to fortify the existing WP:NONENG recommendation into a standard expectation, treating these sources as implicitly challenged rather than just challengeable:

As with sources in English, if a dispute arises involving a citation to a non-English source, editors may request a quotation of When working with highly inaccessible (e.g., rare or non-English) sources, editors should proactively provide a quotation of the relevant portions of the original source be provided, either in text, in a footnote, or on the article talk page.


If you quote a non-English reliable source (whether in the main text or in a footnote), a translation into English should accompany the quote. ... The original text is usually included with the translated text in articles when translated by Wikipedians, and the translating editor is usually not cited.

This would close a gap in our social practice where inaccessible citations meet WP:AGF. If we have the appetite for such a change, I think it would help forestall the inevitable fallout of an editor pulling a Zhemao-style exploitation of community trust on the English Wikipedia. Given our size, it's a matter of time, if not already underway and undetected. czar 08:10, 7 January 2023 (UTC)Reply[reply]

If the Zhemao vulnerability lies with an editor who is inventing sources, how would this proposed change help protect against that same editor from inventing a quote and translation? It would be more work for them but a committed fabulist could still meet the proposed standard without necessarily increasing the possibility of detection. — Carter (Tcr25) (talk) 13:02, 7 January 2023 (UTC)Reply[reply]
It lowers a natural barrier to verification. It's much easier for a non-conversant individual to request a page scan of an inaccessible offline volume and verify that a string of non-Latin text matches the one quoted in WP citation than it would be to find a literate third party who can read, interpret, and verify the citation. That's why our guideline already says to provide quotations upon being challenged—it aids in verification. My suggestion is that we hold inaccessible sources to be inherently challenged. We can't change the fact that a bad actor can make it all up but we can make it easier to verify. czar 15:33, 7 January 2023 (UTC)Reply[reply]
Can you clarify what The original text is usually included with the translated text in articles when translated by Wikipedians, and the translating editor is usually not cited. means? -- asilvering (talk) 03:45, 20 January 2023 (UTC)Reply[reply]
When quoting non-English text directly, whether in a footnote or the article, the current guidance is to include the non-English original text next to the Wikipedian-translated English text. And the person doing the translation is not credited. I think that can be put more clearly but more a secondary point to the larger question at hand. czar 05:42, 20 January 2023 (UTC)Reply[reply]
Oof. I don't think this is "usually" the case at all, though I wish it were. I think your change is sensible, with the caveat that editors shouldn't be copy-pasting entire pages of text since copyright restrictions may limit how much can be reproduced. If a sentence is cited to "pp. 104-118" or something we wouldn't want all 14 pages of it. But, frankly, I think it would already be a huge improvement if current guidelines like "machine translation is worse than no translation" and "verify the sources when you translate" were more commonly known/enforced. -- asilvering (talk) 05:54, 20 January 2023 (UTC)Reply[reply]
I don't think this will help. Your theory of change seems to be that:
  1. People will read the directions.
  2. The occasional bad-faith "committed fabulist" (a lovely phrase) will be at least as likely to follow the directions than the main good-faith contributors.
  3. If someone cites such a source, another editor will ask them to deliver a "scan of an inaccessible offline volume".
  4. It will actually be possible for this to happen, which means that both editors will have e-mail enabled (I'd estimate a 75% chance of this being true), because you can't upload pages of copyrighted sources to the wiki under our fair-use rules, so offline delivery of the scanned page(s) is likely necessary, and also the contributing editor will still have access to the sources at the time the reviewing editor makes the request (probably depends on how soon the request appears).
...all so that one reviewer can check one page in one source.
IMO #1 is unlikely to happen often; #2 is probably backwards to a significant extent, and #3 is very unlikely to happen, and even if #3 happens, #4 has logistical challenges.
In practice, even this could be gamed. It'd be simpler, faster, and easier for your committed fabulist to create a sock account and post a fake request on the talk page, which they would pretend to fulfill, and then – because the problem here is that we trust editors – every editor after that would trust the sock account's public claim that it had received and checked the scanned page.
Then, having established that this is appropriate for "inaccessible" sources, we'll see more editors feeling entitled to demand free copies of anything that isn't already available online and free-as-in-beer, which will probably lead to worse sources. Why should I buy a book, pay for a paywalled article, go to the museum and read the sign on the wall, etc., if citing such a source will only lead to the belief that in doing so, I have WP:VOLUNTEERed to provide scans of all of these "inaccessible" sources to other editors? Once established as the principle, it will not be limited to non-English sources. You take the first step towards that in this proposal, by indicating that not only non-English, but also "rare" sources are included in this rule. WhatamIdoing (talk) 07:07, 20 January 2023 (UTC)Reply[reply]
My theory of change is that editors more frequently challenge citations from inaccessible sources without quotations, and then more inaccessible sources have quotations. I.e., encourage enforcement rather than a recommendation. I'm more concerned with our general practices that protect hoaxes rather than magically rooting out hoaxes. This wouldn't involve scanned pages but text quoted in references, as our current guideline recommends. czar 08:49, 20 January 2023 (UTC)Reply[reply]
I thought you were saying above that "it's much easier for a non-conversant individual to request a page scan of an inaccessible offline volume..." WhatamIdoing (talk) 21:24, 28 January 2023 (UTC)Reply[reply]
Agree with WhatamIdoing on all the above. Here's my take:
  1. This would be ineffective at reducing the problem.
  2. A lot of the time, entire paragraphs are being paraphrased with a single sentence. This proposal would require copyvio for any long passages.
  3. Past instances seem to have been caused by a handful of bad actors, rather than being a systemic issue that would require a policy change. It's extremely bad practice to change policies to address edge-cases; that always leads to a vicious cycle of ever-increasing bureaucracy. Non-systemic problems must be met with non-systemic solutions.
  4. As for the general issue of Wikipedia reliability: Wikipedia is inaccurate, and it's caused by edior laziness, carelessness, or emotional bias. Policy can't fix that. Either increase the amount of eyeballs on each article by an order of magnitude (instead, we've been losing editors), or wait 5-10 years until AI gets good (not like the CNET AI), and then take humans completely out of the equation. The latter is inevitable anyway.
DFlhb (talk) 05:06, 26 January 2023 (UTC)Reply[reply]
I'm not sure AI will inevitably take over here, but I agree that trying to write policy based upon the potential for a rare bad actor to act badly is likely to limit good contributions in a way that doesn't improve the project. — Carter (Tcr25) (talk) 14:33, 26 January 2023 (UTC)Reply[reply]
No one is arguing that verification rules stop bad actors... and stopping bad actors isn't even the point. In my experience, WP:TSI mistakes are far more often accidental error than overt "laziness, carelessness, or emotional bias". The point is to reduce the potential for mistakes by providing ready proof. It's already a recommendation in our guidelines but has no enforcement. Lack of verification for non-English and rare refs is absolutely a systemic issue: Inaccessible refs clearly go unchecked. For any author (AI or human) to be trusted, they need accessible sources, not just citations, and the fastest way there is through partial quotation. czar 02:47, 27 January 2023 (UTC)Reply[reply]
In re "Inaccessible refs clearly go unchecked": I wonder what percentage of accessible refs actually get checked. WhatamIdoing (talk) 21:46, 28 January 2023 (UTC)Reply[reply]
I just know that many times when I try to look at an on-line source, I discover that the link has been dead for years. I suspect that on-line sources are most likely (but, how likely is that) to be checked very shortly after they are added, and very seldom after that. Donald Albury 23:43, 28 January 2023 (UTC)Reply[reply]
Seconding czar here. I'm greatly surprised to see this problem described as "caused by a handful of bad actors, rather than being a systemic issue." -- asilvering (talk) 09:02, 29 January 2023 (UTC)Reply[reply]

A related (perhaps more common) problem is that when quoting an obscure source that actually exists that it can be near impossible to review compliance with other the other aspects of verifiability (e.g whether or not the source truly supports what they wrote.). One "middle of the road" solution covering both would be that whoever is advocating retention of the source or the text which it purportedly supports needs to make reasonable efforts to respond to inquiries about it. Regarding existence of the source, where they know it is available etc. And regarding what is in the source. to provide details and page numbers etc....the proposal above would be a subset of this. If that is not done upon request (including that nobody that responds to the request) then such would be considered to failing verifiability and also a basis for removal of the source. BTW, structurally, "removal of the source" is a separate question from wp:verifiability and rightly so. North8000 (talk) 16:23, 20 January 2023 (UTC)Reply[reply]

Retaining sources is a problem for editors who use libraries. We really can't require it.
Also, I find that in many cases, we don't actually need to double-check the source. I already know that smoking tobacco causes lung cancer; I don't need to check a cited source to make sure that's a verifiable statement. One wants to be able to check some source (not necessarily the specific cited source) for claims that seem unexpected, doubtful, contrary to prior knowledge, etc. WhatamIdoing (talk) 21:45, 28 January 2023 (UTC)Reply[reply]
@WhatamIdoing: I agree. I was digressing a bit. In essence that WP:VER excludes text (based on it's sources or lack thereof), not sources. North8000 (talk) 15:10, 30 January 2023 (UTC)Reply[reply]

Re "will probably lead to worse sources", it's worth remembering that more expensive sources are often less reliable: Prestigious Science Journals Struggle to Reach Even Average Reliability. Nemo 12:41, 29 January 2023 (UTC)Reply[reply]

That article says that Publish or perish causes scientists to submit shoddy work to scientific journals. It bases this conclusion on comparing the number of retracted articles in the 0.5% of most prestigious scientific journals against the average, and concludes that scientific fraud happens in journals of all levels of prestige, not just the unpopular ones (though some very bad ones have an especially high rate of fraud). More relevantly for us, it also doesn't say that scientific journals are less reliable than social media posts, or that we're better off with random websites than with books from the library. WhatamIdoing (talk) 23:05, 29 January 2023 (UTC)Reply[reply]
Retractions are not the main focus of the article. The findings of the article are about statistical findings in the section "The Other 99.95% of the Literature". And it's not a case of "it happens everywhere"; the higher the journal impact factor, the lower the quality of a journal. Nemo 06:35, 30 January 2023 (UTC)Reply[reply]
The Conclusion section says things like this:
  • Biomedicine – Image duplications: Higher ranking journals show a lower incidence of image duplications
  • Neuroscience, psychology – Statistical power: Either no correlation of journal rank with statistical power or a negative correlation
  • In vivo animal experimentation in disease models – Reporting of randomization and blinded assessment of outcome: Lower reporting of randomization in higher ranking journals and no correlation with reporting of blinded assessment of outcome
  • Medicine – Criteria for evidence-based medicine: Two studies found that higher-ranking journals met more criteria, while two failed to detect such an effect
  • Biomedicine – Reproducibility of experiments: Reproducibility is low, not even “top” journals stand out
These are the subject areas that happen to interest me, and I interpret this list like this:
  • Good for them, but if they're comparing the 0.5% at the top against the whole 99.5%, which contains some pay-to-play predatory journals, then it's hardly surprising.
  • "Top-ranking journals would rather publish scientifically interesting pilots than Phase III trials." (Phase III's raison d'être is statistical power, but it's often boring scientifically.)
  • When you're talking to someone who can speak knowledgeably (and probably passionately) about whether a third-party telephone-based randomization service is better than a random number generator on the researcher's computer, or if they have developed views about whether you randomize per animal instead of per rack location, then this is doubtless a huge big deal. But for a lot of other purposes, describing this is just wasted space in a paper. Most people need to know that an attempt was made at randomization, but they don't actually care about the details. (If you get stuck talking to one of those people, then the correct answer for human trials is to hire a specialty service.)
  • This seems to favor higher-ranking journals.
  • This seems to say more about the replication crisis than about the journals.
What the paper doesn't say is: Why compare the 0.5% against the 99.5%? Do you really expect journal quality at 99.4% to be materially different from 99.6%? Why not break them up into quintiles? The research that compares the top quintile against the bottom quintile produces a more favorable impression. WhatamIdoing (talk) 01:14, 31 January 2023 (UTC)Reply[reply]
I'm afraid this entire comment is the result of a misunderstanding. The article doesn't say anything about a "0.5% at the top", nor about a 0.5% of anything. Instead it points out that "retractions cover only about 0.05% of the literature", and therefore proceeds to look for data about the entirety of the literature (as opposed to past studies which focused on the narrow set of works which got extra scrutiny and got retracted). Nemo 23:54, 9 February 2023 (UTC)Reply[reply]
You're right; I misunderstood that. I see no information at all about how many journals fall into their category of "highly selective tier of the most prestigious journals". WhatamIdoing (talk) 04:03, 15 February 2023 (UTC)Reply[reply]

Change is hardEdit

The first sentence says:

In the English Wikipedia, verifiability means other people using the encyclopedia can check that the information comes from a reliable source.

I am wondering today whether this might be better:

In the English Wikipedia, verifiability means other people using the encyclopedia can check that the information you added comes from a reliable source.

I feel a little more strongly about removing the superfluous words than about added the other two. WhatamIdoing (talk) 17:41, 9 February 2023 (UTC)Reply[reply]

I like this idea. Makes it a bit clear that verifiability isn't just for those using the encyclopedia (implying Wikiepdia editors), however verifiability isn't just for information that is added. ― Blaze WolfTalkBlaze Wolf#6545 17:44, 9 February 2023 (UTC)Reply[reply]
I agree. Maybe, "In the English Wikipedia, verifiability means other people using the encyclopedia can check that information appearing in an article comes from a reliable source."? I considered "published" rather than "appearing", but I was worried that might create quibbling over what 'published' means...if folks don't feel that's a concern, I think I'd lean more toward that word. DonIago (talk) 18:05, 9 February 2023 (UTC)Reply[reply]
  • I very much disagree on the addition of "you added". It doesn't matter who added the information. If it doesn't have a reliable source to check against, it isn't verifiable. I also like keeping the "using the encyclopedia" in the sentence, as it makes it clear this is policy related to the encyclopedia. Neither of the proposed changes seems like an improvement to me. --Jayron32 18:33, 9 February 2023 (UTC)Reply[reply]
    I think the issue with 'Using the encyclopedia" is that it can imply only active editors of the encyclopedia should be checking that the information comes from an RS. I also don't really see a need to specify that it's related to the encyclopedia, why wouldn't it be? I don't see someone interpreting the lack of "using the encyclopedia" to mean regarding their homework. ― Blaze WolfTalkBlaze Wolf#6545 18:37, 9 February 2023 (UTC)Reply[reply]
    No it doesn't. If I grab a volume of my old Encyclopedia Americana off my shelf, I'm using it to find information. Using doesn't now, nor did ever, mean "writing the encyclopedia". To use a reference work is to find information in it. I use Wikipedia to find information. I've never heard of "using" to mean "authorship". That seems a novel and peculiar understanding of the phrase "using the encyclopedia". --Jayron32 18:44, 9 February 2023 (UTC)Reply[reply]
    I'm mainly referring into the context used here. OUtside of the context of Wikipedia then it makes perfect sense. But I understand the point you are making. ― Blaze WolfTalkBlaze Wolf#6545 18:46, 9 February 2023 (UTC)Reply[reply]
    Even in this context, there's noting that implies "using" = "editing". Readers, editors, everyone should be assured that the context is accurate and that assurance comes through the presence of WP:RS. —Carter (Tcr25) (talk) 19:51, 9 February 2023 (UTC)Reply[reply]
    About "using the encyclopedia": I'm concerned about readers who are getting information indirectly. Our material should be verifiable regardless of whether you're seeing it at ("using the encyclopedia") or seeing in in a Google Knowledge Graph ("using Google"). WhatamIdoing (talk) 04:32, 15 February 2023 (UTC)Reply[reply]

Let's try this:

How's that? -- Valjean (talk) (PING me) 20:39, 9 February 2023 (UTC)Reply[reply]

I'd lean towards "the information" over "our content" because content implies to me the words as written, which should be based on reliable sources, but not directly taken (i.e., plagiarized or otherwise copies wholesale). —Carter (Tcr25) (talk) 21:03, 9 February 2023 (UTC)Reply[reply]
That's a legitimate option. What do others think? -- Valjean (talk) (PING me) 21:07, 9 February 2023 (UTC)Reply[reply]
I like it. It's short and gets to the point while still getting the point across. ― Blaze WolfTalkBlaze Wolf#6545 21:10, 9 February 2023 (UTC)Reply[reply]
  • I like it with the "information" option. I agree that "content" isn't the best option. I also like the use of "anyone" as an improvement.--Jayron32 21:28, 9 February 2023 (UTC)Reply[reply]
  • Like Jayron, I think we should retain "using an encyclopedia" and avoid the possessive "our content". However, I do like the idea of changing "other people" to "anyone". --GoneIn60 (talk) 21:31, 9 February 2023 (UTC)Reply[reply]

My goal is to shorten and quickly "get to the point", while ensuring we are talking about Wikipedia's content, not information in general everywhere. This is a policy at Wikipedia to guide editors in their work. We are not seeking to create a general definition of the word. So here's my thinking for the three elements I changed:

  1. I chose the words "our content" over "the information" because to me "our content" means "the information at Wikipedia", but is shorter. It also covers more than just words, but also images and other things we host here. All of our content (except "the sky is blue" type of info) must be based on RS. So I still lean toward "content". Any "information" here is Wikipedia's "content". Maybe I'm being too pedantic. Feel free to offer your views.
  2. I prefer "anyone" over "other people using the encyclopedia" because it's much shorter and applies to readers AND editors. Readers have a right to expect that our content isn't editorial opinion. Editors check each other all the time. We insist that other editors also base their additions and edits on what RS say, not on their own opinions. So "anyone" is all-inclusive.
  3. I also wrote "verify" over "check". Maybe they mean the same thing, in which case "check" might be better, just to avoid the repetition of "verifiability means anyone can verify". But isn't that what we mean? Again, feel free to discuss.

Let's talk about each element and figure out what sounds best. I'll create numbered areas for each. -- Valjean (talk) (PING me) 22:27, 9 February 2023 (UTC)Reply[reply]

ONE: "our content" vs. "the information" -- Valjean (talk) (PING me) 22:27, 9 February 2023 (UTC)Reply[reply]

TWO: "anyone" vs "other people using the encyclopedia". -- Valjean (talk) (PING me) 22:27, 9 February 2023 (UTC)Reply[reply]

THREE: "verify" vs "check". -- Valjean (talk) (PING me) 22:27, 9 February 2023 (UTC)Reply[reply]

  • 1) the information 2) anyone 3) check. --Jayron32 12:54, 10 February 2023 (UTC)Reply[reply]
.1 our content - "the information" seems vague.
.2 anyone - This is better on every level.
.3 check - By checking the reference we verify it. -- LCU ActivelyDisinterested transmissions °co-ords° 14:05, 10 February 2023 (UTC)Reply[reply]
.1 "information in the article" (as per Carter (Tcr25)) would also be good. -- LCU ActivelyDisinterested transmissions °co-ords° 16:08, 10 February 2023 (UTC)Reply[reply]
  • 1) the information, 2) anyone, 3) check (I'd also lean towards keeping "using the encyclopedia," which might reinforce that this is about the information at Wikipedia; or maybe say "information in the article"). ―Carter (Tcr25) (talk) 15:32, 10 February 2023 (UTC)Reply[reply]
  • 1) the information 2) anyone 3) check. notes: "content" is more how editors, rather than non-editor users, describe our content. Users come here to find "information." "verify" comes off appearing to be circular, even if it is a valid form of definition. SPECIFICO talk 15:42, 10 February 2023 (UTC)Reply[reply]

Okay, let's see if we're getting closer:

I have exchanged "that information that comes from a" reliable source with "that article content is based on" reliable sources." How's that? -- Valjean (talk) (PING me) 20:55, 10 February 2023 (UTC)Reply[reply]

"The information" in the Some changes version is rather vague. Did you mean something like, "the information in an article"? DonIago (talk) 20:57, 10 February 2023 (UTC)Reply[reply]
DonIago, the original is indeed vague, so I replaced it with "article content". That cannot be misunderstood, plus it's shorter. -- Valjean (talk) (PING me) 21:12, 10 February 2023 (UTC)Reply[reply]
I'd still lean away from "content" (although "article content is based on" does address my previous concern) and go with something simpler like "In the English Wikipedia, verifiability means anyone can check that the information in the article comes from a reliable source.". ―Carter (Tcr25) (talk) 21:02, 10 February 2023 (UTC)Reply[reply]
Carter (Tcr25), "article content" means exactly the same thing, is shorter, and cannot be misunderstood. -- Valjean (talk) (PING me) 21:12, 10 February 2023 (UTC)Reply[reply]
"information in the article" is slightly shorter than "article content is based on" (it's one more syllable, but one fewer character and one fewer word), but I also find it a more straightforward statement that feels less jargon-y. I don't see the potential for confusion or misunderstanding that you're seeing when the "in the article" clause is added. ―Carter (Tcr25) (talk) 21:32, 10 February 2023 (UTC)Reply[reply]
Carter (Tcr25), you're comparing the wrong words. The original is "the information". Doniago proposes adding the words "in an article" to that, which is not necessary. "article content" serves the purpose well and is the shortest version. -- Valjean (talk) (PING me) 21:55, 10 February 2023 (UTC)Reply[reply]
I'm looking at the entire sentence, not just the two words. I'm sorry, but I don't see shifting from talking about "information" to "content" as improving clarity. ―Carter (Tcr25) (talk) 22:22, 10 February 2023 (UTC)Reply[reply]

I also lean away from the current "comes from" as we don't copy all content. Content is "based on" RS, regardless if a quote is copied from a RS, or our wording is based on a RS. "Based on" is all-inclusive. -- Valjean (talk) (PING me) 21:15, 10 February 2023 (UTC)Reply[reply]

That seems reasonable to me. ―Carter (Tcr25) (talk) 21:34, 10 February 2023 (UTC)Reply[reply]

Now it seems to boil down to two options:

  1. In the English Wikipedia, verifiability means anyone can check that the information comes from a reliable source.
  2. In the English Wikipedia, verifiability means anyone can check that article content is based on reliable sources.

Which one seems better? -- Valjean (talk) (PING me) 02:58, 11 February 2023 (UTC)Reply[reply]

I worry about the change from "other people" to "anyone". That can be misunderstood as saying that all references we use must be directly available to anyone who wants to check them. Many of our sources are behind paywalls, or only available in academic libraries, or are in foreign languages, or otherwise not directly available to most people. A change to using "anyone" needs a wider discussion than just this talk page. StarryGrandma (talk) 07:25, 11 February 2023 (UTC)Reply[reply]
Valid concern. Perhaps “someone”? Blueboar (talk) 13:56, 11 February 2023 (UTC)Reply[reply]
Good point. Maybe "anyone with the ability can check" is better. The point of "anyone" is that we are not talking only about editors. Anyone must know the location of the source, even if they don't personally have direct access to it. -- Valjean (talk) (PING me) 17:32, 11 February 2023 (UTC)Reply[reply]
"Someone" is better than "anyone" and implies using another person if necessary to check. The nutshell above the lead uses "readers", which I think covers it nicely too since it is plural. StarryGrandma (talk) 20:38, 11 February 2023 (UTC)Reply[reply]
The nutshell will be altered if we change any of this. We need to get away from focusing on readers, as editors also need to verify content. That's why "anyone" is chosen as it's all-inclusive. How that "anyone" does it is none of our business. They are welcome to get help from someone else. We can't address the full chain of investigation for every case. -- Valjean (talk) (PING me) 02:31, 12 February 2023 (UTC)Reply[reply]
My thoughts:
  • I share the concern raised by StarryGrandma. It is not necessary for "anyone" to be able to determine that the material matches what a reliable source said; it is only necessary for "someone (not counting the person who put that material in the article in the first place)" to be able to do this. I suggest considering the word others.
  • I prefer "article content" to "the information".
  • I'm not certain about "comes from" vs "based on". I don't think that either of them are perfect.
    • "Comes from" suggests that we are more interested in provenance than we actually are. (Example: A natural disaster occurs. As long as it is true and could be found in a reliable source, it doesn't actually matter – in practice – if I heard about it from an unreliable source. Another example: I read a lot of health-related articles in mass media. The information that I add might actually/originally "come from" an unreliable source, but you would never know that, because I found and cited an appropriate source when I added it.)
    • On the other hand, I don't want most content to merely be WP:Based upon a source; that might suggest that editors could play fast and loose with sources, as long as it's only a little fast and loose. But I also don't want to say that absolutely everything must be directly supported by the exact cited source at the end of the sentence, because it's complicated (e.g., the lead, which might summarize a section in a single sentence, and that section might be summaries of chapters or even whole books).
Perhaps the last bit phase (comes from/based on) should wait for another day. WhatamIdoing (talk) 04:22, 15 February 2023 (UTC)Reply[reply]
Perhaps the problem with comes from/based on is actually in the verb "check". Does this sidestep the problem?
  • In the English Wikipedia, verifiability means that others can establish that the information in a Wikipedia article matches the information published in one or more reliable sources.
WhatamIdoing (talk) 04:28, 15 February 2023 (UTC)Reply[reply]

I think that "matches" could be interpreted more broadly than proscribed by this policy. North8000 (talk) 16:10, 15 February 2023 (UTC)Reply[reply]

I suggest using "is supported by" rather than "matches" - we often talk about how content needs to be supported by reliable sources. StarryGrandma (talk) 06:54, 16 February 2023 (UTC)Reply[reply]
@StarryGrandma, part of me wants to agree with you, because I like the familiar words. But the point behind verifiability is that you could find a source for an uncited sentence if you tried to, and we use "supported by" to describe the relationship between between a cited sentence and the source that's cited at the end of it. WhatamIdoing (talk) 03:04, 24 February 2023 (UTC)Reply[reply]
@North8000, are you concerned that "matches" is too exact? That it would empower the "if it's not plagiarism, it's not verifiable" faction? WhatamIdoing (talk) 03:05, 24 February 2023 (UTC)Reply[reply]
@WhatamIdoing: A big topic brought on by a very broad word. At the simplest level, it's a very broad word that would constitute a significant change in policy which could be used or misused in hundreds of ways beyond those that I can imagine. Also IMO it exasperates a problem that we have now which is de-legitimizing the editorial/ editors making decisions process. North8000 (talk) 13:44, 24 February 2023 (UTC)Reply[reply]

I am disturbed by the term "matches". That works if we are quoting, but that is far from always what we wish to do. Therefore, "supported by" is more accurate. While we are not required to use this function, but citations have a "quote=" function, and I sometimes use it to make it easier for anyone who checks to see the exact words from the source which support my wording in the article. So use the quote function when in doubt. -- Valjean (talk) (PING me) 03:48, 24 February 2023 (UTC)Reply[reply]

  • The only problem with “supported by” is that some editors will argue that their original interpretation of the sources is “supported by” those sources. That conflicts with WP:No original research. We need something a bit stronger. Perhaps: “directly supported by”? Blueboar (talk) 14:23, 24 February 2023 (UTC)Reply[reply]
    @Blueboar, what source? If I write "The capital of France is Paris", with no source or even anything else on the page whatsoever, would you say that this absolutely true and unquestionably verifiable sentence is "supported by" a source?
Yes… there are literally thousands of reliable sources that directly support “Paris is the capitol of France”. And if some idiot challenges the statement and demands a citation, we can easily pick one of those thousands and add it. An annoyance, but NOT a problem. Blueboar (talk) 21:34, 24 February 2023 (UTC)Reply[reply]
It feels like there are thousands of source that "could" directly support that statement, but so long as it remains uncited, there aren't any that actually "do". We tend to talk about direct support in terms of the citation named at the end of the sentence, rather than sources that could hypothetically be added. I am concerned that "supported by" language will result in people claiming that all uncited sentences are unverifiable "by definition". Perhaps, though, if we followed it immediately with a sentence that says something like Information is considered verifiable if it is possible for an editor to find a published reliable source, even if no source is cited. This policy does not require an inline citation for all information, the risk would be reduced. WhatamIdoing (talk) 00:04, 25 February 2023 (UTC)Reply[reply]
Didn't we have this conversation once already or am I mixing things up? Usual is to apply a cn tag to uncited material and then if after some reasonable period of time, no cite, then it can be removed. There should be no requirement, implied or otherwise, that the person placing the cn tag or the person removing material, are at all, in any way whatsoever, responsible for locating a citation. Hypothetical sources are just that, hypothetical. Selfstudier (talk) 00:18, 25 February 2023 (UTC)Reply[reply]
@Selfstudier, we're talking about changing the very first sentence of the policy from:
  • verifiability means other people using the encyclopedia can check that the information comes from a reliable source.
to something like
  • verifiability means other people would be able to determine that the information is supported by a reliable source.
Do you think that changing the wording from "comes from" to "is supported by" would be perceived (by some editors) as a change in meaning, from "uncited material can be verifiable, as long as it would be possible for someone to add a citation" to "everything uncited is automatically unverifiable, because if there isn't a little blue clicky number, then it's not technically 'supported' by anything"? WhatamIdoing (talk) 00:43, 25 February 2023 (UTC)Reply[reply]
I responded to your previous comment, not this one. In my view, uncited material is in general essentially worthless (ie unverifiable, to both user and reader, blue sky aside) so I am in that camp that says uncited material may be removed (although for myself I would first tag for a cite).
Another way of saying the same thing is that material in the encyclopedia in general needs to be directly supported by a citation from a reliable source and if not it may be removed. Make it hard to add uncited material.
So I don't care about the wording per se, I care about the principle. Selfstudier (talk) 00:56, 25 February 2023 (UTC)Reply[reply]
@Selfstudier, what makes you think that it is impossible to verify uncited material? WhatamIdoing (talk) 01:45, 26 February 2023 (UTC)Reply[reply]
I can write a=B (just an example), place no citation and that cannot be verified. Sure some one might write something correct but I still won't know, as a user or a reader, whether it is or not without trying to myself find a citation for it. Admittedly I might not know even if it is cited, I would have to check the cite to be sure it supported what was written. So no cite is no use. Selfstudier (talk) 09:47, 26 February 2023 (UTC)Reply[reply]
@Selfstudier, I think the bit you wrote about without trying to myself find a citation for it is the key point. If you can find an appropriate reliable source, then it is possible to verify that "a=B", which means that it is verifiABLE. Content is not verifiable solely if the source is served up to you in the form of a little blue clicky number. It's verifiable if someone (anyone, which might not include you) is able to find a source, not solely if someone already not only found the source, but also documented that source in the article for your convenience. WhatamIdoing (talk) 01:21, 1 March 2023 (UTC)Reply[reply]
That's what the cn tag is for, so that someone can find one (or not). Until it actually is cited it is unverified. If you want to be picky you can say that it "may" be verifiable (because it might be false). Selfstudier (talk) 10:18, 1 March 2023 (UTC)Reply[reply]
I would say that until someone (other than the person originally adding the information) checks a source, it is unverified. "To verify a claim" is not the same as "to cite a claim". Citing the content makes it cited. Verifying the content against a source makes it verified.
Part of the wording in the definition may seem subtle, but it is intentional. To verify information is to check [determine, establish] that the information matches [is based on, corresponds to, comes from] a reliable source. It doesn't have to match "the" source, but it absolutely must match "a" reliable source [some reliable source, any reliable source, even if it's not the source at the end of that material].
Uncited content may (or may not) be verifiable, and it may (or may not) have been verified... just without any trace of that verification happening left on wiki. Imagine this scenario:
  • Alice adds [good] content to an article without a citation: It is verifiable, uncited, and unverified.
  • Bob sees the content and looks it up in Google Books, but doesn't edit the page: It is verifiable, uncited, and verified by a subsequent editor.
  • Carol, knowing nothing of Bob's actions, adds a source: It is now verifiable, cited, and (twice) verified by editors other than the one who added it originally.
(Yes, I want to be picky about this. Wikipedia:Policy writing is hard, and being picky about the wording means that people will be more likely to understand what we actually meant. We don't want editors to mistake our 100% able-to-find-a-source requirement with a 100% little-blue-clicky-number requirement.) WhatamIdoing (talk) 05:40, 4 March 2023 (UTC)Reply[reply]
Honestly the amount of times I see the qualifier "Paris, France" in the |location= field of cites maybe referencing that isn't a bad idea. -- LCU ActivelyDisinterested transmissions °co-ords° 21:41, 24 February 2023 (UTC)Reply[reply]
Have you seen any edit wars over that? Imagine that someone did an AWB run to replace them all infobox items with "Paris" with an edit summary saying "Everyone knows where Paris is". Would you expect that to be contested? If so, we might want to make a habit of specifying the country for every city. But that's a MOS question rather than a WP:V one. WhatamIdoing (talk) 00:39, 25 February 2023 (UTC)Reply[reply]
Yes that's literally my point, you wrote If I write "The capital of France is Paris", with no source and my point was that would likely be contested... -- LCU ActivelyDisinterested transmissions °co-ords° 13:20, 26 February 2023 (UTC)Reply[reply]
Really? How often have you seen someone genuinely express concern over whether the capital of France is Paris? WhatamIdoing (talk) 01:23, 1 March 2023 (UTC)Reply[reply]
Have you seen any edit wars over that? Imagine that someone did an AWB run to replace them all infobox items with "Paris" with an edit summary saying "Everyone knows where Paris is". Would you expect that to be contested? -- LCU ActivelyDisinterested transmissions °co-ords° 12:24, 1 March 2023 (UTC)Reply[reply]
Extremely unlikely scenario - I have a hard time imagining something like that (Except perhaps as intentional vandalism). There are literally thousands of sources that can be used to cite “Paris is the capitol of France”… so many that it is highly unlikely that someone would challenge it. That said, on the very very rare occurrences that someone does, just slap in one of those sources to shut the idiot up. Not worth arguing about. “Let the Wookie win.” Blueboar (talk) 13:32, 1 March 2023 (UTC)Reply[reply]
Blueboar, I think we're talking about different things. I'm imagining infoboxes that say |location=Paris, France, and someone mass-changed them to |location=Paris on the ground that the ,France part was unnecessary (not factually wrong, just unnecessary, the way most editors would say that "Paris, France, Europe" is unnecessary). WhatamIdoing (talk) 05:45, 4 March 2023 (UTC)Reply[reply]
And also, as I have said in previous discussions, 99.9+% of our claims are not in any way obvious like that; formulating policy around this tiny minority of material is a bad idea. Crossroads -talk- 00:43, 25 February 2023 (UTC)Reply[reply]
  • I'm currently thinking about a sentence like "In the English Wikipedia, verifiability means that others are able to establish that the information in a Wikipedia article corresponds to the information published in one or more reliable sources." WhatamIdoing (talk) 18:19, 24 February 2023 (UTC)Reply[reply]
I prefer the wording involving "can check", since that leans more toward the importance of citing sources rather than the exceedingly rare case where it's okay to claim something without a source. Crossroads -talk- 00:44, 25 February 2023 (UTC)Reply[reply]
Are you more interested in the "can" part, or in the "check" part? Would "In the English Wikipedia, verifiability means that others are able to check that the information in a Wikipedia article corresponds to the information published in one or more reliable sources" be good enough? I don't want to do anything that will increase the number of "But I can't check it, because it's PAYWALLed and NONENG" complaints. WhatamIdoing (talk) 02:13, 26 February 2023 (UTC)Reply[reply]

Paywall referencesEdit

There’s a big difference between borrowing from a library, which is normally free (uk) and paying $6/month for 12 months for a paywall? Riskit 4 a biskit (talk) 08:41, 17 February 2023 (UTC)Reply[reply]

There are many different ways to get details of a source, including Wikipedia Library and the Resource Exchange. It's also possible that Google will allow you to find a way to access the source. Either way such sources are accepted as per policy, see WP:PAYWALL. -- LCU ActivelyDisinterested transmissions °co-ords° 12:36, 17 February 2023 (UTC)Reply[reply]
  • Meh… sure, if you live near a free library it is cheaper to use that library… but consider places where you have to travel miles to get to a library? With the cost of transportation it might actually be cheaper and more convenient to pay for the Paywall.
Of course the cheapest option is to find someone who does have easy access to the source (either because they live near a free library or because they have payed the paywall price) to volunteer to check the source on your behalf. Nothing says you have to be one to check the source. Blueboar (talk) 16:02, 17 February 2023 (UTC)Reply[reply]
Where I live, we all have a local library based on our place of residence, but it might cost money or be impossible to obtain a library card for another local library with a larger collection or a more convenient location. So even physical libraries may have a proxy paywall attached. Imzadi 1979  17:24, 25 February 2023 (UTC)Reply[reply]
Yes, the library is free, but many of the resources in my local library that are useful for WP are reference materials, and cannot be checked out. So, I sit in the library taking notes, and then find after I return home that I have left out something (sometimes a page number). On-line, I can find many scholarly sources that are now free. I also use the Wikipedia Library, and have a JSTOR account. More and more, I am finding sources (such as journal articles that are still behind a paywall) and chapters or entire books that have also been posted to a free site by the author(s) of the sources. Donald Albury 16:06, 17 February 2023 (UTC)Reply[reply]

Self-published sourcesEdit

I began editing the article on The Washington Post and found that correcting and improving the sources, alone, has turned out to be an overwhelming undertaking. Please forgive me if I have come to the wrong place and direct me to the proper place. My concern with The Washington Post article is that The Washington Post, itself, is used as a source no fewer than 68 times! I understand the difficulty and temptation to use the Post as its own source, but I think that when we are creating articles about mainstream media we must resist the temptation to use that media as the main source. Even if the facts are accurate, our quest at neutrality and avoiding conflicts of interest compels us to find other sources for the information, or in the absence of outside sources that the information be removed unless it is critical to the reader's understanding of the entity, in this case, The Washington Post. What is the Wikipedia policy on such a thing? I have searched and searched and cannot find the policy that applies here. I covet your opinions and advice on how to address the ridiculous 68 instances of self-sourcing. I also invite you all to jump into that article to reduce its length and improve it overall. All the best. MarydaleEd (talk) 02:34, 20 February 2023 (UTC)Reply[reply]

The policy on this is WP:ABOUTSELF, in the article about the Washington Post it's acceptable to use the Washington Post as long as the details being referenced are uncontroversial. It's always better to have reliable secondary sources, but not strictly necessary. -- LCU ActivelyDisinterested transmissions °co-ords° 09:11, 20 February 2023 (UTC)Reply[reply]
@MarydaleEd, an article should be WP:Based upon sources that are Independent of the subject. It's not always a problem for a third of the sources cited in an article to be associated with the subject, but in the case of a famous subject like The Washington Post, I'm sure you could do better than the bare minimum. I appreciate your efforts to improve the quality of sources in that article. WhatamIdoing (talk) 03:13, 24 February 2023 (UTC)Reply[reply]
  • ABOUTSELF does say that it's important that the article is not based primarily on such sources, which is at least a reason to try and add secondary sources when available. My experience is that even for uncontroversial stuff, when an article has a huge amount of things cited to primary sources for which no secondary source exists, it's often trivia or even promotional in nature. Glancing at the Post's article, the big problem that leaps out at me is the "political endorsements" section, which is cited almost solely to the Post as a primary source - I don't think entire sections should rely almost entirely on one source in general, especially not a primary / WP:ABOUTSELF one. (It's also worth pointing out, though, that the raw number of citations may not tell us anything, because people often cite a primary source + secondary sources discussing it, with the primary cite being more of a convenience link rather than what satisfies WP:V.) --Aquillion (talk) 09:02, 24 February 2023 (UTC)Reply[reply]
    Also, people often cite a good source (e.g., a history book) many times, and a news article once, with the result that 10 uses of a book + 10 newspaper articles actually is "half" the article, but it looks like it's 91% news articles. WhatamIdoing (talk) 18:21, 24 February 2023 (UTC)Reply[reply]
    Looking at that section, most of the refs are for the actual endorsement, which doesn't seem problematic (there's nothing inherently better about a third-party source saying the Post endorsed Obama vs. linking to the endorsement). That said, Editor & Publisher usually tracks newspaper endorsements (at least for presidential races), so something like that could be substituted. The one non-endorsement citation there, used multiple times, is an analysis of the Post's endorsement history by the paper's ombudsman. It could be argued that despite being published in the Post, the ombudsman is actually independent of the paper. It would be good if there were third-party sources included in that section (perhaps this study would be relevant). While that section could be improved, it doesn't strike me as particularly problematic in its current form, at least from a sourcing perspective. —Carter (Tcr25) (talk) 19:26, 24 February 2023 (UTC)Reply[reply]

For stuff without a specific challenge (e.g regarding veracity, POV), I think that we should not even discourage such self sources. For example, secondary sources usually don't cover the "boring" encyclopedic information about an organization and so such is needed to have an informative encyclopedic article. Also, when the organization topic gets larger and more diverse / less centralized, the concerns behind "self" mean less. An extreme case can illustrate. 100% of the sources for the Human article are humans so it is 100% self sourced. North8000 (talk) 13:49, 1 March 2023 (UTC)Reply[reply]


@WhatamIdoing: Was this glossary discussed somewhere? Levivich (talk) 00:31, 1 March 2023 (UTC)Reply[reply]

I believe that the most recent discussion on this page was with you: Wikipedia talk:Verifiability/Archive 76#RFC to change "verifiable" to "verified". WhatamIdoing (talk) 01:12, 1 March 2023 (UTC)Reply[reply]
I don't think the glossary has any kind of consensus behind it. Some of the definitions are different from WP:GLOSSARY. One of the words is being discussed now at #Change is hard. I still don't care to discuss the definitions of these words, but I don't think it's kosher to add a glossary to the FAQ at a core policy without so much as mentioning that you'd done so on the talk page of that policy. I only stumbled upon it because the subheadings broke the TOC; it took me like a half hour BTW to figure that out, I would have figured it out sooner had you posted a notification on this talk page about the addition of a glossary to the FAQ. I'm not going to revert it but I will (once again) protest undiscussed major changes to policies (or in this case, their FAQs). Levivich (talk) 01:27, 1 March 2023 (UTC)Reply[reply]
That discussion, much like the other recent discussion, didn't come to any form conclusions. I would be wary ofaking any changes based on them. -- LCU ActivelyDisinterested transmissions °co-ords° 15:35, 1 March 2023 (UTC)Reply[reply]

I really don't like FAQ's, and definitely not for something that has been challenged. They are really in-effect part of / given the same imprimatur as the page and need to get the same scrutiny/requirement for consensus, which they don't.North8000 (talk) 14:45, 2 March 2023 (UTC)Reply[reply]

Is a final court judgment considered verifiable?Edit

Is a court decision considered "verifiable" under this policy? In US, court decisions are open to the public, who have an interest in the proper functioning of their state dispute resolution mechanism. They are also not "original research" by the person writing thenarticle, as they are recognized acts of state, and are the report of research done by the court through an adversarial process. (They also must be redacted to prevent disclosure of personal information the public does not have an interest in.)

(I should note that trial courts are considered the "finders of fact" in the legal system.This means they have a legal duty to rigorously find the factuality of all cases brought before them.) — Preceding unsigned comment added by (talk) 07:19, 9 March 2023 (UTC)Reply[reply]

I am asking particularly because of the case of Emily St. John Mandel, who got divorced in 2022, but couldn't have her article updated "unless someone wrote an article". (see, citations 40 and 41.) It seems to me that public records should be citable and verifiable, as they meet all apparent requirements: archived, individually addressable, and publicly available; however, there is no actual policy statement in place, and I would like to understand if such an issue has been considered before.

Thank you for your time! (talk) 07:12, 9 March 2023 (UTC)Reply[reply]

Sure, a primary source is ok for such a simple uncontroversial fact - unless someone comes along with a credible argument that the subject is not actually divorced. Roger (Dodger67) (talk) 07:18, 9 March 2023 (UTC)Reply[reply]
The problem, though, in that context is Wikipedia:No original research, which means that Wikipedia cannot be the first publisher of new information. Information must be cited to reliable, published sources. Appellate opinions are published in the sense that either they are formally published in reporters or they are widely available from legal databases. However, trial court judgments at the state court level are rarely published in either sense. --Coolcaesar (talk) 09:57, 9 March 2023 (UTC)Reply[reply]

Could I add my own essay to this page?Edit

I would like to add my own essay on the topic, Wikipedia:Unsourced information is not valuable, to the list at Wikipedia:Verifiability#Essays.

I am not aware of any procedure for such a request, and I am not sure if it can be seen as boastful of me to propose this.

So, what do you think? Veverve (talk) 22:52, 10 March 2023 (UTC)Reply[reply]

Thanks for your effort. My 2 cents: There are probably hundreds of essays related to wp:ver and only a few linked, and so it's a question of which ones are in the 1% that is more perfect and useful. There are good thoughts in there but also things (including IMHO a conflict with policy) which put it into the other 99%. Thanks again. Sincerely, North8000 (talk) 00:57, 11 March 2023 (UTC)Reply[reply]

RFC on whether citing maps and graphs is original researchEdit

Please see Wikipedia:Village pump (policy)#RFC on using maps and charts in Wikipedia articles. Rschen7754 15:12, 19 March 2023 (UTC)Reply[reply]