Open main menu

Wikipedia β

Help:How to mine a source

  (Redirected from Wikipedia:How to mine a source)

It is very common for Wikipedia editors to add a citation, such as to a newspaper or magazine article, a book chapter, or other hopefully reliable publication, to source the verifiability of a single fact in an article. Most often the editor has found this source via a search engine, or perhaps even a library visit, seeking a source for a detail in an article, some pesky tidbit without a citation. This common approach tends to miss many opportunities to improve both the content and the sourcing of articles; it's akin to stopping at a grocery store for bread and nothing else, rather than "working" the store for an hour with a long shopping list and an eye for bargains.

This tutorial offers a very short but real-world example of how to "mine" a source– to really work it like a seam of ore for every last bit of verifiability gold. In addition to noticing facts in your source that are missing from the article, and noticing that your source can also provide a citation for more facts already in the article than the one(s) you were most concerned about, you can also often double-up citations on a fact that already has one source cited. While the average fact in an article does not need seven citations, having two rarely hurts, can provide a cushion if something is found faulty with the other source and it is deemed unreliable or its link goes dead, and can provide backup sourcing if a third, questionable, source challenges the first.

Example articleEdit

The article Manx cat, on the domestic cat breed, like most cat (and dog, and horse, and orchid, etc.) variety articles needed a lot of work as of late 2011. In particular, even though it linked to many current breed standards, it was missing information on the early history of the variety. Google Books actually turns out to be very useful for old "natural or traditional breeds" like the Manx, because it tends to have the full text of sources that are no longer covered by copyright. One such source was Charles Henry Lane's Rabbits, Cats and Cavies: Descriptive Sketches of All Recognized Exhibition Varieties (1903) with a detailed if short chapter on the Manx. This piece was "mined" first, and the Wikipedia article vastly improved with it, but this was too rich and complex an example to make a good case study.

Example sourceEdit

A more appropriate example for this page's purpose was found a bit later. It is a much shorter chapter, from The Cat: Its Points and Management in Health and Disease by Frank Townend Barton (1908). Since it is out of copyright, and quite short, we can just quote the full text of his "The Manx Cat" here:

The Manx cat—the origin of which is involved in obscurity—chiefly exists in the Isle of Man, and has been found also in the Crimea and Cornwall. Few specimens are now found.

The suppression of the tail constitutes one of the characteristic features of the breed. Manx cats by no means breed true to type, any more than the bob-tailed sheep-dog or schipperke does, and if the aborted caudal appendage is removed, it makes the cat quite as good as though it had been born with a total absence of tail. It is the absence of tail that gives the peculiar appearance to the Manx Cat, being akin to that of the rabbit in the hinder part, owing to the length of the limbs.

With reference to colour of coat, the Manx may be of any colour, but probably black is most frequently met with.

There is nothing whatever to recommend the breed, whilst the loss of the tail in no way enhances its beauty.

If a short tail is present, it should be removed whilst the kitten is a few days old, and there is no doubt that many spurious Manx cats exist, as the result of this simple operation, practised for deception.[1]

Yep, that's the entire chapter. At first glance, it hardly seems worth bothering with.

Attention was first drawn to this chapter because of its mention of similar cats in Cornwall and Crimea, details other sources so far had not discussed. But there is actually a quite large number of facts (i.e., in Wikipedia terms, nontrivial statements of fact from an independent, non-fringe, apparently reliable, professionally published work) to be dug like gems from this source.

Mining this source for all it's worthEdit

It is tempting to simply skim this source and edit the article for a point or two and move on, but it's quite easy to miss something (indeed, the fact that Manx cats were thought of by Barton as scarce and possibly even declining was missed until preparation of this essay). It is best to make a list of facts (e.g. in a sandbox page or a text editor), in wiki markup and in sentences, or at least easily reusable sentence-fragment form, and already carefully rewriting to avoid plagiarism. Start with the first sentence and work your way down. It might look something like this, including square-bracketed notes based on sources already cited in the article:

  • The Manx's ultimate origin is unknown. [It was as of 1908, and still is now according to other sources, but genetic study could change that at any time.]
  • Most specimens were then found on the Isle of Man. [This was long before the world-wide explosion of cat breeding.]
  • Similar cats were also found in Cornwall and Crimea. [That they are exactly the same as Manx cats as Barton seems to suggest is not credible from a modern, post-genetics perspective; i.e. on that point of heredity, Barton cannot be a reliable source.]
    • [We know from the Japanese Bobtail and Kuril Islands Bobtail that stunted-tail cats are a common type of mutation in insular, isolated populations but not necessarily the same mutation.]
    • [But we also know from other sources that Manx cats were popular as ship's cats, so they could have simply spread to Crimea by ship. Needs more sources. We can't draw any conclusions yet; that would be original research. Other sources also mention them in Denmark, etc. This is all interesting enough to mention, without advancing a theory.]
    • Cornwall is not very far from the Isle of Man. [Again, we can't put words in the source's mouth, but simply noting this is enough to let the reader think about it; one of them might even find some evidence we're lacking that Manx cats originally came from Cornwall, or Cornish tailless cats originally came from the IoM.]
  • As of 1908, the breed was uncommon. Barton implies clearly that they are declining. [It's tempting to say "even on the IoM", but honestly the original passage is a bit vague, and an inference that specific would be another form of OR.]
  • One of the defining characteristics of the breed is "suppression" of the tail. [That's a good way to encapsulate "taillessness to near-taillessness to short-tailedness"! Use that term.]
  • It is not the only defining characteristic of the breed. [Barton does not elaborate much, but Lane did; we now have two sources making it clear very early in the days of the "cat fancy" that Manx are distinctive in more than one way, and where Barton does specify, he does so in a consistent manner with Lane. I.e. this is a really good thing to double-up citations on.]
  • Manx do not breed true; i.e. not every pure-bred individual exhibits all defining traits of the breed, like taillessness.
    • This is also true of various, though not all, other pure-bred varieties of domestic animal. [Some outside reading informs us that this is true of two canonically tail-suppressed dog breeds, the Bobtailed Sheepdog and the Schipperke, both of which are frequently born partially or fully tailed and are frequently tail-docked. This is an interesting point, and even the fact that it's not all about cats is likely interesting to the reader; broadens the perspective.]
    • Barton actually twice recommended docking of partially-tailed Manx, though he later also specifically states that this is sometimes done for fraudulent purposes. [And he even thinks the breed is ugly; so he at least thinks of the breed as intrinsically a breed, albeit one he disfavors, rather than as defective cats that don't constitute a breed; this puts him in agreement with other late-19th-century sources that already consider this a legitimate breed.]
  • Tail suppression is the most visually obvious of the breed's defining characteristics.
  • Manx also have long back legs. [Other sources say this, but it's nice to have another period source indicate it was an early, natural trait, not the result of later, e.g. American, breeding.]
  • With short or no tail and long legs they thus have a rabbit-like rear half. [Lane and others said this too, but it's nice to have another early source indicating this was always the case, and always the perception.]
  • Manx are of any coat color. [In the context, this can only mean any coat color normal for a European cat; the cat fancy at that time did not extend further, and it obviously cannot include point coloration an other Asian cat traits; we know from Lane and, well, all other early cat fancy literature that in this era, Siamese and other "exotic" breeds were very rare curiosities in the West, and their genes were not being spread around yet.]
  • Black was the most common color of the original, native Manx breed being written about at the turn of the last century, before controlled breeding of cats became a big deal. [Lane corroborates. We also have tentative info from another source, not yet in the article, that this may actually no longer be true even on the IoM, but once was.]
  • Barton is actually quite hostile to the breed, and his derogatory remarks are worth quoting directly in full. [They're a sharp counterpoint to Lane's enthusiasm (he owned one of the earliest championship Manx show cats), and are the earliest on-record cat expert hostility toward the breed. It's good to have this viewpoint balance for countering possible WP:Undue weight resulting from Lane's favoritism. This is a theme that actually carries through to the current day, and will soon be its own "Controversy" section in the article. This short little Barton piece is even more important than it seemed!]
  • Docking of non-rumpy specimens was performed not long after birth. [This is no longer common practice today, and illegal in many places, including most of Europe.]
  • Docking was sometimes performed for fraudulent purposes, to pass off regular cats as Manx by cutting their tails off. [We knew this already from cat Web forums, but actually needed a reliable source for it to add it to the article.]

A quick scan shows that what we can glean from and source to this article – what we can determinedly mine from it – is, in combination with other facts that have to be connected (without novel synthesis) to and weighed against the details in this source, actually more material than the entire full text of the source! And that's before we've written it out in reader-friendly, explanatory prose.

After all of this is worked into the article, it's good to re-read the source; often a salient point will have been missed the first time around.


As this simple test case demonstrates, even sources that appear to be near-trivial in their brevity can often, if they are reliable, be used to source far more material than they seem capable of at first glance, especially if they relate (negatively or positively) to material in other sources (so long as WP:SYNTH is followed carefully). This remains true long after they are cited, since a newly "discovered" source may re-open a dynamic between the earlier, already-mined sources and the article as it evolves.

A caution on misapplicationEdit

Care must be taken not to apply this approach to works that are not actually reliable sources for the material in question. A source is mainly about one thing or two, but it may have other points that can be used to expand an article. This must, however, be done within allowable limits of the Wikipedia core content policies. One must be aware in particular of the distinctions between primary, secondary, and tertiary sources, because a case of misuse of material can arise for multiple reasons:

  • The work is a magazine article or other piece of lower-end journalism, mentioning something in passing or as a side comment, without any indication what the ultimate source is. Many "factoid" sidebars and tables in regular news articles are also in the "low-end journalism" category, as they frequently misinterpret and misrepresent the data on which they are based. Look for the real sources of the data.
  • The work is a specialist piece by an expert on a particular topic, but the detail you wish to use is from a completely different field, and the author, with no credentials in that field, doesn't provide a source. This arises frequently in non-fiction books. Look for corroborating material from actual experts in that other discipline.
  • The claim you want to cite is a novel conclusion reached by the author of the piece; this makes it a primary source for that claim. In peer-reviewed journals, such material mostly takes the form of the newly-collected data and results/conclusions material in the article or paper (and the summary of this material in the abstract); there may be many pages of secondary-source material leading up to and supporting it. Primary research is often provisionally cited in Wikipedia, with attribution (e.g. to the author, the research team, or to the paper); a secondary source should also be provided when available, as primary claims are always suspect – current research is constantly being overturned by newer research. For science material, the usual secondary source is a literature review. We like to have both, because secondary sources indicate acceptance by other experts and are more understandable by more readers, while primary ones provide details and are especially useful to university students and experts using Wikipedia.
  • The item you want to use is a subjective opinion. You may still be able to use it, as a primary source, if you attribute the claim directly, either to the author(s) of the piece you are citing (if notable, e.g., "According to Jane Q. McPublic ..."), or to its publisher (e.g., "According to a 2017 New York Times article ...). If neither are notable, are you sure the source is actually reliable at all? Primary-source opinion pieces take many forms, including editorials and op-eds, advice columns, book and film reviews, press releases, position statements, speeches, autobiographical content, interviews, legal testimony, marketing or activism materials, and overly personalized instances of investigative journalism. Such content often appears in publications that otherwise provide the kinds of secondary-source material on which Wikipedia mostly relies, such as newspapers.
  • The work is outdated and does not reflect current expert consensus about the matter at hand. In such a case, the newer sourcing should be used. Include the contrary viewpoint, attributed to its author, only if it seems pertinent to continue including it (e.g. to highlight a controversy, or to cover changing views of the topic over time). A general rule of thumb in research is that very old sources, or sources close in time to an event (i.e. "old" after a few months have passed and more analysis has been done by other writers) should be treated as if they are primary sources like eye-witness accounts and opinion pieces.
  • The work is a tertiary source, like a topical encyclopedia, coffee-table book, or other conglomeration and summarization of material from numerous other sources. Such works are often not written by experts, contain material that is already obsolete by the time the work is published, gloss over important distinctions and limitations in previously published research conclusions, and may reflect a strong editorial bias. Tertiary sources are better than no sources, but they do not stand up to challenge from secondary ones.
  • You are "cherry-picking" by only citing sources (or parts of sources) that agree with the claim you want to include. This is a fraudulent approach, a fallacious form of original research in which the editor is deciding what is and isn't true and warping Wikipedia content and citations to fit this personal pre-conceived notion.


  1. ^ Barton, Frank Townend (1908). "The Siamese—Abyssinian—Manx". The Cat: Its Points and Management in Health and Disease. London, England: Everett & Co. p. 31. Retrieved 2011-11-18. 

See alsoEdit