Wikipedia:Key information

(Redirected from Wikipedia:Defining data)

Key information is the content of Wikipedia infoboxes and the common knowledge, simple claims, and general reference information which is necessary for a fundamental understanding a topic.

Key information is special in Wikipedia because articles include it under different guidelines than the prose of the Wikipedia article. Many WP:Reliable sources which cover a topic will omit presenting key information on the assumption that their anticipated audience already knows the information or has trivial access to it. Because Wikipedia is a general reference source, Wikipedia presents to users who start with no information whatsoever and therefore need access to even basic information. The usual Wikipedia quality control method of performing WP:Verification and checking for Wikipedia:Identifying and using independent sources may not apply to key information, because key information may only exist as the product of original research or self-published sources.

A common use of key information in Wikipedia is a statement of a person's nationality and at least approximate year of birth. A biography of a person born in India in the 1800s may assume that readers know when and where a person lived. In Wikipedia, however, the global audience may be totally unfamiliar with the topic, and have no idea about the country or time when they lived. When a biography omits the basic details but goes on to present a person's life, then for key information, Wikipedia editors may do a minimal amount of WP:original research or present WP:primary sources to establish the key information of the topic.

Most of the content inside Wikipedia infoboxes is key information. Key information is the subset of a data model which is required for humans to have basic thoughts about a concept.

Examples edit

As an example, consider the term "Rani of Jhansi". If someone is not familiar with these non-English words, then perhaps they might wonder anything, such as if this is a type of coffee, an animal, a city, an event, or a concept. The first piece of "key information" is that this is an "instance of a human". There will be no book or biography which starts by giving this information because most publishing assumes that readers start with this information. Wikipedia cannot assume this about its readers, and has to present key information even when source texts do not explicitly state it. Other key information might be the name translated, details such as "occupation is queen", "nationality is India", "lifetime is during the 1800s". Once there is some basic information available, then a human can start learning other details and the general learning can begin.

  • To understand an organization, it is necessary to know its approximate budget and number of employees. Without this information the other information in a Wikipedia article about a company may not even indicate whether it is a small organization with a few employees or a large organization with hundreds.
  • To understand a biography, it is necessary to know approximately when and where a person lived. Typical reliable sources may not explicitly indicate a person's residence or the decades in which they were active. Without key information other information loses cultural context.
  • To understand any topic, it is helpful to connect that topic to a matching identifier or authority control. A Wikipedia article's subject may not precisely match a subject by the same name in another library or cataloging system, so the Wikipedia editorial process has to make a subjective judgement about the correctness of a match.

Including key information edit

Wikipedia guidelines for the prose of articles, including WP:NOT, WP:RS, WP:WEIGHT, WP:V, WP:LEAD, seem to prohibit the inclusion of key information for lack of citation to a reliable source. Including key information may be original research for statements of fact and also for weight.

There is currently no objective way to determine what sorts of facts are key information. A theoretical objective method would be to determine if common practice is to include certain facts in similar articles, but right now, Wikipedia has trouble detecting this at scale.

The current best practice for determining whether something is key information is whether a field is already in an infobox. Not all key information is in infoboxes, but much is. Anyone can make an argument that some additional information is key by making the case that readers will not understand a topic without having access to some simple fact about it.

Applications edit

Infoboxes edit

The editorial process of Wikipedia decides what "key information" is when the Wikipedia community publishes an infobox. The content of any Wikipedia infobox is the set of "key information" for that concept. The set of possible fields within a class of Wikipedia infoboxes is the key information for that class.

Prose of article text edit

Key information is most visible as the contents of an infobox. Often the prose of Wikipedia articles repeats information from the infobox as part of the narrative of an encyclopedic presentation. There is a precedent in the editorial process of Wikipedia to permit key information in an infobox, while excluding it from the prose of the Wikipedia article. As of 2019 there might not be documentation in the manual of style advising when to repeat text from the infobox and when to exclude it.

Wikipedia + Wikidata integration edit

Wikipedia's origin is as a prose encyclopedia. It contained "unstructured data", which meant that humans could write freely with any sort of human language they liked. The advantage of this is that human writing has been more enjoyable for humans to read than machine-written information with the currently available technology. Disadvantages of using human-written information is that robots are capable of writing and sorting information much more quickly than humans and also that robots can support human collaboration in many ways if text is machine readable.

An application of Wikidata in Wikipedia which everyone desired from early development is "query at the intersection of categories". Suppose that someone wanted to see a list of all the Wikipedia articles which are "biographies" of "rulers" who are "female" and "lived in the 1800s". Plain text search of Wikipedia for may not immediately list "Jhansi ki Rani". Wikidata, however, can return this information instantly. This kind of data discovery depends not only in the entirety of data about a concept, but also on the human curation of the smaller set of data which is key.

Wikidata has innumerable functions. The best way to describe it is as a generational change in technology. Just as in the 1990s most of the world did not understand what the World Wide Web was, and how in the 2000s people had trouble understanding social media, and in the 2010s the world came to understand smartphones, Wikidata and its related open data applications are challenging to explain as of 2019. The Wikimedia Movement is making a major financial and labor investment in integrating Wikipedia with Wikidata and structured data in general on the expectation of many fundamental changes and many more small changes in every part of the project's operations.

d:Wikidata:WikiProject Schemas in Wikidata and the 2019 establishment of the Schema namespace in Wikidata will lead to greater insight into the content gaps in all languages of Wikipedias, particularly around key information.

See also edit