Wikipedia:Wikipedia Signpost/2022-06-26/Recent research

Wikipedia versus academia (again), tables' "immortality" probed: Tables "like to socialize" and "share genes": ooh la la!


Wikimedia Research Newsletter Logo.png
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"The Secret Life of Wikipedia Tables"

This paper[1] presents an analysis of "the entire history of all 3.5 M tables on the English Wikipedia for a total of 53.8 M table versions." In an accompanying conference poster, the researchers summarize their findings as follows:

Wikipedia tables

... like to socialize! 🥳
... share genes! 🧬
... live a fast-paced life! 🏎️
... tend to be immortal! 💓

The paper itself presents various interesting results in slightly more scholarly detail.

"Number of tables and pages created per month" (from the paper)

The authors note Wikipedia contained "almost no tables" in its first three years, after which:

"Using tables in Wikipedia became more popular only around 2004 and tables were fully adopted by end of 2006. Since then, every month around 20,000 new tables are created (about one every two minutes). The hypothesis that insertion frequency would decrease once tables are inserted at all relevant locations seems false: While the number of new pages created per month drops since 2007, the insertion-rate of new tables remains constant. This relative increase in tables per page shows that more and more data is stored in a structured fashion, raising the relevance of methods to extract knowledge from said tables."

As an aside, there is no mention of Wikidata in the paper (a sister project of Wikipedia launched in 2012 aimed at providing structured machine-readable data), nor of the more recent efforts to store tabular data on Wikimedia Commons for use on (e.g.) Wikipedia. While there are tools to generate Wikipedia tables automatically from the structured data available on Wikidata, they are not widely used yet.


"Histogram of the maximum table count per page" (from the paper, omitting pages without any tables)

A "histogram of the maximum number of tables that ever existed simultaneously on a Wikipedia article" demonstrates that

The vast majority of Wikipedia articles contain only a few tables [...]. On the other hand, most tables appear on pages together with other tables. Only 19.1% of all tables appear alone on a Wikipedia article."

These results appear to provide the empirical foundation for the party emoji in the conference poster (above).

The racecar emoji refers to various results on how often tables are changed. From the author's perspective of reusing information from tables outside of Wikipedia, they stress that "in a one-month-old snapshot, already 4.4% of tables are outdated."

"Table freshness over time" (violin plot from the paper)

A violin plot of table "freshness" (i.e. time since the table's last update) over table age (i.e. time since the table's creation) shows that

"The median rises until a certain point, after which it stays constant or slightly decreases again. However, the distribution is skewed towards the two ends of the spectrum: tables either are very frequently updated or are hardly ever changed."

The authors note that the distribution of the number of updates per table has "a large skew", with one outlier being "a table on social networking websites that was updated more than 10,000 times during its lifetime. At least 1,310 tables were each updated more than 1,000 times during their lifetimes."

The paper also examines schema changes of existing tables (e.g. the addition, removal or renaming of columns). It finds e.g. that "about half of all tables never change their schema", and that schemata can evolve into various specializations, such as in this example visualizing "genes" shared by around 500 football-related tables:

"Example of schemata evolving over time" (from the paper): "This particular plot shows a cluster of schemata that all contain information about league results of football teams. There are almost 500 tables for which at least one of the snapshots had one of the Schemata 2–7."

Lastly, the conference poster's "immortality" claim is quantified as follows:

"69.5% of all tables ever created have survived until the end-date of our dataset. If a table is deleted, then this usually happens at the beginning of its lifetime. [...] While the vast majority of tables is never deleted (57.2%) or deleted only once (29.9%), there is a larger skew in the distribution of deletes. One table that explains the Wiki syntax was deleted 620 times during its lifetime, mostly from vandalism."

See also our earlier coverage of related research: "Neural Relation Extraction on Wikipedia Tables for Augmenting Knowledge Graphs", "TableNet: An Approach for Determining Fine-grained Relations for Wikipedia Tables", "Methods for Exploring and Mining Tables on Wikipedia"

Papers further explore dynamic between Wikipedia and academia

The June 2021 issue of "She Ji: The Journal of Design, Economics, and Innovation" featured several articles examining Wikipedia with a focus on its relation to academia, including by longtime Wikipedians Piotr Konieczny (User:Piotrus) and Dariusz Jemielniak (User:Pundit).

Konieczny's first contribution, titled "From Adversaries to Allies? The Uneasy Relationship between Experts and the Wikipedia Community"[2], provides a historical overview and literature review, concluding that "Collaborating with Wikipedia is increasingly common in academia, though barriers remain" and that "Wikipedia’s anti-elitist culture and academia’s anti-amateur culture are still at odds." Konieczny commiserates with his "fellow experts" who try to contribute to Wikipedia, but holds up a mirror:

"Undeniably, we receive unfair treatment on Wikipedia. At the same time, the proverbial shoe may be on the other foot. Many experts view Wikipedia as a still-recent startup that should recognize how badly it needs experts and give them special privileges—but without acknowledging that Wikipedia’s model of knowledge creation requires everyone to earn those privileges on the site.

Furthermore, Konieczny reminds academics who complain about hostile Wikipedians about their own power structures:

"Are some Wikipedians impolite? Certainly. So are some journal reviewers. Was your Wikipedia edit removed or article deleted? How different is it from having a journal or conference submission rejected? Is the power of experienced Wikipedia volunteers or administrators superior to that of a newbie editor? The answer is yes—in the same way that a journal editor or grant reviewer has leverage over one’s submission."

In a short commentary,[3] Jemielniak agrees with Konieczny's analysis of these two polarized stances as "the underlying cultural problem", and calls for "institutional support [for Wikipedia] from beyond the Wikimedia Foundation or Wiki Education Foundation", e.g. by "counting [Wikipedia editing] towards tenure reviews at universities."

In another response, titled "Wikipedians among Us: From Allies to Reformers"[4], Kara Kennedy also largely agrees with Konieczny's observations, but "sheds light on some of [his] oversights, including the still-present issues of bias and gaps in content and quality due to a lack of diversity in editorship".

In a third response,[5] the journal's editor-in-chief Ken Friedman (User:Kenfriedman0) argues that Wikipedia "suffers from the internally-focused cultural patterns among Wikipedians that prevent the improvements needed for a high quality reference work". Among other observations, he focuses on the Wikimedia Foundation's statement (in its fundraising messages) that 98% of Wikipedia readers do not donate, claiming that "This admission contains a message that the Wikimedia Foundation doesn’t seem to understand. When only 2% of the audience for a widely used not-for-profit project is willing to support the project they use, this suggests that the project might not survive as a commercial venture."

In the concluding piece, Konieczny responds to the three comments, joining Jemielniak and Kennedy in making "The Case for Institutional Support: It’s High Time for Governments and University Administration to Actively Support Wikipedia". [6] He devotes some space to Friedman's recollections of his own negative experiences of trying to contribute to Wikipedia. Examining the on-wiki record, Konieczny notes that the only dispute appears to have been about "whether to insert several names on the list of Fluxus members—an art movement Friedman was involved in both as artist and later, scholar—or not," whereas Friedman's larger contributions all appear to have been accepted. Konieczny argues that "[t]his illustrates the classic notion of negativity bias: we are much more likely to remember the bad experiences than the good ones, even if the latter are more common".


Briefly

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"The Wikipedia Global Consciousness Index: A Measurement of the Awareness and Meaning of the World-as-a-Whole"

From the abstract:[7]

"To supplement current globalization indexes, I propose a new index, the Wikipedia Global Consciousness Index (WikiGCI). [...] The first research objective is to construct the new index as an empirical assessment of global consciousness by applying the top 100 global articles as the empirical units. Global articles are the Wikipedia articles edited in the most countries, identified by geolocating the IP address edits. Furthermore, I discursively analyze how these Wikipedia articles express global consciousness by statements of global wholeness in their narratives. [...] The second research objective is to discursively analyze regional patterns in Wikipedia’s global and local articles. I performed a mixed method, multilingual discursive analysis to examine how four globalizing discourses (references to the countries in the world’s economic core, the use of English in citations, references to international media institutions, and the monetization of commodities) can distinguish place representations between two groups of articles. [...] This discourse analysis reveals that the representation of the world is not strictly determined by the core. While the socio-economic power in the core creates the globalizing discourses, non-core editors engage with the discourses to depict the world based on the socio-historic conditions of their countries."


"Wikipedia in the anti-SOPA protests as a case study of direct, deliberative democracy in cyberspace"

From the abstract:[8]

"On 18th January 2012 in the ‘first Internet strike’ against the American ‘Stop Online Piracy Act' legislation, over two thousand Wikipedians took part in the vote concerning whether their site should undertake a protest action, with vast majority expressing support for this action. However, the vote participants formed only a tiny fraction of the total number of Wikipedians who number in millions. [...] This paper discusses the intricate dynamics between Wikipedia egalitarian ethos and the creed to discuss project matters deliberately on one hand and the conspicuous lack of promotion and advertisement stemming from a rule against ‘canvassing’ and an overall skepticism regarding the status of majority votes. While voters' passivity and lack of interest play a major role, as expected, another factor emerges as a significant factor responsible for the low levels of participation: an inefficient information distribution system, as the vast majority of Wikipedians were not aware of the ongoing discussions and the vote itself until after their conclusion.

See also our review of an earlier paper by the same author: "Wikipedia’s SOPA Strike considered as international political movement", and his own review of a 2012 paper: SOPA blackout decision analyzed"

References

  1. ^ Tobias Bleifuß, Leon Bornemann, Dmitri V. Kalashnikov, Felix Naumann, Divesh Srivastava: The Secret Life of Wikipedia Tables. Proceedings of the 2nd Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores, co-located with VLDB 2021 (August 16-20, 2021, Copenhagen, Denmark) d:Q108215401 (datasets)
  2. ^ Konieczny, Piotr (2021-06-01). "From Adversaries to Allies? The Uneasy Relationship between Experts and the Wikipedia Community". She Ji: The Journal of Design, Economics, and Innovation. 7 (2): 151–170. doi:10.1016/j.sheji.2020.12.003. ISSN 2405-8726.
  3. ^ Jemielniak, Dariusz (2021-06-01). "Collaborative Society Needs Institutional Support". She Ji: The Journal of Design, Economics, and Innovation. 7 (2): 171–172. doi:10.1016/j.sheji.2021.05.003. ISSN 2405-8726.
  4. ^ Kennedy, Kara (2021-06-01). "Wikipedians among Us: From Allies to Reformers". She Ji: The Journal of Design, Economics, and Innovation. 7 (2): 172–177. doi:10.1016/j.sheji.2021.05.004. ISSN 2405-8726.
  5. ^ Friedman, Ken (2021-06-01). "Wikipedia Is a Magnificent, Flawed Gem. Can It Be Polished?". She Ji: The Journal of Design, Economics, and Innovation. 7 (2): 177–187. doi:10.1016/j.sheji.2021.05.005. ISSN 2405-8726.
  6. ^ Konieczny, Piotr (2021-06-01). "The Case for Institutional Support: It's High Time for Governments and University Administration to Actively Support Wikipedia". She Ji: The Journal of Design, Economics, and Innovation. 7 (2): 187–196. doi:10.1016/j.sheji.2021.05.002. ISSN 2405-8726.
  7. ^ Stieve, Thomas (2021). The Wikipedia Global Consciousness Index: A Measurement of the Awareness and Meaning of the World-as-a-Whole (Ph.D.). University of Arizona.
  8. ^ Konieczny, Piotr (2016-03-16). "Wikipedia in the anti-SOPA protests as a case study of direct, deliberative democracy in cyberspace". Information, Communication & Society. 0 (0): 1–18. doi:10.1080/1369118X.2016.1157620. ISSN 1369-118X. closed access post-print (freely available)