Wikipedia:Wikipedia Signpost/2021-01-31/Recent research

Recent research

Students still have a better opinion of Wikipedia than teachers do


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.


Wikipedia as OER: the “Learning with Wikipedia” project

Reviewed by Matthew Sumpter

An article [1] from the Journal of e-Learning and Knowledge Society reports on the qualitative results of the "Learning with Wikipedia" project, which involved 1200 students and 30 faculty members from the University of Padua. The project was designed in response to the UNESCO policy agenda to promote Open Education (OE). The goal was to determine the effectiveness of learning subject-specific content with communal forms of assessment and to stimulate digital competencies involved in information literacy, while also creating open content and encouraging the spirit of OE.

The project was divided into planning and teaching phases. The planning phase required that instructors were trained on Wikipedia philosophies and rules, and activities were designed for each course, while also setting up test environments for students to familiarize themselves with. Several potential student activities were designed, including: translation of an English Wikipedia article to Italian, elaboration on course content, validation of Wikipedia content when compared to other sources, as well as new article creation regarding course content. Students attended a workshop that introduced them to the goals and expectations of the project before beginning, and then were continuously assisted throughout with a focus on developing skills related to OE. Specifically, these included skills in "finding, selecting and evaluating information, digital citizenship actions, applying guidelines for online etiquette, creating and developing digital content, becoming familiar with copyright issues and the use of Creative Commons licenses, [and] protecting personal data". In this way, the project served the joint tasks of teaching course content and digital OE skills.

The article analyzes the outcomes of this project using a questionnaire provided to participants to answer two research questions: 1) How do teachers and students perceive Wikipedia? and 2) What digital competences do teachers and students believe have been improved by creating a Wikipedia article in the project? The authors found that teachers and students have contrasting opinions regarding Wikipedia - teachers exhibited a poor opinion while students on average had a good opinion. The authors point to this finding as an important indication that teachers must be involved in the planning and implementation of educational Wikipedia projects, in order to prevent negative bias towards the medium. In regards to the digital competences, students expressed they had learned Wikipedia's rules, how to browse, search, create, and manipulate digital content. They also indicated that "evaluating data, information and digital content", an essential goal, was not sufficiently covered. This was again in contrast to the teachers' answers, of which 100% indicated that this task was well stimulated. The authors suspect this is because many bibliographic sources used for writing the articles were provided to the students by the instructors. This becomes one of the authors' main conclusions when considering future work - namely that students should be provided more freedom in finding primary sources. Additionally, they cite the need for carefully managed learning strategies to ensure the work is educational while following Wikipedia's rules, so that student articles do not risk deletion.


Briefly


Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Tilman Bayer


Wikipedia is "largely successful" at removing harmful content

From the abstract:[2]

"... we aim to assess the degree to which English-language Wikipedia is successful in addressing harmful speech with a particular focus on the removal of deleterious content. We have conducted qualitative interviews with Wikipedians and carried out a text analysis using machine learning classifiers trained to identify several variations of problematic speech. Overall, we conclude that Wikipedia is largely successful at identifying and quickly removing a vast majority of harmful content despite the large scale of the project. The evidence suggests that efforts to remove malicious content are faster and more effective on Wikipedia articles compared to removal efforts on article talk and user talk pages."

See also earlier coverage of the researchers' presentation at Wikimania 2019, and the research project's page on Meta-wiki


Wiki Loves Africa Logo

Recruiting previous participants of the "Wiki Loves Africa" photo contest

From the abstract of a preprint titled "Broadening African Self-Representation on Wikipedia: A Field Experiment":[3]

"Wiki Loves Africa (WLA) is a Wikipedia community project that hosts an annual photo competition focused on increasing African contributions to imagery that represents African people, places, and culture. In a field experiment with [5,905 previous] participants, we randomly assigned past contributors to receive a recruitment message and observed their contributions to the 2020 competition. On average, receiving a message caused a 1.2 percentage point increase in competition entrants (p=0.002). Among those who contributed to the contest in 2019, there was an increase of 2.7 percentage points compared to the control group (p=0.029)."


"WiTPy: A Toolkit to Parse and Analyse Wikipedia Talk Pages"

From the abstract:[4]

"In this article, we propose an opensource toolkit to extract, parse, and analyze the Wikipedia talk pages. [...] User-friendly and high-level analysis methods are created on the top of NoSQL database, which can be used to understand the collaboration dynamics on article talk pages."


"Modelling User Behavior Dynamics with Embeddings"

From the abstract:[5]

"... we present a user behavior model built using behavior embeddings to compare behaviors and their change over time. To this end, we first define the formal model and train the model using both action (e.g., copy/paste) embeddings and user interaction feature (e.g., length of the copied text) embeddings. Having obtained vector representations of user behaviors, we then define three measurements to model behavior dynamics over time, namely: behavior position, displacement, and velocity. To evaluate the proposed methodology, we use three real world datasets [ ... including] (iii) thousands of editors completing unstructured editing tasks on Wikidata. Through these datasets, we show that the proposed methodology can: (i) surface behavioral differences among users; (ii) recognize relative behavioral changes; and (iii) discover directional deviations of user behaviors."


"Deriving Geolocations in Wikipedia"

From the abstract:[6]

"We study the problem of deriving geolocations for Wikipedia pages. To this end, we introduce a general four-step process to location derivation, and consider different instantiations of this process, leveraging both textual and categorical data. [...] our system can be used to augment the geographic information of Wikipedia, and to enable more effective geographic information retrieval."


"Testing the validity of Wikipedia categories for subject matter labelling of open-domain corpus data"

From the abstract:[7]

"In this article, a hierarchical taxonomy of three-level depth is extracted from the Wikipedia category system. The resulting taxonomy is explored as a lightweight alternative to expert-created knowledge organisation systems (e.g. library classification systems) for the manual labelling of open-domain text corpora. Combining quantitative and qualitative data from a crowd-based text labelling study, the validity of the taxonomy is tested and the results quantified in terms of interrater agreement. While the usefulness of the Wikipedia category system for automatic document indexing is documented in the pertinent literature, our results suggest that at least the taxonomy we derived from it is not a valid instrument for manual subject matter labelling of open-domain text corpora."


"Wikipedia, The Free Online Medical Encyclopedia Anyone Can Plagiarize: Time to Address Wiki-Plagiarism"

From the abstract and paper:[8]

"... plagiarism of Wikipedia in peer-reviewed publications has received little attention. Here, I present five cases of PubMed-indexed articles containing Wiki-plagiarism, i.e. copying of Wikipedia content into medical publications without proper citation of the source. [...] ... I subsequently contacted the authors of the three other Wiki-plagiarizing papers, as well as the publishers and Editors of the journals involved, to ask for an explanation, correction or retraction. None of them replied, despite the fact that these journals are members of the Committee on Publication Ethics (COPE). Of note, the article on exome sequencing was edited by the same author as the 2010 published paper. ... "


"Hierarchical Trivia Fact Extraction from Wikipedia Articles"

From the abstract:[9]

"In this paper, we propose a new unsupervised algorithm that automatically mines trivia facts for a given entity. Unlike previous studies, the proposed algorithm targets at a single Wikipedia article and leverages its hierarchical structure via top-down processing. [...] Experimental results demonstrate that the proposed algorithm is over 100 times faster than the existing method which considers Wikipedia categories. Human evaluation demonstrates that the proposed algorithm can mine better trivia facts regardless of the target entity domain and outperforms the existing methods."


"Multilingual Contextual Affective Analysis of LGBT People Portrayals in Wikipedia"

From the abstract:[10]

"Prior work has examined descriptions of people in English using contextual affective analysis, a natural language processing (NLP) technique that seeks to analyze how people are portrayed along dimensions of power, agency, and sentiment. Our work presents an extension of this methodology to multilingual settings [...] We then demonstrate the usefulness of our method by analyzing Wikipedia biography pages of members of the LGBT community across three languages: English, Russian, and Spanish. Our results show systematic differences in how the LGBT community is portrayed across languages, surfacing cultural differences in narratives and signs of social biases. Practically, this model can be used to surface Wikipedia articles for further manual analysis---articles that might contain content gaps or an imbalanced representation of particular social groups."

(The underlying article dataset has also been used by another researcher to create this interactive visualization.)


References

  1. ^ Petrucco, Corrado; Ferranti, Cinzia (2020-12-21). "Wikipedia as OER: the "Learning with Wikipedia" project". Journal of e-Learning and Knowledge Society. 16 (4): 38–45. doi:10.20368/1971-8829/1135322. ISSN 1971-8829.
  2. ^ Clark, Justin; Faris, Robert; Gasser, Urs; Holland, Adam; Ross, Hilary; Tilton, Casey (2019-11-01). "Content and Conduct: How English Wikipedia Moderates Harmful Speech". SSRN 3489176.
  3. ^ J. Nathan Matias; Florence Devouard; Julia Kamin; Max Klein; Eric Pennington (2020-08-06). "Broadening African Self-Representation on Wikipedia: A Field Experiment". OSF.io. {{cite web}}: Unknown parameter |lay-url= ignored (help)
  4. ^ Verma, Amit Arjun; Iyengar, S.R.S; Gandhi, Nitin (2020-08-01). "WiTPy: A Toolkit to Parse and Analyse Wikipedia Talk Pages". Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020. JCDL '20. New York, NY, USA: Association for Computing Machinery. pp. 535–536. doi:10.1145/3383583.3398629. ISBN 9781450375856. Closed access icon
  5. ^ Han, Lei; Checco, Alessandro; Difallah, Djellel; Demartini, Gianluca; Sadiq, Shazia (2020-10-19). "Modelling User Behavior Dynamics with Embeddings". Proceedings of the 29th ACM International Conference on Information & Knowledge Management. CIKM '20. New York, NY, USA: Association for Computing Machinery. pp. 445–454. doi:10.1145/3340531.3411985. ISBN 9781450368599. Closed access icon
  6. ^ Krause, Amir; Cohen, Sara (2020-10-19). "Deriving Geolocations in Wikipedia". Proceedings of the 29th ACM International Conference on Information & Knowledge Management. CIKM '20. New York, NY, USA: Association for Computing Machinery. pp. 3293–3296. doi:10.1145/3340531.3417459. ISBN 9781450368599. Closed access icon
  7. ^ Aghaebrahimian, Ahmad; Stauder, Andy; Ustaszewski, Michael (2020-12-03). "Testing the validity of Wikipedia categories for subject matter labelling of open-domain corpus data". Journal of Information Science: 0165551520977438. doi:10.1177/0165551520977438. ISSN 0165-5515. S2CID 229423946.
  8. ^ Laurent, Michaël R. (2020-07-21). "Wikipedia, The Free Online Medical Encyclopedia Anyone Can Plagiarize: Time to Address Wiki-Plagiarism". Publishing Research Quarterly. 36 (3): 399–402. doi:10.1007/s12109-020-09750-0. ISSN 1936-4792. S2CID 225570855. Closed access icon
  9. ^ Kwon, Jingun; Kamigaito, Hidetaka; Song, Young-In; Okumura, Manabu (December 2020). "Hierarchical Trivia Fact Extraction from Wikipedia Articles". Proceedings of the 28th International Conference on Computational Linguistics. COLING 2020. Barcelona, Spain (Online): International Committee on Computational Linguistics. pp. 4825–4834.
  10. ^ Park, Chan Young; Yan, Xinru; Field, Anjalie; Tsvetkov, Yulia (2020-10-21). "Multilingual Contextual Affective Analysis of LGBT People Portrayals in Wikipedia". arXiv:2010.10820 [cs.CL].