Wikipedia:Wikipedia Signpost/2023-11-20/Recent research

Recent research

Canceling disputes as the real function of ArbCom

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"Canceling Disputes: How Social Capital Affects the Arbitration of Disputes on Wikipedia"

Reviewed by Bri

This provocative paper in Law and Social Inquiry^[1] by a socio-legal scholar shows, through research mostly based on interviews with Wikipedia insiders, that the Arbitration Committee functions to cancel disputes, not to arbitrate to a compromise position, nor to reach a negotiated settlement, nor to actively promote truthful content (which one might naïvely have inferred from the name of the Committee).

Some of the arguments used in the paper are both arresting and concerning. This reviewer found the interpretive language, and the often verbatim quotes of people involved in the arbitration process — often deeply involved, including at least one described as a member of the Committee — more compelling than the light data analysis included in the paper. The author interviewed 28 editors: current and former members of the Committee, those who have been involved parties, those who have commented on cases, and those "who have knowledge of the dispute resolution process due to their long-standing involvement with Wikipedia" (not further defined).

"Social Capital and the Arbitration Committee's Remedies" (figure 2 from the paper)

The data analysis consisted of a breakdown of sanction severities against edit count (as a proxy for social capital). It found a negative correlation between social capital and severity, by examining edit count against light severity outcomes (admonishment) and heavy severity (up to and including site bans); see figure 2 above. The author presented two potential interpretations: one, the conventional one, that more mature and upstanding editors with deep social capital were more likely to obey norms; the other, that those editors with the social capital were free to disobey norms without severe consequences because of the wiki's empowerment of bad behavior through various means. In essence, this would validate the idea of a "cabal", or that a "too essential to be lost" mentality endows a "wiki aristocracy" capable of creating either true consensus or promoting their "version of the truth", to quote the paper (p. 15). It was this non-data-driven approach that attempted to find which of the competing theories was correct.

The key idea in the paper is that social capital — largely built up and represented by an editor's edit count regardless of their ability to peacefully coexist with other editors — is the most important factor when it comes to arbitration. The committee's purpose is to quash disputes in order for editing to continue, not to reach a "just" outcome in some broader sense. One way the social capital is expressed and brought to bear is essentially in the opening phases of an arbitration case, called preliminary statements. If one reads between the lines of the paper, the outcome is frequently predetermined by these opening phases and all that the committee can do is go along with the crowd. In fact, it is explicitly stated — again based on evidence gathered from insiders — that cases are frequently orchestrated off-wiki precisely in order to stack the deck against the other side.

[A] Wikipedia insider told me how a disputant prepared her "faction" for months before bringing a case before the Arbitration Committee (which she ended up winning). These efforts are usually made covertly, as Wikipedia norms prohibit what is called "canvassing"...for instance ... on a secret mailing list ... A long-standing editor who was described as a member of Wikipedia's "aristocracy" told me: "we are a tight clique of very long-standing editors and none of our words find their way onto the site"...
— p. 12

"There's no cabal" (a classic community cartoon, first posted on the French Wikipedia in 2006)

Sadly for Wikipedians, the author concludes that it is the Machiavellian use of power that holds true on Wikipedia, or in other words, that there is a cabal. One passage that comes across as especially skeptical of this structure is found on p. 17: "an editor compared the Arbitration Committee to 'riot cops' ... [who] can be compared to the 'repressive peacemakers' ... guaranteeing the level of social peace that is necessary for the Wikipedia project to unfold, even to the detriment of fairness." Then the author appears to equate the arbitration process to a trial by ordeal, a feudal concept eschewed by the West in favor of due process based legal proceedings, further saying that

My empirical findings are consistent with the argument that, despite its rhetoric of inclusiveness ("anyone can edit"), Wikipedia is a "unwelcoming and exclusive environment" for newcomers, which tends to reinforce the "hegemony" of a consensus that is mostly shaped and controlled by white Western men.
— p. 19

Summing up on the next page:

[W]hat emerges from the evidence I have collected, and is perhaps more conclusive, is that experienced editors with dense networks are well positioned to avoid the consequences of their own breaches and to use their power to prevail in disputes against weaker parties.
— p. 20

In other words, a system that puts the powerful above the law.

15% of datasets for fine-tuning language models use Wikipedia

Reviewed by Tilman Bayer

A new preprint titled "The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI"^[2] presents results from "a multi-disciplinary effort between legal and machine learning experts to systematically audit and trace [...] the data lineage of 44 of the most widely used and adopted text data collections, spanning 1800+ finetuning datasets" that have been published on platforms such as Hugging Face or GitHub. The authors make their resulting annotated dataset of annotated datasets available online, searchable via a "Data Provenance Explorer".

The paper presents various quantitative results based on this dataset. wikipedia.org was found to be the most widely used source domain, occurring in 14.9% (p. 14) or 14.6% (Table 4, p. 13) of the 1800+ datasets. This result illustrates the value Wikipedia provides for AI (although it also means, conversely, that over 85% of those datasets made no use of Wikipedia).

The paper highlights the following example of such a dataset that used Wikipedia:

Surpervised Dataset Example: SQuAD
Rajpurkar et al. (2016) present a prototypical supervised dataset on reading comprehension. To create the dataset, the authors take paragraph-long excerpts from 539 popular Wikipedia articles and hire crowd-source workers to generate over 100,000 questions whose answers are contained in the excerpt. For example:
Wikipedia Excerpt In meteorology, precipitation is any product of the condensation of atmospheric water vapor that falls under gravity.
Worker-generated question: What causes precipitation to fall? Answer: Gravity
Here the authors use Wikipedia text as a basis for their data and their dataset contains 100,000 new question-answer pairs based on these texts.

The bulk of the paper is of less interest to Wikimedians specifically, focusing instead on general questions about the sourcing information about these datasets ("we are in the midst of a crisis in dataset provenance") and their licenses (observing e.g. "sharp divides in composition and focus of commercially open vs. closed datasets, with closed datasets monopolizing important categories: lower resource languages, more creative tasks, richer topic variety, newer and more synthetic training data"). An extensive "Legal Discussion" section acknowledges that the paper leaves out "several important related questions on the use of copyrighted works to create supervised datasets and on the copyrightability of training datasets." In particular, it does not examine whether the Wikipedia-based datasets satisfy the requirements of Wikipedia's CC BY-SA license. Regarding the use of CC-licensed datasets in AI in general, the authors note: "One of the challenges is that licenses like the Apache and the Creative Commons outline restrictions related to 'derivative' or 'adapted works' but it remains unclear if a trained model should be classified as a derivative work." They also remind readers that "In the U.S., the fair use exception may allow models to be trained on protected works," although "the application of fair use in the context is still evolving and several of these issues are currently being litigated".

(The datasets examined in the paper are to be distinguished from the much larger unlabeled text corpuses used for the initial unsupervised training of large language models (LLMs). There, Wikipedia is also known to have been used, alongside other sources such as Common Crawl, e.g. for the GPT-3 family that formed the basis of ChatGPT.)

Wikipedia biggest "loser" in recent Google Search update

A blog post^[3] by Search Engine Optimization firm Amsive (recommended as "extensive (and fascinating) research" in a recent The Verge feature about the SEO industry) analyzes the impact of an August 2023 "core update" by Google Search. The post explains that

Google [...] announced a new signal in its December updates to the Search Quality Rater guidelines: “E” for experience. The “E” is a new member of the E-A-T family, now called E-E-A-T, and stands for experience, expertise, authoritativeness, and trustworthiness. According to Google, the amount of E-E-A-T required for a page or site to be considered high-quality depends on the nature of the content and the extent to which it can cause harm to users. [...] Search Quality Raters have been working off this new version of the Quality Guidelines to review the quality of Google’s results and evaluate E-E-A-T for 9 months now, giving Google plenty of time to update its algorithms with the feedback provided by quality raters."

The analysis of Google's August update focuses on "the list of the top 1,000 winners and losers in both absolute and percentage terms, using Sistrix Visibility Index scores using the Google.com U.S. index." (Sistrix' - generally not freely available - index is calculated based on search results for one million keywords, weighted by search volume and estimated click probability, and aggregated by domain.)

wikipedia.org tops the "Absolute Losers" list for Google's August 2023 update, with a larger score decrease than youtube.com (#2) and amazon.com (#3). Still, in relative terms, Wikipedia's score decline of -6.75% doesn't even make the "Percent Losers" list of the 250 sites with the biggest percentage declines. And in better news for Wikimedians, wiktionary.org ranked #3 on "Absolute Winners" list (right before britannica.com at #4). wikivoyage.org also gained, reaching #38 on the same list (with an index increase that is 37.38% in relative terms). What's more, Amsive's similar analysis of Google's preceding March 2023 core update, which had been "highly anticipated given the significant changes affecting organic search" in the preceding months, of which the EEAT announcement was just one, wikipedia.org had conversely topped the "Absolute Winners" list, with a 10.16% relative increase. Then again, back then wiktionary.org topped the March 2023 update's "Absolute Losers" list ahead of urbandictionary.com (#2) and thefreedictionary.com (#3), although both had a larger relative decrease than Wiktionary's -22.66%. Wiktionary was found to have declined by -51.70% in this update. This may indicate that such changes are merely palimpsestuous snapshots of the long timeline of Google Search. (And indeed Google has since conducted two further "core updates" for October and November 2023, which Amsive does not appear to have analyzed yet.) Still, these results illustrate that Wikipedia's prominence in search engine results is by no means ubiquitous and static.

Briefly

See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.
Until December 15, the Wikimedia Foundation is inviting applications for its Research Fund grants of up to $50k, "particularly encourag[ing] research studies on medium to small size languages and communities, as well as in low resourced languages and projects." See also our coverage of previous rounds.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Ca and Tilman Bayer

"Evaluation of Accuracy and Adequacy of Kimchi Information in Major Foreign Online Encyclopedias"

From the abstract:^[4]

In this study, we analyzed the content and quality of kimchi information in major foreign online encyclopedias, such as Baidu Baike, Encyclopædia Britannica, Citizendium, and Wikipedia. Our results revealed that the kimchi information provided by these encyclopedias was often inaccurate or inadequate, despite kimchi being a fundamental part of Korean cuisine. The most common inaccuracies were related to the definition and origins of kimchi and its ingredients and preparation methods.

"Speech Wikimedia: A 77 Language Multilingual Speech Dataset"

Abstract:^[5]

"The Speech Wikimedia Dataset is a publicly available compilation of audio with transcriptions extracted from Wikimedia Commons. It includes 1780 hours (195 GB) of CC-BY-SA licensed transcribed speech from a diverse set of scenarios and speakers, in 77 different languages. Each audio file has one or more transcriptions in different languages, making this dataset suitable for training speech recognition, speech translation, and machine translation models."

"WikiTableT: A Large-Scale Data-to-Text Dataset for Generating Wikipedia Article Sections"

From the "Conclusion" section:^[6]

"We created WIKITABLET, a dataset that contains Wikipedia article sections and their corresponding tabular data and various metadata. WIKITABLET contains millions of instances covering a broad range of topics and kinds of generation tasks. Our manual evaluation showed that humans are unable to differentiate the [original Wikipedia text] and model generations [by transformer models that the authors trained specifically for this task]. However, qualitative analysis showed that our models sometimes struggle with coherence and factuality, suggesting several directions for future work."

The authors of this 2021 paper note that they "did not experiment with pretrained models [such as the GPT series] because they typically use the entirety of Wikipedia, which would presumably overlap with our test set."

"Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective"

From the abstract:^[7]

"Recent advances in machine learning [this sentence appears to have been written in 2020] have made it possible to train NLG [natural language generation] systems that seek to achieve human-level performance in text writing and summarisation. In this paper, we propose such a system in the context of Wikipedia and evaluate it with Wikipedia readers and editors. Our solution builds upon the ArticlePlaceholder, a tool used in 14 under-resourced Wikipedia language versions, which displays structured data from the Wikidata knowledge base on empty Wikipedia pages. We train a neural network to generate an introductory sentence from the Wikidata triples shown by the ArticlePlaceholder, and explore how Wikipedia users engage with it. The evaluation, which includes an automatic, a judgement-based, and a task-based component, shows that the summary sentences score well in terms of perceived fluency and appropriateness for Wikipedia, and can help editors bootstrap new articles."

The paper, published in 2022, does not yet mention the related Abstract Wikipedia project.

"XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation in Low Resource Languages"

From the abstract:^[8]

"Lack of encyclopedic text contributors, especially on Wikipedia, makes automated text generation for low resource (LR) languages a critical problem. Existing work on Wikipedia text generation has focused on English only where English reference articles are summarized to generate English Wikipedia pages. But, for low-resource languages, the scarcity of reference articles makes monolingual summarization ineffective in solving this problem. Hence, in this work, we propose XWikiGen, which is the task of cross-lingual multi-document summarization of text from multiple reference articles, written in various languages, to generate Wikipedia-style text. Accordingly, we contribute a benchmark dataset, XWikiRef, spanning ~69K Wikipedia articles covering five domains and eight languages. We harness this dataset to train a two-stage system where the input is a set of citations and a section title and the output is a section-specific LR summary."

The paper's "Related work" section provides a useful literature overview, noting e.g. that

"Automated generation of Wikipedia text has been a problem of interest for the past 5–6 years. Initial efforts in the fact-to-text (F2T) line of work focused on generating short text, typically the first sentence of Wikipedia pages using structured fact tuples. [...] Seq-2-seq neural methods [including various LSTM architectures and efforts based on pretrained transformers] have been popularly used for F2T. [...]
Besides generating short Wikipedia text, there have also been efforts to generate Wikipedia articles by summarizing long sequences. [...] For all of these datasets, the generated text is either the full Wikipedia article or text for a specific section.

The authors note that most of these efforts have been English-only.

See also our 2018(!) coverage of various fact-to-text efforts, going back to 2016: "Readers prefer summaries written by a neural network over those by Wikipedians 40% of the time — but it still suffers from hallucinations"

References

^ Grisel, Florian (2023-05-04). "Canceling Disputes: How Social Capital Affects the Arbitration of Disputes on Wikipedia". Law & Social Inquiry: 1–22. doi:10.1017/lsi.2023.15. ISSN 0897-6546. S2CID 258521021.
^ Longpre, Shayne; Mahari, Robert; Chen, Anthony; Obeng-Marnu, Naana; Sileo, Damien; Brannon, William; Muennighoff, Niklas; Khazam, Nathan; Kabbara, Jad; Perisetla, Kartik; Wu, Xinyi; Shippole, Enrico; Bollacker, Kurt; Wu, Tongshuang; Villa, Luis; Pentland, Sandy; Roy, Deb; Hooker, Sara (2023-11-04). "The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI". arXiv:2310.16787 [cs.CL].
^ Ray, Lily (2023-09-12). "Google August 2023 Core Update: Winners, Losers & Analysis". Amsive blog.
^ Park, Sung Hoon; Lee, Chang Hyeon (2023). "Evaluation of Accuracy and Adequacy of Kimchi Information in Major Foreign Online Encyclopedias". Journal of the Korean Society of Food Culture. 38 (4): 203–216. doi:10.7318/KJFC/2023.38.4.203. ISSN 1225-7060. (in Korean, with English abstract)
^ Gómez, Rafael Mosquera; Eusse, Julián; Ciro, Juan; Galvez, Daniel; Hileman, Ryan; Bollacker, Kurt; Kanter, David (2023-08-29). "Speech Wikimedia: A 77 Language Multilingual Speech Dataset". arXiv:2308.15710 [cs.AI].
^ Chen, Mingda; Wiseman, Sam; Gimpel, Kevin (August 2021). "WikiTableT: A Large-Scale Data-to-Text Dataset for Generating Wikipedia Article Sections". Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Findings 2021. Online: Association for Computational Linguistics. pp. 193–209. doi:10.18653/v1/2021.findings-acl.17. code, data and models
^ Kaffee, Lucie-Aimée; Vougiouklis, Pavlos; Simperl, Elena (2022-01-01). "Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective". Semantic Web. 13 (2): 163–194. doi:10.3233/SW-210431. ISSN 1570-0844.
^ Taunk, Dhaval; Sagare, Shivprasad; Patil, Anupam; Subramanian, Shivansh; Gupta, Manish; Varma, Vasudeva (2023-04-30). "XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation in Low Resource Languages". Proceedings of the ACM Web Conference 2023. WWW '23. New York, NY, USA: Association for Computing Machinery. pp. 1703–1713. arXiv:2303.12308. doi:10.1145/3543507.3583405. ISBN 9781450394161. , code and dataset

← Previous "Recent research"

Next "Recent research" →

In this issue

Recent research

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

Machine learning research

Some interesting machine learning research utilising Wikipedia as dataset there!.--Vulcan❯❯❯Sphere! 10:12, 20 November 2023 (UTC)[reply]

"Canceling Disputes: How Social Capital Affects the Arbitration of Disputes on Wikipedia"

I find it puzzling that Grisel's ArbCom paper cites my old not particularly relevant research from 2010 but omits much more relevant 2017 paper ([1]). I do not want to toot my own's horn, but I think my 2017 paper was the first and until now only serious piece dedicated to researching ArbCom. It's great to see a follow up - but it's weird how that follow up totally ignores the work that went before it. It does not inspire a ton of confidence in that paper, I fear. Well, either that, or my 2017 piece is just so bad it is not worth citing :> --_{Piotr Konieczny aka Prokonsul Piotrus| reply here} 12:14, 20 November 2023 (UTC)[reply]
I read a preprint of the Cancelling Disputes paper (and haven't read the final so if there were large changes this comment won't reflect them) and found it reasonably true. ArbCom has an obligation to solve disputes the community can't and figuring out how to do that would include considering the social aspects of the community. But I would also note that its data set ends in 2020 and I think FRAM marked a major turning point in the committee and since then social capital seems to have meant less for better and worse (and in a more private setting I'd give examples of both). Barkeep49 (talk) 18:33, 20 November 2023 (UTC)[reply]
- @Bri, Barkeep49, and Piotrus: I too have noticed some changes in ArbCom since about 2020, or a bit before. In general, I think the changes are positive, but the jury is still out. But first I have to say that I consider the situation described by the author is absolutely horrendous. The ideal is - correct me if I'm wrong - "equality for all under the law (rules)" and "just follow what the policies and guidelines say, nothing else is needed". What the author describes seems to be the exact opposite of that. I do have some sympathy for the idea that there's "obligation to solve disputes the community can't and figuring out how to do that would include considering the social aspects of the community." But what I see described is a free-for-all for finding favor (ffafff). What I'm starting to worry about is that ffafff is inherently the only thing that works. Much like the US legal system, where nobody anymore really believes that a poor black person will get the same result in a criminal trial as a rich white person (all else equal). We've got to try to be better than that. Smallbones_(smalltalk) 19:34, 20 November 2023 (UTC)[reply]
  I should have made clear in my initial comments, that I think the "problem solving" focus of arbcom carries with it benefits and drawbacks and we should decide how much we're willing to live with those drawbacks or if we want to trade them for other ones. Best, Barkeep49 (talk) 20:01, 20 November 2023 (UTC)[reply]
  - I understand that ArbCom does not have an easy job and that there are some tradeoffs involved. I hope that if we diverge from our stated policies (and ideals) then this would be discussed in an RfC. But please do consider what would happen if we had to honestly tell new- and non-Wikipedians that they don't have to worry about all our detailed rules, because ... it's all just window dressing, or (maybe) it's all decided by who is in favor this month. Smallbones_(smalltalk) 21:17, 20 November 2023 (UTC)[reply]
  @Smallbones One of my to-do projects is to survey what parties of ArbCom think. It should shed some light. In either case, we all know that comparisons are not idea. The quivalent of "poor blacks in US", i.e. new editors or IPs, don't even make it to ArbCom, they can blocked early on, and the odds are their appeals will be simply ignored with no recourse. But that's not an ArbCom failing, just an admin-level or community-level one. That said, this drives the point of "editors are not equal" even further. Prior contributions and position in the community, which are related, matter. There is some relation to real world mitigating factor or aggravation (law) (NOTHERE, for example). It's complicated. Frankly, I don't think Arbitrators or AE admins are doing a great job, but at the same time, I think they are doing the best job possible (while I have my views on how to reform the system, who knows if this would actually help...). The Worst Form of Government adage comes to mind (and I guess another topic goes on my to do list for article creation :P...). _{Piotr Konieczny aka Prokonsul Piotrus| reply here} 23:55, 20 November 2023 (UTC)[reply]
- I've always had the impression that arbitration on Wikipedia has nothing to do with fairness. It's just about reaching some stable state where people waste a bit less of their own and everyone else's time quarrelling and return to hopefully more productive activities like fixing typos. The articles will always be wrong anyway, it doesn't matter who actually held the WP:TRUTH in their pockets. An ArbCom reaches stability by beating people into submission, while the alternative is that people fight each other until exhaustion. Nemo 20:04, 20 November 2023 (UTC)[reply]
  - I hope nobody agrees with you that ArbCom is about "beating people into submission," or even that this might be better than the alternative of letting "people fight each other until exhaustion." Smallbones_(smalltalk) 21:17, 20 November 2023 (UTC)[reply]
    - What Smallbones said. It's too bad that one of our major institutions has an aura of combat/violence attached to it, even if jokingly. Really, when I think about institutions deliberately built around peace and nonviolence, I can only think of one: the Teahouse. I think, when writing this article, it dawned on me that I've gradually accepted a general background that is more oriented around the former and less the latter, and both resent that that was imposed on me, and am upset that I didn't see my own involvement in it sooner. I used to be quite active at WP:COIN which is probably one of our most combative fora, if not the most. But back to Arbcom, there are a lot of models for conflict resolution, and I wonder now if they would have been a better starting point than something aligned with something explicitly set up to denote a winner and a loser: the U.S. legal system. ☆ Bri (talk) 23:28, 20 November 2023 (UTC)[reply]
    Unfortunately, I do agree with Nemo that ArbCom is about "beating people into submission" until they don't care anymore or are banned. I've never been there myself, but I've read through some cases (I've not kept track of any, so no examples), and that is the exact impression I got. No ones bothers de-escalating, or actually listening to each "party". Ciridae (talk) 08:08, 21 November 2023 (UTC)[reply]
    You may want to consider the source here... The Blade of the Northern Lights (話して下さい) 17:39, 22 November 2023 (UTC)[reply]
    What do you mean? Ciridae (talk) 10:12, 24 November 2023 (UTC)[reply]
    There's a fairly long history of people who hang around meta having serious personality clashes with en.wiki editors, what touched it off in this case was a long time ago but is clearly still lingering. The user rights logs at meta should point you in the right direction. The Blade of the Northern Lights (話して下さい) 22:28, 26 November 2023 (UTC)[reply]
    I assume you're talking about the incident which involved some en.wiki admin. Honestly I had forgotten it had to do with ArbCom, I won't check now. I think en.wiki has its own peculiarities and has developed its own ways to handle them, which are imperfect but may be the best we can come up with. It's ok as long as such practices aren't forcefully or mindlessly exported where they make less sense. Nemo 07:35, 28 November 2023 (UTC)[reply]
    
    If anything, that’s an understatement. Volunteer Marek 01:45, 23 November 2023 (UTC)[reply]
    
    Back in the pre-ArbCom Stone Age of Wikipedia, the preferred approach was to use de-escalation & moderation; very few, if any, were outright banned from editing back then. Unfortunately, this was abused by (if I may call them) troublemakers. (I can think of 2 or 3 notorious troublemakers from that period who could only be handled with a ban.) Jimmy Wales tried to step in to deal with these troublemakers, but "Jimbo doesn't scale" led to the creation of the ArbCom & the Mediation Committee. The latter never proved useful for one reason or another; an attempted revival was the "Mediation Cabal" of a few years ago, which also failed. I believe what happens for those cases where some form of de-escalation is possible, they are handled long before they come to the ArbCom -- just my opinion, not based on any cases -- & the ArbCom is left with the cases where there is grounds to use the banhammer. Firmly. (Some people are simply not here to create an encyclopedia, & need to be encouraged to find another hobby.) -- llywrch (talk) 17:05, 27 November 2023 (UTC)[reply]
    That timeline seems a bit off. WP:MEDCOM was 2003–2015ish (officially closed in 2018 after being largely inactive for a while), while WP:MEDCAB was active from 2005–2012 rather than following MedCom. Anomie ⚔ 23:13, 27 November 2023 (UTC)[reply]
    I'm relying on my memory here, as a former member of MedCom. I don't remember MedCom being active after 2005; if it was, it was in a very minimal way, I regret to say. -- llywrch (talk) 23:51, 1 December 2023 (UTC)[reply]
Although "social capital" is linked to our article Social capital, I suspect that this research's use of the term corresponds instead to our Political capital. —2d37 (talk) 02:38, 1 December 2023 (UTC)[reply]

Wugapodes' comments on the Grisel paper

Like Piotrus and Barkeep49 I also read a preprint of the Grisel paper and provided a substantial review. I skimmed through the published version, and to the author's credit a number of issues were resolved with only a month lag between my comments and publication. However I still find the article lacking, and would say that the omission of Piotrus' work isn't the only gap in the bibliography. I went back to my notes on reading the preprint and referenced them with the present version. Below are some of the issues which I think remain.

I'm confused by the appeal to social capital, and I think the article would be improved by one of two interventions. The authors should either eliminate that discussion (in favor of an analysis of discourse and power a la Foucault which I think they do a good job of elsewhere in the paper) or engage more deeply with their analysis of social capital. On the assumption they do the latter, the authors would benefit from a deeper engagement with Bourdieu's social theory. The authors do not adequately define "social capital", and they seem to use it as a theory of everything even where Bourdieu's Forms of Capital points to better analytical descriptions for the phenomenon they describe. For example:

"the 'administrator rights removed' remedy is applied to parties who by definition possess high levels of social capital." Administrator rights are obviously a form of institutionalized cultural capital; as social capital is used to mobilize resources for particular actions, we actually would expect to see administrators with low social capital be more susceptible to desysopping. That this was not seen points to a methodological flaw, namely...
"I focus here on specific edit counts called 'Wikipedia' and 'Wikipedia Talk,' which provide an estimate for the number of edits made by a given editor on topics associated with the Wikipedia project itself, specifically its norms, policies, and governance" what is being operationalized is better understood in terms of Bourdieu's habitus and field w.r.t. embodied cultural capital. WP and WT namespaces constitute a particular field in which cultural capital is contested, and number of edits in those namespaces more clearly map onto how effectively an actor has embodied the intellectual dispositions of those within that particular field. Viewed from this lens, we can see that editors who make a lot of project-space edits are more likely to align with the intellectual dispositions of those in power eventually developing a habitus which embodies the cultural capital required for advanced participation in the field. Those in power, recognizing the embodied cultural capital, move to institutionalize that cultural capital through sysop rights. Social capital may mediate how quickly these things occur and how effectively one can mobilize support to ward of challenges, but it is an orthogonal process which is why the operationalization produces the strange result in the previous bullet point.
"Indeed, this remedy exclusively targets 'administrators' who, due to their very functions, have high 'Wikipedia' and 'Wikipedia Talk' counts." This is an empirical claim that, from the above perspective of fields, needs more motivation. Project governance is only one field in which administrators work, and while there are many administrators who participate in the field defined by 'Wikipedia (Talk)', there are fields such as anti-vandalism and technical infrastructure that would actually privilege very different namespace distributions. For example, anti-vandalism would privilege mainspace (reverts) and user talk (warnings) while technical infrastructure would privilege Template and MediaWiki namespaces.
"According to the same arbitrator, one 'accumulates social capital by being a good contributive editor.'[...]One editor who was banned by the Arbitration Committee put it bluntly: 'Social capital is the basic currency for getting things done on Wikipedia.' When using these words, this editor signals the fact that a 'positive' contribution on Wikipedia crucially depends on one’s ability to gather troops in support of a version of the truth[...]." This section points to how the authors seem to be playing fast-and-loose with "social capital". The authors already point out that Bourdieu and Coleman have different definitions of social capital---which one we're using in the article isn't clear to me---and the use of the term by interviewees is accepted uncritically without establishing that they are using a term of art rather than a colloquialism. Take the first quote for example. Under both Bourdieu's and Coleman's definitions of social capital, that description of how social capital accumulates doesn't make much sense. Social capital is the network of relations one establishes and calls upon in order to achieve particular goals; the critical aspect being a network of social relationships. We have many editors who make positive contributions but who do not engage in the more social aspects of the website---the accumulation of social capital is definitionally through socializing. The quintessential counterexample being IP editors who make many helpful edits but because of their transience leads to lacking a "durable network of [...] mutual acquaintance and recognition" they do not accumulate social capital despite their edits. What the quoted editor means when they say "social capital" is different from "social capital" as a term of art, and the author does not disentangle this linguistic confusion. Even if we were to accept the first quote uncritically, the second quote (and the authors' interpretation) set up a paradox: if you get social capital by making good edits, but you need social capital to make good edits, how do any edits get made at all? There are various ways out of this paradox, but they require we assume that these editors are talking about different things despite both using the words "social capital". The paradox arises here because the authors seem to have forgotten that interviews are data and can't be treated as a coherent theory out-of-the-box.
The authors' selection of papers from the linguistic anthropology of verbal disputes is puzzling. It's not clear what the discussion of Berber or Eskimo song duels have to do with social capital. [this seems to have been clarified but the following point remains] The authors also seem to take a very narrow view of that wider literature. The selection seems to be in support of a somewhat rosy view of verbal duels, when the wider literature points to this genre of verbal duel being quite dangerous. In Irvine 1993: "Insult and responsibility: verbal abuse in a Wolof village", the ritualized insults achieve their effect of social cohesion through marginalization of those who transgress norms. The capital of the performers is mobilized to punish those who transgress social norms, and the fear of being insulted in these practices leads to compliance with social norms. Irvine points out that her participants recount one particular member who was insulted so frequently and so harshly that he died by suicide, and participants in the ritual blamed him for (1) breaking the social norms and (2) being unable to take the insults. Further, a major purpose of these performances is not simply to adjudicate disputes but to serve as a model and warning for the audience. The participants in these verbal duels are, to some degree, unimportant beyond how their treatment sets an example for the rest of the community. All that said, the review on pages 28-9 seems to treat these rituals as more benign than we know them to be cross-culturally, and focuses too closely on the individuals, neglecting why and how these performances operate in the wider context of a social structure. I'd strongly recommend browsing the 2010 special issue of J. Linguistic Anthropology which has a collection of articles dealing with cross-cultural methods of dispute and the contemporary literature around "verbal duels"; the introduction to the special issue---Pagliai 2010: "Introduction: Performing Disputes"---may be particularly helpful. Also consider Irvine 1974: "Strategies of Status Manipulation in Wolof Greetings" which may help connect this literature to theories of capital.
The paper engages with the role of bans and other sanctions but does not engage with the wider literature on the effects of bans on Wikipedia. For example, the authors discuss how the Committee might modify its behavior when restricting individuals, but does not engage with Ciampaglia 2011: "A bounded confidence approach to understanding user participation in peer production systems" or Rudas and Török 2018: "Modeling the Wikipedia to Understand the Dynamics of Long Disputes and Biased Articles" which develop agent-based computational models of Wikipedia to investigate and predict how individual interventions affect the wider system of production. Rudas and Török (2018) specifically investigate how particular methods of banning affect content bias. Engaging with this literature (especially since agent-based computational modeling is well suited to a Bourieuian analysis) might provide empirical support for claims about the consequences of particular actions or inactions related to the role of capital at the Arbitration Committee.

— Wug·a·po·des 01:03, 21 November 2023 (UTC)[reply]

@Wugapodes A very solid review & critique, if I say so myself. It's a shame public reviews/discussions and like are not clearly linked to the paper in most models of publishing. Oh well. _{Piotr Konieczny aka Prokonsul Piotrus| reply here} 01:59, 21 November 2023 (UTC)[reply]

I'm afraid that the pernicious element of habitus in making class-based hierarchies of privilege perennial has been underestimated in the defanged version of Bourdieu's notion:

we can see that editors who make a lot of project-space edits are more likely to align with the intellectual dispositions of those in power eventually developing a habitus which embodies the cultural capital required for advanced participation in the field.

translation into everyday English: contributors see that if they want to get anything done in the politics channel (namespaces #4 and #5) they have to be sure not to be seen as posing a threat to the long-standing "pecking" order. Those who seek power, one cannot help but observe, are quick to pose behind the quirky linguistic conventions of, for example, calling copyists "editors", or of calling challenges to their way of seeing "unhelpful", "disruptive", or even (in some cases) "batshit insane" (though such utterances were only permitted the reportedly-now-neutered god kings of yore).

In short, if noticeboards seem to have always been "thataway", perhaps it is because class-habitus reproduces "clueful" speech even unto Dr. Tarr, in the war-room, with a feather. ♫ -- SashiRolls ^{🌿 · 🍥} 13:32, 21 November 2023 (UTC)[reply]

I assume you are comparing Wikipedia's system of governance to The System of Doctor Tarr and Professor Fether. Smallbones_(smalltalk) 01:40, 22 November 2023 (UTC)[reply]

I'm impressed that you picked up the allusion, but be careful not to conflate creative misuse of names with comparison (which would be far more pointful or playless than what I intend). Had there been a cool song like "The System of Mr. Cirt and WifiOner" I probably would have referenced that instead. For an easy to understand example of Bourdieusian class-habitus, Google offers up the Dictionary of Sports Studies (§) as its first link. As for the Wobblies, not only does Jah have a fun new CD out, but there's also a fun related piece about Wikipedia and the problem of hysterical memory in this month's Harper's (§). Be careful not to approach it as if it were prose. -- SashiRolls ^{🌿 · 🍥} 03:47, 22 November 2023 (UTC)[reply]

I don't know just how much lasting value Ben Lerner's essay about Wikipedia has. My impression having (admittedly hastily) read it is that if one sucks up to the right people, get admitted to the right colleges & other prestigious institutions, then one can spend ones day sharing ones "profound" thoughts, instead of spending the majority of one's time as Yet Another technology drudge to pay the bills & having only the outlet of editing Wikipedia in ones scattered spare time to express any intellectual ability. In other words, the function of "social capital" (as seems to be the designated label for this immeasurable parameter) functions in the academic/intellectual world as it does on Wikipedia. Nonetheless, almost any Wikipedian could write an essay like Lerner's, & probably with more insight. (COI admission: many years before Wikipedia started I submitted some poetry to Harpers for publication, & those poems were rejected. Yes, they were bad poems, but some animus has likely remained.) -- llywrch (talk) 08:14, 23 November 2023 (UTC)[reply]

Fun find...I've uploaded Lerner's Harper page to my Apple Books where it is now recursive. — Neonorange (talk to Phil) (he, they) 17:56, 25 November 2023 (UTC) —[reply]

convenient break

@SashiRolls, Bri, Wugapodes, Piotrus, and JPxG: - Happy Thanksgiving to all. I doubt that I'll be here tomorrow, so I better get my ideas out now.

A. I don't really thing of the Grisel paper as an academic paper, at least not like anything I've read before - I guess I haven't read many sociology or law or history papers. The lack of any meaningful stats has a lot to do with this. Using participants quotes to judge what's happening falls into "journalism" in my experience. So I think we need to concentrate on the quotes if anybody wants to take this further.

B. Sashi (immediately above) brings up the Harper's [(§) short story] - which is yet another way to view the world. You may want to download the available PDF which I found much easier to read. In any case it should be required reading for admins, arbs, etc. Like Sashi suggests it is much more poetic and symbolic than the usual prose short story, but strip this off and it becomes an extraordinary document that's 95% about Wikipedia (and its place in the world). And it's by an extraordinary author Ben Lerner, a genuine certified MacArthur grant "genius" with half-a-dozen award-winning books, a Distinguished Professorship at Brooklyn CUNY, poetry editor at Harper's, so I think we need to take it seriously, but not literally. Stripped to the basics, it's just an ordinary story of a master with hundreds of socks, a few meatpuppets, two adminships, with their 1st employer being a liberal thinktank and the second being a billionaire. Normal stuff like that. All I can say is that you should read it. JPxG - you should definitely read the very end about the end of Wikipedia and Chatbots, and I don't think the Signpost has ever done a short-story review before, but would you mind such a submission for the next issue? Smallbones_(smalltalk) 19:06, 22 November 2023 (UTC)[reply]

@Smallbones Playing devil's advocate - that's qualitative research for you. _{Piotr Konieczny aka Prokonsul Piotrus| reply here} 08:05, 23 November 2023 (UTC)[reply]

@Piotrus: Yes, of course. I've even heard of it before! I was just saying that this version of it is nothing like anything I've seen before. The closest form to this that I'm comfortable dealing with is journalism. BTW something else in this month's Recent research that caught my eye was "Kimchi information" a term which I totally misinterpreted! Happy Thanksgiving. Smallbones_(smalltalk) 16:21, 23 November 2023 (UTC)[reply]

I don't think it is particualrly insightful to state that the purpose of the committee is to end diputes. That's not a secret. WP:ARBGUIDE is quite clear about this and is often quoted in decisions: "Arbitration aims to "break the back" of the dispute. It has never been actual arbitration, but cultural inertia, or ennui, or whatever, makes it near-impossible to rename an institution on this website, so "Dispute-Ending Committee" doesn't really have a chance. I'm pretty sure we're also the only WMF website that still uses the term "oversight" instead of "supression". I get that supression has some really negative connotations, but the word oversight does not describe what the team does at all. Just Step Sideways ^{from this world ..... today} 22:32, 29 November 2023 (UTC)[reply]

Wikipedia biggest "loser" in recent Google Search update

If Google Search changes were so huge for Wiktionary, shouldn't the impact on pageviews be rather obvious? I sure can't see any obvious pattern in the pageview (unique devices) statistics for the English Wiktionary. Nemo 19:59, 20 November 2023 (UTC)[reply]

Get the latest headlines on your user page — just add {{Signpost-subscription}}.

Home

About