Wikipedia:Wikipedia Signpost/Single/2012-05-28

The Signpost
Single-page Edition
WP:POST/1
28 May 2012

 

2012-05-28

Wikimedia Foundation endorses open-access petition to the White House; pending changes RfC ends

Obama petitioned on open access

Access2Research founders Heather Joseph, John Wilbanks, Michael W. Carroll, and Mike Rossner after a meeting at the White House Office of Science and Technology Policy
On May 25, the Wikimedia Foundation moved to endorse a petition to the White House calling for public access to journal articles resulting from research funded by US public sources. The campaign has already commanded close to 20,000 signatures.

The petition was initiated by the group Access2Research, whose members include the executive director of the Scholarly Publishing and Academic Resources Coalition (SPARC), Heather Joseph, law professor Michael W. Carroll, and Dr John Wilbanks of the Consent to Research project, a major medical data-sharing endeavour. In backing the petition, the WMF has joined a wide range of educational and research institutions and communities, like the Association of Research Libraries, Creative Commons, Harvard's Open Access Project, and digital communities such as Academia.edu.

Kat Walsh is a prominent Wikimedian who has signed the petition. She is a co-author of the foundation's endorsement announcement, along with senior research analyst Dario Taraborelli and general counsel Geoff Brigham. Kat told the Signpost that "we spend public money on research because it's important to everyone—why isn't it beyond question that the public should have access to it?" The WMF announcement points out that Wikipedia as well as the other projects hosted by the foundation are heavily dependent on verifiable, reliable sources, and that its volunteers should be "empowered to read it, report on it, and cite it."

Access2Research petition signatures, 29 May. The green line marks the petition threshold after which the administration looks at the petition.
The key case study deployed by Access2Research in the petition is the Public Access Policy of the US National Institutes of Health, one of the world's major funding agencies. Heather Joseph told the Signpost that the current White House has had open access on its radar from its first month in office and has engaged with issues that research and open-access communities care about (see WMF response). She is confident that the administration will take action in response to a successful petition, either by means of executive action or by a positive response to legislative proposals by Congress.

Joseph pointed out that the petition shows not only major public support, which is likely to lead to improvements in open-access policy and, critically, will exert a positive influence on consideration of the proposed Federal Research Public Access Act (FRPAA), previously put to Congress in 2006, 2010, and again this year. The FRPAA would require that 11 US federal science agencies deposit articles on research they have funded into publicly accessible archives; the articles must be maintained and preserved by that agency or another repository that permits public access. Articles must be made available "gratis" to users within six months. The legislation commands bipartisan support in both houses of Congress, and would complement executive actions with a legislative framework that could not be easily rolled back by a later administration.

Beyond the US, the open-access debate is moving in a similar direction. Prominent mathematicians are calling for a boycott of Dutch publisher Elsevier, the biggest player on the medical and scientific literature market, and Jimmy Wales has been appointed to advise the UK government on open access (Signpost coverage). Heather pointed out that the Foundation's endorsement is important not just because the Foundation is a major player, but because elected representatives remember the Wikipedia community's action in response to the proposed SOPA (Signpost coverage) and the public attention carried by Jimmy's status as public figure.

To have an impact, the petition needs at least 25,000 signatures by June 19. Anyone who is at least 13 years old, US citizen or not, can sign it.

Pending changes RfC finally over

RfC outcomes: oppose (blue); support (orange); accept tool but reject draft policy (yellow)

After a protracted 60 days, the request for comment on the pending changes feature (Signpost coverage) ended May 22 without a clear preliminary result. While the final administrative evaluation of the process is still under consideration, the sheer numbers indicate higher participation than in the last RfC on the issue in 2011.

This time, there were three options: to oppose (1) or support (2) the feature as such, with another option (3) to accept the tool but reject the current draft policy on the other. According to figures published by a community member, the numbers are as follows:

While 3% of the RfC participants (17 users) supported option 3, the opposing camp managed to rally 35% (178), and the support camp rallied 61% (308). The support option also received the highest relative level of support among reviewer user rights holders within its own voting block (63%, 193 users) but the lowest numbers among editors with fewer than 1000 edits (15%). All options received a high level of comments and justifications.

Last year the third phase of the pending changes trial ended with a closure that delivered just over 66% support for the proposal (127 ayes to 65 nays), as well as concluding that no consensus to keep the feature had been established. Additionally, two caveats, each related to a set of BLP-related issues and articles, received some support.

This year's RfC aimed to follow up at the 2011 results and to reassess the tool that was temporarily taken out of service in response to those results. However, in terms of participation both proceedings are significantly below the level of activity generated by the vote on the German Wikipedia, the largest project using the more restrictive flagged revisions, back in 2008. The German community, which regularly decides project governance issues by vote-only procedures rather than deliberative RfCs, voted in favor of their version of the tool by 53.7% (638 of 1189 votes) and has abided by this decision to this day.

A co-ordinating administrator of the English Wikipedia RfC, Fluffernutter, stated that no fixed target date for the administrative closure of the RfC at hand is set.

In brief

Editor satisfaction by project (Q1b, base: 5,911). Note that the y-axis does not begin at zero.
  • Terms of use update implemented: The new terms of use (Signpost coverage) have been implemented on schedule, May 25.
  • Editor satisfaction (WESI): The fourth release of findings of the editor survey last year (Signpost coverage) in the foundation's blog highlighted that the majority of responding editors were satisfied with the environment provided by Wikipedia. However, it appears that there are differences between the language versions, with German- and Japanese-speakers at the lower end of the field.
  • Cairo education pilot reaches endgame: The WMF's Cairo education program pilot to pioneer Arabic language higher education outreach comes to an end in June. The foundation's blog states that the initiative was a "huge success" so far. The Signpost will do a special report on the pilot in July, once all results have been published.
  • FDC update: Discussions on how to design the Funds Dissemination Committee (Signpost coverage) reached a new stage on May 25 as a process update was published on Meta. It states among other things that the FDC draft outlines that of the nine voting FDC members the community will have the right to vote on five and the WMF's Board of trustees to appoint four members.
  • German Wikipedian defies criminal charge on controversial content: A prominent German Wikipedian, Achim Raschka, reports in the Signpost's German counterpart Kurier that he faced criminal charges under the German law that covers individual liability in relation to the use of pornographic content, for his use of a Commons video in the history section (Geschichte) of the pornography article on the German Wikipedia. The video, a historical document shot in c. 1925, features nuns and monks engaging in sexual acts. The prosecutor decided not to proceed on the grounds that the infraction was minor, but did not address the wider issues raised by jurisdictional inconsistency on the internet.
  • Czech Ambassador Program results: The educational cooperation projects between universities and Wikipedia in the Czech Republic have produced more than 100 articles over the last year. A member of the Czech Wikimedia chapter reports in the WMF's blog that the program focuses to a large extent on hard science and that the program aims at 150 to 200 articles this term.
  • GLAM conference in Barcelona: An international delegation of Wikipedians took part in the MuseumNext 2012 conference on social media in museums in Barcelona. The GLAM newsletter points out that they could rely on a theme-specific lounge organized by Amical Viquipèdia to represent the movement's activities.
  • New administrators: The Signpost welcomes Jenks24 as a new administrator.

    Reader comments

2012-05-28

Supporting interlanguage collaboration; detecting reverts; Wikipedia's discourse, semantic and leadership networks, and Google's Knowledge Graph

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, edited jointly with the Wikimedia Research Committee and republished as the Wikimedia Research Newsletter.

Discourse on Wikipedia sometimes irrational and manipulative, but still emancipating, democratic and productive

An article[1] in sociology journal The Information Society looks at interactions between Wikipedia editors and the project's governance, visible in the articles on stem cells and transhumanism, and in the analysis of Wikipedia's discussion of userboxes, all through the prism of Jürgen Habermas' universal pragmatics and Mikhail Bakhtin's dialogism theories.

The authors focus on the qualitative analysis of language used by editors, to argue that Wikipedia has elements of a democracy, and is an example of a Web 2.0–empowering discourse tool. They stress that some forms of discourse found online (including on Wikipedia) may be highly irrational, something that some previous arguments that Web 2.0 is a democratic space have often ignored, but they argue that this is in fact not as much of a hindrance as previously expected. Cimini and Burr remark that discourse can develop between Wikipedians of widely differing points of view, and that some editors will engage in "repeated, strategic, and often highly manipulative attempts" to assert personal authority. Such discussions may be very lively, involving "personal, emotional, or humour-based arguments", yet the authors argue that such comments may not be a hindrance; instead, "on many occasions, there is thus a clearer exposition of views that is achieved, in spite of, or perhaps because of, these personal [and] sometimes vulgar methods of argumentation."

In the end, the authors are positive about the success of Wikipedia's deliberation in reaching consensus, although they say that it can be "fleeting and transitory" on occasion. Unfortunately, the paper does not touch on Wikipedia policies such as Wikipedia:Civility and Wikipedia:No personal attacks, which would certainly have added to their analysis.

Despite the paper's claim to have received approval for research through a university research ethics committee, the paper does critically discuss the postings of specifically named editors ("[Editor A's] claim to authority and ad hominem attacks were met with derision by [Editor B]" (names replaced by the Signpost); this may raise eyebrows. Not all editors are 100% anonymous, which raises the question of whether the researchers did enough to protect the identity and reputation of the editors it cites. At the very least, why weren't the editors' usernames changed in the quotes? Their direct identification adds nothing to the article, and may expose the users to attack. (Similar questions have been discussed in the past by members of the Wikimedia Foundation Research Committee.)

Different language Wikipedias: automatic detection of inconsistencies

In a paper presented at the 4th International Conference on Intercultural Collaboration (ICIC),[2] Kulkarni et al. offer a simple approach to support the work of Wikipedia editors who maintain articles concerning the same topic in multiple language versions. The long-term goal is to implement a bot that supports these specialized users by highlighting missing attributes and content inconsistencies.

The analysis was focused on a pairwise comparison of infoboxes in different languages. First, the attribute-value pairs were extracted from the infoboxes and translated into English via Google translate. The identification of matching attribute names was achieved through direct text comparison with a set of synonyms obtained from WordNet (this step was included to handle mismatches caused by translation errors and variations). In a second step (the matching of attribute-values) the authors again used direct text comparative methods, and checked whether the values could be identified as homophones, to exclude mismatches caused by spelling mistakes in the text.

The evaluation data-set of these analyses and the whole pipeline included articles from English, German, Chinese and Hindi Wikipedias concerning two restricted domains: Indian cities and US-based companies. The evaluation revealed "a significant increase in recall after the concepts of homophones and synonyms were applied in addition to the direct text comparison." But the overall result was very weak, mainly due to translation errors. The authors noticed syntactic and semantic differences between the infoboxes, such as paraphrasing or different fact representations. "Also, abbreviations, unit conversion and geographic location matching [was not handled by their system]." The researchers plan to improve the system by addressing all of these issues in turn.

Finding deeper meanings from the words used in Wikipedia articles

An undergraduate computer science honors thesis at Trinity University (Texas) constructs a semantic graph from 451 articles, linked to from the World War II article.[3] Ryan Tanner's goal is to produce a visualization "which allows one to quickly find and examine connections between the people, places and things described in Wikipedia". The process is as follows:

  1. Import SQL dump from the Wikimedia Foundation into a local database
  2. Strip wiki markup from the articles using Bliki
  3. Parse articles with the Stanford NLP, using dependency grammars to extract facts and simplify sentences
  4. Parse the output from the Stanford library using Scala
    1. Read a Stanford XML file into a collection of models.
    2. Produce abstractions for named entities and locations.
    3. Input models into the algorithm developed for this thesis (see Chapter 7)
  5. Store results in a database.
  6. Traverse the resulting graph and produce user-presentable output.

Originally the goal was to visualize the whole of Wikipedia; however, due to problems with the dump, only 250,000 articles out of about 1.5 million were imported. An even smaller subset was ultimately usable, since the Stanford NLP library crashed on many of the remaining articles due to markup issues and the need for manual cleanup. To ensure a dense graph, tests were focused on the network of the World War II article. Some brief examples of the resulting graph are given in Chapter 10, which notes false positives as one problem requiring further investigation. The author makes suggestions for future research, such as using the Simple English Wikipedia or more complex relations.

How leaders emerge in the Wikipedia community

A paper titled "Leading the Collective: Social Capital and the Development of Leaders in Core-Periphery Organizations"[4] looks at how leaders emerge in Wikipedia and similar crowd-based organizations. While often seen as egalitarian and with little hierarchy, such projects always have a group of leaders who have emerged from the community (the "crowd"), involved in planning, mediation, and policy development. The authors treat Wikipedia and similar organization as a core–periphery network model developed by Steve Borgatti—a system with a deeply interconnected center and a poorly connected periphery. In Wikipedia, the leaders ("core") comprise the most active contributors, and the authors assume they produce the most social capital. Using social network analysis, the paper looks at the interpersonal ties between the editors, focusing on the ties between leaders and periphery. The hypothesis is that specific types of ties will have a greater influence on advancement to leadership.

The authors collected data from RfA pages, and the ties were measured through user-talk-page interactions. Leaders were defined as admins, and periphery editors as non-administrators; this operationalization may raise some doubts about the validity, since some very active and prominent members of the community are not admins, something the authors do not address. The authors find that the most important ties are the early ones to the periphery, and later, ties to the leaders. Overall strong ties are not as important as weak ties, although Simmelian ties (between pairs of leader groups) are among the most important.

Collier and Kraut conclude that leaders in projects such as Wikipedia do not suddenly appear; instead, they evolve over time through their immersion in the project's social network. Early in their experience, those leaders gain a deeper understanding of the community, developing a network of contacts through their weak ties to the periphery; later, their most important ties are to the leaders, particularly in the form of strong connection to a leader group.

Identifying software needs from Wikipedia translation discussions

A paper[5] presented at an international conference on intercultural collaboration aims "to identify the type of community interaction needed for successfully creating or amending an article via Wikipedia translation activities", and proposes new software tools to facilitate these interactions. To this end, the researchers from Kyoto University analyzed 1694 talk-page comments from three Wikipedias, belonging to articles in categories marking (partial or complete) translations (e.g. fr:Catégorie:Projet:Traduction/Articles_liés): 228 articles from the Finnish, 93 from the French, and 94 from the Japanese Wikipedia. They attempted to categorize (code) each comment according to which "activity" it referred to (either editing the article or translating it), about which "context" it was referring to (using the categories "content", "layout", "sources", "naming", "significance" and "wording"), and which action was intended (requesting or providing help, requesting an edit, announcing an edit that the user had made, criticizing the article without a direct request for action, coordinating actions between users, or referring to an established Wikipedia policy).

Regarding comments focused on the activity of editing, the "results were consistent with previous research, with a high frequency of discussion contributions about content and layout". The authors found that "the Japanese Wikipedia was the only one with more discussion contributions about layout than content when the discussion was about editing activities (40.18%)" and speculate that this is because "in the older, or larger, Wikipedias, practices and policies are likely to be better established than in the younger, or smaller, Wikipedias leading to a lower frequency of discussions about layout." (However, they later point out that the Finnish Wikipedia, rather than the Japanese, is the smallest and youngest among the three examined ones, noting that it shows a much higher frequency of discussion about policy—15.0%, versus 6.0% on the French and 3.3% on the Japanese Wikipedia.) In this class of comments, "discussions about citing sources were relatively common in the Finnish and French Wikipedias (18.8% and 12.4%, respectively). In the Japanese Wikipedia, sources were less common with 7.1% of all discussion contributions regarding editing activities."

Most discussions about translation activities were about naming—that is, "resolving the proper form for the title of the article, section or sub-section, names or proper nouns, and transliteration in the corresponding article", contrasting the researchers' initial hypothesis that such discussion would "have a high frequency of contributions regarding translation of specific words and expressions" (their "naming" category "does not include phrasing or resolving proper translation of individual words or expressions"). As one reason, they identify "the diversity in naming practices of events between different language sources, such as mass media. Especially in the Finnish Wikipedia, discussion about sources was common (16.15%). These two topics are loosely related, as direct translations of the names of well-known events are often not acceptable in the target language Wikipedia."

Having identified naming issues and the search for suitable sources in the target language as "key problems" emerging in the translation discussions, the authors conclude that "the current approaches for supporting Wikipedia translation are not necessarily solving the main problems in Wikipedia translation" and proceed to suggest two "directions for designing supporting tools for Wikipedia translation, especially through open source development of MediaWiki extensions":

  • "Support for consistent translation of names and proper nouns", e.g. by making a "user editable multilingual dictionary resource" directly accessible in the design, and enabling editors to "coordinate through discussion pages directly related to a specific dictionary or dictionary entry in order to resolve inconsistencies in a centralized repository"
  • "Support for citing sources in translated articles", by offering an automatic search for sources that have themselves been translated into the target language and/or the development of a supporting "crowdsourcing translation tool for open content sources not available in the target language using machine translation"

The paper makes references to previous work on Wikipedia translation (including the authors' own), but does not mention the EU-supported CoSyne project, which aims to integrate tools with MediaWiki that "automate the dynamic multilingual synchronization process of Wikis" and would seem to have a lot of overlap with the kind of tools discussed in the paper.

New algorithm provides better revert detection

A paper[6] by three researchers affiliated with the EU-supported RENDER project (to be presented at next month's "Hypertext 2012" conference) promises "accurate revert detection in Wikipedia". The article starts by describing the detection of reverts as "a foundational step for many (more elaborated) research ideas, [whose] purposeful handling leads to a superior understanding of wiki-like systems of collaboration in general", giving an overview over such research. (Revert detection has also been used in tools for the use of the editing community, such as this one that identify articles on the German Wikipedia that are currently controversial.)

Overviewing the "state-of-the-art in revert detection", the authors criticize the prevalent "identity revert detection method" (SIRD) which relies on finding identical revisions using MD5 hashes, arguing that it does not fully match the definition of a revert in the (English) Wikipedia's policies at Wikipedia:Reverting: The SIRD method "does not require the reverting edit to actually undo the actions of an edit identified as reverted ... [Furthermore, it] is not possible to indicate if the reverting edit fully, partly or not at all undid the actions of the reverted edit ... It also does not require the intention of the reverting edit to revert any other edit." (Still, mainly due to requests by researchers, MD5 hashes have been integrated directly into the revision table stored by MediaWiki recently, necessitating considerable technical efforts when updating the existing databases for Wikimedia projects.)

The paper then presents the authors' new method for revert detection, which still aims to detect full reverts and to avoid false positives, while coming closer to the Wikipedia community's definition. It is implemented as an algorithm based on splitting the revisions' wikitext into word tokens (and made available online as a Python script). Also, MD5 hashes are still used on a paragraph level to be able to detect unchanged paragraphs easily and speed up computation. The algorithm was then evaluated by a panel of Wikipedians recruited on the English Wikipedia in comparison with the existing SIRD method.

As summarized by the authors, this user study found the new method to be "more accurate in identifying full reverts as understood by Wikipedia editors. More importantly, our method detects significantly fewer false positives than the SIRD method [27% in the sample, which however was somewhat small]". As a drawback, the authors note "the increased computational cost. As [the new algorithm] is quadratic over the number of words in the DIFFs [the changed text between subsequent revisions], in its current implementation it might not be the tool of choice if larger amounts of articles are to be analyzed; especially in the case of complete history dumps of the large Wikipedias, e.g., English, German or Spanish."

Briefly

The history of art mapped using Wikipedia (visualization of wikilinks between "art-historical actors" spanning at most 75 years, from Goldfarb et al.)
  • The history of art mapped using Wikipedia: A paper by four researchers from Vienna, to be presented at next month's WebSci 2012 conference [7] examines the wikilinks between 18,002 Wikipedia articles about artists (or more precisely "art-historical actors", derived from the English Wikipedia via DBpedia), from present times back to ancient Greece. A first result appears to confirm the assumption that artists are more likely to be influenced by or related to their contemporaries: "the number of short links covering 0–37.5 years clearly outnumbers the sum of all the other .... This can be interpreted as such that contemporaries are much more likely to be interlinked than persons who are generations apart". They present a visualization of the link graph colored by nationality of the person, which "reveals interesting patterns of cultural interaction within the network, as they are perceived by the English speaking Wikipedia community: The left side ... is dominated by Italians (green). This cluster spans Renaissance and Baroque times, fading out by the end of the 17th century. A small cluster on the lower left represents German Renaissance around Albrecht Duerer (black) ... The rightmost part represents Post-Modernist Americans, with a nationality-independent cluster of Architects beneath."
  • The use of references in Wikipedia coverage of current events: On the blog of Ushahidi[8], Wikimedia researcher Heather Ford described preliminary "key findings" from an ongoing project examining the use of sources in Wikipedians work on current events such as the 2011 Egyptian Revolution: "1. The source <original version of the article and its author> of the page can play a significant role 2. Primary sources are gradually replaced by secondary sources," 3. The cite is not always the same as the source ("the citation that editors use to back up a particular phrase are not always the same as the source from which they receive their information"), 4. The blurring of boundaries along traditional “reliable sources” lines. Her "design recommendations include the design of source management systems around the kind of collaboration that is already working on Wikipedia: where editors collaborate around specific news stories, checking to see whether the source actually reflects the information in the article, whether the source is accurately contextualized, whether other media verify the facts in the article and whether there is any accompanying multimedia."
  • Distribution of article title lengths: A statistical analysis of the length of the more than 40 million article titles on all Wikipedias (including redirects) found that 90% are shorter than 32 characters and 98% are shorter than 53 characters. The blog post[9] by Denny Vrandečić, head of the Wikidata development team—who generated those stats to inform some design decisions for this project—provides charts of the length distribution for each language, exhibiting some interesting differences and similarities (e.g. the distributions for the English, German, French, Polish and Russian Wikipedias, as well as the overall one, peaks around 13 characters).
  • To understand a Wikipedia article, which others does one need to read first?: A paper titled "Crowdsourced Comprehension: Predicting Prerequisite Structure in Wikipedia[10] starts from the assumption that "the primary reason that technical documents are difficult to understand is lack of modularity: unlike a self-contained document written for a general reader, technical documents require certain background knowledge to comprehend—while that background knowledge may also be available in other on-line documents, determining the proper sequence of documents that a particular reader should study is difficult". Trying to develop a method to solve this problem in the example of five Wikipedia articles (global warming, meiosis, Newton's laws of motion, parallel postulate and public-key cryptography), the researchers analyzed the structure of wikilinks, whether pages had been edited by the same users, and the page text itself, and had Mechanical Turk workers decide in advance for many pairs of linked articles (within a subject domain) whether one was a prerequisite to understand the other. They conclude that "while it is not immediately obvious that this task is feasible, our experiments suggest that relatively reliable features to predict prerequisite structure exist, and can be successfully combined using standard machine learning methods".
  • High-conflict areas may deter uninvolved users: A student thesis from Macalester College, titled "Characterizing Conflict in Wikipedia" [11] examines editing disputes between Wikipedians that concern several articles, pointing out that much of the previous research has only looked at such conflicts one article at a time. The analysis involved clustering 1.4 million articles. Among the conclusions is that "The vast majority of conflicts are very small, but there are still thousands of conflicts involving at least one hundred users. Conflicts between small numbers of users, or with small numbers of reverts, tend to span only one article, whereas larger conflicts tend to span more than one article." Also, within a conflict cluster, "contributions from users uninvolved in conflicts are even lower than those involved in conflicts. This indicates that users may be deterred from contributing to areas with high concentrations of conflict."
  • The vandalism revert and other temporal motifs, and their change from 2001 to 2011: A paper presented at ICWSM '12[12] looks—like several other recent papers—at the bipartite graph of editors and the articles they have edited, but enriches it "with temporal information of both who edited the article [discerning bots, IP editors, and admins], and how the article was changed [. This] enables discovering meaningful editing behavior in the form of network motifs. These temporal motifs are repeated subgraphs of the editing graph which correspond to significant patterns of collaborative interactions." (The concept of network motifs is popular in bioinformatics, where it is applied to gene regulatory networks. See also the review of an earlier paper applying a simpler kind of motif to analyze the editors-articles graph: "Collaboration pattern analysis: Editor experience more important than 'many eyes'".) Motifs involving just a single author were the most frequent. As an example of the patterns that become visible by including temporal information, among the multi-author motifs those involving a revert "occur much faster, with 6,558 of all 13,961 such motifs having a median time under 5 minutes ... The strong correspondence between reverting an edit and combating vandalism suggests that such short durations are due to active participation by Wikipedia community members, such as the Counter Vandalism Unit, which actively monitors recent revisions for potential vandalism". The authors then look at how the frequency of their motifs has changed over the history of Wikipedia from 2001 to 2011, and find that "the trends suggest that the early growth was fueled by content addition from single authors or collaborating between two authors (B) and contributions from administrators. These early behaviors have given way to increases in behaviors associated with editing (A) and maintaining quality or vandalism detection (D)."
  • The Wikipedia research behind Google's new Knowledge Graph?: On May 16, Google introduced its Knowledge Graph, a semantic network drawing information from many different sources including Wikipedia, which Google uses to enhance its search engine results with semantic information—often appearing to include excerpts from the infobox and lead section of a particular Wikipedia article on the top right corner of the results page. Two days later, Google Research announced a paper by two Google employees titled "A Cross-Lingual Dictionary for English Wikipedia Concepts"[13] describing the construction of "a resource for automatically associating strings of text [such as search terms] with English Wikipedia concepts", considering "each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL". The resulting dataset is available for download and described as having been "designed for recall [rather than precision]. It is large and noisy, incorporating 297,073,139 distinct string-concept pairs, aggregated over 3,152,091,432 individual links".

References

  1. ^ Cimini, N., & Burr, J. (2012). An Aesthetic for Deliberating Online: Thinking Through “Universal Pragmatics” and “Dialogism” with Reference to Wikipedia. The Information Society, 28(3), 151–160. Routledge. doi:10.1080/01972243.2012.669448 Closed access icon
  2. ^ Gurunath Kulkarni, R., Trivedi, G., Suresh, T., Wen, M., Zheng, Z., & Rose, C. (2012). Supporting collaboration in Wikipedia between language communities. Proceedings of the 4th international conference on Intercultural Collaboration – ICIC ’12 (p. 47). New York, New York, USA: ACM Press. doi:10.1145/2160881.2160890 Closed access icon
  3. ^ Tanner, R. (2012). Creating a Semantic Graph from Wikipedia. Computer Science Honors Theses. Paper 29. http://digitalcommons.trinity.edu/compsci_honors/29/ Open access icon
  4. ^ Collier, B., & Kraut, R. (2012). Leading the Collective: Social Capital and the Development of Leaders in Core–Periphery Organizations. Physics and Society. http://arxiv.org/abs/1204.3682 Open access icon
  5. ^ Gurunath Kulkarni, R., Trivedi, G., Suresh, T., Wen, M., Zheng, Z., & Rose, C. (2012). Supporting collaboration in Wikipedia between language communities. Proceedings of the 4th international conference on Intercultural Collaboration – ICIC ’12 (p. 47). New York, New York, USA: ACM Press. doi:10.1145/2160881.2160890 Closed access icon
  6. ^ Fabian Flöck, Denny Vrandecic and Elena Simperl. Reverts Revisited – Accurate Revert Detection in Wikipedia. HT’12, June 25–28, 2012, Milwaukee, Wisconsin, USA. Open access icon
  7. ^ Doron Goldfarb, Max Arends, Josef Froschauer, Dieter Merkl. Art History on Wikipedia, a Macroscopic Observation (PDF) WebSci 2012, June 22–24, 2012, Evanston, Illinois, USA. Open access icon
  8. ^ Ford, Heather: Update on the Wikipedia sources project. Ushahidi.com, May 17, 2012 Open access icon
  9. ^ Vrandečić, D. (2012). Distribution of title lengths in Wikipedias. simia.net, 10 May 2012 Open access icon
  10. ^ Talukdar, P. P., & Cohen, W. W. (2012). Crowdsourced Comprehension: Predicting Prerequisite Structure in Wikipedia. 7th Workshop on Innovative Use of NLP for Building Educational Applications, NAACL 2012. PDF Open access icon
  11. ^ Miller, N. (2012). Characterizing Conflict in Wikipedia. Honors Projects. Paper 25. http://digitalcommons.macalester.edu/mathcs_honors/25 Open access icon
  12. ^ Jurgens, D., & Lu, T.-ching. (2012). Temporal Motifs Reveal the Dynamics of Editor Interactions in Wikipedia. ICWSM '12 PDF Open access icon
  13. ^ Valentin I. Spitkovsky, Angel X. Chang. "A Cross-Lingual Dictionary for English Wikipedia Concepts". Eighth International Conference on Language Resources and Evaluation (LREC 2012) Open access icon


Reader comments

2012-05-28

Experts and enthusiasts at WikiProject Geology

WikiProject news
News in brief
Submit your project's news and announcements for next week's WikiProject Report at the Signpost's WikiProject Desk.
Baltic amber with an ant inclusion
Opal from Yowah, Queensland, Australia
Hoodoos in Bryce Canyon, Utah, United States
A geologist's hammer sitting on an outcrop of Ordovician oil shale in northern Estonia
The Black Stone is an Islamic relic that serves as a cornerstone of the Kaaba
The Mancos Shale in the western United States
A concrete water tank partly crushed by lava in Vestmannaeyjar, Iceland
A hot spring near Mount Meager in British Columbia, Canada
Hyperthermophiles produce some of the bright colors of Grand Prismatic Spring in Yellowstone National Park
Castle Geyser in Yellowstone National Park

This week we spent some time with WikiProject Geology, a project dedicated to the study of solid Earth, the rocks of which it is composed, and the processes by which it evolves. The project dates back to April 2007 and has grown to include 33 Featured Articles, 1 Featured List, 29 Featured Pictures, 2 A-class Articles, and 45 Good Articles. Project members work on a variety of open tasks, participate in peer reviews, maintain a list of resources, and showcase their achievements. We interviewed RockMagnetist, Graeme Bartlett, Mav, MONGO, Chris.urs-o, Bejnar, Mikenorton and Awickert.

What motivated you to join WikiProject Geology? Do you have an educational or professional background in geology?

RockMagnetist: I am a geophysics professor. As my user name implies, my field is Rock magnetism. One day, I Googled the subject and found that the top hit was this Wikipedia article. Pretty sad! Then I discovered that when I searched any subject in geophysics, one of two things was likely to happen – either the top hit was a Wikipedia page, or there was no Wikipedia page on the subject. So I got a Wikipedia account and started editing. The most relevant wikiprojects for geophysics are Geology and Physics, so I joined both, boldly stating that I intended to create a Paleomagnetism WikiProject. I immediately got a nice welcome message from Mikenorton pointing out that there wasn't even a Geophysics WikiProject yet, and very few geophysicists were actively editing. Indeed, I'm pretty sure there isn't another professional geophysicist actively editing.
Graeme Bartlett: I am an interested person with no professional experience. I only studied geology at year 12. However my enthusiasm was encouraged by two geologists I knew as a teenager. I signed up to Wikiprojects after writing some content related to them.
Mav: Historical geology fascinates me, especially the astounding immensity of deep time and the fact that over 99% of all species that ever lived on Earth are extinct and can only be known through a tragically-incomplete fossil record. It has therefore been quipped, that at a first-order approximation, all life on Earth has been dead for millions of years. Learning about geology has been a life-long obsession that has involved a huge investment of time and money on personal and school field studies, book purchasing and reading. Yet, I didn't want to become a professional geologist for some reason I can't quite put my finger on (ah, I remember, it was all the calculus and quantitative analysis classes that were required). Instead, I pursued a degree in biology with a minor in geology and taught myself about the geology of places I love to visit. When I found Wikipedia in January 2002 I finally had an outlet to share some of my knowledge and at the same time improve my grasp of the subjects I care so much about; for there is no better way to learn something than to consult multiple sources and write about it in your own words.
MONGO: I founded WikiProject Glaciers in 2006, a sister project to WikiProject Geology. I have a minor in Geology but my main interest has always been geography since I find it easier to write about geolocations such as a specific mountain than the more complex task of writing about geology.
Chris.urs-o: I'm an engineer interested on Yellowstone hotspot, San Andreas Fault, New Madrid Seismic Zone, etc. Geology is a broad subject (scope). I edited at large volume volcanic eruptions in the Basin and Range Province, timeline of the development of tectonophysics and I'm updating list of minerals (complete), for instance.
Bejnar: I have a minor in Geology and it has been a life-long interest. When I started editing on Wikipedia, there was a natural gravitation to work articles where I knew something. Unfortunately, since so much on Wikipedia needs attention, I frequently find myself scattered.
Mikenorton: I'm a professional geologist, I got involved when the major oil company I was working for wanted to expand parts of their own internal wiki. I started editing wikipedia geology articles for 'practice', but got 'sucked in' to becoming a regular contributor.
Awickert: I'm a graduate student in geology; a few years ago, I decided that I should help with some Wikipedia articles, and that's what I did!

The project is home to 34 pieces of Featured content and 47 Good and A-class articles. Have you contributed to any of these? What are the biggest challenges to improving geology-related articles to FA or GA status?

RockMagnetist: Not yet. I've been mostly building articles from scratch. However, recently I selected a couple of articles that seemed to be nearly GA - good content, well written, plenty of citations - and worked to get them ready for a nomination. Much to my surprise, there was a lot of work to do. The big job was checking the citations and making sure they really support the content. Surprisingly often, they didn't, and I had to hunt down sources or change the content. Of course, none of that is unique to Geology.
Graeme Bartlett: I have worked on very few of these high standard articles, as the quality requirements especially for FA are too demanding. I have assessed one of the GA standard ones, my input may be in some of the GA or FA pages, but only to a limited extent. Getting to GA is more difficult for me since I do not have access to appropriate references. There is so much other work to do getting material in for missing topics, that getting to GA is something for later.
Mav: Depending on how one measures it, I am largely responsible for getting 10 or 11 WikiProject Geology articles featured. My first was Yellowstone National Park but that article has been expanded and improved so much, especially by MONGO, that it bears little resemblance to the paltry version I helped get featured in March 2004. My second, geology of the Bryce Canyon area, started a series of articles I wrote on the geology of several protected parklands in the United States. Each of those is still largely written and maintained by me. The biggest challenge to improving any article to FA status is now the amount of time it takes to do so; several hours of good writing using a few sources was all that was needed back in 2004 but now it seems to take about 10 times that effort once all the changes suggested or demanded in Peer Review and FAC are satisfied. In fact, most of my FAs wouldn't be good enough for GA or even B-class now if they weren't improved over the years. I guess this is a good problem to have since it means our standards have improved. But it sure does make things a bit less fun.
MONGO: I would concur with Mav's assessment about FA level work...the standards have gotten more strict, especially regarding prose and Manual Of Style idiosyncrasies. Geology is a complex discipline that isn't always easy to write about since one must have a broad knowledge base in so many other sciences. Putting the information together is an exceptionally time consuming venture. My expertise has been primarily in geography related articles...and the vast amount of my work has been stubs which delineate specific mountains, lakes and geographical areas such as parks. However, geology articles rarely need significant updating since the science doesn't change much unless there is, as in the case of an article on a volcano, an eruption. The main issue is creation of new articles which better document this complex arena....and further expansion of this material can be done as time permits and further reference material becomes available.
Chris.urs-o: No, I didn't. I did a clean up on plate tectonics, though. And the lists of eruptions (see list of volcanoes) delivered one of the backbones for list of largest volcanic eruptions. Encyclopædia Britannica (11 ed, 1911) was written before the plate tectonics model got accepted, and nowadays the computer models got much better (plate reconstruction). So, the content of some geology articles has/had to be updated. You need a solid background to understand the scientific papers dealing with geology.
Bejnar: I didn't do any significant work on any of the Geology FA or GA articles. Just an occasional minor edit. In part, that is because my own feeling is that it is better to spend time improving stubs, starts and Cs to B status, than to FA or GA status, because one gets more bang for the buck.
Mikenorton: I've contributed to a few GA and FA articles in a minor way, but I mainly aim at filling gaps and expanding stubs (including the ones that I've created).
Awickert: I contributed to a few volcanism-related ones thanks to the contagious enthusiasm of User:Ceranthor and some significant help from User:Carcharoth; the most memorable was the preparation of David A. Johnston for the 30th anniversary of the eruption of Mount St. Helens back in 2010 (wow - 2 years already?).

A lengthy category tree is maintained by the project. Has this been helpful in building and organizing Wikipedia's collection of geology articles? Does Wikipedia have any glaring holes in its coverage of geology?

RockMagnetist: Answering the question backwards: Geophysics was a hole in coverage. The main article was not much more than a list, and some major divisions of geophysics (for example, Mineral physics and Geophysical fluid dynamics) had no article. (It wasn't all bad, for example Earth's magnetic field and Geomagnetic reversals were pretty good.) I found category trees very useful in writing Geophysics because I wanted to write it in such a way that it linked to as many existing articles as possible.
Graeme Bartlett: Coverage of regional geology is quite poor, and I would like to see articles for each country. This is an area I will contribute more content in future. Many geological terms should have short articles to explain and illustrate them. But there seems to be many that would prefer to merge these into bigger articles.
Mav: As is common in Wikipedia, coverage is uneven and tends to be better on focused articles verses articles about more general subjects. Category trees help to expose the systemic bias that naturally arises in any project that grows as a result of volunteer effort to write about subjects the author finds interesting. Adding content to content poor areas, especially general topics, is the glaring hole that needs to be filled.
Chris.urs-o: Voluntary work is a rare thing, university level voluntary work is very rare, indeed.
Bejnar: Yes, the tree helps. The area is full of gaps, although mostly at this point because the value articles are less than Bs. Biography is weak, probably because it is held as less important in hard science areas. I agree that the more general articles are harder to research (that's library research) and write than the specific ones.
Awickert: I find it very helpful to mentally organize the work to be done, and (echoing above comments) to find holes in which some work is needed.


The project has an active peer review department. How useful has this been in generating feedback for improving articles? With so many abandoned peer review departments at other WikiProject, what has been the secret to WikiProject Geology's peer review process?

RockMagnetist: The secret is that I revived it a few weeks ago! Someone asked for a peer review on the talk page, and I realized that the project was not making a clear distinction between a request for feedback and a peer review. So I rewrote the peer review page after looking at how some other projects did it. I don't know if it made any difference.
Graeme Bartlett: I admit I have never used it, but there are a several committed people here that step in for all the new articles to improve things.
Mav: Purely WikiProject-specific PRs tend to limit the discussion to the WP members and they die quickly as a result. While that makes sense for an A-class review or nomination, which is supposed to be content focused, it doesn't really make sense for a PR, which is a request for suggestions for improvement. Content-related suggestions can just as easily be made during a general PR as in a WP PR but with the added benefit of having non-specialist (and more) eyes look over the work. I'm glad to see that the re-organized PR dept simply transcludes the standard general PR unto a WikiProject subpage. This strikes the right balance for me even though a link to the general PR via an announcement (a la WP:ELEMENTS/A) may be just as effective but with less overhead.
MONGO: The best way to get adequate PR for any article is to find neutral persons that either have interest in the article content and/or have knowledge of MOS issues, prose and fact checking. A number of bots are available to do reference and citation checks to help speed things up and anyone can run these. I have generally found WikiProject PR set-ups to be less than helpful.
Bejnar: I've looked at the PR requests, but none of them were in my areas of interest. However, I did make a couple of edits in passing to improve them.


WikiProject Geology has several daughter projects and task forces. Are you involved in any of them? Do they regularly communicate and collaborate with WikiProject Geology?

RockMagnetist: I wasn't aware of the daughter projects until I got inspired to reorganize the WikiProject pages. Both are pretty much dead now, but in the past they were responsible for a lot of Geology's best articles and a nice set of timelines for organizing geological time units. Daughter projects, particularly Earthquakes and Paleontology, deserve much of the credit for many of our GA and FA articles.
Graeme Bartlett: I questioned the existence of some of these, but if there is an active group of people working on them then why not?
Chris.urs-o: I'm involved with WikiProject Geology through WikiProject Volcanoes and WikiProject Rocks and minerals, I use their talk pages. We're just a few, we talk with each other.
Mikenorton: I'm the coordinator for WikiProject Earthquakes and I spend a lot of my time on earthquake related articles.
Awickert:: The daughter projects help to parse out organization and expertise, and there is good communication. The whole community at geology is very friendly and good at communicating and collaborating.

What are the project's most pressing needs? How can a new member contribute today?

RockMagnetist: I think we need more people to contribute significant amounts of content. The great majority of edits seem to be tweaks, and a number of top importance articles have languished in Start-class for years. There are lots of ways people can contribute: we have suggestions on the main page, the open tasks page and the top of the talk page. If I were to single out one article for improvement, I would pick Rock (geology). This article was in Version 0.7 and is a supplemental core article, yet much of its content was taken from a 1911 edition of Encyclopedia Britannica!
Graeme Bartlett: Another area we are missing, is pictures of equipment, or rocks viewed through microscopes with different optical tools. I should be encouraging those who have access to labs to take photographs to put on commons!
Mav: All period and larger divisions of geologic time need to be A-class or better and we need good articles for each geologic division down to the epoch level wherever there is good source info. All those articles need to explain in general detail the distribution of continents and biomes that prevailed during those times. Further, the migration through time of each craton (the more-or-less stable cores of continents) needs to be documented in an easy to understand manor within the limitations of the source data. Finally, detailed geologies of each continent need to be fleshed out. That will give us a good backbone to build on when writing about how specific geologic features fit into the larger historical geology context.
Chris.urs-o: WikiProject Rocks and minerals has one liner stubs (Category:Rocks and minerals articles needing attention).
Mikenorton: There are many many gaps in regional coverage and a lot of the basic articles need plenty of work. We need more pictures, particularly in my areas of interest of seismic reflection data, but there are very few public domain images available. It's difficult to produce a meaningful article on contourites for example without relevant images.
Awickert: Inside the project, we need significant content work and figures in core articles. On the periphery, we need to make sure that articles that incorporate geology have information that is correct. To the latter point, I would encourage anyone who is concerned about the factual accuracy of material that they are adding that is related to geology (or Earth science in general) to drop a message on the project talk page or on the user talk of a particular member.


Anything else you'd like to add?

Graeme Bartlett: Last year I assisted a plate tectonics class writing Wikipedia articles. Hopefully there will be a new group of students this year.
RockMagnetist: I have been involved in a number of wikiprojects, and I like Geology the best. There are frequent discussions and the contributors get along well. I have not encountered edit wars, mass AfDs, significant incivility or any of the other problems that often beset projects.
I would like to see the Recent changes page extended to allow searches by wikiproject. Tim1357's Wikiproject Watchlist is good, but a uniform interface would be better, and I would like to see how large each change is.
Mav: Wikipedia is no longer a village where people with completely different interests can feel as if they are in a single community with known members; it became a bustling and nearly faceless metropolis several years ago. That really did hurt the general feeling of community and shared purpose. Yet WikiProjects preserve many aspects of that lost general feeling. They create forums where known participants can interact, agree on standards and use those standards to build something they are all proud of. WikiProject Geology is such a forum and I'm glad it has been maintained as such over all these years.
Chris.urs-o: Articles on science & technology need editors with an academic degree. These people studied around 15 years, and one hour of their time is costly. An anonymous IP number changing numbers, does a lot of damage. I think that the voluntary work of these people needs more protection. Professors (Zyzzy2, Rasteraster, MaxWyss, for instance) got their images (some fair use images) deleted. It isn't Wikipedia's brightest hour, I'm not amused ;) Amazon got a verified identity option, something to think about ...
Mikenorton: I attended a workshop organised by the Geological Society of London and Wikimedia UK a few months ago. I hope that some of the attendees will be inspired to start contributing.
Awickert: I have found that as time has passed, more professional academics seem comfortable with Wikipedia and the general content that it offers in geology; this is encouraging!


Next week's project will showcase Wikipedia's artistic side. Until then, explore our gallery of previous Reports in the archive.

Reader comments

2012-05-28

Featured content cuts the cheese

This week's edition covers content promoted between 19 and 26 May.
The French cheese coulommiers, the lesser-known cousin of Brie, has a similar buttery colour and supple texture with perhaps a nuttier flavour. It is made from unpasteurised cow's milk, usually in the shape of a disc with a white, bloomy edible Penicillium candidum rind. A newly featured picture.
Alec Douglas-Home, seen here in 1986, was Prime Minister of Britain from 1963 to 1964, when he resigned because of illness. His was the second briefest premiership of the twentieth century, lasting two days short of a year. From the newly featured article on his life.
German singer Oceana performing at the Radio Hamburg Top 820. In 2010 she finished sixth in the Polish version of Dancing with the Stars and hosted the TV show ARTE Lounge on the European channel Arte. A new featured picture.

Four featured articles were promoted this week:

  • Alec Douglas-Home (nom), by Tim riley. Homes (1903–1995) was a British Conservative politician of noble birth who served as Prime Minister from October 1963 to October 1964. He is best known for his two spells as the UK's foreign minister, in which position he worked on relations with the Soviet Union and issues with the former colony Rhodesia. He retired in 1974.
  • Henry Wrigley (nom), by Ian Rose. Wrigley (1892–1987) was a flight pioneer who made the first trans-Australia flight in 1919. A member of the Australian Flying Corps since World War I, in 1921 Wrigley helped form the Royal Australian Air Force. Holding increasingly senior positions, by World War II he was an air commodore and in charge of organising the newly established Women's Auxiliary Australian Air Force. He retired in 1946.
  • Romances (nom), by Magiciandude. Romances is a 1997 studio album by Mexican singer Luis Miguel. The third in a series of studio albums, Romances was recorded in early 1997 at the Ocean Way recording studio in Los Angeles, California. It was a critical and commercial success, selling over 4.5 million copies and Miguel received praise for his vocals and the song selection.
  • Edmund Sharpe (nom), by Peter I. Vardy. After studies throughout Europe, Sharpe (1809–1877) became an architect in Lancaster, working mainly on churches. Afterwards, he was elected to the town council and spearheaded an effort to improve the quality of the water supply. He also achieved national recognition with his histories of architecture, including detailed architectural drawings.

One featured article was delisted:

Six featured pictures were promoted this week:

This newly featured picture, a chromolithograph by the Swedish born illustrator Thure de Thulstrup, depicts the Battle of Spotsylvania, the second major battle in Lt. Gen. Ulysses S. Grant's 1864 Overland Campaign of the American Civil War.


Reader comments

2012-05-28

Fæ case and GoodDay request for arbitration, changes to evidence word limits

The Arbitration Committee closed no cases last week and opened one case; another is pending review. (This was the first time in 22 months that there were no pending cases.)

Open cases

(Week 1)

A case has now been opened concerning alleged misconduct by . This follows a submission for a case by MBisanz two weeks ago that was rejected on the basis that other dispute resolution forums had not been explored. In his statement, MBisanz claims that "Fæ has rendered himself unquestionable and unaccountable regarding his conduct because he responds in an extremely rude manner that personally attacks those who question him." He alleges that Fæ mischaracterises commentary about his on-wiki conduct as harassment, further stating that while "Fæ has been treated poorly by some users off-wiki (and possibly on)", his violent responses to commentary about him on-wiki "has become the issue itself."

In Fæ's response, posted on his behalf by clerk Guerillero, he noted that MBisanz's writings on Wikipediocracy during May about a planned private meeting with Gregory Kohs "should be of interest to many and appears to directly relate to the nature of his complaints about matters off Wikipedia." In his statement, Themfromspace states that "views at the RFC were divided over the legitimacy of Fae's adminship when it was alleged that heleft [sic] his previous account "under a cloud". Questions were raised about the scope of ArbCom's involvement in the RFA (Fae stated that it was sanctioned by ArbCom; John Vandenberg stated that Fae was mistaken and that only he endorsed the RFA)."

Moreschi advised Fæ to step away from external websites adding that "50 percent of what people say about you at WR et al is simply driven by hurt vanity: 40 percent is based on misinformation provided by those of the hurt vanity, and 10 percent (at best) might be fair criticism of some validity." Arguing that if Fæ "can't filter out the white noise" that he not read the threads at all and continue "working quietly here without starting vast drama-filled BADSITES AN threads in which you then go make yourself look awful."

Anthonyhcole asked the committee to accept the case, provided they manage its pages "for relevance and civility." He notes that Fæ abandoned his earlier account, claiming to be leaving the project during an RfC/U where the likely outcome would have been "to sanction him in the area of BLPs." The committee, however, agreed to a clean start and in his RfA Fæ stated he changed his name after an RfC/U[1] and that he'd never been blocked or sanctioned under the earlier name.[2] Cole continued, saying that "this implied, to the !voters at his [Fæ's] RfA, that the RfC/U had found nothing sanctionable" adding that it is probable he would not have passed if !voters were aware of the circumstances in which he left the RfC/U.

Cole asked the committee to "address Fæ's fitness to edit BLPs" which he said "is still an open question." He conjectures that the committee should have stipulated that "he return and complete the RfC/U before agreeing to a clean start." Cole continued and stated that the right decision which, given the misleading evidence Fæ supplied in his RfA would be for him "to ask the community to reconfirm his adminship. It is argued that the value he adds to the project as an admin is too great to jeopardise with a reconfirmation RfA."

Pending cases

Steven Zhang has submitted a case for review into the disruptive editing of his mentoree GoodDay in the use of diacritics; GoodDay, who is topic-banned from articles pertaining to the UK and Ireland, broadly construed, believes that diacritics should not be used in articles as they are not part of the English language, in his statement, Zhang states that "at times he is rather uncivil when discussing his objections with other editors. When questioned on his edits, he will often remove the comments from his talk page, citing harassment."

In response, GoodDay remarks that "there's nothing for me to add here, except that folks should take a look at the English alphabet." In their statement, Resolute says s/he and GoodDay have both agreed and disagreed on certain points over the years in the ice hockey project—in particular, on the use of diacritics: "we used to agree but now disagree. I don't know much about his conflicts in the realm of the British Isles, but his attitude around diacritics has become increasingly combative as of late in my view."

In brief

  • The committee has resolved by motion that users who are named parties and submitting evidence must limit their submissions to 1000 words; all others will have a 500 word limit. Clerks may refactor submissions significantly over the limit at their and the committee's discretion.
  • The committee has also resolved by motion to amend finding of fact 2.5 of the Race and Intelligence Review to read that "Mathsci has engaged in borderline personal attacks and frequent battleground conduct."
  • A request for comment into the expansion of the Ban Appeals Subcommittee is now underway in an attempt to address concerns that "been raised regarding the feasibility of electing additional community members, given the traditionally low number of viable candidates in prior elections."
  • A request for comment into the effect of arbitration processes on editor retention is now underway. As the Signpost reported two weeks ago, the complexity of rules and processes and inadequacy of mechanisms dealing with problematic editors may be factors leading to decreased editor activity; because the arbitration process impacts both of the concerned areas, improving it to reduce the negative impact on editor retention is a vital step towards meeting the strategic goals of the editor retention effort.

Notes



Reader comments

2012-05-28

Developer divide wrangles; plus Wikimedia Zero, MediaWiki 1.20wmf4, and IPv6

English Wikipedians discuss editor–developer divide

A minor change—tweaks to the default heading used at the top of diff pages—provoked a long debate on the English Wikipedia when it went live this week. The discussion focussed on an issue that has bubbled to the surface intermittently for the past few years: as the MediaWiki developer base professionalises, are developers becoming less responsive to English Wikipedian demands?

Most developers would agree that editors of the English Wikipedia are given less priority than they used to be. There are overtly more projects than there used to be and more languages to support on each of them. Staff development projects are far more likely to target "newbie" editors than existing stalwart editors (a decision that seems to have significant support given this week's poll results, below); design choices are increasingly being made in the name of helping the former, potentially at the expense of upsetting the latter. Needless to say, decisions that fit such a paradigm (including the recent diff colours switchover) have not proved universally popular.

The horrific technological inertia that is developing within the community is only going to lead to two possible outcomes ... Either the developers abandon any hope of satisfying the community and stop bothering to even try to engage with it, or they stop trying to develop beneficial features at all.

—Happy-melon

Ultimately, a number of viewpoints emerged from the resulting discussion. They centre on two questions: firstly, whether developers are targetting the "wrong" things, and secondly whether they should be expected to communicate the changes they have made better. Both have proved to be contentious issues. Equazcion, in proposing the former, talked of developers implementing their "own whims regarding what is best for the community"; but such a critique relies on a certain view of the community as being a superior judge of what is best for itself and its future members, rather than as an insider group keen to resist any kind of novelty. Moreover, volunteer developers, much like Wikimedians who work in an idiosyncratically narrow area, are likely to resist any attempt to tell them what new features they can and cannot work on, especially since virtually all will have been proposed by some community or other at some point.

The issue on which a consensus is more likely to form revolves around the need for better communication between developers (who frequent the wikitech-l mailing list, MediaWiki.org, and Bugzilla, which, in unrelated news, was down for considerable periods this week) and editors who frequent their home wikis. When pushed for comment on the thread, WMF developer Ryan Kaldari was the first to admit that despite the amount of time WMF developers were putting in to communicating with communities, more could still be done. "Right now", he wrote "there are so many different venues for discussion it's rather unmanagable [and] we have a very hard time getting people to beta test things for us. ... It seems no matter where we advertise it, we generally only get significant community feedback after the features are deployed".

The issue is not restricted to the English Wikipedia, although it is certainly the place that the issue is invoked most frequently. By contrast, members of smaller wikis are more likely to complain not that too many changes are being forced on them, but that rather too few are made—that their many feature requests are simply never acted on because they are neither WMF strategic priorities nor aligned with the personal interests of volunteer developers. The difficulty for WMF development coordinators undoubtedly lies in addressing all of these multifarious complaints simultaneously and without trade-off.

In brief

Signpost poll
Long-term threats
You can now give your opinion now on next week's poll: What's your take on developer–user misunderstandings?

Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for many weeks.

  • MediaWiki 1.20wmf4 begins deployment: The fourth Wikimedia deployment from the MediaWiki 1.20 branch began today, with an additional two weeks' worth of bugfixes and other incremental improvements going live to MediaWiki.org and two test wikis. Among the 230 changes included in the deployment, two are of the small, visible change type that has caused so much controversy in the past couple of weeks: the first excludes immovable namespaces from Special:MovePage's namespace selector; the second broadens Special:Shortpages to include pages from all content (non-talk) namespaces, rather than just article space. All non-Wikipedia sites will be next to receive the changes on or around May 30; the English Wikipedia will receive them on June 4 and other Wikipedias on June 6, major problems notwithstanding.
  • UploadWizard updated: A series of updates went live to Wikimedia Commons' Upload Wizard this week, including "disabling multiple-file selection in browsers where it didn't work", tweaking the configuration such that the "'skip tutorial' step ... won't come back every time you switch computers any more", and ensuring that "files now start uploading immediately when selected" (wikitech-l mailing list). The changes, which include background support for so-called "upload campaigns" such as Wiki Loves Monuments, have largely been welcomed by the Commons community. It was noted that very large uploads (those greater than the previous limit of 100 MB) continue to fail on a regular basis, a major disappointment for those seeking to take advantage of the new, much greater upload limit of 500 MB.
  • Character support interview: Localisation team member Gerard Meijssen has used a post on his personal blog to publish an interview with Unicode specialist Michael Everson. In it, Meijssen and Everson discuss the possibilities with regard to improved web-font support (particularly the eradication of the ☐☐☐☐☐☐ ☐☐☐☐☐-phenomenon) for Wikimedia wikis. Everson noted, however, that it could be necessary to establish, either unilaterally or multilaterally, a body concerned with the creation of freely licensed web-friendly typefaces. "Right now", Everson wrote, "anyone viewing any Wikipedia in any language may encounter text in Ol Chiki, or in Runic, or in the simple International Phonetic Alphabet, and pages have to apologize to the reader because their computer may not display the material correctly".
  • Wikipedia Zero launches in Malaysia: Malaysia has become the first Asian country, and the third country in the world, to have a national mobile provider offer Wikipedia free of charge. According to a post on the Wikimedia blog, "Digi’s 10 million customers can read as many Wikipedia articles as they like (provided they have an internet-capable phone), in any language, through the Opera Mini browser data; WebKitFormBoufree access applies to the lightweight, text-only mobile version of Wikipedia, which Digi customers can now access by going to zero.wikipedia.org". Support for Tunisia and Uganda was brought in earlier this year; further networks will be added later in the year, should negotiations go well; at the very least, Digi is part of the already signed-up Telenor group, suggesting that Telenor's other 115 million customers will soon be enjoying the same service.
  • Wikimedia wikis to take part in World IPv6 Day: The foundation will attempt to take part in this year's World IPv6 Day, it was revealed this week. Deputy Director Erik Möller, answering a question from the community about the event (scheduled for June 6), outlined how the WMF was "planning to do some limited production testing at the Berlin Hackathon (June 1–3) and on IPv6 Day, but we'll only keep it enabled if the issues are manageable". Last year the foundation was forced to back out at the last minute due to problems with the database schemas of several extensions (see previous Signpost coverage). These have now been updated in readiness for the day, which last year was supported by a long list of major websites including Facebook and Google. Realistically, it has been pointed out that any proper addition of IPv6 support to Wikimedia wikis would require large scale "reeducation" of admins and the rewriting of a considerable number of user scripts, though such a move is regarded as inevitable in the longer term.
  • One bot approved: 1 BRFA was recently approved for use on the English Wikipedia:
  1. Joe's Null Bot, once-daily application of WP:NULLEDITs special purges with the "forcelinkupdate" option set to each of the articles within Category:BLP articles proposed for deletion by days left.
    At the time of writing, 17 BRFAs are active. As usual, community input is encouraged.

    Reader comments
If articles have been updated, you may need to refresh the single-page edition.