Wikipedia talk:Labels/Edit types

Latest comment: 7 years ago by EpochFail in topic New Annotation Campaign


Updates Edit Type Taxonomy edit

Since the syntactic operations (syntactic objects and actions) can be told via heuristic rules, we decided not to put them into the annotation task. That is, in the current annotation project, we only need people's help on labeling the semantic intentions: for what reason a user made an edit. The semantic intentions can be chosen from the following table. And, one revision could have multiple semantic intention labels. Besides, the labeling form were deployed so you'll see a new labeling interface.

The removal of syntactic operations will largely reduce the annotation burden; however, we might not be able to get an accurate mapping of syntactic operations and semantic intentions. Please leave your comments here about whether the removal is good or remains to be discussed.

Semantic Intentions
Copy Editing
Clarification
Simplification
Point of View
Refactoring
Fact Update
Elaboration
Verifiability
Disambiguation
Wikification
Vandalism
Counter-vandalism
Process
Other Intentions

Several remarks about the annotation:

  1. Each revision could have multiple semantic intentions. (An editor could do multiple things at the same time)
  2. If you think none of the semantic intentions fit a revision, please label it as ‘Other’ and leave comments in the Notes field.
  3. Adding Categories usually belongs to Wikification
  4. Adding images or files belongs to Elaboration
  5. Difference between Copy Editing and Wikification: the former aims at fixing grammar or spelling errors; the latter focuses on formatting the text to follow the wikipedia manual of style, adding links, etc.
  6. In terms of semantic intentions, if an edit only changes the syntax of a reference or citation, it does not belong to Verifiability; it might be Wikification.

--Diyiy (talk) 16:27, 23 March 2016 (UTC)Reply

Annotation Discussion edit

We summarized several frequent questions about edit type annotation. Please leave your comments here, thanks! @EpochFail, DarTar --Diyiy (talk) 04:56, 29 March 2016 (UTC)Reply

  1. Should 'adding/changing text color' belong to Wikification?
  2. What if adding a row to a table? Elaboration?
  3. Does formatting a table or text belong to Wikification?
  4. For "migrating Persondata to Wikidata, please help, see challenges for this article", see the example, I labeled it as "Process". Do you agree on this?

Current annotation also requires us to provide annotation about whether an edit changes the meeting. This can be provided by the 'information-added', or 'information-modified' or 'information-removed' fields.

  1. Does semantic intention 'Process' involve meaning modification (add/modify/remove)?
  2. For clarification intention, does it add information to the article? or does it modify the information?
  3. For adding a figure, does it add new information?
  4. Does adding a category add new information? I labeled information-added for adding categories.
  5. Also, I think Wikification (adding new wikilink) does add information, but some formatting operations does not. — Preceding unsigned comment added by Diyiy (talkcontribs) 05:03, 29 March 2016 (UTC)Reply

Pilot Study Annotation edit

Hi DGG (talk · contribs), EpochFail (talk · contribs), DarTar (talk · contribs), ONUnicorn (talk · contribs), Mdann52, He7d3r, とある白い猫 (talk · contribs), Ladsgroup, Noyster, EoRdE6, SchreiberBike, JoeSperrazza, Epicgenius, Stuartyeates, MrX, Jay8g, Blackmane, Coretheapple, Pishcal, TheMagikCow, Esquivalience, Kharkiv07, Philippe (WMF). Sarr Cat, Odeesi, Masssly,

Sorry for the mass ping, but I'd like to invite you to participate in a new labeling campaign. In this one, we'll ask you to help us evaluate the intention of edits. Our goal is to stand up an automated system that can automatically label edit. EpochFail (talk · contribs), DarTar (talk · contribs) and I are currently training undergraduate students to work on the annotation. To begin with, we provided them with a sample of revisions.

The labeling campaign can be accessed here: https://en.wikipedia.org/wiki/Wikipedia:Labels.

The description can be found here: https://en.wikipedia.org/wiki/Wikipedia:Labels/Edit_types/Taxonomy

It contains 70 representative revisions. Five undergraduate students has provided annotations for them already, but we want to hear from Wikipedian experts like you, to calculate agreement between their annotation and yours. This will allow us to work with them on a larger corpus construction.

Each workset takes around 3-5 mins, and we have 7 worksets in total. Please help us on this project! --Diyiy (talk) 02:29, 6 April 2016 (UTC)Reply

New Annotation Campaign edit

Hey, DGG (talk · contribs), EpochFail (talk · contribs), DarTar (talk · contribs), ONUnicorn (talk · contribs), Mdann52, He7d3r, とある白い猫 (talk · contribs),

Thanks for your discussion and annotation for our last edit type campaign!

For last edit type campaign, we also asked 4 graduate students to do the labeling. However, students are not Wikipedia experts as you, so we ended up with low agreement for semantic intentions such as clarification, simplification and disambiguation. This makes our machine learning classifiers hard to automatically predict such edit intentions because of low quality annotation.

Thus, we want to set up another campaign to annotate revisions that might belong to specific semantic intentions. We'd like to hear your valuable opinions about this!

This new campaign contains around 1400 revisions that might belong to POV, clarification, simplification, refactoring, fact update, etc .(These are the classes that do not have enough instances in our last annotation campaign). We randomly select 200 revisions for it based on information contained in the comment.

For example, to acquire revisions of POV (point of view), we collect revisions that mention 'pos' in their comments. Similarly, we collection revisions that mention 'clarify', 'simplify', 'refactor', 'update', 'links' in their comments, in order to better predict 'clarification', 'simplification', 'refactoring', 'fact update', 'wikification'.

Do you have any suggestions or comments about the current methods of collecting revisions? --Diyiy (talk) 14:58, 12 October 2016 (UTC)Reply

Hi Diyiy, I'd like to review how you selected these revisions. Can you post the exact queries that you ran in order to gather these samples? It seems like a little bit of spot-checking would be in order to make sure that the selection is working acceptably. --EpochFail (talkcontribs) 15:02, 12 October 2016 (UTC)Reply
Hi User:EpochFail, please see the below for NPOV. For others, I just changed the keywords --Diyiy (talk) 15:13, 12 October 2016 (UTC):Reply
SELECT rc_this_oldid AS rev_id, rc_comment FROM recentchanges WHERE rc_comment LIKE "%pov%" AND rc_type IN (0, 1) AND rc_namespace = 0;
It looks like this picks up a lot of reverting edits and other terms like "poverty". Here's a query that seems to do a bit better. https://quarry.wmflabs.org/query/13086 Note the use of word boundaries in the regex along with a few related terms (npov, pov, pushing, neutral) and the exclusion of summaries containing "reverted" and "undid". --EpochFail (talkcontribs) 23:30, 12 October 2016 (UTC)Reply