File:Number of claim revision pairs in each debate category of Kialo.png

Original file(1,212 × 1,442 pixels, file size: 207 KB, MIME type: image/png)

Summary

Description
English: "The number of claim pairs of the 20 most frequent categories in both corpus versions are presented in Figure 2"

"Here, we present our corpus created based on claim revision histories collected from kialo.com.

3.1 A New Corpus based on Kialo

Kialo is a typical example of an online debate portal for collaborative argumentative discussions, where participants jointly develop complex pro/con debates on a variety of topics. The scope ranges from general topics (religion, fair trade, etc.) to very specific ones, for instance, on particular policy-making (e.g., whether wealthy countries should provide citizens with a universal basic income). Each debate consists of a set of claims and is associated with a list of related pre-defined generic categories, such as politics, ethics, education, and entertainment.

What differentiates Kialo from other portals is that it allows editing claims and tracking changes made in a discussion. All users can help improve existing claims by suggesting edits, which are then accepted or rejected by the moderator team of the debate. As every suggested change is discussed by the community, this collaborative process should lead to a continuous improvement of claim quality and a diverse set of claims for each topic. As a result of the editing process, claims in a debate have a version history in the format of claim pairs, forming a chain where one claim is the successor of another and is considered to be of higher quality (examples found in Table 1). In addition, claim pairs may have a revision type label assigned to them via a non-mandatory free form text field, where moderators explain the reason of revision.

Base Corpus To compile the corpus, we scraped all 1628 debates found on Kialo until June 26th, 2020, related to over 1120 categories. They contain 124,312 unique claims along with their revision histories, which comprise of 210,222 pairwise relations. The average number of revisions per claim is 1.7 and the maximum length of a revision chain is 36. 74% of all pairs have a revision type. Overall, there are 8105 unique revision type labels in the corpus. 92% of labeled claim pairs refer to three types only: Claim Clarification, Typo/Grammar Correction, and Corrected/Added Links. An overview of the distribution of revision labels is given in Table 2. We refer to the resulting corpus as ClaimRevBASE [...]

Extended Corpus To increase the diversity of data available for training models, without actually collecting new data, we applied data augmentation. ClaimRevBASE consists of consecutive claim version pairs, i.e., if a claim v has four versions, it will be represented by three three pairs: (v1, v2), (v2, v3), and (v3, v4), where v1 is the original claim and v4 is the latest version. We extend this data by adding all pairs between non-consecutive versions that are inferrable transitively. Considering the previous example, this means we add (v1, v3), (v1, v4), and (v2, v4). This is based on our hypothesis that every argument version is of higher quality than its predecessors, which we come back to below. Figure 1 illustrates the data augmentation. We call the augmented corpus ClaimRevEXT.

For this corpus, we introduce the concept of revision distance, by which we mean the number of revisions between two versions. For example, the distance between v1 and v2 would be 1, whereas the distance between v1 and v3 would be 2. The distribution of the revision distances across ClaimRevEXT is summarized in Table 2. The number of claim pairs of the 20 most frequent categories in both corpus versions are presented in Figure 2. We will restrict our view to the topics in these categories in our experiments."
Date
Source https://arxiv.org/abs/2101.10250
Author Authors of the study: Gabriella Skitalinskaya, Jonas Klaff, Henning Wachsmuth

Licensing

w:en:Creative Commons
attribution
This file is licensed under the Creative Commons Attribution 4.0 International license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

Captions

"...of the two provided versions of our corpus (ClaimRevBASE, ClaimRevEXT)"; From the study "Learning From Revisions: Quality Assessment of Claims in Argumentation at Scale"

Items portrayed in this file

depicts

25 January 2021

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current16:00, 1 June 2023Thumbnail for version as of 16:00, 1 June 20231,212 × 1,442 (207 KB)PrototyperspectiveUploaded a work by Authors of the study: Gabriella Skitalinskaya, Jonas Klaff, Henning Wachsmuth from https://arxiv.org/abs/2101.10250 with UploadWizard
The following pages on the English Wikipedia use this file (pages on other projects are not listed):