User:EpochFail/Quality assessment

This page in a nutshell: A proposal for combining editor and WMF staff resources for evaluating the usefulness of feedback generated by the Article Feedback Tool v5 is described

As part of the first phase of testing Article Feedback Tool v5 (AFT), the feedback produced by each of the proposed interface widgets will need to be judged for usefulness. This page describes a proposal to have Wikipedia editors (who are likely to make use of feedback produced by the tool) aid in its evaluation.

Why editors?

It is intended that the AFT becomes a productive resource for constructing and improving the articles of Wikipedia. Since the editors of Wikipedia will be the best judges of what feedback will be useful to them, their participation is sought in evaluating the quality of the feedback that is generated by the new interface.

The experiment

Version 5 of the AFT is scheduled to be released in three phases. The first phase will test the type of widget that the final interface will employ: share your feedback, make a suggestion or review this page. The second phase will test location of the invitation for feedback. Finally, the third phase will be an evaluation of the AFT's affect on the quantity and quality of work performed by editors (both registered and anonymous). The quality assessment proposed by this page is intended to take place between the first and second phases. The results of the assessment will be used to determine which interface widget to proceed with for the 2nd phase.

Categorizing feedback

As part of the quality assessment process, editors will be requested to view a randomized list of feedback generated by the three types of widgets and assess the quality and usefulness of this feedback. In order to more deeply understand the type of feedback that can be elicited by the article feedback tool, editors will also be asked to categorize the feedback independently of its usefulness.

FES: Feedback evaluation interface

Annotated version of the FES interface.

The interface mockups below represent a proposed hand coding system that an editor could use to evaluate the qualities of feedback.

A hand coding system will be developed and hosted on the toolserver that will allow users to log in using their Wikipedia username and password. Editors will then be allowed to request a small block feedback items to evaluate (~50 items) using an interface like the one mocked up above. Upon completion of a block, an editor would be able to request an additional block of feedback items to evaluate. Each feedback item will be evaluated by a minimum of three editors in order to ensure consistency among the ratings.

Every effort will be made to obfuscate details about which AFT gadget was used to generate the feedback in order to ensure that such information does not introduce a bias in the results.

Support for volunteers

Documentation will be provided to describe the meaning behind the categories of FES along with examples.
- This documentation can be elicited from within FES by moving the mouse over the icons.
An IRC channel will be created and staffed by the WMF staff in charge of overseeing the experiment.
The talk page will be watched by the WMF staff in charge of overseeing the experiment.

Analysis & public release of data

After the minimum number of feedback items (TDB) have been evaluated, a dataset will be released publicly and analysis of the results will be completed and posted along with the developers' decision about which interface widget to test in phase 2.

The dataset will include:

The article for which the feedback was submitted for
The current revision of the article at the time the feedback was submitted
The interface widget used to produce the feedback
The text of the feedback
The author of the feedback
The editor who performed the evaluation
All details of the evaluation