User:Andrewa/Wikipedia approval mechanism

This is an annotated copy of a snapshot of Wikipedia approval mechanism. My comments are so extensive I thought they'd better go in a user page. I'm new to User subpages, so I hope this is an appropriate use for them.

It needs to be read in conjunction with the current version of that page. There are subsequent edits to that page which I have no intention of bringing here. See also Wikipedia talk:Wikipedia approval mechanism, which contains a lot of comments too, some chronologically before mine, some after.

This page is a work in progress, like anything on a Wiki, and my comments will probably be expanded even more in time. And if the proposal goes forward in any form, they will eventually get refactored into the software spec and other more appropriate places.

Please feel free to add your own comments, preferable signed and at least triple-indented. Then we can assume that unsigned double-indented comments are mine unless otherwise indicated. I went to double because in Larry Sanger's and other proposals below, they use single indents already.

The primary purpose here is to relate this existing discussion to the m:Referees proposal, henceforth just referred to as the proposal. Andrewa

Wikipedia approval mechanism, defined

"Wikipedia approval mechanism" means any sort of mechanism whereby Wikipedia articles are individually marked and displayed, somehow, as "approved."

Satisfied by the proposal.

The purpose of an approval mechanism

The purpose of an approval mechanism is, essentially, quality assurance. By presenting particular articles as approved, we (Wikipedians) would be representing those articles as reliable sources of information.

Satisfied by the proposal.

Some basic requirements of an adequate approval mechanism

Among the basic requirements of an approval mechanism would have to fulfill in order to be adequate are:

  • The approval must be done by experts about the material approved. A Wikipedia:trust model?
  • There must be clear and reasonably stringent standards that the experts are expected to apply. See central issues.
  • The mechanism itself must be genuinely easy for the experts to use or follow. Nupedia's experience seems to show that a convoluted approval procedure, while it might be rigorous, is too slow to be of practical use.
  • The approval mechanism must not impede the progress of Wikipedia in any way. It must not change the Wikipedia process; it should be an "add-on."
  • Must not be a bear to program, and it shouldn't require extra software or rely on browser-specific stuff like Java (or Javascript) that some users won't have. A common browser platform standard must be specified for them, preferably low level but not so low level that the machines don't have CDs.
  • Must provide some way of verifying the expert's credentials - and optionally a way to verify that he or she approved the article, not an imposter.
Mostly satisfied. The verification of qualifications is only relevant at referee level. There is one part of the proposal that would use Javascript, but the system would work for users and reviewers without it, they'd just lack one convenience tool. Apart from that, browser requirements are minimal.

Some "desirements":

  • Makes it possible to broaden or narrow the selection of approvers (e.g., one person might only wish authors who have phd's, another would allow for anyone who has made an effort to approve any articles.)
  • Allows for extracting topic-oriented sets (e.g., in order to produce an "Encyclopedia of Music"). (The idea is that article approval could contain more information than just the binary "high-quality" bit, e.g. topic area, level of detail, and so forth. Such "approved metadata" would allow easy extraction of user-defined subsets of the full approved article set.)
All satisfied by the full proposal, introducing baselining. Any sub-Encyclopedia would require its own Editorial Board.

The advantages of an approval mechanism

The advantages of an approval mechanism of the sort described are clear and numerous:

  • We will encourage the creation of really good content.
  • Large, reputable websites and the web in general are more likely to use and/or link to our content if it has been approved by experts. And, especially, if the current version of an article has a persistent URL that they can link to. At present only an article which has been changed actually has such a URL!
  • The addition of an approval mechanism will be attractive to academics who might not participate without it--particularly the academics who might want to be reviewers. All that matters is a mass of GNU FDL text on important topics that is good enough for serious scholars to find worth correcting. EofT
  • It makes it easier to collect the best articles on Wikipedia and create completed "snapshots" of them that could be printed and distributed, for example. This issue is central to Pushing To 1.0.

Generally, Wikipedia will become comparable to nearly any encyclopedia, once enough articles are approved. It need not be perfect. Just better than Britannica and Encarta and ODP and other CD or Web resources.

I am not sure there are any significant disadvantages of an approval mechanism, but idly, I think there might be one. I think that it's possible that Wikipedia might become more of an "exclusive club" than it is, if people start comparing nascent articles contributed by new contributors to the finished projects. I might not want to contribute two sentences about widgets if I think ten neat paragraphs, with references, is what is expected. Again, I don't know if this is really apt to be a problem.

Another general argument against is that this really doesn't seem necessary. An approval mechanism has been suggested since Day One of Wikipedia, and evidence aside that Wikipedia is working just fine, will probably continue to be suggested 'til kingdom come.

All answered and/or satisfied by the proposal.

Proposals

Below, we can develop some specific proposals for approval mechanisms.

Sanger's proposal

When I say the approval mechanism must be really easy for people to use, I mean it. I mean it should be extremely easy to use. So what's the easiest-to-use mechanism that we can devise that nevertheless meets the criteria?
The following: on every page on the wiki, create a simple popup approval form that anyone may use. ("If you are a genuine expert on this subject, you can approve this article.") On this form, the would-be article approver (whom I'll call a "reviewer") indicates name, affiliation, relevant degrees, web page (that we can use to check bona fides), and a text statement to the effect of what qualifications the person has to approve of an article. The person fills this out (with the information saved into their preferences) and hits the "approve" button.
Satisfied by the proposal, which is a variation on this but simpler.
When two different reviewers have approved an article, if they are not already official reviewers, the approval goes into moderation.
This could be added to the proposal but I think it would be premature. See how it goes. If we need such a mechanism, add it later.
The approval goes into a moderation queue for the "approved articles" part of Wikipedia. From there, moderators can check over recently-approved articles. They can check that the reviewers actually are qualified (according to some pre-set criteria of qualification) and that they are who they say they are. (Perhaps moderator-viewable e-mail addresses will be used to check that a reviewer isn't impersonating someone.) A moderator can then "approve the approver."
Needlessly complicated.
The role of the moderators is not to approve the article, but to make sure that the system isn't being abused by underqualified reviewers. A certain reviewer might be marked as not in need of moderation; if two such reviewers were to approve of an article, the approval would not need to be moderated.
New addition I think it might be a very good idea to list, on an approved article, who the reviewers are who have approved the article.
--Larry Sanger
The creation of 'Moderators' may ultimately be necessary, but it's a departure from the existing consensus model. The proposal is to use the existing model. Let's try it.

Bryce's Proposal

From my experience with the Wikipedia_NEWS, it seems that there's a lot that can be done with the wiki software as it exists. The revision control system and its tracking of IP addresses is ok as a simple screen against vandalism. The editing system seems fairly natural and is worth using for managing this; certainly we can expect anyone wishing to be a reviewer ought to have a fair degree of competence with it already.
Second, take note at how people have been making use of the user pages. People write information about themselves, the articles they've created, and even whole essays about opinions or ideas.
Agree. The proposal supports this approach
What I'd propose is that we encourage people who wish to be reviewers to set up a subpage under their userpage called '/Approved?'. Any page that they added to this page is considered to be acceptable by them. (It is recommended they list the particular revision # they're approving too, but it's up to them whether to include the number or not.) The reviewer is encouraged to provide as much background and contact information about themselves on their main page (or on a subpage such as /Credentials?) as they wish. It is *completely* an opt-in system, and does not impact wikipedia as a whole, nor any of its articles.
The proposal is also completely opt-in.
Okay, so far it probably sounds pretty useless because it *seems* like it gives zero _control_ over the editors. But if we've learned nothing else from our use of Wiki here, it's that sometimes there is significant power in anarchy. Consider that whomever is going to be putting together the set of approved articles (let's call her the Publisher) is going to be selecting the editors based on some criteria (only those with phds, or whatever). The publisher has (and should have) the control over which reviewers they accept, and can grab their /Approved? lists at the time they wish to publish. Using the contact info provided by the reviewer, they can do as much verification as they wish; those who provide insufficient contact info to do so can be ignored (or asked politely on their userpage.) But the publisher does *not* have the power to control whether or not you or I are *able* to approve articles. Maybe for the "PhD? Reviewers Only" encyclopedia I'd get ruled out, but perhaps someone else decides to do a "master's degree or better" one, and I would fit fine there. Or maybe someone asks only that reviewers provide a telephone number they can call to verify the approved list.
Consider a further twist on this scheme: In addition to /Approved?, people could set up other specific kinds of approval. For instance, some could create /Factchecked? pages where they've only verified any factual statements in the article against some other source; or a /Proofed? page that just lists pages that have been through the spellchecker and grammar proofer; or a /Nonplagerized? page that lists articles that the reviewer can vouch for as being original content and not merely copied from another encyclopedia. The reason I mention this approach is that I imagine there will be reviewers who specialize in checking certain aspects of articles, but not everything (a Russian professor of mathematics might vouch for everything except spelling and grammar, if he felt uncomfortable with his grasp of the English language). Other reviewers can fill in the gaps (the aformentioned professor could ask another to review those articles for spelling and grammar, and they could list them on their own area.
I think this system is very in keeping with wiki philosophy. It is anti-elitist, in the sense that no one can be told, "No, you're not good enough to review articles," yet still allows the publisher to discriminate what to accept based on the reviewer's credentials. It leverages existing wiki functionality and Wikipedia traditions rather than requiring new code and new skills. And it lends itself to programmatic extraction of content. It also puts a check/balance situation between publisher and reviewer: If the publisher is selecting reviewers to include unfairly, someone else can always set up a fairer approach. There is also a check against reviewer bias, because once discovered, ALL of their reviewed articles would be dropped by perhaps all publishers, which gives a strong incentive to the reviewer to demonstrate the quality of their reviewing process and policies.
-- BryceHarrington
Most of this is covered by the proposal and supports it. The fine detail of approval for specific things such as NPOV, non-plagiarised etc could be implemented for a particular Editorial Board with extensions to the proposed software. Much of the function is already there in the proposal and the design is compatible with these concepts.

Magnus Manske's proposal:

I'll try to approach the whole approval mechanism from a more practical perspective, based on some things that I use in the Wikipedia PHP script. So, to set up an approval mechanism, we need:
    • Namespaces to separate different stages of articles
That is one way, true, but there are others. Why not just use the versioning we already have, and flag each version according to who has approved it, and to which level?
    • User rights management to prevent trolls from editing approved articles
Using versioning avoids this need, which is what makes this approach elegant. The approval is to a particular version. Any version can still be edited, but the resulting version will not be part of the approved version until it is approved.
From the Sanger proposal, the user hierarchy would have to be:
    1. Sysops, just a handful to ensure things are running smoothly. They can do everything, grant and reject user rights, move and delete articles etc.
    2. Moderators who can move approved articles to the "stable" namespace
    3. Reviewers who can approve articles in the standard namespace (the one we're using right now)
    4. Users who do the actual work ;)
This is also more complicated than the versioning approach I have suggested.
For Wikipedia 1.0, I'd just have add an editorial board.
Stages 1-3 should have all rights of the "lowerlevels", and should be able to "rise" other users to their level. For the namespaces, I was thinking of the following:
    • The blank namespace, of course, which is the one all current wikipedia articles are in; the normal wikipedia
    • An approval namespace. When an article from "blank" gets approved by the first reviewer, a copy goes to the "approval" namespace.
    • A moderated namespace. Within the "approval" namespace, noone can edit articles, but reviewers can either hit a "reject" or "approve" button. "Reject" deletes the article from the "approval" namespace, "approve" moves it to the "moderated" namespace.
    • A stable namespace. Same as for "approval", but only moderators can "reject" or "approve" an article in "moderated" namespace. If approved, it is moved to the "stable" namespace. End of story.
And again, quite a complicated setup. Is it needed?
This system has several advantages:
    • By having reviewers and moderators not chosen for a single category (e.g., biology), but by someone on a "higher level" trusting the individual not to make strange decisions, we can avoid problems such as having to choose a category for each article and each person prior to approval, checking reviewers for special references etc.
Yes. Common to my proposal.
    • Reviewers and moderators can have special pages that show just the articles currently in "their" namespace, making it easy to look for topics they are qualified to approve/reject
    • Easy handling. No pop-up forms, just two buttons, "approve" and "reject", throughout all levels.
The interface for my proposal is even simpler.
    • No version confusion. The initial approval automatically locks that article in the "approval" namespace, and all decisions later on are on this version alone.
My proposal doesn't need locks or extra article namespaces at all.
    • No bother of the normal wikipedia. "Approval" and "moderated" can be blanked out in every-day work, "stable" can be blanked out as an option.
Sililarly, my proposal has minimal impact on the base Wikipedia.
    • Easy to code. Basically, I have all parts needed ready, a demo version could be up next week.
Did this happen? Can I see it?
This is reminiscent of the sort of software development hierarchy I was using professionally in the 1980s, provided by IBM's SCLM product. My proposal uses a simpler and more general model, which can serve the needs of QA on the base Wikipedia, Wikipedia 1.0, and a G-rated Wikipedia, with no more complication than this proposal introduces just to deal with Wikipedia 1.0. Andrewa 06:55, 5 Nov 2003 (UTC)

Ehrenberg addition

edit

This would be added on to any of the above approval proceses. After an article is approved, it would go into the database of approved articles. People would be able to access this from the web. After reading an article, the reader would be able to click on a link to disapprove of the article. After 5 (more, less?) people have disapproved of an article, the article goes through a reapproval process, in which only one expert must approve it, and then the nessessary applicable administrators.

Let's try the simpler way first. But I like the idea of asking for reader's evaluations. It's a separate issue IMO.
Perhaps put a link at the top of every article, Did this article answer your questions?, leading to a review page Please take a few seconds to fill in this readers' assessment form. It will help us to improve Wikipedia. Information gathered could be a great help IMO. The only minus I can see is that people who would otherwise have become Wikipedians might become reviewers instead, contributing less to the project than they would have otherwise. Hard to guess that one.

DWheeler's Proposal: Automated Heuristics

edit

It might also be possible to use some automated heuristics to identify "good" articles. This could be especially useful if the Wikipedia is being extracted to some static storage (e.g., a CD-ROM or PDA memory stick). Some users might want this view as well. The heuristics may throw away some of the latest "good" changes, as long as they also throw away most of the likely "bad" changes.

Here are a few possible automated heuristics:

  • Ignore all anonymous changes; if someone isn't willing to have their name included, then it may not be a good change. This can be "fixed" simply by a some non-anonymous person editing the article (even trivially).
  • Ignore changes from users who have only submitted a few changes (e.g., less than 50). If a user has submitted a number of changes, and is still accepted (not banned), then the odds are higher that the user's changes are worthwhile.
  • Ignore pages unless at least some number of other non-anonymous readers have read the article and/or viewed its diffs (e.g., at least 2 other readers). The notion here is that, if someone else read it, then at least some minimal level of peer review has occurred. The reader may not be able to identify subtle falsehoods, but at least "Tom Brokaw is cool" might get noticed. This approach can be foiled (e.g., by creating "bogus readers"), but many trolls won't bother to do that.

These heuristics can be combined with the expert rating systems discussed elsewhere here. An advantage of these automated approaches is that they can be applied immediately.

Other automated heuristics can be developed by developing "trust metrics" for people. Instead of trying to rank every article (or as a supplement to doing so), rank the people. After all, someone who does good work on one article is more likely to do good work on another article. You could use a scheme like Advogato's, where people identify how much they respect (trust) someone else. You then flow down the graph to find out how much each person should be trusted. For more information, see Advogato's trust metric information. Even if the Advogato metric isn't perfect, it does show how a few individuals could list other people they trust, and over time use that to derive global information. The Advogato code is available - it's GPLed.

Another related issue might be automated heuristics that try to identify likely trouble spots (new articles or likely troublesome diffs). A trivial approach might be to have a not-publicly-known list of words that, if they're present in the new article or diffs, suggest that the change is probably a bad one. Examples include swear words, and words that indicate POV (e.g., "Jew" may suggest anti-semitism). The change might be fine, but such a flag would at least alert someone else to especially take a look there.

A more sophisticated approach to automatically identify trouble spots might be to use learning techniques to identify what's probably garbage, using typical text filtering and anti-spam techniques such as naive Bayesian filtering (see Paul Graham's "A Plan for Spam"). To do this, the Wikipedia would need to store deleted articles and have a way to mark changes that were removed for cause (e.g., were egregiously POV) - presumably this would be a sysop privilege. Then the Wikipedia could train on "known bad" and "known good" (perhaps assuming that all Wikipedia articles before some date, or meeting some criteria listed above, are "good"). Then it could look for bad changes (either in the future, or simply examining the entire Wikipedia offline).

All good suggestions for tools that an Editorial Board might use.

Why wikipedia doesn't need an additional approval mechanism

edit

These are arguments presented for why an additional approval mechanism is unnecessery for wikipedia:

  • Wikipedia already has an approval mechanism! Anyone can edit any page. It means that experts of all sorts can be bold and contribute to articles, peer-review is an approval mechanism.
  • An expert-centered approval mechanism is a considered a cathedral-type methodology, in contrast with the bazaar-type open-source projects like wikipedia, that are known to achieve good results (e.g. Linux) thru aggressive peer-review, and openness ("With enough eyeballs, all errors are shallow"). It can be argued that the very reason Linux has become so reliable is the radical acceptance, and for some degree, respect, for amateurs' and enthusiasts' work of all sorts.
  • Experts themselves have controversies between themselves, for example, many subjects in medicine and psychology are highly debated. By giving a professor the free hand in deciding whether an article is "approved" or "non-approved" there is a risk of compromising the NPOV standards by experts' over-emphasizing their specific opinions and area of research.
  • Low quality articles can be easily recognized by a reader with some or no experience over reading wikipedia, and by applying some basic critical thinking:
    • Style may sound biased, emotional, poorly written, or just unintelligible.
    • Blanket statements, no citing, speculative assertions: any critical person will be careful in giving too much credit for such article.
    • History of an article shows much of the effort and review that has been brought into writing it, who and how qualified are the writers (Users seem to put some biographical information about themselves on their pages)
  • Cross-checking with other sources is an extremely important principle for good information gathering on the internet! No source should be taken as 100% reliable.
  • Some "authoritative" and "approved" encyclopedias don't seem to stand for their own claims of creditability. See, for example, Columbia Encyclopedia's article about Turing test, compare with Wikipedia's Turing test. Any amateur computer science hobbyist knows that a Turing test does not necessarily test whether a computer is capable of "human-like thought". See also m:Making fun of Britannica.
  • Finding an expert who corresponds to a certain article can sometimes be troublesome. Can a Ph.D on applied Mathematics "approve" articles on pure mathematics?, or more strictly, does one will be accepted as a approver only if he/she have made research on the specific subject he/she is approving? Who will decide whether a person is qualified for approval?
  • Some obscure or day-to-day topics don't have any immediate "expert" attached to them. Who will approve articles on hobbies, games, local cultures etc.?
  • The very idea of an article being "approved" is debatable, especially on controversial topics, and can be seen as an unreachable ideal by some.
  • The immediateness and easiness of publishing on Wikipedia is seen by some as one of the main incentives for working on the project. Creating a moderation hierarchy can become cumbersome as a whole (e.g. Nupedia) and discouraging for these contributors.
I think the proposal addresses all of these. I'll develop some more detailed responses perhaps.

PeterK's Proposal: Scoring

edit

This idea has some of the same principles as the Automated Heuristic suggested above. I agree that an automated method for determining "good" articles for offline readers is absolutely crucial. I have a different idea on how to go about it. I think the principles of easy editing and how wikipedia works now is what makes it great. I think we need to take those principles along with some search engine ideas to give a confidence level for documents. So people extracting the data for offline purposes can decide the confidence level they want and only extract articles that meet that confidence level.

I think the exact equation for the final scoring needs to be discussed. I don't think I could come up with a final version by myself, but I'll give an example of what would give good point and bad points.

Final Score: a: first thing we need it a quality/scoring value for editors. Anonymous editors would be given a value of 1 and a logged in user may get 1 point added to their value for each article he/she edits, up to a value of 100. b: 0.25 points for each time a user reads the article c: 0.25 point for each day the article has existed in wikipedia d: each time the article is edited it gets 1+(a/10)*2 points, anonymous user would give it 1.2 and a fully qualified user would give it 20 points. e: next if an anonymous user makes a large change then you get a -20 point deduction. Even though this is harsh, if it goes untouched for 80 days it will gain all those points back. It will gain the points back faster if a lot of people have read the article.

This is the best I can think of right now, if I come up with a better scoring system I'll make some changes. Anyone feel free to test score a couple of articles to see how this algorithm holds up. We can even get a way of turning the score to a percentage, so that people can extract 90% qualified articles.

Another tool that could be incorporated, but I'm not too enthusiastic.

Trolls are not here to approve, and usually reject views of experts who must be certified by someone trolls grumble about. So one would expect them to be disgruntled by definition about such a mechanism. However, paradoxically, almost all trolls think they apply clear and reasonably stringent standards. The problem is that each troll has his own standards, unlike those of others!

That said, there is much to agree on: the mechanism itself must be genuinely easy to use, nothing slow and rigorous is of any value, the progress of Wikipedia and its proven process should not be impeded, and the results of the approval can be ignored. Where trolls would disagree is that verifying the expert's credentials are of any value. Any such mechanism can be exploited, as trolls know full well, often being experts at forging new identities and the deliberate disruption of any credentialing mechanism.

One might ignore this, and the trolls, but, it remains that what goes on at Wikipedia is largely a process not of approval but of impulse and then disapproval. As with morality and diplomacy, we move from systems of informal to formal disapproval. Today, even our reality game shows demonstrate the broad utility of this approach, with disapproval voting of uninteresting or unwanted or undesired candidates a well-understood paradigm.

So, imagine an entirely different way to achieve the "desirements", one that is a natural extension of Wikipedia's present process of attempt (stubs, slanted first passes, public domain documents, broad rewrites of external texts) and disapproval (reverts, neutralizing, link adds, rewrites, NPOV dispute and deletions). Rather than something new (trolls hate what is new) and unproven that will simply repeat all the mistakes of academia. Imagine a mechanism that

  • Begins with all approved, and makes it possible to broaden or narrow the selection of approvers (e.g., one person might only wish authors who have phd's, another would allow for anyone who has made an effort to approve any articles) for each reader, or supported class of reader, simply by disapproving editors.
  • Allows for extracting topic-oriented sets (e.g., in order to produce an "Encyclopedia of Music") relying on metadata that is specific to each such supported class of reader, not part of the Wikipedia as a whole
  • Exploits ongoing feedback ("I don't care about this" or "I don't understand this") to adjust the list of articles of interest. Each user can begin from some class (like Simple-English-only readers), and adjust if they like.
  • Potentially, exploits more feedback on authors ("I can't believe this" or "I find this irrelevant") to adjust also the list of disapproved authors/editors.
  • Credits each troll who has driven off a disapproved author or editor. OK, that's a joke, but what do you expect, I'm a troll neh neh neh...

By embracing and extending and formalizing the disapproval, boredom and disdain that all naturally feel as a part of misanthropy, we can arrive at a pure and effective knowledge resources. One that rarely tells us what we don't care about. And, potentially, one that can let us avoid those who we find untruthful.

Some good points which are addressed in the proposal. Others will need a far bigger database monster than we can currently afford.

Propose and veto

edit

Include articles that have been proposed by at least one person, and vetoed by none.

Where two versions of an article are so approved, pick the later one. Where no versions of an article are so approved, have no article.

That's it.

Consistent with the proposal. Exactly this could be provided as a user option.

Andrew A's proposal

edit

See m:Referees. This proposal is consistent with much of the above.

Agree. (;-> Andrewa