Wikipedia:Bot requests/AutoAssessBot

Proposal: Bot to automatically add project assessment templates to article talk pages

Preamble edit

Assessments of article quality and importance are useful to wikiproject members, who use them to prioritize work, for example improving Stub quality articles that are High importance. There is an element of subjectivity, but assessments are most useful if they comply with the assessment criteria. Articles are assigned to wikiprojects, and their quality and importance recorded, using wikiproject talk page templates based on {{WPBannerMeta}}, e.g.

France NA‑class Low‑importance

	France portal This article is within the scope of WikiProject France, a collaborative effort to improve the coverage of France on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.FranceWikipedia:WikiProject FranceTemplate:WikiProject FranceFrance articles
NA	This article has been rated as NA-class on Wikipedia's content assessment scale.
Low	This article has been rated as Low-importance on the project's importance scale.

An article may be assigned to several wikiprojects. The quality rating should be the same for all of them, but the importance rating may differ from one project to another. There is a steadily growing backlog of articles to be assessed, about 570,000 in April 2018. Many articles have been significantly enhanced since they were last assessed, so their assessment may be out of date.

This proposal is to create a bot that would automatically assess and reassess articles, using the InceptionBot output to decide on relevant projects, and using ORES to assess quality for each project. The initial version of the bot would not assess importance, but that may be added as an enhancement. The proposal has been discussed at Wikipedia:Village pump (proposals)#Automated article assessment, where the response was generally positive.

Outline edit

AutoAssessBot checks articles in three different modes:
1. Soon after an article has been created, when InceptionBot has determined potentially relevant projects
2. Soon after significant changes have been made.
3. Scan through a defined set of articles, such as all that have never been assessed
For each new article it determines relevant projects using the InceptionBot output
If a new article does not have a talk page it creates one, with templates for the relevant projects
If the article has a talk page, it adds templates for relevant projects that are not yet on the talk page, and checks existing project templates to see if they should be updated
The {{WPBannerMeta}} parameter |auto= is used to control bot assessment
- When the bot adds a project template, it sets |auto=auto, meaning the assessment has not yet been checked.
- A reviewer may approve the assessment by changing the parameter to |auto=ok.
- A reviewer may change the assessment and change the parameter to |auto=no, meaning the bot has been overridden.
- A reviewer may delete a project template for an article that is irrelevant to this project
- The talk page is placed in a category for the |auto= value for each project. Thus |auto=ok for WikiProject Biography would put it in Category:Autoassess ok Biography articles.
- |auto=auto or |auto=ok show as a note on the template saying, e.g., "This assessment was done mechanically by AutoAssessBot"
When the bot checks an existing project template it skips it if |auto=no, otherwise it refreshes the assessment if changed
The bot creates or refreshes the quality rating on a project template using the result supplied by ORES
At this point, the bot defaults the importance rating to "low", since 80% of articles are low importance. More work is needed to find an algorithm that can accurately identify higher importance

There is an element of guesswork, so the bot will sometimes assign an article to an irrelevant project (e.g. General Hospital to WikiProject Medicine) or assess quality higher or lower than the criteria suggest. These errors should not cause serious problems, and can easily be corrected manually. Some experiment with the bot's parameters will be needed to ensure that the net benefits are clearly positive. A basic principle is that the AutoAssessBot always backs off if it hits any complication, including a manual assessment different from the ORES assessment, an ORES assessment over C, project templates with conflicting assessments, unrecognized |auto= values and so on.

Benefits edit

The bot will reduce workload. Assessors will usually just have change to |auto=ok to confirm the bot result
Newbies will not be upset when they see a bot has assessed quality based on an algorithm, where they might be upset if a human had decided their work was junk
The bot will reduce the backlog of unassessed articles
The bot will run periodically until a human overrides it, so will update its assessments. Human assessors typically do not review and adjust their initial assessment
The bot's algorithm can be steadily refined through review of rejected assessments to reduce the number of human interventions needed.

Related software edit

ORES edit

ORES is a tool supported by MediaWiki that uses machine learning to predict article quality based on structural characteristics of the article. Although ORES cannot evaluate the quality of the writing, neutrality, completeness etc., its predictions in practice correlate well with the quality ratings given by human reviewers.

ORES will give the full range of quality ratings, from lowest to highest: List, Stub, Start, C, B, GA, A, FL, FA. However anything above C requires a human review. Therefore, when ORES returns FL, that will be changed to L, and when it returns B, GA, A or FA those will be changed to C before creating or updating the project templates. See #High quality articles (below) for more on this.

It should be possible to query ORES with two parallel threads for 50 revisions (current version of pages) at a time. This should take less than 24 hours to go through 1 million revisions. An ORES client makes querying for a very large number of scores efficiently easier. See https://github.com/wiki-ai/ores/blob/master/ores/api.py We have a volunteer to help with usage since the docs are somewhat lacking, who can be found as "halfak" in #wikimedia-ai ^connect.

InceptionBot edit

InceptionBot, formerly AlexNewArtBot, checks new articles some time after they are started and adds them to wikiproject lists like User:AlexNewArtBot/FranceSearchResult, with a score indicating how likely the article is to be relevant to the wikiproject. Project members then check the lists and assess relevant articles.

The score is based on rules and a threshold provided by each wikiproject that subscribes to InceptionBot.
The rules define scores (positive or negative) to be given when patterns are found in the article.
The threshold defines the total score that must be reached for InceptionBot to put the new article into the wikiproject list.
Results will include many false positives since the goal is to alert a wikiproject that a new article may be relevant to the project, but the threshold is set high enough to avoid flooding the wikiproject list with irrelevant results.
The InceptionBot developer has offered to write the new article information to a public database in the Toolforge toolsdb. Each record would contain the date, article name, WikiProject, threshold and article score.

The InceptionBot results will be the basis for AutoAssessBot to add project templates to articles, but AutoAssessBot will apply higher thresholds. The goal is to get relatively few false positives so as to reduce the effort needed to delete irrelevant project templates, without being so rigorous as to get no matches at all. Some experiment may be needed to determine the optimum score needed to select a project for an article.

Possibly a multiple of the InceptionBot threshold could be used for the AutoAssessBot threshold.
Perhaps a default multiplier of, e.g., 4 could be used for all projects, so if an InceptionBot threshold for a given project is 20, the AutoAssessBot threshold would be 80. There would then be a way to define custom AutoAssessBot thresholds for specific projects.

WPBannerMeta edit

The {{WPBannerMeta}} template is used by all wikiproject templates on an article talk page. These display a banner that shows the article has been assigned to the project, shows the quality rating, in most cases shows the importance rating, and sometimes captures other information such as (for a biography), |living=no|listas=Last, First|politician-work-group=y. There may be several wikiproject templates on an article talk page, since an article may be relevant to several projects. The {{WPBannerMeta}} template will be changed to respond to the new |auto= parameter values.

It may normalise some variants in parameter values, e.g. OK = ok, No = no
Unrecognized parameter values will result in a message like "Unrecognized autoassess value: [val] – ignored"
For |auto=auto and |auto=ok it will add a message like "This assessment was done mechanically by AutoAssessBot", linking to the bot documentation
For all recognized |auto= values it will add the talk page to Category: Autoassess [val] [project] articles, e.g. Category: Autoassess auto Aviation articles

A one-off task will be to create the three Category: Autoassess [val] [project] articles categories for each of the 2,000-odd categories directly or indirectly in Category:WikiProjects by status. After this, it will be up to new WikiProject owners to create these categories.

Each of the new categories will be in a general category such as Category:Autoassess ok articles and in the main WikiProject category, e.g. Category:WikiProject Aviation articles.
The main WikiProject category is normally formed from "Wikiproject" + project name + "articles".
However, sometimes the project chooses a different category, which is given in the |MAIN_CAT parameter passed by the project template to {{WPBannerMeta}}.
Thus, for WikiProject Medicine, {{WikiProject Medicine}} includes the statement |MAIN_CAT = All WikiProject Medicine articles.

The new Category:Autoassess auto Medicine articles would therefore be in Category:All WikiProject Medicine articles.

Running the bot edit

The bot will run in three modes:

New article assessment, driven by the User:InceptionBot log
- AutoAssessBot may periodically check User:InceptionBot/Status, which InceptionBot updates when it finishes a run, to see if a new batch is ready.
- It will then start working through the InceptionBot log database starting from the date//time it left off on the previous run.
- The selected log entries should be sorted so that each new article is processed as a whole.
- There are about 800 new articles per day. If throttled to 1 one per four seconds, that would take about an hour.
Changed article assessment (tentative spec)
- Once a week the bot checks for all articles that have been changed in the 7-day period up to 48 hours ago
- For each selected article, it compares the length 9 days ago to the length 2 days ago
- Where the length difference is over 200 characters, it reviews and if needed updates the article talk page
Older article review, working through a set number of articles that meet specified criteria, such as articles that have no assessment

General considerations:

Overhead is a major concern. Code should be designed for maximum efficiency.
Possibly the changed article assessment should operate on a copy of the database to make a list of articles that need updates, then use that list to drive the live updates. The assumption is that most changes will not result in changes to ORES assessments
It is probably best to use a message-queue such as ZeroMQ or RabbitMQ at no more than 4 worker threads (See here for possible rate limits.
The bot should respect the {{bots}} template, and skip pages that block AutoAssessBot.
Each time the bot runs it should generate a page for that run holding statistics such as number of articles checked, project templates created, etc.

Bot rules edit

Standard rules for an article edit

For new articles, when the bot is processing InceptionBot results, it will add project templates to the talk page for projects that pass the AutoAssessBot threshold and where the talk page does not yet have a template for that project. The new project templates will be passed:

|auto=auto
|importance=low (See #Importance default below)
|class= quality rating obtained from ORES
In later implementations, other information may be added, such as "listas" for a biography, but this is out of scope at present

For articles checked in any run mode (new, changed, older article review), for each existing project template in the talk page:

Where |auto=auto or |auto=ok the bot will
- Obtain the ORES quality assessment
- Update the project template's quality assessment to the ORES assessment if different, and set |auto=auto to show the update has not yet been reviewed
Where |auto= is missing the bot will:
- Obtain the ORES quality assessment
- If the project template does not include a quality rating, add the ORES rating with |auto=auto
- If ORES gives the same quality rating as is present in the project template, add |auto=ok
- If ORES gives a different quality rating, add |auto=no
For project templates with other |auto= values, the bot will do nothing.

Importance default edit

When AutoAssessBot creates or updates a project template's quality rating, if the |importance= parameter is blank or missing the bot will set it to "low". Note that the default "low" importance will only be added to templates that show a message like "This assessment was done mechanically by AutoAssessBot".

Wikipedia:Version 1.0 Editorial Team/Statistics shows that four out of five articles have their importance assessed as "low". For manual reviewers it is less effort to change the importance of one article to "mid" than to add "low" or "mid" to the importance rating of five articles (we can assume that almost all articles with "top" or "high" importance have already been identified as such.) See #Importance (below). A future enhancement may add the ability to make a more sophisticated estimate of importance.

High quality articles edit

A quality rating above C must be reviewed and confirmed by a human. If ORES returns a rating above C, and based on the #Standard rules for an article AutoAssessBot would normally create or update the rating, AutoAssessBot:

Pushes the rating from FL down to L, or from B, GA, A or FA down to C
Creates or updates project templates with this rating as appropriate, setting |auto=no
Adds a template like {{High quality review |autoprediction=FA}} to the top of the article talk page to display a message like:

This article has been assessed by AutoAssessBot, which predicted that it may qualify as FA. You may want to consider formally reviewing the article and updating the quality rating in the project templates below. After reviewing the article, and updating the quality rating if appropriate or confirming the present C rating, please remove this template

The {{High quality review}} template adds the article to, e.g., Category:Articles auto-assessed as FA that require review.
Since all the project templates for the article now have |auto=no, AutoAssessBot will no longer update quality ratings for this article.

One article, many quality ratings edit

Although the quality rating is meant to be project-independent, measuring how close the article comes to ideal coverage of the subject, editors may give different quality ratings in different wikiproject templates. AutoAssessBot should flag these by adding the talk page to, e.g., Category:Articles with multiple quality assessments]]. In these cases, AutoAssessBot will not add or update project templates: the situation is confused and needs human attention.

Destubbing edit

If AutoAssessBot reassesses an article from Stub class to a higher class, it will also check for and remove any {{stub}} templates from the article itself. The reverse does not apply. If AutoAssessBot rates an article as Stub it will not add a {{stub}} template to the article. A {{stub}} template is any template that is in Category:Stub message templates.

Out of scope edit

Importance edit

Wikipedia:Assessing articles#Importance ratings: a variety of definitions discusses project importance ratings. Some projects refer to the scale documented at {{Importance scheme}}, while others refer to the definitions in the Wikipedia:Version 1.0 Editorial Team/Release Version Criteria. Other projects have customized scales which may consider factors such as notability of article topics, relationship to a "main" article for the project, centrality to understanding the project's subject area, reader interest and expectations and so on. The number of inbound links and number of page views would appear to indicate importance, but meta:Research:Automated classification of article importance shows that they are not strong predictors. It may be possible to develop project-specific algorithms for importance. They do not have to be perfect, just not wildly inaccurate. But for now, importance assessment is out of scope.

Project-specific information edit

Project templates sometimes captures other information such as (for a biography), |living=no|listas=Last, First|politician-work-group=y. It may be possible to derive this information from the article body, particularly when it is given as fields in templates or as categories. However, this will not be included in the first implementation.

Old articles with no project edit

The bot will not create project templates when reviewing changes or checking batches of old articles. If an old article has no project template, the bot will have no place to record a quality assessment, and will just skip the article. As a later enhancement we may consider adding logic to assign old articles to projects in some common situations, such as assigning an article to WikiProject Biography if the article is in a subcategory of Category:People by nationality. This is out of scope in the first version.

Comments edit

If the bot assesses an article from stub class to a higher class, will it also check for and remove any {{stub}} templates from the article itself? I haven't seen it mentioned and certainly hope that it will. Thank you.--John Cline (talk) 18:33, 11 April 2018 (UTC)[reply]

Good point. I have added that in "Destubbing" above. Aymatth2 (talk) 18:38, 11 April 2018 (UTC)[reply]

From a developers standpoint, re-running the article "soon after significant changes have been made"... What specifically would constitute soon after? What about significant changes? It might be better (less overhead - which is a real concern thinking about the volume of pages) to re-visit that have changed in the last X days/months/etc instead. SQL ^{Query me!} 03:09, 12 April 2018 (UTC)[reply]

See above #Running the bot. I am also concerned about overhead. If we could piggyback on an existing bot, that would be ideal. I assume, for example, there is something that checks large additions for copyvios. Maybe that could also call on AutoAssessBot to reassess the article.

Tagging new articles. I think it would be better to leave this off as a task. There are many other approved bots that do this IIRC. Overhead is a huge concern at this scale, and keeping the bot as simple as possible would be ideal. SQL ^{Query me!} 03:09, 12 April 2018 (UTC)[reply]

See above #Running the bot. @Bamyers99: Would it be practical to make AutoAssessBot a sub-task, whatever, of ~~AlexNewArtBot~~InceptionBot? Aymatth2 (talk) 12:31, 12 April 2018 (UTC)[reply]

I don't think that using AlexNewArtBot for WikiProject assignment would produce high quality results. There are too many false positives (articles listed that are unrelated to a particular WikiProject). This would be hard to overcome given the complexity of the regular expression matching that is custom to each WikiProject. Here are two examples with a lot of false positives: EducationSearchResult and MedicineSearchResult. --Bamyers99 (talk) 15:41, 12 April 2018 (UTC)[reply]

@Bamyers99: Presumably the scores range from "maybe" through "quite likely" to "almost certainly". If we set the threshold very low, the time spent flagging the many false positives would outweigh the saving from auto-assess on the valid matches. If we set the threshold very high, the saving from auto-assess on the many correct guesses would greatly outweigh the cost of manually flagging the few false positives. There is some optimal point. If the threshold was set at Score:150, would that eliminate most false positives? The manual review could then delete the incorrect guesses. Aymatth2 (talk) 21:32, 12 April 2018 (UTC)[reply]

The threshold is set by each WikiProject. Some percentage over the threshold could be used to increase the likelihood of a correct match. --Bamyers99 (talk) 23:04, 12 April 2018 (UTC)[reply]

We can work with that. A bit of experimentation may be needed. Aymatth2 (talk) 11:23, 13 April 2018 (UTC)[reply]

A separate question is whether AutoAssessBot could piggyback on AlexNewArtBot, with its logic invoked from AlexNewArtBot for each new article. On each call, if the article creator had added project templates without assessments, or if AlexNewArtBot had scored the project at 150+, AutoAssessBot could add the ORES quality assessment. I do not think ORES uses a lot of resources, since it mostly just looks at the article text.Aymatth2 (talk) 21:32, 12 April 2018 (UTC)[reply]

I am not interested in adding AutoAssessBot functionality to InceptionBot (AlexNewArtBots replacement). I would be willing to write the new article information to a public database in the Toolforge toolsdb. Each record would contain the date, article name, WikiProject, threshold and article score. --Bamyers99 (talk) 23:04, 12 April 2018 (UTC)[reply]

@Bamyers99: That would work. I get the sense that InceptionBot would take some time (maybe hours) between writing the first record for an article and the last. Is there some way it could signal when it has finished processing the article, so that AutoAssessBot could start on it? Aymatth2 (talk) 11:23, 13 April 2018 (UTC)[reply]

AutoAssessBot could periodically check InceptionBots status page that gets updated when it finishes: User:InceptionBot/Status. The records would get inserted into the database all at once right before InceptionBot finishes. --Bamyers99 (talk) 17:18, 13 April 2018 (UTC)[reply]

I'm struggling with the difference between autoassess=no and autoassess=skip. At least from the bot's standpoint - I get how they are intended to work. What's the reason for the differing parameters. In this vein, should the bot respect the {{bots}} template? SQL ^{Query me!} 03:09, 12 April 2018 (UTC)[reply]

~~I have tried to clarify that in #WPBannerMeta. autoassess=no means the bot should not reassess the article. autoassess=skip means the bot should not assign the article to this project.~~ I have also added a note on respecting the {{bots}} template. Aymatth2 (talk) 12:31, 12 April 2018 (UTC)[reply]

Come to think of it, there is no need for a "skip" value, since the bot will only add project templates the first time it looks at an article, and irrelevant templates can just be deleted. I have fixed the write-up to take out that unnecessary complication. Aymatth2 (talk) 14:08, 14 April 2018 (UTC)[reply]

Implementation wise, it's probably best to use a message-queue such as 0MQ or RabbitMQ at no more than 4 worker threads (See here for possible rate limits? @MMiller (WMF): / @Halfak (WMF): - These are the rate limits for ORES, right? This'll be pretty important. Templated articles alone that are unassessed number nearly 1 million according to my query. At one thread hitting those ALONE is nearly a year assuming no overhead.) SQL ^{Query me!} 03:50, 12 April 2018 (UTC)[reply]

I have added your point on message queues to #Running the bot. Again, I would prefer to piggyback on another bot. Aymatth2 (talk) 12:31, 12 April 2018 (UTC)[reply]

You should be able to query ORES with two parallel threads for 50 revisions (current version of pages) at a time. This should take less than 24 hours to go through 1 million revisions. We provide an ORES client to make querying for a very large number of scores efficiently easier. See https://github.com/wiki-ai/ores/blob/master/ores/api.py I'd be happy to help with usage since our docs are somewhat lacking. You can find me as "halfak" in #wikimedia-ai ^connect. --Halfak (WMF) (talk) 20:50, 12 April 2018 (UTC)[reply]

I hadn't thought about passing multiple revisions in a single query, thanks! SQL ^{Query me!} 21:19, 12 April 2018 (UTC)[reply]

I will put a note on that under #Running the bot. It seems part of a general need to find way to minimise resource usage. Aymatth2 (talk) 14:12, 13 April 2018 (UTC)[reply]

WP:RATER includes an ORES evaluation and when I used it to tag articles, I sometimes got predictions like "GA" and "FA" for C-class articles. So there probably should be a limit as to how good the article can be automatically rated, with the bot being prevented to add more than C-class (because starting with B-class, manual review is required). This also means there has to be a parameter added to WPBannerMeta like manual-assessment-needed=yes which the bot could set to alert human editors to articles it believes to be of better class but is prevented from assigning because manual review is required. Regards So Why 11:45, 13 April 2018 (UTC)[reply]

@SoWhy: Good points. I will start a section above on ORES and make that point. Any predication from ORES above C goes down to C. Rather than make any more changes that needed to WPBannerMeta, maybe the way to flag them would be to add a new template with the actual ORES rating, putting it in a category for manual assessment? I am sort of drifting towards the idea that since quality is meant to be independent of projects, there should be one quality template followed by one or more project templates – but that is probably too radical at this stage. Aymatth2 (talk) 14:12, 13 April 2018 (UTC)[reply]

See above #ORES and #High quality articles. I think this handles it. Basically the bot flags the need for manual review then gives up on the article, which is too high quality for the bot to deal with. Aymatth2 (talk) 10:57, 14 April 2018 (UTC)[reply]

Random break edit

I am still reading and thinking about the details but I can post a few initial thoughts. It sounds like a useful proposal but I think the details need more thought.

Will the bot ever override assessments that have been made manually? I see the |autoassess=no option but very few editors will know about that, and may get irate if their manual ratings are changed by the bot (especially if the bot's rating is inaccurate, which may happen). It may be safer only to autoassess articles which are currently unassessed.
I can't see the purpose of |autoassess=auto. If it hasn't yet been automatically assessed, then just leave this parameter off?
Template:WPBannerMeta already has |auto= parameter with various options. Perhaps it is sensible to expand this function rather than introducing another parameter related to autoassessment?
The advantage of using syntax like |auto=C is that it would retain any manual assessment e.g. |class=B. In the event that the two assessments differ, the template can alert editors or populate certain tracking categories. In that case the concerns voiced by SoWhy above would not occur but projects could be notified if they potentially had an GA which was only rated as C-class.

More thoughts to follow ... — Martin (MSGJ · talk) 10:42, 19 April 2018 (UTC)[reply]

@MSGJ: My comment on your comments: Aymatth2 (talk) 13:09, 19 April 2018 (UTC)[reply]

The bot will not directly override a manual assessment. See #Standard rules for an article (Where |autoassess= is missing...). If ORES gives a different quality rating from the manual rating, the bot will leave the manual rating and set |auto=no. But if ORES gives the same quality rating, it will set |autoassess=ok, and in that case, after a significant change it might adjust the rating up or down. I think that should be acceptable. Marcel Lucet could be taken as an illustration, an article I started the other day.

Soon after the first version was saved, it was manually assessed "stub", quite reasonably
AutoAssessBot would have found the article in the new article list, presumably would have agreed with the assessment, and set |autoassess=ok
The article was then expanded considerably
AutoAssessBot would have spotted the change, reassessed the article as Start or C, and set |autoassess=auto to show the upgrade has not been reviewed

I don't think the original reviewer would have any problem with that. Note that if AutoAssessBot had not got to the article until after it had been expanded, its rating would have been different from the manual "Stub" and it would therefore have left the manual Stub rating and set |autoassess=no. I would hope that over time the bulk reviewers will learn to give a new article a few days to settle down before rating it, good practice anyway.

|autoassess=auto is the value when the article has been automatically assessed, but has not yet been reviewed. It puts the talk page into, e.g. Category: Autoassess auto Aviation articles, a list of automatically assessed articles for Wikiproject Aviation. A reviewer working through that category would then set it to |autoassess=ok or |autoassess=no, moving it into Category: Autoassess ok Aviation articles or Category: Autoassess no Aviation articles.
I can't see any reason why we should not use the |auto= parameter, adding the new auto / ok / no values. It seems a bit confused, but it would work, I think. I have made the change above:

|auto=stub, the article includes a stub template and therefore has automatically been rated Stub-class;
|auto=inherit, the class has automatically been inherited from other WikiProject's assessments on the same page;
|auto=length, the class has automatically been deduced from the length of the article.
|auto=auto, a bot has assessed the article using ORES, and the assessment has not yet been reviewed
|auto=ok, a bot has assessed the article using ORES, and the assessment has been reviewed and approved
|auto=no, the article should not be assessed or re-assessed by a bot

See #Standard rules for an article "For project templates with other |auto= values, the bot will do nothing". AutoAssessBot would skip articles with |auto= stub, inherit or length, which is what it should do.

See #High quality articles. If ORES rates the article above C the bot rates it C and adds a template that requests manual review. If a human has rated the article above C, the bot backs off.

A basic principle which I have now spelled out is that the bot always backs off if it hits any complication, including a manual assessment different from the ORES assessment, an ORES assessment over C, project templates with conflicting assessments, unrecognized |auto= values and so on. Aymatth2 (talk) 13:09, 19 April 2018 (UTC)[reply]

Hey Aymatth2, thanks much for putting this together, excellent work that I'm definitely in support of! Using the article predictions to help Wikipedia contributors update ratings is an idea that's been floating around in my mind since I first started working on the prediction models back in 2012 or so, so it's great to see this happening! Thought I'd leave a few comments that I think are relevant:

I would not use ORES to assess list-style articles as it's not trained on that type of data. The data gathering process for the dataset that ORES is trained on does its best to remove lists. You might be able to use it to assess a list, but I don't know how that would work since it cannot output an "L" or "FL" rating.
ORES doesn't make predictions for A-class articles. There are not a lot of A-class articles in Wikipedia (my Quarry query finds 909 at the moment) and adding them results in lower classifier performance. In my experience there are only a few WikiProjects that use A-class articles, and those that do tend to then push them to FA (arguably, an A-class article is an FA that hasn't yet been edited to conform to the Manual of Style).
I'm not sure how to solve the problem of an article having different ratings by different projects. What we do for the data that ORES is trained on, and how I generally approach this, is to use the highest class. Another approach would be to use the majority vote, something which I've found rarely contradicts the other approach. While each WikiProject has the ability to rate articles individually, I do not know of examples of projects radically rewriting the assessment criteria. In other words, ORES lives in a world where an article has one assessment rating.

Hope this is useful, and again: great project! Cheers, Nettrom (talk) 23:29, 24 April 2018 (UTC)[reply]

@Nettrom: Thank you for your input. To your points:

Does ORES never return an "L" rating? If so, AutoAssessBot could use a separate way to identify lists. Initial rules could be that if the name starts with "List of " or most of the lines start with * then it is a list. That would give some false positives like Peristylus and would miss table-type lists with odd names, but would probably be accurate enough. Aymatth2 (talk) 11:37, 25 April 2018 (UTC)[reply]
See #High quality articles. AutoAssessBot will not assign a rating above C, but will flag one rated above C by ORES for human attention. I suspect that it is very rare for an article to be GA, A or FA quality but to have not yet been assessed as higher than C. Aymatth2 (talk) 11:37, 25 April 2018 (UTC)[reply]
I think it would be better if article quality were shown on a different template from the project templates. I do not know of any projects that have non-standard criteria, and it would certainly make it simpler for AutoAssessBot, which could record quality ratings even when no projects had been identified.

Start

This article has been rated as Start-Class on Wikipedia's standard quality scale.

Detailed criteria: The article has a usable amount of good content but is weak in many areas. Quality of the prose may be distinctly unencyclopedic, and MoS compliance non-existent. The article should satisfy fundamental content policies, such as BLP. Frequently, the referencing is inadequate, although enough sources are usually provided to establish verifiability. No Start-Class article should be in any danger of being speedily deleted.

Reader's experience: Provides some meaningful content, but most readers will need more.

Editing suggestions: Providing references to reliable sources should come first; the article also needs substantial improvement in content and organisation. Also improve the grammar, spelling, writing style and improve the jargon use.

The assessment was made by AutoAssessBot and has not yet been reviewed.

	This page is supported by WikiProject France, a collaborative effort to help develop and improve the coverage of France on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks..
High	This article has been rated as High-importance on the project's importance scale.

	This page is supported by WikiProject Ecoregions, a collaborative effort to help develop and improve Wikipedia's coverage of ecoregions. The aim is to write neutral and well-referenced articles on these topics. See WikiProject Ecoregions and Wikipedia:FAQ/Contributing.
Low	This article has been rated as Low-importance on the project's importance scale.

But that would be a huge shake-up. I would like to see AutoAssessBot up and running first, then perhaps add a new {{quality}} template for use when no projects have been identified, then nudge the projects into using |auto=inherit to somehow pick up the quality rating from the {{quality}} template.. For now though, the approach at #One article, many quality ratings seems all that is practical. Aymatth2 (talk) 11:37, 25 April 2018 (UTC)[reply]