User talk:WP 1.0 bot/Second generation/Archive1

Question

edit

Hey CBM, where are you going to get three-four folks to maintain the new bot? :) Oleg Alexandrov (talk) 14:52, 13 July 2008 (UTC)Reply

I'm not sure. I don't have any candidates in mind, so at this point it may be just wishful thinking. Especially since the people with experience are usually busy enough already. Time will tell if anyone steps forward over the next couple months.
I think breaking the code into pieces may help. A new person could just write one component instead of working with the entire codebase. I'm thinking in particular about CGI scripts to run queries against the ratings data - a lot of people have the ability to write that sort of thing. — Carl (CBM · talk) 01:14, 14 July 2008 (UTC)Reply
I'll be happy to help with maintenance, of course, subject to available time and as long as the logic of the code is kept reasonably separate from the database queries, which I guess will be many. I worked with PHP before a bit. Wish you luck, it will be a lot of work I think. Oleg Alexandrov (talk) 03:49, 14 July 2008 (UTC)Reply
I could probably do a little bit of PHP, if the code does not have the complexity of MediaWiki. My programming background is essentially QBasic (no kidding), C++ and Java, so I can probably play around with the code a bit. Titoxd(?!? - cool stuff) 04:09, 14 July 2008 (UTC)Reply
  • We will most definitely need somebody with knowledge of SQL if we want to add a database to the backend (which appears to be the case) and also a place to host the database. Would the toolserver accept it? Titoxd(?!? - cool stuff) 00:14, 16 July 2008 (UTC)Reply
    • I know enough SQL and database stuff to get by; I've already been implementing some things to see what issues come up. Also, I have a toolserver account, which comes with personal database privileges. The toolserver has a "stable server" that we could probably use as well. — Carl (CBM · talk) 01:37, 16 July 2008 (UTC)Reply

Feature requests

edit

Titoxd started a list of feature requests, which is a good idea. The only one I don't understand is #2 means - "Wikiproject preferences".

Request #1 will require some discussion, but it's one of the goals for the second generation code. #3, #4, and #5 are also goals to be implemented.

I think that #3 and #6 are both about the same issue, which is that the current bot code doesn't generate updated statistics until the end of the run, and that one long project can block other projects from running. My idea for the new bot is that the script to generate overall statistics would be unrelated to the script that gathers data; the statistics generated would just be the statistics present in the database at that time. That would mean that we could update the "small" projects more often than the "large" projects.

The issue of task forces and subprojects will need to be discussed mode widely. It seems like a good idea, but there are some technical issues that need to be talked through. — Carl (CBM · talk) 13:56, 14 July 2008 (UTC)Reply

#2 is essentially #3 in #Motivation for an update, and I'll clarify it. Also, #3 and #6 are indeed about having smaller / new projects updated more frequently, while larger projects being split off to using a separate instance of the bot, maybe using the same frequency as is being used now. Titoxd(?!? - cool stuff) 18:55, 14 July 2008 (UTC)Reply

Category intersection

edit

I believe that a rigorous category naming convention and a category scheme should be set up right now. We should make room for a new input (-Type), as it is possible that it would be implemented later:

I suggest a categorization scheme that follows this pattern, with this exact capitalization. For a fictitious WikiProject Whatever:

First level:

[[Category:Whatever page of INPUT-Class]]
[[Category:Whatever page of INPUT-Importance]]

And if -Type is accepted:

[[Category:Whatever page of INPUT-Type]]

Second level:

[[Category:Whatever page of INPUT-Class of INPUT-Importance]]

And if -Type is accepted:

[[Category:Whatever page of INPUT-Type of INPUT-Class]]
[[Category:Whatever page of INPUT-Type of INPUT-Importance]]

Third Level (if type is accepted):

[[Category:Whatever page of INPUT-Type of INPUT-Class of INPUT-Importance]]

Inputs:

INPUT (-Type)= Article, List, Portal, WikiProject, Disambiguation, Redirect, Image, Category, Template, Needed, NA, Unspecified
INPUT (-Class, For now) = FA, FL, GA, A, B, C, Start, Stub, List, Template, Category, Image, Needed, Disambig, NA, Unassessed
INPUT (-Class, Future?) = Featured, Good, A, B, C, Start, Stub, NA, Unassessed
INPUT = Top, High, Mid, Low, NA, Unknown

Headbomb {ταλκWP Physics: PotW} 06:32, 15 July 2008 (UTC)Reply

I think that changing the categorization system is beyond the scope of this page. If the system is changed, the bot will change to work with it. But the categories should be set up by the people who contribute to the WP 1.0 group, rather than by the bot operators.
Apart from the Type addition, the main change that I think you are proposing is to add "second-level" category/importance categories. That is one issue that definitely needs to be resolved in the next six months. — Carl (CBM · talk) 13:31, 15 July 2008 (UTC)Reply
Aside from that, none of those changes are actually needed to make the feature you suggest work. If the bot can pick up X-Class, Y-Importance and maybe Z-Type information independently, it can do the back-end work of putting the three together in an "Article" object. The results of the category intersection can be displayed by linking to a CGI script on the backend (toolserver?) , instead of on the wiki, as creating and updating those pages on the wiki would cause a lot of unnecessary writes to the en.wikipedia database. Titoxd(?!? - cool stuff) 19:17, 15 July 2008 (UTC)Reply
Actually, thinking about this, I flat-out can't support implementing this on the wiki front end. WP:1.0/I says that right now, there are 1376 participating projects in bot assessments; assuming that the projects use standard 8-point quality scales (FA, A, GA, B, C, Start, Stub, Unassessed) and 5-point importance scales (Top, High, Mid, Low, Unassessed), the bot would need to create an additional 55,040 pages. Since the bot is coded to stop for five seconds after page writes, it would mean that for a single bot run, 275,200 seconds of idle time would be added to the bot's processing time. This is 3.18 days of just waiting, without counting the processing time to generate those pages; a conservative estimate of the added processing time would be about half a day for the whole index. So, we're talking about almost doubling the time a bot run would take, making bot runs occur only once a week (or twice a month if a run fails). I'd think it would be much more efficient to just make the 1.0 bot create links to a CGI script in the toolserver that returns the processed request from the bot's assessment database. Titoxd(?!? - cool stuff) 07:15, 23 July 2008 (UTC)Reply

Well low-use sections wouldn't need to be processed as often. Template, Disambig, Categories, Image, etc... would not have a lot of change made to them. In reality, only Article and List type would need to be considered for regular (see more than weekly). WP 1.0 really doesn't need to keep track of "A-Class" and "B-Class" disambiguation pages every day, or for "high" and "low" importance categories etc... Run the basic bot for regular updates (Full parameters for Articles, and Lists, tally up templates, categories, disambiguation, images, Projects, Portals, etc... (about ), and run the full thing (Full parameters on Articles, Lists, WikiProjects, Images, Portals, Categories, Templates... and tally up the NA, Disambig, Needs types).

Other things could be considered, such as placing individual type, class, and importance categories, then let WP 1.0 build a database and do the intersections rather than query wikipedia for each subcategory.Headbomb {ταλκWP Physics: PotW} 07:53, 23 July 2008 (UTC)Reply

Actually, I didn't consider any of those intersections in my calculations above. Adding those, even though not used on all projects, would probably add another day or two to the bot run, which is what I'm trying to avoid. And having the individual type/class/importance categories is actually what I'm suggesting, with the only thing being that instead of accessing a page on Wikipedia such as Wikipedia:Version 1.0 Editorial Team/Tropical meteorology A-Class Top-Importance articles, we link to tools:~titoxd/wp10cat.php?project=Tropical meteorology&class=A&importance=Top (dead link) or something similar. Titoxd(?!? - cool stuff) 08:36, 23 July 2008 (UTC)Reply
There are three types of output that the current bot generates: tables, logs, and lists. My idea for the second generation is that the tables would still be uploaded to the wiki, but the lists would be generated on demand by a web-based program. That would reduce the huge number of edits that have to be made to keep the lists up to date, which would in turn allow the bot to update the tables more often. Once the lists are dynamically generated, category intersections can be made with very little extra work. — Carl (CBM · talk) 14:19, 23 July 2008 (UTC)Reply
Indeed, and the logs could be generated dynamically as well, showing a page's prior history. I'd be interesting in helping code that. Titoxd(?!? - cool stuff) 21:24, 23 July 2008 (UTC)Reply

I've seen Headbomb's 'type' idea floating around in a number of places, and have never been convinced of its usefulness or practicality. What is its purpose, and why are the non-article -Class extensions (Category-Class, Template-Class, etc) inadequate? If this is not a good place to discuss this issue, where should we go to do so? Happymelon 18:57, 7 August 2008 (UTC)Reply

Well, my argument is that the assessement is that the class should reflect the quality of what is tagged. For example, there are list-class pages, but if you use the list-class, you can't give a quality rating to your list. I mean, you could add GL, AL, BL, CL, StartL, StubL classes, but that's kinda weird/ugly/unnatural. Same goes for images. If you have an image-class page, you can't rate the quality of the image. This would not be extremely useful, but it could have uses. For example, some images are featured. Right now they are not tagged as such, so we could make a FI class, or simply make an image type, with featured class. So basically, if you have articles, lists, categories, templates, images, portals, wikiprojects, redirects, disambig class pages, you could tag and assess things not normally assessed, such as portals, images and wikiprojects. Some things might not be used a lot, I doubt there's a great deal of projects that would like to assess their templates, but some might wish to do so. But assessing lists, portals, and images would definitely be useful. It might also lead to "criteria lists" that will inspire people to make their project/portal/template of higher quality etc... I already wrote a wikiproject template that uses all of these and it's near completion (see {{WkP X}}, made for Wikipedia:WikiProject X). Headbomb {ταλκWP Physics: PotW} 16:58, 20 August 2008 (UTC)Reply

WP News

edit

What I mean by this is that it would be really nice for WikiProjects to have news about the status of their articles. I don't know if this is the right place to suggest this, but it's at least something to think about. What I mean by this is the ability for Wikiprojects to get bot-delivered news about which articles under their wing is up for Peer Review, is nominated for deletion, is nominated for FA, passed FA, failed FA, got demoted from FA, ... It's unrelated to WP 1.0 bots tasks per say, but perhaps WP 1.0 bot could build the database in a way to help a "WikiProject News Bot (WPNBot?)"? Or perhaps WPNBot wouldn't need this database at all, and that my request is unnecessary.Headbomb {ταλκWP Physics: PotW} 06:03, 15 July 2008 (UTC)Reply

It would certainly be possible to permit other bots to download a list of all articles from a particular project. Once that feature is available, it would be easy for someone to make a bot that watches peer review, FA, etc. and notifies wikiprojects when their pages are active. It would probably need to be an opt-out or opt-in system. — Carl (CBM · talk) 13:18, 15 July 2008 (UTC)Reply
Yeah it probably would. The bot might not even need this list, but it probably wouldn't hurt to structure it in a bot-accessible way. Headbomb {ταλκWP Physics: PotW} 13:26, 15 July 2008 (UTC)Reply

Current tasks

edit

A list of current tasks done by the bot would be useful. Titoxd(?!? - cool stuff) 00:13, 16 July 2008 (UTC)Reply

These are the tasks I know of:
  • Download data, generate tables and logs, and upload them
  • Fix the categorization of assessment categories themselves
  • There is another CGI script to create an assessment category tree for a new project.
— Carl (CBM · talk) 13:12, 7 August 2008 (UTC)Reply

Feature requests 10 & 11

edit

I added a couple of feature requests. I suspect that neither of them will make it into the second generation bot, but I thought it was worth mentioning them. Maybe for the third generation?

    1. 10 may be better done by the SelectionBot, but I list it here in case. It's not critical; also, we may decide that we don't want to promote petty competition, "my article is more important than yours". I'm only proposing it be done for articles with a WP1.0 template on the talk page; currently importance is (in effect) not used in the 1.0 template. The purpose: to show why certain articles have been chosen. I thought it would be a nice idea to put on the table, at least.
    2. 11 would allow projects to tag specific versions of articles for the release. This could be very valuable to the 1.0 project itself, but it would require projects to be more actively involved in designing releases. This is unlikely for a general release like Version 0.7, but is VERY likely when we start having the WikiProjects designing their own mini-releases or WikiReaders, such as "Atlantic Hurricanes since 1950." I'm guessing that the code for this could be nasty, though!

Thanks, Walkerma (talk) 02:04, 7 August 2008 (UTC)Reply

Item #10 is very hard for selectionbot to do, because it would mean editing the individual talk pages of articles. It's also hard to program into the template code because it requires access to data like interwiki count and hitcount that isn't available to wiki code. The best solution might be to add these scores to the output lists that the bot generates. — Carl (CBM · talk) 02:47, 7 August 2008 (UTC)Reply
And if it is really necessary to place in {{WP1.0}}, we can always add text that links to a report in the toolserver. I don't think editing a ton of pages because of changes in Henrik's stats (to pick an example) would be feasible or prudent. 11 is something that we should definitely do, regardless of what happens with flagged revisions; it will also be made easier by the tracking of assessment logs that will be made with the logging database table in 1.0 2g. Titoxd(?!? - cool stuff) 06:07, 7 August 2008 (UTC)Reply
Once the CGI query script is done, we can easily link from the WP 1.0 template to a page that will give a complete assessment history of the article along with the current ratings. — Carl (CBM · talk) 15:31, 7 August 2008 (UTC)Reply
Sounds great! Would such pages be temporary (generated on the fly) or would they be like a log- a permanent page with regular updates? Would each article have its own page, or would these be collated into groups (perhaps in a table)? Walkerma (talk) 15:54, 7 August 2008 (UTC)Reply
They would be generated on the fly, for the most part. We could probably have something like history.pl?title=2005_Atlantic_hurricane_season&project=Tropical_cyclones to show only the changes that have occured in 2005 Atlantic hurricane season's assessment history, or we could have history.pl?project=Tropical_cyclones to show all of a project's articles and links to each individual article's history. Titoxd(?!? - cool stuff) 16:02, 7 August 2008 (UTC)Reply
Just like Titoxd says, the pages would be dynamic. The web interface has (at least) three functions, which together make a unified interface. The functions are:
  1. Summary tables. Each cell in the table should link to a dynamically generated list of the articles inside that cell. The proof-of-concept for this is table.pl
  2. Display lists of articles. Once a list has been generated, it should be possible to make a summary table of the list. The proof-of-concept code is in list.pl and list2.pl
  3. Display historical logs. There is no public demo of this yet, but the logging information is being stored in the database, and I can extract the old log information already made by WP 1.0 bot. There are at least two types of logs:
    1. Log of all recent changes in a given project. This could include the ability to filter out just a subset of the project.
    2. Log of all recent changes to a given article
— Carl (CBM · talk) 16:31, 7 August 2008 (UTC)Reply

Proposition concerning #11

edit

Concerning the selection of articles, #11 looks for me to be the last big challenge. I have been thinking since more than a year about that and I have now a software proposition.

The basic feature of my tool is to give anyone, with a registered user on WP, the ability to tag any article version. In addition, for an article, to see the diff beetween a tagged version an the current one.

The system works on an independent server an is connected to wikipedia with the help of a web api and a custom javascript page (like a lot of tools on WP). In other words : the system gives the ability to register a user (with the same username than on WP) and to save four-tuples (user-article-version-tag).

Concretely, every user which wants it, can with a small box on the left sidebar (like "main", "search" or "toolbox") of every article page, tag an article as "reviewed". Afterward, he will be able to display in one click the difference between this (old) tagged version and the current one ; like that he can better overview the improvements (or vandalisms) and, if he wants, can tag again in one click the new version.

I think that could be a really easy and useful tool which can help a lot, lots of people : something mandatory if we want to have good releasable versions for more than 30.000 selected articles.

I think that someone motivated can do that in a few days of work ; but that´s only the first step.

The second step would be to introduce :

  • Group of users/articles (wikiproject for example),
  • new tags ("reviewed" is the default one) but you can imagine a tag

"0.7" with special requirements for example. I'm sure that people have a lot of special wishes and will setup their own tags.

  • permissions to allow certain tags for certain users and/or groups and/or articles.

ASCII-art screenshot of the sidebar box:

tag article "Chicago"
-----------------------
| [ ] rewiewed (diff) |
| [X] 0.7 (diff)      |
|  [Tag now (button)] |
-----------------------

You can see now, how we can have our best version of selected articles :)

Regards Kelson (talk) 09:24, 10 August 2008 (UTC)Reply

Erm, you might want to have a look at WP:FLV and mw:Extension:FlaggedRevs, becase I think you've been beaten to the post! Happymelon 16:14, 10 August 2008 (UTC)Reply
Flagged revision :
* is not installed on en: nor fr:
* does not allow to specify custom tags
* does not allow specific users to tag specific articles, i.e. I'm interesting in the tag of the Mathematics wikiproject members for the mathematics articles. On WP, I only trust a few people (not always the same, depends of the article) to correctly validate a specific article.
* does not allow to everyone to build his own selection Kelson (talk) 16:22, 10 August 2008 (UTC)Reply
Well, none of that is strictly true, although I admit it would be more difficult for the system to be maintained if its configuration was left up to the Wikimedia developers. FlaggedRevs is installed on all wikimedia wikis, although disabled on most until there is a consensus to enable it, and allows you to create as many flags as you want, and call them whatever you want. De.wiki only uses one, calling it 'quality', but we could quite easily implement a "1.0" flag. Each flag can have as many levels as desired, and they can be called whatever you like, so we could give that "1.0" flag levels called "0.5", "0.7", etc. The right to set the flag levels on an article is restricted by usergroup, so we could quite easily create a 'wp1.0assessor' usergroup and only allow them to set the 1.0 flag. So as you can see, FlaggedRevs theoretically can do everything your system suggests. I'm not suggesting, however, that it would be anything like as convenient to organise - certainly the developers will not want to create a separate flag for each individual wikiproject that wants it. But then again, I'm not sure whether such a flag field is a good idea anyway. I also predict considerable opposition to integrating an external system so prominently into the core interface - there is considerable merit to keeping all software and tools 'in the family', as it were, using just the WMF servers and the toolserver. All in all, I think that this system is essentially just moving the assessments from the talk page (in wikiproject banners) to the article, and adding a few bells and whistles. Those bells and whistles would be useful, no doubt, but I think the majority of what you propose could be accomplished by storing the oldid of the assessed version of the article in the banner; that data could be piped back onto the article using javascript if desired, it would make a useful gadget. Happymelon 16:52, 10 August 2008 (UTC)Reply
Thank you for your argued answer. I agree with you about FlaggedRevs, .i.e. it's more or less possible technically, but not possible concretely to have the flexibility I think we need. At least, it will be complicated and slow to obtain. You write, "I also predict considerable opposition to integrating an external system" : this system would be a full opt-in system : nobody will have this tool per default. About the technical possibility to store the infos in wikipedia itself : we do like that currently on the french wikipedia. So, I agree with you again, if you write "the majority of what you propose could be accomplished by storing the oldid of the assessed version of the article in the banner", but : (1) that will be slow, (especialy if you have groups...), compared to a relational DB (2) it can generate quickly hundreds of thousands of modifications, what is not a good idea IMHO. So I believe, that they are many possibilities to allow people to tag articles, but the devil is in the detail and the details between these systems can change completely the deal. Regards Kelson (talk) 17:26, 10 August 2008 (UTC)Reply

Recording GA or FA

edit

Another thing I'd like to discuss is something that regularly comes up in 1.0 discussions. Could we use the bot to record if an article is a GA or an FA, using the GA or FA talk page template (or the article milestones template)? Perhaps the bot could add it into the log and the tables? The bot already does something similar, for recording an article's inclusion in Version 0.5.

Many (including myself) would like to see GA and FA removed from the assessment scale, because they are not WikiProject-based. Why? A huge number of people misunderstand the current system - some start tagging articles as GA-Class when they aren't GAs, and there is a CONTINUAL and POINTLESS discussion about "Shouldn't A be higher than GA?" or "What's the point of A-Class?" and related things. (To me, this is like asking, shouldn't cats be higher than dogs? They are different animals, you can't reduce them to a hierarchy.) The idea would be that projects would simply assess Stub-Start-C-B-A, and let the bot tag things as GA or FA.

We don't NEED to do this right away; but if I'm to moderate a serious discussion on this topic, I need to know if it CAN be done fairly easily. Currently I believe opinion is fairly evenly split, but most of the opposition centers around the (very reasonable) idea that projects want to keep track of GAs and FAs too. If we knew unequivocally that the bot could do this, it might finally resolve this thorny issue. Only if there was a clear vote in favor would we need the code to be written.

Noting one of the second generation proposals, that WikiProjects be able to have the bot more tailored to their specific needs, it may be that this could be one such feature that could be turned on or off on a per-project basis. Walkerma (talk) 02:15, 7 August 2008 (UTC)Reply

This can certainly be achieved if there is desire for it. Currently, there is a consistency problem where an article can be rated GA by projects without actually being a GA, or (less likely) could be a GA but not rated GA by projects. This means that the WP 1.0 bot's count of good articles often differs from the GA project's count of good articles. Using the "real" GA/FA categories to get the lists of good/featured articles would remove that problem. — Carl (CBM · talk) 02:40, 7 August 2008 (UTC)Reply
But cats are more certainly better than dogs, duh. That said, we could probably handle this using a "external validation" column in the ratings table, with r_external and r_external_timestamp, if necessary. Titoxd(?!? - cool stuff) 06:03, 7 August 2008 (UTC)Reply
Yes, it would be possible to add it to the database. The more difficult part is (1) the user interface to choose the revision and especially (2) authenticating the person who does the choosing. I think the "flagged revisions" system, that is already being tested by German Wikipedia, already has the ability for users to choose particular versions of articles as special. It might be easier to use that system to mark specific revisions. — Carl (CBM · talk) 12:42, 7 August 2008 (UTC)Reply
Good point, although we don't know if FlaggedRevs will ever be enabled here. Titoxd(?!? - cool stuff) 16:20, 7 August 2008 (UTC)Reply
I think there's always a possibility of enabling it only for the purposes of WP 1.0, so that it has no visible effect whatsoever, except to permit projects to select the right version of each article. This isn't the use that it was originally designed for, but it might be able to be customized to fit the job. — Carl (CBM · talk) 16:44, 7 August 2008 (UTC)Reply
I think FlaggedRevs was intended to help quality control generally, and this would be a great way to use it! Walkerma (talk) 16:56, 7 August 2008 (UTC)Reply
  • Related to revision markers: looking at the wp10bot database schema, I see that we're missing markers for pages that are selected in v0.5, v0.7, etc. We will probably need to add r_version, r_version_oldid and r_version_timestamp fields to replicate the current version's functionality. Titoxd(?!? - cool stuff) 16:20, 7 August 2008 (UTC)Reply

We definitely need to add additional tables for "articles that are GA or FA" and "articles that are in WP 0.5". These would be updated separately from the ratings table. The query interface would use these extra tables when it makes lists, etc. I have this on my todo list, after some caching that needs to be implemented.

The lack of revision_id is intentional. My experience with the current WP 1.0 bot is that fetching the revisionids is a key bottleneck. On the other hand, I found that if I store just the timestamp when the article was rated, it's easy to dynamically fetch the revisionid of the article at that time. So it's better to only store the timestamps, as the alpha code does, and only fetch the corresponding revisionid of the article when it is needed. For example, we can add a link to the query output that says "display rated version". This link will run a script that first fetches the revisionid, then uses that to display the version of the page when the rating was assigned. — Carl (CBM · talk) 16:40, 7 August 2008 (UTC)Reply

If you look at this example, you'll see that there are 0.5s in the version column. I was planning on asking 0.7 to be added into the system once we had got our selection made. Could GA/FA simply be noted in that same column (which could perhaps have a different column heading)? Walkerma (talk) 16:56, 7 August 2008 (UTC)Reply
I would prefer that to be in a different column, as they're different things (community validation and publication selection). As for the lack of rev_ids: that works. Those revisions are used only a few times, when selecting articles for publication, so if we can generate them at that time, instead of continuously, that should be a nice boost. Titoxd(?!? - cool stuff) 16:58, 7 August 2008 (UTC)Reply
Oh, I just noticed - only the OLD template ({{V0.5}} works for that, but the newer template ({{WP1.0}}) fails with that. Separate columns would be fine with me. I think the Ver column should really start getting busy once SelectionBot is humming along nicely - because I'd like to see folks like WP:CYCLONE using that to produce their own specialised releases. Walkerma (talk) 17:00, 7 August 2008 (UTC)Reply
We probably want to fix that. Titoxd(?!? - cool stuff) 18:26, 7 August 2008 (UTC)Reply

WIkiProject preferences aka custom ratings by project

edit

One commonly requested feature is the ability for a project to add its own ratings to the bot's tables. After thinking about it, and trying a different approach, I think the best way to do this is to add a template to the project's category (like Category:Foo articles by quality) that has the information in it. The template would look something like this:

{{ReleaseVersionParams
 | homepage=Wikipedia:WikiProject Foo
 | extra1-name=Bplus
 | extra1-type=quality
 | extra1-category=Bplus mathematics articles
 | extra1-ranking=400
}}

That would tell the bot where the project's home page is, and also tell the bot to add the quality rating "Bplus" to the projects table. The "ranking" value is needed to sort the table correctly. — Carl (CBM · talk) 02:33, 7 August 2008 (UTC)Reply

This would be a very nice feature, I think. I presume that SelectionBot could still read things like Bplus OK? Would it just presume it was B, or would it read in a ranking of 400 from the template? Cheers, Walkerma (talk) 03:00, 7 August 2008 (UTC)Reply
Yes, Selectionbot could handle this. Selectionbot can share the same database as WP 1.0 bot, actually. — Carl (CBM · talk) 03:06, 7 August 2008 (UTC)Reply

wikiproject-specific ratings and global tables

edit
How would we handle this for the global tables (e.g. WP:1.0/S), though? Titoxd(?!? - cool stuff) 05:53, 7 August 2008 (UTC)Reply
That's one of the things that needs to be hashed out in the planning process. There are a few options I can brainstorm:
  • Use the next lowest standard rating
  • Make the project choose which standard rating should be used, via the template above
  • Make a different line in the overall table, "Other", that collects all the nonstandard articles.
— Carl (CBM · talk) 12:38, 7 August 2008 (UTC)Reply

(\r) Let's think of the possible uses here:

  1. Many projects want to add things such as {{Cat-Class}}, {{Template-Class}}, {{Current-Class}}, {{Future-Class}}, and such to the scheme. These grades, even though they're non-standard, are still fairly common, so we probably could declare them as a "hidden standard" class or something like that. We could define a numerical value on wp10routines.pl above 1200, and just keep these grades disabled unless explicitly enabled in wp10prefs.
  2. Some projects will use classes unique to them: {{B+-Class}}, {{Merge-Class}}, {{AfD-Class}}, etc. These should be completely defined with wp10params. Although in a way, we need to figure out what are we going to do if several projects define {{Merge-Class}} with different numerical values...
  3. What do we do with custom tables for cumulative tables? If there is, let's say, B+-Class defined as a 350 quality assessment, do we round down to GA (400) or up to A (300)? Do we include every custom class in the cumulative tables, unless more than χ projects use it? Do we flag things like {{Image-Class}} as things that we want to ignore?
  4. Other projects may want to disable standard classes—do we want to allow this or not?
  5. Several others have indicated that they would prefer to have GA over A, or otherwise rearrange the order of classes. This would cause a rather confusing situation (both from a technical and end user perspective) when it is time to calculate cumulative tables—do we want to allow it or not?

We do need to discuss all of these things, as they will affect the way the bot is coded. Titoxd(?!? - cool stuff) 16:35, 7 August 2008 (UTC)Reply

To reduce confusion, I'm going to change wp10_routines to use the quality numbers used by SelectionBot. [1] In that system, a higher numbers correspond to higher rankings. — Carl (CBM · talk) 18:48, 7 August 2008 (UTC)Reply
Thinking about it, there is no reason why we should allow #4. If a project doesn't want to use a particular rating, then they can just not use that rating in talk-page assessments. As for common "type" non-standard classes (List, Category, Template, Image, Portal), let's define them inside the bot as hidden by default for global tables (although maybe with the exception of List), but not hidden in individual tables. Disambiguation, Redirect, Current and Future are common "article" non-standard classes, so we can again, define them globally and show them in individual tables. Other classes will only be shown in their projects. So, in summary, here's what I think we could do:
Common type Common article Custom
global score +
visible individual table +
invisible global table
global score +
visible individual table +
visible* global table +
* = maybe
custom score +
visible individual table +
invisible global table +
round down quality score
Category List B
Template {{Current-Class}} {{Merge-Class}}
File Future {{AfD-Class}}
Portal Disambig etc
Redirect

Comments? Questions? Flames? Titoxd(?!? - cool stuff) 01:58, 17 August 2008 (UTC)Reply

(I regard List-Class as pretty standard, now that 1.0 has been tracking it for some time.) Can you clarify global table vs. individual table? (There are many levels in 1.0!) BTW, I'm away so only checking WP occasionally. Walkerma (talk) 21:26, 17 August 2008 (UTC)Reply

I would prefer something like this:

Shown in all tables   FA   FL   GA   A B C
Start Stub {{Current-Class}} Future List Unassessed
Shown in project-level tables, completely ignored in other tables Template {{AfD-Class}} {{Merge-Class}} Portal Disambig Needed
Redirect Category NA etc.
Shown in project-level tables, merged with other ratings for global tables B

So the global tables would still be oriented towards article assessment, but the per-project tables would permit projects to track images, disambigs, etc. if they want. I think there is a downside to putting things like Image-Class in the global tables - it encourages all other projects to go and start tagging images and redirects, because the global counts will seem low. But (personally) I would rather not encourage that. For example, I see very little benefit in tagging millions of redirects with wikiproject tags. — Carl (CBM · talk) 22:34, 17 August 2008 (UTC)Reply

I like this idea. Titoxd(?!? - cool stuff) 23:58, 17 August 2008 (UTC)Reply
Looks good to me. Walkerma (talk) 02:34, 18 August 2008 (UTC)Reply
Well tagging redirects has a certain utility, in that projects can periodically check if the articles are still redirects, and if not, give them proper assessements.Headbomb {ταλκκοντριβςWP Physics} 01:02, 13 November 2008 (UTC)Reply

Very minimal demo

edit

I have put a very minimal demo online.

These are very ugly and incomplete, and are just intended to spark discussion. — Carl (CBM · talk) 02:35, 7 August 2008 (UTC)Reply

Hey, these look great! I've paid money for software that didn't work as well! This is really good, thanks for putting these together. Walkerma (talk) 02:58, 7 August 2008 (UTC)Reply
rofl Titoxd(?!? - cool stuff) 06:09, 7 August 2008 (UTC)Reply
Is there a way to list all articles in a project's purview using list.pl, or does it cover only one quality/importance intersection at a time? Titoxd(?!? - cool stuff) 06:09, 7 August 2008 (UTC)Reply
If you leave the fields blank, they won't be considered. So this query lists the first 50 articles assessed by the Amiga project. Most projects assess too many articles to make it practical to look at them all, which is why I added the ability to filter the list. I expect that the final version will have a more sophisticated query interface. — Carl (CBM · talk) 12:34, 7 August 2008 (UTC)Reply
We'll probably need to eventually increase the limit from 50 to the 400 we currently use in the tables, though. Titoxd(?!? - cool stuff) 16:59, 7 August 2008 (UTC)Reply
No problem - I just picked 50 out of the air. 500 would be fine. — Carl (CBM · talk) 18:13, 7 August 2008 (UTC)Reply
  • For some reason the quality and importance query don't seem to work for me.
  • Tables could have links to articles (very usefull)
  • A new feature for this, namely "quality mismatch"m would be useful for bot update of articles assessment. If an article is rated B-Class for Math and C-Class for Computer science, it could dump the article in a "quality mismatch" list or something, and a bot could tag the talk page with something like {{quality mismatch}}. Also if an article is rated B-Class in one project and nothing in the other, then a bot could pick this up and set the unassessed rating to B-class (or whichever is the lowest). Headbomb {ταλκWP Physics: PotW} 20:46, 7 August 2008 (UTC)Reply
There are legitimate reasons for assessments of different projects to differ, so I feel rather uncomfortable implementing something like that. Titoxd(?!? - cool stuff) 21:48, 7 August 2008 (UTC)Reply
The point is to list the articles with a mismatch in quality, and place a notice that this could be problematic, not to have a bot change them. Different projects may assess things differently and might decide that their rating only applies to the coverage of the section relevant to their wikiprojects (for example, Wikiproject Physics might consider the physics coverage of the electron to be of A quality, but history of science project might consider the history section to be of C quality). But by far an large, discrepancies are due to someone who gave a rating who didn't bother to update all the rest of the templates. This tagging would be very useful in articles that uses collapsed templates where the ratings are hidden. It would also be useful for wikiprojects, serving as a notification the article might have improved or got worse and might need reassessment. I don't see the harm of having a list of these articles as well as a "mismatch" message saying this is possibly problematic on their talk page.

The auto-assessing would be restricted to having all class= replaced by class=CLASS when there's already a rating on the page. I think there's already a bot doing this, but I'm not sure. Either way it'd be a good idea for a bot if there isn't already one around. It couldn't do any damage and would certainly facilitate the assessment of the zillions of tagged but unassessed articles out there.Headbomb {ταλκWP Physics: PotW} 06:30, 10 August 2008 (UTC)Reply
Headbomb, could you be more precise about which queries don't work? If you can give a link, that will help me find and fix any errors. The tables do link to lists (see the section below) - is that what you mean in the second bullet? I did fix one bug just now, which might have been causing your problems. — Carl (CBM · talk) 21:55, 7 August 2008 (UTC)Reply
Any quality or importance queries. Take the Amiga selections. It has at least some Stub-class articles. Yet if I place "Stub", "stub", "Stub-Class", "Stub-class", or "stub-class", in the "quality" box, I get no result. Headbomb {ταλκWP Physics: PotW} 06:30, 10 August 2008 (UTC)Reply
I think I see where the confusion got from. I don't mean the quality vs. importance table, I mean in the list of articles generated in http://toolserver.org/~cbm//cgi-bin/wp10.2g/alpha/cgi-bin/list.pl and http://toolserver.org/~cbm//cgi-bin/wp10.2g/alpha/cgi-bin/list2.pl. Headbomb {ταλκWP Physics: PotW} 06:57, 10 August 2008 (UTC)Reply
The interface isn't very good right now. You have to capitalize things in a particular way - "Stub-Class" and "Top-Class". See [2] versus Category:Stub-Class Amiga articles. — Carl (CBM · talk) 17:19, 10 August 2008 (UTC)Reply

Implemented: linking from tables to lists

edit

This is one of the most common requests I have seen - that the individual table cells should link to the list of that type of article. I have now added that to the alpha code (in the same place). So if you look at a table like this, the individual cells link to the right place. — Carl (CBM · talk) 18:13, 7 August 2008 (UTC)Reply

That's awesome. One question, though: why does the table show 19 unassessed articles, and when you click on the intersections, there's none displayed? Titoxd(?!? - cool stuff) 18:25, 7 August 2008 (UTC)Reply
You found a bug. The issue is that the database isn't quite coherent. For Unassessed-Class quality, sometimes it stores NULL because the article isn't in the "Unassessed foo articles" category. An example is Talk:2008 Atlantic hurricane season, which isn't in Category:Unassessed Tropical cyclone articles because it's marked "Current-Class". The code that updates the database is not (yet) robust enough to handle this sort of thing. I edited list.pl to temporarily hide the issue, so now the links should work more correctly. — Carl (CBM · talk) 18:37, 7 August 2008 (UTC)Reply
I think it's one thing to LIST differences, and another to automatically CHANGE assessments because of them. I think listing differences is a very nice idea, but as Tito points out, autocorrecting is bad. However, we should perhaps ask if the projects want such a feature - we want to make sure that the bot isn't generating hundreds of pages of output that are never read! Walkerma (talk) 22:14, 7 August 2008 (UTC)Reply
In this case, the pages would be dynamically generated only when they are requested, so there's no danger of the bot doing unneeded work. I implemented this in list2.pl now: compare [3] and [4]. Since this is easy to implement and (I think) has a good chance of being useful for manual comparison, it seems like a good thing to include in the final version. — Carl (CBM · talk) 12:47, 8 August 2008 (UTC)Reply
That page is the perfect example why it would be inadvisable to not modify a project's ratings: Severe weather uses List-Class, while Tropical cyclones assesses all of its lists. Changing either assessment might go against that project's assessment procedures (it would definitely go against WP:WPTC's criteria; not sure about Severe weather's), so it's better to let people do that by hand, and discourage blind bot runs to make changes of this nature. Titoxd(?!? - cool stuff) 22:51, 13 August 2008 (UTC)Reply
And no one wants and auto-assessment of already assessed articles. The only thing that would be nice is the auto-assessement of unassessed articles. Sure it's possible that an article that would normally get a list-class rating would get a C-class one instead, but considering the number of unassessed articles out there, having a bot do a rough and dirty job would help a lot more than it would cause harm of the "giving a C rating to what would normally be a list-class article". Individual project subscription to the bot could be added if people are terrified of having some C-class/List-class contamination, but I don't think it's necessary. Headbomb {ταλκWP Physics: PotW} 03:29, 14 August 2008 (UTC)Reply

Should the bot track non-articles?

edit

One reason that the current bot doesn't track Category-Class, Image-Class, and Template-Class is that the current bot only considers pages in the main namespace (that is, it only pays attention to WP 1.0 banners on pages in the Talk: namespace). Pages in other namespaces are completely ignored. The motivation for this behavior, I think, is that the goal was only to track article assessments for WP 1.0.

For the second-generation code, we need to decide whether to keep this behavior. It would be possible for the bot to track all pages that are assessed, or only track articles. — Carl (CBM · talk) 12:52, 8 August 2008 (UTC)Reply

I have always wanted to know what fraction of the other namespaces was covered by the extended class structure. I don't think this information should be provided in the standard tables, but it would be nice to have it available. Happymelon 14:12, 8 August 2008 (UTC)Reply
I agree with Happy-melon on that. Walkerma (talk) 14:44, 8 August 2008 (UTC)Reply
So it sounds like the bot should plan to store this information then, and we can work out later what to do with it. I'll add this to the list of things to do. — Carl (CBM · talk) 16:31, 8 August 2008 (UTC)Reply

Feature request - Good Topics and Wikipedia:Featured topics

edit

Hi, I hope you could get this feature in the first version of the bot, but please see (and comment) here - rst20xx (talk) 01:14, 30 August 2008 (UTC)Reply

A comment about log size

edit

While I realize this is an extreme edge case (me cleaning out a thousand or so redirects between updates), if the log for one day exceeds the allowed length of the page, the bot should post the whole day's log anyway instead of losing part of the log to the aether. Nifboy (talk) 03:31, 30 August 2008 (UTC)Reply

Note to selves

edit

Add principal article field to the projects table, to make Martin's life easier. Titoxd(?!? - cool stuff) 19:55, 17 September 2008 (UTC)Reply

Thanks for thinking of me! We need to remember that for some projects, there are 2, 3 or maybe even 4 principal articles used. Walkerma (talk) 21:12, 17 September 2008 (UTC)Reply
Ugh, that's going to make life harder... but I guess we can still make it work. Titoxd(?!? - cool stuff) 21:44, 17 September 2008 (UTC)Reply
My first thought here is to add a table "principalarticles" to store the data. It would not have a unique key, so it could have multiple rows per project. — Carl (CBM · talk) 13:52, 29 September 2008 (UTC)Reply
pa_project pa_article
Archaeology Archaeology
Archaeology Great_Pyramid_of_Giza
Austria Austria
... ...

review comments

edit
    • I'm not sure about this. My experience with the math project is that the "comments" can range from a couple sentences to an entire peer review. I think it's better to leave the comments to people who are actively editing the page, rather than reproducing them in the summary data. — Carl (CBM · talk) 13:52, 29 September 2008 (UTC)Reply
      • Could we perhaps just include the first two sentences of comments? Walkerma (talk) 15:21, 29 September 2008 (UTC)Reply
        • I think it would make sense to just link to the comments page if it exists. It's very inefficient to obtain the page text of all the comment pages. The current 1.0 bot and the new one being developed both function without downloading any page text, and that's an important design goal. — Carl (CBM · talk) 15:53, 29 September 2008 (UTC)Reply
        • It might be possible, though, to show the comment text when the logs of a specific article are shown. Then the text would not be stored in the database, it would just be loaded on demand when a particular log is viewed, which is acceptable. What we have to avoid is anything that requires downloading the text of many pages at once. There's a lot of work left to be done on the log viewing interface in the alpha code. — Carl (CBM · talk) 15:56, 29 September 2008 (UTC)Reply

Log parsing

edit

Status update?

edit

Just wondering, what's the current status of the second gen bot? Any problems? Headbomb {ταλκWP Physics: PotW} 04:13, 20 October 2008 (UTC)Reply

Essentially, the alpha version is feature complete, but further development is on hold until after v0.7 is released, as that is where WP:1.0's efforts are being focused on. Titoxd(?!? - cool stuff) 07:15, 14 November 2008 (UTC)Reply
There are some issues with handling page moves that still need to be handled in the program that lists article history logs. This turns out to be less trivial than it sounds. The main delay is the 0.7 release, as Titoxd says. That is being finalized this month, after which I should be able to work on the new WP bot again. — Carl (CBM · talk) 13:03, 14 November 2008 (UTC)Reply

Status update, March 2009

edit

After a long delay, I think I have some time to work on the updated bot. I expect to have a public demo of the alpha version soon. This will not have a perfectly polished user interface, but it will demonstrate the general concept and give people a chance to give feedback before the UI is frozen. — Carl (CBM · talk) 02:18, 18 March 2009 (UTC)Reply

Thanks! Before things get completely locked in, I'd like to mention some features that (I think) shouldn't be too much work. If they are a lot of work, maybe they could be considered for the NEXT major update to the bot.
  1. For Version 0.5, Oleg added some code that tells you if an article is included in Version 0.5 or not, by looking at the "by quality" list. You can see it in this example, the "Ver" column. It was always intended that we would keep this updated for new releases as they came out. Could we put this in for Version 0.7? I think it would be OK if the 0.5 has to be deleted, for space reasons - what do you think?
  2. On a related note, it would also be really good if we could track GA/FA status from the "article milestones" template. This could be used to check that something tagged GA-Class really has passed at GA (a small but annoying problem, e.g., when people unilaterally decide that "their" article is "good" in their own opinion). It could also allow us to separate Wikipedia-wide assessments from WikiProject-based assessments. This may seem almost like splitting hairs, but it has been the source of so much debate and acrimony over the years I would really like to find a way to resolve the issue. All of the questions like "Why do we need A-Class (or GA-Class)" or "Shouldn't GA be above A?" are old chestnuts that we have to answer over and over. There are also issues that often cause confusion like - what if a project doesn't use list classes, and they tag a Featured List as FA-Class; what if they prefer to tag a GA as A-Class? This issue has been very prominent recently because of the A-Class discussions, and it resulted in this proposal; although that proposal looks set to fail, adding a GA/FA feature to the bot would allow much more flexibility in discussions. It might allow us to develop a technical solution to what is (at present) a logistical and perception problem.
  3. Ideally, both of the above should report the information not just to the "by quality" tables (which aren't so widely read), but also to the statistics tables (which people LOVE to read). It would be great if people could see that WikiProject:Foo has 87 articles in Version 1.1, and they could click on the link in the table to see that list. It would also be great if they could see that they have 13 FAs (click to see them) and 27 GAs (click), and such links could be separated from the WikiProject assessment tags. Would this be possible? I think such things would greatly enhance the statistics table.
Sorry to dump this on you at quite a late stage, but I've been mulling things over a lot since last summer. I understand if I've missed the bo(a)t. Thanks, Walkerma (talk) 03:26, 18 March 2009 (UTC)Reply
Tracking the 0.7 release is already accounted for, as is tracking FA/GA/FL separately from the wikiproject assessments. Generating a list of all articles that are in a particular release is also going to be possible. The data for that is already collected, it's just that the search interface is not complete. — Carl (CBM · talk) 04:01, 18 March 2009 (UTC)Reply

Testing the new bot

edit

Will we be looking for a handful of WikiProjects to be used for testing of the new bot, before it is rolled out for all? I'm sure I could get WP:Chem to participate (we helped test the original version of the bot), and I can also ask a few other projects like WP:CYCLONE (home of Titoxd et al) for help. Walkerma (talk) 03:44, 18 March 2009 (UTC)Reply

I am just going to run the new bot in parallel with the old bot during testing, so everyone can use both. There is no need to change the talk page tags in any way (that's a requirement). The old bot will keep (slowly) uploading during that time. When the old bot is retired, the new bot will start uploading the summary tables to the wiki on a regular basis. It will not upload logs or lists - those will just be dynamic. I think that having the summary tables on the wiki is important because people like to transclude them all over the place. — Carl (CBM · talk) 04:03, 18 March 2009 (UTC)Reply
Agree entirely. WP:MEASURE would doubtless be happy to help out in any way desirable! Can I echo Martin in asking for the table of articles which are included in 0.7 to be included whenever it becomes available: at present, simply having the articles which are in 0.5 is a little confusing to non-initiates. Physchim62 (talk) 14:25, 18 March 2009 (UTC)Reply
Something that WikiProjects might want to do is to employ {{ReleaseVersionParameters}} (documentation non-existent), similar to what is done at Category:Tropical cyclone articles by quality. We use that information to parse WikiProject data for the main WikiProject index, and also to define custom assessment categories. Titoxd(?!? - cool stuff) 07:04, 20 March 2009 (UTC)Reply
I was planning to use SelectionBot or WP 1.0 bot to add that template to the appropriate places, with some info already filled in. Then people would be able to edit their info or fill in new info as desired. We do need documentation first, for sure. I was planing to work on this during the beta phase. — Carl (CBM · talk) 11:55, 20 March 2009 (UTC)Reply
I've added it to Category:Measurement articles by quality in any case – it can always be removed again if it causes problems! Documentation would be a nice feature to add ;) Would we also need guidelines as to what extra categories the projects include in the templates, or will it be OK to leave that up to individual projects? Physchim62 (talk) 12:17, 20 March 2009 (UTC)Reply
You can sign WP:PHYS and its three taskforces for this if you want to test the taskforce related stuff.Headbomb {ταλκκοντριβς – WP Physics} 05:55, 21 March 2009 (UTC)Reply
That will be helpful. I have not implemented the task force stuff yet; it's on the list. This will be another thing that is set up via the ReleaseVersionParameters template. — Carl (CBM · talk) 17:45, 21 March 2009 (UTC)Reply

Non-standard importance-types

edit

I understand that the new bot will support non-standard class types ({{Future-Class}} etc.), but will it also support non-standard importance types such as {{Bottom-importance}} and {{No-importance}}? PC78 (talk) 13:17, 31 March 2009 (UTC)Reply

Yes, it will support both non-standard quality and importance ratings. I will document how to do set this up at some point. — Carl (CBM · talk) 18:35, 31 March 2009 (UTC)Reply
On a related note, will the bot be able to distinguish between an article with no importance rating and a project which simply doesn't support an importance rating? PC78 (talk) 10:08, 1 April 2009 (UTC)Reply
Yes, it can distinguishes between those. For the moment, I use the rule of thumb that if a project has not assigned importance to any articles, then that project does not use importance ratings. — Carl (CBM · talk) 12:14, 1 April 2009 (UTC)Reply
Cheers! PC78 (talk) 12:31, 1 April 2009 (UTC)Reply

Status update?

edit

Any new news/progress to mention? This seems like too good a proposal/item to just let it slide into obscurity (my personal interest would be the ability to click on both a specific class and importance rating (such as all C class mid importance etc.) so yeah, any info would be great! I am guessing a lot of behind the scene stuff is going on so it wont be something I can notice. Cheers!Calaka (talk) 05:37, 13 June 2009 (UTC)Reply

Selection of a particular class and a particular importance level is already implemented. We're working on some problems with the logs, but the interface is usable now. Titoxd(?!? - cool stuff) 15:02, 14 June 2009 (UTC)Reply
Great news! Will every wikiproject need to enable the feature individually or will there be a mass update? Calaka (talk) 08:31, 15 June 2009 (UTC)Reply

Is this still in development?—NMajdantalk 00:21, 29 November 2009 (UTC)Reply

Yes. Titoxd(?!? - cool stuff) 17:31, 4 December 2009 (UTC)Reply
I am waiting on a response from the toolserver admins; once that is done we should be able to begin beta testing. — Carl (CBM · talk) 17:59, 4 December 2009 (UTC)Reply