User talk:Mike Peel/Archive 51

Latest comment: 4 years ago by Valereee in topic DYK for Tranvía Villasegura

Wikidata weekly summary #397

This Month in GLAM: December 2019

 




Headlines
Read this edition in fullSingle-page

To assist with preparing the newsletter, please visit the newsroom. Past editions may be viewed here.

Expanding the use of wikidata in Commons interwiki link templates?

Hi, Mike. When an editor uses {{commons}} without any arguments, the template falls back to performing a search for {{PAGENAME}} on Commons. The problem is that the behavior of the Commons search engine has changed. It used to be that Commons:Special:Search/Dog would take a reader directly to Commons:Dog. Now it provides a ranked list with lots of noisy distraction. This isn't a good user experience (in my opinion).

I was thinking that {{commons}} (and related templates, e.g, {{commons-inline}}, {{commons and category}}, {{commons and category-inline}}) could use wikidata to find related Commons galleries and categories, before falling back to an unsatisfying search. I know you proposed the 2018 RfC for using wikidata in commons links, and have done a bunch of work on {{commons category}}. I thought you would be a good source of feedback before we surface anything to the broader community.

As an experiment, I made versions of these templates that do wikidata lookup before falling back to search: {{commons/sandbox}}, {{commons-inline/sandbox}}, {{commons and category/sandbox}}, and {{commons and category-inline/sandbox}}. I expanded the corresponding test pages to test behavior for different pages.

A couple of notes:

  • These don't change the current behavior of the templates when given positional arguments: for the large majority of usage, there is no change.
  • After some experimentation, I decided to use P935 (Commons gallery) and P373 (Commons category), rather than commons sitelink or WikidataIB.getCommonsLink(). After poking around, it didn't seem that the commons sitelink was consistently pointing to galleries or categories. For {{commons}}, I wanted to restore the old behavior of going to a gallery if it exists, otherwise going to a category. For {{commons and category}}, I needed to have separate links for the gallery and category.

What do you think of this? Any advice to give, or changes to make? I think your feedback would be super helpful. — hike395 (talk) 19:07, 4 January 2020 (UTC)

@Hike395: Thanks for looking at this! I have this on my radar to look at eventually, but I've been busy cleaning up the commons category links at Category:Commons category Wikidata tracking categories. A few notes:
  • {{Commonscat}} actually uses {{Commons}} in the backend - I want to change that at some point so we can deal with galleries separately from galleries. So you don't actually have as many to deal with as you might think.
  • Related to that, I think there's a number of cases of {{Commons}} being used to point to galleries, which need migrating over to {{Commonscat}} at some point.
  • If you use the sitelinks rather than the properties, then they will auto-update over time. I'm hoping that we can get rid of Commons category (P373) at some point. Commons gallery (P935) is more work, as not all of the galleries are actually linked up on Wikidata (most of them are actually terrible anyway and should probably just be deleted...) I can probably adapt some of my bot tasks to update this property and the gallery sitelinks as well, I just haven't been motivated to do that yet.
  • getCommonsLink has a parameter that you set to only get categories - without that being set, it gets the gallery first, and the category second. It doesn't have an option for just getting the gallery link, or linking to the search page, though - perhaps @RexxS: could add that option too? It might also be worth adding a Commons gallery (P935) fallback option like there is for Commons category (P373)
  • You will want a set of tracking categories like those at Category:Commons category Wikidata tracking categories - these are very useful to find cases that need updating.
  • I was wondering about proposing getting rid of the 'commons and category' templates, instead using the gallery/category ones separately, what would you think to this?
  • The next-gen thought I had for commons cat was to use properties like category for the interior of the item (P7561) to auto-provide extra links, I'm not sure if there's something similar that might work for galleries.
  • There is also {{Sister project links}} to consider at some point.
Hope that helps - and I'm happy to chat further about this if that will help! Thanks. Mike Peel (talk) 19:30, 4 January 2020 (UTC)
@Mike Peel: actually I don't really know how to return just galleries as I don't understand well enough what the conventions are on Commons. Is there something about a gallery page that I can look up on Wikidata that identifies it as a gallery? I can spot categories because the first nine characters of the sitelink are "Category:". I'll have to do some re-writing of the function to achieve what you want. Presumably it needs to be backwards-compatible, so perhaps it should still have |onlycat= as a boolean with true/yes as significant. That means I need another parameter to return only galleries, but what behaviour do you want if both are set to true? Optionally, should I allow |onlycat= to be a string with values like "true", "false" or "galleries"? or something else? --RexxS (talk) 20:54, 4 January 2020 (UTC)
@RexxS: I think the best approach would be to have "onlygal" or similar as an additional parameter that then only returns sitelinks without 'Category:' in them. A nuance on that might be to avoid 'Creator:' and other namespaces, which I've seen a few times, but I don't think they are too common. Another approach might be to have something like "only=cat" vs. "only=gal", if we can have a short transition period. Thanks. Mike Peel (talk) 20:58, 4 January 2020 (UTC)
Thanks for all of the helpful tips! A few questions and comments:
  1. Is the auto-updating of fields in wikidata documented anywhere? What happens when there is both a Commons gallery and a Commons category that correspond to the wikidata item? How will the commons sitelink get chosen? Or does the choice happen once (when an editor links an item to Commons), and then it updates when the gallery or category moves? Why don't P373 or P935 auto-update?
  2. Setting up tracking categories is a good idea, I'll work on that.
  3. Good point about {{Sister project links}}: that's another one that preferentially links to galleries, instead of categories.
  4. I was thinking of perhaps starting with {{commons and category}} (or inline), because that will affect many fewer articles. For that template, I would need two links, not one. One of them is going to have to be P373 or P935 (right?) So one of the links will not auto-update?
  5. The way I read RexxS' code is that WikidataIB.getCommonsLink() first looks for a sitelink for an item, and then if that doesn't work, it does some extra logic looking for other sitelinks. Is there some specification that requires sitelinks to point to galleries before categories? I could not find such a thing. Or am I misunderstanding RexxS' code?
  6. I think that Commons categorization is getting bad. There are a number of editors who make incredibly detailed categories (e.g., Commons:Category:Iron oxidations at Kilauea). Because of OVERCAT, most images live down in the leaf nodes. Browsing broad-concept categories (like Commons:Category:Kīlauea) is almost impossible. That's why we need well-curated galleries like Commons:Kīlauea. This is why I think that {{commons}} should point to Commons galleries preferentially. (For more examples of well-curated galleries, see Commons:Mountains, Commons:Glacier, or Commons:Yosemite National Park).
  7. The old behavior of {{commons}} (inherited from Commons search) was to provide a "best landing page", e.g., a gallery if it exists, or a category otherwise. I think having {{commons}} continue that behavior would be good. That means editors could place {{commons}} with no arguments, and always have it go to a sensible place.
  8. Given the gallery vs. category controversy at Commons, having {{commons and category}} is important. There are several editors who prefer categories, and will go through en WP and replace {{commons}} with {{commonscat}}. Having a compromise {{commons and category}} makes both the pro-gallery and the pro-category editors happy. Taking it away will lead to editing strife.
Thanks for all of the thoughts! — hike395 (talk) 06:19, 5 January 2020 (UTC)
@Hike395: I'll reply properly soon, but for now I've started a brain-dump at wikidata:User:Mike Peel/Commons linking that you might find useful. Thanks. Mike Peel (talk) 07:35, 6 January 2020 (UTC)

Commons link statistics

@Mike Peel and RexxS: I did a data-gathering exercise yesterday that I think you'd find interesting. I instrumented {{commons and category-inline}} to which articles were defaulting to Commons search, and then did a wikidata dump for the relevant fields. Out of the 129 transclusions of {{commons and category-inline}}, 84 of them were missing one or two arguments, and 57 were missing both. The relevant fields for the 84 articles are saved here (Lua code to generate it is here.) This means that 65% of the usage of {{commons and category-inline}} can benefit from wikidata.

Some statistics you may find interesting:

  • 76/84 (90%) of the articles have a commons sitelink defined
  • 65/76 (86%) of the sitelinks point to galleries, while 11/76 (15%) point to categories
  • 67/84 (80%) of the articles have P935 (Commons gallery) defined.
  • One article (Rhodes Street Historic District) has a gallery sitelink defined, but is missing P935.
  • For 66/67 of the articles with P935, the Commons gallery field matches the Commons sitelink
    • One article (Galicia (Spain)) has a Commons sitelink gallery that is different. It looks like the sitelink is better than P935.
  • 78/84 (93%) of the articles have P373 (Commons category) defined.
  • The category sitelink always matches P373, when both P373 and the sitelink is present.
  • RexxS code that looks at P910 finds Commons categories for 8/9 (89%) of the missing common sitelinks

Some conclusions:

  • The commons sitelink is not consistently either galleries or categories
  • Using the commons sitelink to find galleries fixes 2/65 (3%) mistakes in P935
  • Using the commons sitelink, and then falling back to P910 sitelinks fixes 2/11 (18%) mistakes in P373.

The question in my mind when I gathered this data is: the extra logic in getCommonsLink() comes at a cost of user transparency. If getCommonsLink() makes a mistake, a typical editor would not know how to fix the incorrect data in wikidata. Of course, editors can always override default wikidata results, so maybe it's not a big deal.

Seeing that all 4 errors out of the 84 articles were fixed by getCommonsLink(), my tentative conclusion is that the extra logic is worth it. I'm tempted to instrument {{Commons and category}} to get ~1000 samples. If the wikidata self-contradiction rate is only ~5%, then I'm happy to manually inspect ~50 of those and report back. — hike395 (talk) 16:54, 6 January 2020 (UTC)

@Hike395: Interesting - but with such a small number of uses I still think it's worth using {{Commons}} or {{Commonscat}} (or both) instead! For comparison, there are around 700,000 {{commons category}} links. Thanks. Mike Peel (talk) 17:57, 6 January 2020 (UTC)
I don't know how to automatically iterate through the articles in a category from Lua. I generated the calls to the Lua function via cut-and-paste. I don't think I want to cut and paste 700,000 article names :-). Do you know how to automatically iterate through a category in Lua? — hike395 (talk) 06:18, 7 January 2020 (UTC)
@Hike395: The problem in getting the contents of a category into a Lua module to scan through is that there isn't a call in the Scribunto library to get the rendered text of a page (and that's pretty much impossible because the server runs the Lua module before it renders the page). You can get the wikitext using mw.title.new(categoryname, 14):getContent(), but that's no use for a category because the wikitext doesn't contain the content you're interested in. If you want to work within a wikipage, you need JavaScript which can access the Wikipedia API. Otherwise the simplest way is to use Pywikibot in Python locally. --RexxS (talk) 18:02, 7 January 2020 (UTC)

More statistics

@Mike Peel and RexxS: I instrumented {{Commons and category}} and now have 924 examples where one or more parameters is empty. I'm starting to run into Lua and Wikidata limits, so I had to split the query into two. The results are shown in User:Hike395/Commons link stats2 and User:Hike395/Commons link stats3, with "boring" article filtered out. A "boring" article has statements in wikidata for a Commons gallery, a Commons category, and a Commons sitelink, and they all agree. Interesting data to note:

Given that the sitelink or P910 sitelink is not always better than the P373 statement, I have a suggestion for the philosophy of the lookup. Any of these commons templates have an ultimate fallback of using Commons search. Instead of running the risk of generating a less-good link, how about if we use multiple statements to check for data quality? That is, if there is a conflict between the sitelink and P935, or between P373 and the sitelink (or fallback sitelink), then return nil, rather than trying to guess. This will slightly reduce the recall of the query (from 88% to 86% for categories), but reduce the error rate to essentially zero. I suspect this is worth it.

I was going to code this up in Lua for {{commons and category}}, but curious what you both think. — hike395 (talk) 07:01, 13 January 2020 (UTC)

@Hike395: From my point of view, it looks like it would be an net improvement, albeit a slight one. When you have four possible choices for a link, the logic to decide which one to return is unlikely to please everyone, but it may be worth trying out your algorithm – as long as you're willing to roll it back if the editors curating articles that are then affected kick up a fuss. I know that Mike is hoping to deprecate at least one of Commons category (P373) and topic's main category (P910) in favour of the Commons sitelink, but he'll be able to express himself better than I can. --RexxS (talk) 15:33, 13 January 2020 (UTC)

Wikidata weekly summary #398

Category:Air Force templates has been nominated for discussion

 

Category:Air Force templates, which you created, has been nominated for possible deletion, merging, or renaming. A discussion is taking place to decide whether this proposal complies with the categorization guidelines. If you would like to participate in the discussion, you are invited to add your comments at the category's entry on the categories for discussion page. Thank you. DexDor (talk) 20:51, 13 January 2020 (UTC)

Links to sister projects

Hey Mike, I saw you removing dead links to Commons and with your Commons experience I thought you might have insight on Wikipedia:Bot requests#Remove sister project templates with no target. Thanks and keep up the good work, SchreiberBike | ⌨  21:29, 17 January 2020 (UTC)

@SchreiberBike: Thanks! Pi bot will take care of dead links to commons categories, but not galleries or wikispecies at the moment. I'll look into this more tomorrow. Thanks. Mike Peel (talk) 21:32, 17 January 2020 (UTC)

Cochrane Bot is working really well!

Hi Mike, I hope that you are well and the start of 2020 has gone smoothly. I just wanted to thank you again for all your help with the bot. It is working really well!

JenOttawa (talk) 03:12, 18 January 2020 (UTC)

@JenOttawa: Thanks for continuing to update Wikipedia! Let me know if there's anything else I can do to help. Thanks. Mike Peel (talk) 09:19, 18 January 2020 (UTC)

image enquiry

Gooday. I noticed your name cropping up changing cats on motorcycle makes. I saw you were admin on en-wiki and Commons. I noticed an image had been uploaded with peculiar licensing and wondered if it was possible to crop-out a section showing a building as a derivative work? It would be for a list article displayed normally at 150px. c:File:Bray Hill (1904).jpg

It may not be adequate-enough resolution as it's been scanned large, but can try if there are no objections. Thanks.--Rocknrollmancer (talk) 02:10, 18 January 2020 (UTC)

@Rocknrollmancer: I'm not a copyright expert, but given the age of the image I think it should be OK. If you want to be sure, I suggest checking at commons:Commons:Village pump/Copyright. Thanks. Mike Peel (talk) 09:18, 18 January 2020 (UTC)
Thx, normally I would've approached Ronhjones who is admin at WP and Commons, but abruptly stopped last April, so hope everything is OK there.--Rocknrollmancer (talk) 13:02, 18 January 2020 (UTC)

Wikidata weekly summary #399

Rigel

It seemed a bit of a shame that Betelgeuse was a FA and Rigel wasn't...so a few of us have been buffing it. Would be insanely grateful for more eyes on it re cohesiveness, readability etc. Cas Liber (talk · contribs) 23:56, 24 January 2020 (UTC)

The Signpost: 27 January 2020

Wikidata weekly summary #400

Administrators' newsletter – February 2020

News and updates for administrators from the past month (January 2020).

  Guideline and policy news

  • Following a request for comment, partial blocks are now enabled on the English Wikipedia. This functionality allows administrators to block users from editing specific pages or namespaces rather than the entire site. A draft policy is being workshopped at Wikipedia:Partial blocks.
  • The request for comment seeking the community's sentiment for a binding desysop procedure closed with wide-spread support for an alternative desysoping procedure based on community input. No proposed process received consensus.

  Technical news

  • Twinkle now supports partial blocking. There is a small checkbox that toggles the "partial" status for both blocks and templating. There is currently one template: {{uw-pblock}}.
  • When trying to move a page, if the target title already exists then a warning message is shown. The warning message will now include a link to the target title. [2]

  Arbitration

  • Following a recent arbitration case, the Arbitration Committee reminded administrators that checkuser and oversight blocks must not be reversed or modified without prior consultation with the checkuser or oversighter who placed the block, the respective functionary team, or the Arbitration Committee.

  Miscellaneous



Sent by MediaWiki message delivery (talk) 15:06, 1 February 2020 (UTC)

Wikidata weekly summary #401

Use of IMO categories for ships with only one identiy

Hi Mike, As you've probably noticed, I've undone a few of your changes where you'd replaced links to Commons categories for ships which have only ever operated under one identity with the IMO cat (for instance, HMAS Canberra (L02)). As the IMO cat in each case only contained the ship name category, I don't think that this change is helpful for readers. I'd suggest that this only be done for instances where the ship has had multiple identities and the article covers all of them. Nick-D (talk) 22:52, 9 February 2020 (UTC)

@Nick-D: Commons seems to use the IMO categories for cases where the ships have used multiple names over the years, I'm just synchronising things with that system. If you think these IMO categories aren't useful, could you nominate them for deletion/discussion on Commons? Thanks. Mike Peel (talk) 22:54, 9 February 2020 (UTC)
The Commons conventions for ship names are a massive problem which I don't want to take on to be honest. Having IMO numbers in the background there is probably an OK idea as ships often will change names. I found this useful for setting up a category for a Japanese coast guard ship a few months ago. This is also much less odd than the decision made years ago to use things like Category:Dunkerque (ship, 1935) instead of the conventions used in Wikipedia articles such as French battleship Dunkerque: from memory, that naming convention was adopted when Commons' governance problems were at their worst. Regards, Nick-D (talk) 23:02, 9 February 2020 (UTC)
@Nick-D: OK, I've started commons:Commons:Categories for discussion/2020/02/Category:IMO 9608960, let's see how that one goes. Personally I'm less fussed about naming conventions than I am about category maintenance, and I know that there are photos of ships that we're not linking to since they're in a different name category. Thanks. Mike Peel (talk) 23:09, 9 February 2020 (UTC)

Wikidata weekly summary #402

This Month in GLAM: January 2020

 




Headlines
Read this edition in fullSingle-page

To assist with preparing the newsletter, please visit the newsroom. Past editions may be viewed here.

DYK for Tranvía Villasegura

On 14 February 2020, Did you know was updated with a fact from the article Tranvía Villasegura, which you recently created, substantially expanded, or brought to good article status. The fact was ... that after the closure of Tranvía Villasegura, Tenerife's first tram system, in the 1950s, one of the trams was reused as a bar? The nomination discussion and review may be seen at Template:Did you know nominations/Tranvía Villasegura. You are welcome to check how many page hits the article got while on the front page (here's how, Tranvía Villasegura), and it may be added to the statistics page if the total is over 5,000. Finally, if you know of an interesting fact from another recently created article, then please feel free to suggest it on the Did you know talk page.

--valereee (talk) 00:03, 14 February 2020 (UTC)