Geographic neighbors of different kind edit

A very common kind of a category cycle is the following:

  1. <Foo> (place on land)
  2. Bodies of water of <Foo>
  3. <Bar> (a body of water)
  4. Places on <Bar>
  5. Back to <Foo>

Examples:

When the category "Places on <Bar>" doesn't exist and thus is not part of the cycle, the solution is straightforward—just remove the category for the place Foo from category Bar (e.g. as I did with cycles involving Category:Strait of Malacca, see Special:Diff/981392261, Special:Diff/981392401, Special:Diff/981392616, Special:Diff/981392692, and Special:Diff/981392760).

How do we resolve such categorization cycles? —⁠andrybak (talk) 18:00, 2 October 2020 (UTC)Reply

Categories are to help readers, not to facilitate bots. The only question is whether each individual connection makes sense to guide readers to related topics or subtopics. Whether that ultimately causes loops that trip up nonhumans is irrelevant. postdlf (talk) 19:38, 3 October 2020 (UTC)Reply
Makes no sense. Which reader goes up and down the category hierarchy looking for pages to read? The only way the deep category hierarchies actually get used are in tools like search and petscan, which after all exist for the readers/editors only. Existence of bad links (of which cycles are just an example) causes the tools to give wrong results. – SD0001 (talk) 16:47, 5 October 2020 (UTC)Reply

Country → Culture of country → Language → Language-speaking countries edit

Another common kind of category cycle:

How do we resolve such categorization cycles? —⁠andrybak (talk) 18:00, 2 October 2020 (UTC)Reply

@Andrybak: this was raised and discussed at WT:WikiProject Categories/Archive 6 § Category loops for countries by language spoken. The widely agreed solution is that articles about a country (in your examples: East Timor, Iran and Libya) should be in a category of territories by language spoken, but not the eponymous category (Category:East Timor etc.). Indeed, only the country itself is defined as a Foo-speaking territory, not the rest of the content of the eponymous category. The relevant guideline here is WP:EPONCAT. While there seemed to be an agreement on the solution, it was not widely implemented (my bad, I guess I had other things on my mind). Place Clichy (talk) 18:01, 10 February 2021 (UTC)Reply
Place Clichy, thanks for the pointer. I've cleaned up all subcategories of Category:Administrative territorial entities by language and made sure that the articles are in corresponding language categories. —⁠andrybak (talk) 11:49, 13 February 2021 (UTC)Reply
SD0001, it would be nice to regenerate the list again, because a big portion of the currently listed cycles involve a "X-speaking countries and territories" category. —⁠andrybak (talk) 11:54, 13 February 2021 (UTC)Reply
@Andrybak:   Done! – SD0001 (talk) 15:42, 13 February 2021 (UTC)Reply

It's hopeless edit

Sorry to be pessimistic, but poking at specific examples like this is hopeless. The wiki category system is fundamentally broken, since there's no information about what kind of relationship each subcategory link represents. I know I've ranted about this elsewhere, but it's worth repeating. For example, what's the relationship between Category:Gulf of Aden and Category:Horn of Africa? borders-on, maybe? Let's assume it is. Then, how would you resolve the (hypothetical) cycle Indian Ocean borders-on Pacific Ocean borders-on Southern Ocean borders-on Indian Ocean? You can't. Until you know what the relationships are, this is truly an intractable problem. Not only can't you understand how to break the cycle, you can't even understand if the cycle is inherently unbreakable. -- RoySmith (talk) 18:58, 2 October 2020 (UTC)Reply

Per Wikipedia:Categorization#Category_tree_organization, categories can either be a WP:TOPICCAT (is-related relationship) or WP:SETCAT (is-a relationship). I think for SETCATs it's pretty clear, but the is-related relationship of TOPICCATs can vary by editor. Borders-on doesn't sound like a good relationship. I would just remove or change such categorisations to the way I see fit, because after all these categorizations must have been done by a single editor without any discussion. I don't think many people monitor changes to categories which would mean bad categorisations are going to exist in plenty -- if they result in a cycle, the bot would highlight them but not otherwise. – SD0001 (talk) 18:29, 3 October 2020 (UTC)Reply
SD0001, Is there any way to tell from looking at the category graph which type a particular edge represents? -- RoySmith (talk) 19:50, 4 October 2020 (UTC)Reply
RoySmith, You mean for a bot? Not really. {{Set category}} exists, but it has only 32.5k transclusions so I suspect all set categories haven't been tagged with the template. – SD0001 (talk) 14:50, 5 October 2020 (UTC)Reply
SD0001, Well, it sounds like a good place to start this quest would be to get all the set categories tagged with {{set category}}. That at least would give us a line in the sand. We could then start to build tools which enforced that there were no cycles composed entirely of set categories. And tools which could traverse the set category subgraph with the knowledge that it was a tree (or at least a DAG). -- RoySmith (talk) 16:01, 5 October 2020 (UTC)Reply
RoySmith, unfortunately there are more problems, set cats (like Category:Office buildings in Manhattan often contain topic cats (like Category:Empire State Building), causing a leak. I'm surprised the guideline doesn't explicitly say that set cats should only contain set cats, though both WP:SUBCAT (When making one category a subcategory of another, ensure that the members of the subcategory really can be expected (with possibly a few exceptions) to belong to the parent also.) and the wording of {{set category}} template imply it. – SD0001 (talk) 16:42, 5 October 2020 (UTC)Reply
SD0001, That's why I was explicit about "no cycles composed entirely of set categories". Adding an additional constraint that "set cats can only contain other set cats" would be a stronger constraint, and perhaps something we could make a long-term goal, but at least getting the set cats identified as such is a step in the right direction, and useful in its own right. -- RoySmith (talk) 16:48, 5 October 2020 (UTC)Reply
I would be delighted if the logically obvious rule "set cats can only contain other set cats" could be explicitly added somewhere. At present Wikipedia:Categorization#Eponymous_categories explicitly advocates Category:New York City as a subcat of Category:Cities in New York (state), thus adding a plethora of non-cities to a set category of cities (and often compounding the error by removing the article New York City from Category:Cities in New York (state) on the grounds that categories are hierarchical). Oculi (talk) 20:44, 14 October 2020 (UTC)Reply

Regenerated edit

Following a request by Andrybak, I did a re-run of the bot. Regards, – SD0001 (talk) 18:17, 3 October 2020 (UTC)Reply