This is my page for musings about categories on wikipedia, which are an endless source of pleasure and pain for me. Pleasure because I enjoy organizing articles and having nice clear usable hierarchies to browse through; pain because our category system is just so painfully bad, and so terribly implemented, that I despair at the amount of work left to fix it.

On ghettoization, non-diffusing categories, and LGBT heads of government edit

I will start by walking you through something I do from time to time, which is de-ghettoizing an area of the category tree.

Let's start with a definition of ghettoization[1], which I developed in a previous essay, and then I'll give you a new example I found today which has confounded me somewhat...


What is ghettoization? edit

To start, we will consider that ghettoization only applies to categorization of human biographies on wikipedia

A biography is ghettoized if the following are true:

  1. The bio is a member of a gendered, ethnic, sexuality, or religion-based category X, and
  2. The biography is not in an ancestor or "blood relative" category of X (e.g. sibling, cousin, parent, grandparent, etc) that is neutral, i.e. non-gendered, non-ethnic, non-sexuality-based, and non-religious and which retains an equivalent descriptive specificity. By equivalent descriptive specificity, you can't say "Well, Y is in Category:Women novelists, and in Category:American people, which is gender neutral, so she's not ghettoized." An essential aspect of ghettoization is that the biography is in a ghetto, but not in a neutral category which is an analogue to the ghetto.
  3. If multiple facets are intersected on the bio (e.g. gender + ethnicity + sexuality + religion + ...), as you go up the tree, the bio is ghettoized if it is not a member of each extant iteration that removes a facet while retaining the same noun. For example, to avoid ghettoization, Category:African-American women in politics members should also be, at the very least, in Category:American women in politics (removal of "African-American"), Category:American politicians (removal of "Women" and "African-American")[2], and Category:African-American politicians (removal of "Women").
  4. The above rules do not apply for any characteristic which has been fully diffused - i.e. if all men and women are fully diffused, there is no ghettoization concern. See for example Category:Actors_from_Adelaide, which is empty save its sub-categories. In these cases, there is no need for a neutral category, each person can be an actor or an actress and that is not considered "ghettoization".

What are non-diffusing categories? edit

Another definition related to ghettoization is needed, that of a non-diffusing category. Briefly, a non-diffusing category is one that behaves differently from normal categories: normally when you place something in a sub-category, you remove it from the parent, but a non-diffusing category behaves differently - if you place something in a non-diffusing category, you do not immediately remove it from the parent.

Wrinkle #1: whether a category is non-diffusing or not depends on its parent. A category can be non-diffusing for one parent, and diffusing for another. Here's an example:

So a single category can be diffusing and non-diffusing at the same time.

Wrinkle #2: a category can be non-diffusing, but members of a non-diffusing category won't always remain in the parent. How does that work? This can happen in cases where the parent category has diffusing categories underneath it, especially if those categories diffuse fully (i.e. if everyone in the parent can be placed in at least one diffusing child cat).

Optional side bar on why you don't always bubble up non-diffusing categories

This particular wrinkle was a bone of contention during the Category-gate discussions, with many arguing "If the category is non-diffusing, it means we must bubble them up!" To show why this is not workable, consider the following simple category structure:

  • Novelists
    • (Bob)
    • (Mary)
    • Women novelists (non-diffusing)
    • Novelists by country (diffusing)
      • American novelists (diffusing)
      • Scottish novelists (diffusing)

We start by placing Bob and Mary in the Novelists category. Now, someone says "Mary is a woman", so she gets added to the Women novelists category as well. Someone else says "Bob is Scottish", so he gets moved to the Scottish novelists category and is removed from the parent, as is normal for diffusing categories - we regularly diffuse based on nationality. Finally, someone comes along and says "Well, Mary is American, so I'm going to move her to the American novelists category and remove her from the parent (in other words, treating her the same as Bob)" - but an editor opposes: "You can't do that - she's in Women novelists which is non-diffusing, so she has to stay in the parent otherwise she will be ghettoized!" - so she gets placed back in the parent.

So now our situation looks like this:

  • Novelists
    • (Mary)
    • Women novelists (non-diffusing)
      • (Mary)
    • Novelists by country (diffusing)
      • American novelists (diffusing)
        • (Mary)
      • Scottish novelists (diffusing)
        • (Bob)

Do you notice anything weird? Mary is the only one in the parent "Novelists" category - this is a rich irony indeed, as she now gloats over her spot at the top of the food chain, while Bob languishes down in the Scottish novelists dregs.

There are two ways to fix this problem:

  1. The first solution is: Allow people to be bubbled up, and then diffused down, as long as they are diffused to neutral categories. This was ultimately the solution decided by consensus at Category:American novelists. To eliminate a step, you can "bubble up and diffuse down" in one fell swoop, placing the biography in a neutral sibling or cousin, and skipping the parent entirely.
  2. A second solution - which was proposed but rejected at the time - would be to consider that as soon as you place a non-diffusing category, all of the siblings (and all of their sub-categories) become instantly non-diffusing as well - everyone bubbles up to the parent. To consider the mess this would cause, look at Category:Mathematicians; if we bubbled up all the women in Category:Women mathematicians‎ to the parent, in order to be fair we'd have to bubble up the men as well, so now our nicely diffused structure would be overwhelmed by thousands upon thousands of mathematicians. They would would, in addition to their "Mathematicians by country" and "Mathematicians by century" and "Mathematicians by field" categories, need to add an additional redundant one to their list, of "Mathematicians". This solution means a perfectly stable, reasonable, diffused tree, can be upended by adding a single non-diffusing category to the parent, causing every other category to un-diffuse itself, spreading virally. It's a bad idea.

Real-life example: LGBT heads of government edit

So, now that we know what ghettoization and non-diffusing categories are, let's do a real-life example, with a puzzle/quiz at the end.

Today I picked Category:Heads_of_government, which has a subcat Category:LGBT heads of government. I think we all agree that being LGBT doesn't mean you are somehow less a head of government, so we want to make sure all of those fellows in Category:LGBT_heads_of_government are also in a diffusing, neutral subcat (and ideally, several) of the parent. How do we find them? It's rather tricky. We will be using the Category intersection tool to help us. We want to find out, who is in LGBT heads of government, but not in any other neutral categories under Category:Heads of government. But there are dozens of articles, and dozens of nested categories - how do we sort this out? Here are the steps I took:

  1. Get a list of all diffusing, neutral sibling categories of Category:LGBT heads of government. You can do so using the Category intersection tool like this. This gives us a list we can copy paste, which we paste into the Negative categories box. The list looks like this:
List of sibling categories
   14th-century_heads_of_government
   15th-century_heads_of_government
   16th-century_heads_of_government
   17th-century_heads_of_government
   18th-century_heads_of_government
   19th-century_heads_of_government
   1st-century_heads_of_government
   2nd-century_heads_of_government
   3rd-century_heads_of_government
   4th-century_heads_of_government
   6th-century_heads_of_government
   7th-century_heads_of_government
   9th-century_heads_of_government
   Assassinated_heads_of_government - should be non-diffusing
   Children_of_national_leaders - even if they're in this cat, we don't care
   Collective_heads_of_government
   Diplomatic_visits_by_heads_of_government
   Female_heads_of_government - should be non-diffusing
   Heads_of_government|0 - this is the parent cat - should not recurse, so we override the recursion
   Heads_of_government_by_country
   Heads_of_government_in_Africa
   Heads_of_government_in_Asia
   Heads_of_government_in_Europe
   Heads_of_government_in_North_America
   Heads_of_government_in_Oceania
   Heads_of_government_in_South_America
   Heads_of_government_of_non-sovereign_entities
   Kuhina_Nui
   LGBT_heads_of_government - remove our target category
   Lists_of_heads_of_government - this is only a category for lists
   Premiers
   Prime_ministers
   Rulers
   Spouses_of_politicians - not interested in this
   Wikipedia_categories_named_after_heads_of_government - more of a maintenance cat, we don't care if they're in it or not
  1. Now, we remove from the list some of the categories, like any non-diffusing categories (such as Female heads of government - if our bios are in that one too, that's great, but that doesn't de-ghettoize them.). The sibling cats I removed are struck in the list above.
  2. Now we place target cat, Category:LGBT heads of government in the Categories box, select a depth of 8 or so (or deeper, depending on how far down your categories go), select 'Subset', and then "do it". This link shows you a filled out form. What this search does is say "Show me all LGBT heads of government that aren't in this whole other list of categories, recursively."[3] Using this technique, you can search through trees with hundreds or thousands of bios, and quickly find the ones that are ghettoized.

Ok, that was fun, but now to our puzzling result. We found three fellows who are ghettoized - they are considered "LGBT heads of government", but they are nowhere to be found in the Heads of government tree. Here are the questionable characters:

Now, how could a great king like Tiglath-Pileser III be ghettoized? He's in Category:Assyrian kings, Category:Babylonian kings, and even Category:Monarchs of the Hebrew Bible! But the category intersection tool tells us he's ghettoized?? What's going on?

Well, here's where you need to start to explore your tree. If you do so, you will find that it goes something like this: Assyrian kings -> Kings -> Monarchs -> Heads of state - aha! Category:Heads of state is sibling to Category:Heads of government, so the algorithm was correct - these bios *are* ghettoized.. in a manner of speaking. You may notice that there is no Category:LGBT heads of state - only Category:LGBT heads of government, so we have a bit of a contradiction - our LGBT category says they are a head of government, but the parenting of the other categories suggest, not really. This is a great example because it illustrates something which you will see all over the category tree: inconsistency. Sometimes you will have a female category and an LGBT category and an African-American category and sometimes even a combination of same, and then for a slightly different job title, you will have none of that. You will also find the gender/ethnic/religious/sexuality categories placed at all points in the category tree - up high, in the middle, and down low. If you've got a woman and you want to make sure she is in a gendered category for something close to her job, you may have to click up 2 or 3 levels before you find the appropriate "Women X" category; in the case of old Tiglath-Pileser III, someone went to distant uncle to find the LGBT category, placing him, somewhat incorrectly, as a "head of government". Sorting this out I leave as an exercise for the reader, as I frankly don't know what the best path is, but here are a couple of options to consider:

  • We could create Category:LGBT heads of state, since all kings are heads of state, but not all kings are heads of government. This is probably the most "correct" solution.
  • We could just say "Forget it, it's close enough", but it technically violates our rules against ghettoization, so perhaps do the rules need changing? Think about it this way - if you classify a woman as a "Woman novelist" and an "American poet", is she ghettoized or not? I think, yes.

Anyway, please share your comments on the talk page. I welcome your feedback on this meandering...

Footnotes edit

  1. ^ Ghettoization has been called out in the popular press as a form of sexism. My essay, attempts to explain why this is a bad word choice, because setting up a proper category structure, and then properly categorizing biographies without ghettoizing is so complex that to call it sexism is sort of like asking someone to solve 20 differential equations about African-American population growth in their head and then calling them racist if they get the wrong answer.
  2. ^ Important: or a diffusing sub-category of same
  3. ^ Please note, this is a simple version, and as the category tree gets more complex it becomes harder and harder to do this, but this search will give you a lower bound on the number of ghettoized people.