This is an essay on Wikipedia categorization. It contains the advice or opinions of one or more Wikipedia contributors. This page is not an encyclopedia article, nor is it one of Wikipedia's policies or guidelines, as it has not been thoroughly vetted by the community. Some essays represent widespread norms; others only represent minority viewpoints. |
Every Wikipedia page (e.g. an article, a talk page or even a redirect) is in a single namespace. Many/most Wikipedia pages are also in one or more categories. This essay contains the results of an analysis of how these 2 schemes interact - i.e. how pages in each namespace fit into the category structure. In particular, it identifies combinations of namespace and category that are not valid for any pages - for example, there should be no user talk pages below Category:Articles and there should be no articles below Category:Wikipedians.
This analysis only considers the combination of namespace and some of the highest level Wikipedia categories - e.g. Category:Wikipedia books and Category:Disambiguation pages. A diagram showing the relevant part of the category structure can be found below the table.
Namespace-category matrix
editNote: The information in this matrix should not be used directly to support an argument about whether or not a particular page should be in a parent category. However, this matrix may indicate where the applicable policy/guideline can be found.
The analysis was carried out in 2014-2016 using category intersection tools. Some aspects of the analysis are currently incomplete and may not incorporate later changes to the categorization structure.
Explanation of matrix
editThe matrix is designed so that each page in the English Wikipedia satisfies the criteria for one (and only one) of the rows.[1] Which row a page matches is determined primarily by which namespace the page is in; for some namespaces other criteria are also considered -
- Some rows are only applicable to disambiguation pages (i.e. pages that are under Category:Disambiguation pages) or non-disambiguation pages. The column headed "D?" indicates whether each row includes disambiguation pages - "Y" means only dab pages, "N" means excluding dab pages and "-" means either.
- Some rows are only applicable to hard redirects or to pages that are not hard redirects. The column headed "R?" indicates whether each row includes hard redirects. "Y" means only hard redirects, "N" means excluding hard redirects and "-" means either.
- Some rows are only applicable to subpages. The column headed "S?" indicates whether each row includes subpages. "Y" means only subpages, "N" means excluding subpages and "-" means either. Subpages are not allowed in some namespaces.
- Some rows are only applicable to pages that are, or are not, in certain categories.
Having identified which row of the matrix a page belongs to the coloured cells on that row then indicate which high-level categories the page should/may be in (green cells) and should not be in (pink cells). The matrix can also be used in the opposite way; for a particular high-level category it is possible to go down the corresponding column to see what types of pages are expected to be in that category. Amber cells indicate where there is currently uncertainty about whether or not that is a valid combination. A more detailed key to the colours is provided below the matrix.
Note: Some pages belong in several columns and a small number of pages don't belong in any of the (current) columns.[a]
Matrix
edit- Note: watchlisting this page will not show changes to the matrix - for that it's necessary to watchlist a separate page.
Child page | Parent category | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Name- space | Type | D? | R? | S? | None | Articles | Books | Dab pages | Essays | Files | Help | Inact. | Portals | Redirects | Temp- lates | Wikipedians | WikiProjects |
0 | Main Page | N | N | (N) | yes | no | none | no | none | no? | no | no? | no(NP) | no | no | none | no |
Article[b] | N | N | (N) | never | all | never | tbd[2] | some | tbd(TS) | ||||||||
Other | N | N | (N) | no(NC) | never | tbd | some[3] | ||||||||||
Dab page | Y | N | (N) | never | tbd(NY) | all | no? | no | tbd(TN) | ||||||||
Redirect (hard) | - | Y | (N) | no?(NA) | some | none | none | some? | all(AR) | some[4] | |||||||
2 | User (excl. t.) | - | - | - | some | no(NU) | no(NB) | no | some | tbd | some? | tbd(TI) | no? | some | never | some | some |
User (template) | - | - | - | never | none | no? | tbd | no? | none | all | tbd | some | |||||
4 | Wp dab page | Y | - | - | never | no | none | all | none | none | none | none | none | none | none | none | none |
Essay (not dab) | N | N | - | never | never | all | no | some? | some | none | some? | none | no? | some | |||
Wp redir (hard) | - | Y | - | no(NA) | none | some? | none | some | some | all(AR) | some | none | some? | ||||
WikiProject | N | N | - | never | never | never | no? | tbd(TH) | no? | no | some(ST) | some? | all | ||||
Wikipedia (other) | N | N | - | some(SW) | never | never | tbd | some | tbd | some | tbc | never | |||||
6 | File | - | - | (N) | no(NA) | some? | none | none | none | all | some | some | some? | tbd | tbd | no? | tbd |
8 | MediaWiki | - | - | (N) | all | none | none | none | none | no | no? | none | none | some | none | no? | none |
10 | Template | - | N | - | no | no?(NX) | no? | no... | none? | some? | tbd | some | tbd | tbd | all | some(SV) | some |
Template redir | - | Y | - | none | tbd | none | none | some | tbd | none | tbd | ||||||
12 | Help | - | - | - | no?(NA) | none | none | some | tbd | no | all | some | none | some | some? | none | no?. |
None | CA | CB | CD | CE | CF | CH | CI | CP | CR | CT | CU | CW | |||||
14 | Category:Contents | - | - | (N) | yes | no | no | no | no | no | no | no | no | no | no | no | no |
Category (other) | - | - | (N) | no | some | some | some | some | some | some | some | some | some | some | some | some | |
None | CA | CB | CD | CE | CF | CH | CI | CP | CR | CT | CU | CW | |||||
100 | Portal (dab) | Y | - | - | never | none | none | all | none | no | none | none | none | none | none | none | none |
Portal (h/redir) | N | Y | - | some(SR) | none | never | none | no? | some | some(SR) | none | tbd | |||||
Portal (not d/sp) | N | N | N | no(NA) | all? | no | tbd | all(AP) | some | no | no | ||||||
Portal s/page | N | N | Y | some(SP) | no | some | some(SP) | tbd | tbd | ||||||||
108 | [[Wikipedia:Books|]] (dab) | Y | - | - | never | no | none | all | none | none | no | none | none | none | none | none | no |
[[Wikipedia:Books|]] (hard redir) | N | Y | - | some(SB) | none | none | never | none | some(SB) | none | |||||||
[[Wikipedia:Books|]] (encyc'c) | N | N | - | no(NA) | all?(AB) | all | no | none | no? | ||||||||
[[Wikipedia:Books|]] (Wp) | N | N | - | no? | no | all? | some? | ||||||||||
118 | Draft | - | - | - | some | no(ND) | no(ND) | no(ND) | none | tbd | no?(NS) | none | no(ND) | some | no?[5] | no | tbd |
446 | Ed. Program | - | - | (N) | some? | none | none | none | none | none | none | none | none | none | none | none | none |
710 | TimedText | - | - | - | some? | none | none | none | none | none | none | none | none | some | none | none | none |
828 | Module | - | - | - | some? | no? | none | none | none | no? | tbd | none | none | some | tbd(TM) | none | tbd(TW) |
None | CA | CB | CD | CE | CF | CH | CI | CP | CR | CT | CU | CW | |||||
1 | Talk | - | - | - | some | no(NT) | no | none | none |
tbd | tbd | no? | some? | some | tbd | no | some |
3 | User talk | - | - | - | some | no(NT) | no | no | tbd | tbd | no? | none | some | no? | some | some | |
5 | Wikipedia talk | - | - | - | some(SG) | no(NT) | none | none | tbd | some? | no? | none | some | tbd | no? | some | |
7 | File talk | - | - | - | some | tbd(T7)(NT) | none | none | no | no | no | some? | some | none | none | some | |
9 | MediaWiki talk | - | - | - | some(SG) | no(NT) | none | none | none | some? | no | none | some | none | none | some | |
11 | Template talk | - | - | - | some(SG) | no(NT) | none | no? | none | tbd | tbd(TI) | no? | some | tbd | no | some | |
13 | Help talk | - | - | - | some(SG) | no(NT) | none | none | none | some? | no? | none | some | none | none | some | |
15 | Category talk | - | - | - | some(SG) | no(NT) | none | no | none | some? | no? | none | some | none | no | some | |
101 | Portal talk | - | - | - | some(SG) | no(NT) | none | none | none | none | no? | no? | some | none | none | some | |
109 | [[Help:Using talk pages|]] | - | - | - | some(SG) | no(NT) | no | none | none | some? | no | no | some | none | none | some | |
119 | Draft talk | - | - | - | some(SG) | no(NT) | none | none | no? | none | no | none | some | none | none | some | |
447 | Ed. Prog. talk | - | - | - | some(SG) | no(NT) | none | none | none | tbd(TE) | no | none | none | none | none | some | |
711 | TimedText talk | - | - | - | some(SG) | no(NT) | none | none | none | none | no | none | some | none | none | some | |
829 | Module talk | - | - | - | some(SG) | no(NT) | none | none | none | none | no | none | some | tbd. | no? | some | |
2600 | Topic | - | - | - | all | none | none | none | none | no | none | none | none | none | none | none | none |
None | CA | CB | CD | CE | CF | CH | CI | CP | CR | CT | CU | CW | |||||
Note: The following namespaces are not shown in the table above: 2300&2301 (Gadget) and 2302&2303 (Gadget definition). Legend:
Notes about why a particular namespace-category combination isn't valid:
Notes clarifying the definition of a row:
Notes about why there's a "some" in the table:
Notes about why there's a "TBD" in the table:
Other notes:
|
Note: After the template has been changed it may be necessary to
this page.Top-level category structure
editThe diagram below shows some (probably the most important) categories at the top levels of the category structure. The two-letter codes are those used in the matrix above.
Other categories directly below Category:Contents (as of July 2016) are Category:Wikipedia categories, Category:Featured content, Category:Glossaries, Category:Image galleries, Category:Indexes of topics, Category:Lists, Category:Outlines, Category:Timelines.
Finding and fixing anomalies
editCategory intersection tools can be used to detect pages that are at an anomalous combination of namespace and high level categories; the appropriate changes to the categorization can then be made. Links to some category intersection queries and advice on fixing the mis-categorizations discovered can be found at User:DexDor/FHL.
The matrix above shows that there are some pairs of columns for which there are no types of pages, apart from category pages, that are valid in both of those columns. For example, there are no rows for which there is a green cell in both the CA and CE columns - i.e. there should be no pages in both Category:Articles and Category:Wikipedia essays. Thus, there should be no categories at that intersection. This can be checked using a category intersection query on the Category namespace. However, it's rare to find categories mis-categorized in this way so in general it's best just to look for (non-category) pages that are at an anomalous namespace/category combination as that will also uncover most incorrectly categorized categories (assuming that the category contains at least one page).
Maintenance of the matrix
editThe namespace-category matrix shown above is generated using a template (User:DexDor/Cmtp). The advantage of using a template rather than placing the details directly in this page is that parameters can be used to control how the template is displayed - thus, the template can generate both the compact format shown above and a longer more detailed format used during development/maintenance of the matrix. There is also a similar template which expands the CA column into lower level article categories.
To be done:
- Remove all unnecessary detail (including references) from the compact format of the matrix.
- Fix all "TBD"s (if possible).
- Have the number of months before a cell is flagged as "OLD" depend on the importance (e.g. whether it affects CA and how often pages get put in it).
- Ensure every "no" cell as a "N" note.
- Ensure every "tbd" cell as a "T" note.
- Ensure every "some"/"all" cell is linked to at least one example.
- Refresh all "OLD" cells.
- Seek suggestions for improvements from other editors.
- Consider whether checking of some cells could be improved by AWB, bots or database reports.
- Move from User namespace to Wikipedia namespace (but would then have to explain each edit, length might increase unless use short names for templates).
- Make other language Wikipedias aware of this (sharing ideas).
- Improve the documentation of the templates used in this.
- Should any more rows be split?
- Reduce the number of pages that don't match any row.
- Complete the diagram showing the top of the category hierarchy (and simplify that where possible).
Comments (e.g. about the status of a particular namespace-category combination or about the formatting of this page) are welcome on the talk page.
See also
editNotes
edit- ^ E.g. many of the pages that are directly in Category:Wikipedia disambiguation.
- ^ This "article" row is for any page in namespace 0 that is in Category:Articles, is not a hard redirect and is not categorized as a disambiguation page. I.e. it includes some pages that fall outside some definitions of "article" such as lists, outlines, soft redirects and stubs.
References
edit- ^ Note: Currently there may be a small number of pages that don't fit any row of this matrix.
- ^ E.g. there are articles below Category:Wikipedia articles incorporating a Leigh Rayment's Peerage Pages template that is below Category:Wikipedia sources.
- ^ Pages that are soft redirects (e.g. see Category:Redirects to Wiktionary) are at this intersection. Also (temporarily) hard redirects that are at RFD. Pages that have been (incorrectly) placed under a redirected category are at this intersection - see Category:Wikipedia non-empty soft redirected categories.
- ^ As of March 2015 there's also lots of pages at this intersection because of redirects in Category:WikiProject Artemis Fowl and Category:Redirects from books (which is, possibly incorrectly, categorized under a wikiproject).
- ^ Pages get here for a variety of reasons - (1) because a template (example) is created in Draft namespace and has been placed in a templates category, (2) a page in Draft namespace (example) uses a template in such a way that it puts the page in a tracking category (e.g. Category:WikiProject banners with formatting errors or Category:Geobox usage tracking for region type) that is under Category:Templates (which itself is dubious), (3) a page in Draft namespace (example) is in Category:Template test cases that is under Category:Templates.
- ^ As of November 2015: 108RinCR indicates that there are 255 hard redirects that are in Book namespace in Category:Wikipedia redirects. Wikipedia:Database reports/Page count by namespace shows there are 764 (presumably hard) redirects in Book namespace.
- ^ As of November 2015: 100RinCR indicates that there are 1587 hard redirects that are in Portal namespace in Category:Wikipedia redirects. Wikipedia:Database reports/Page count by namespace shows there are 11227 (presumably hard) redirects in Portal namespace.
- ^ E.g. "Please ... add [redirect] templates ... when you create a redirect".