Wikipedia talk:Categorizing articles about people/Archive 5

Archive 1 Archive 3 Archive 4 Archive 5 Archive 6 Archive 7 Archive 10

Ordering and sort-keys

I would like to add several new statements to the guidelines, but thought I should lay them open to debate first:

  1. Capital letters which are not at the beginning of a word should be converted to lower case, so that James BeauSeigneur should sort as "Beauseigneur, James", not "BeauSeigneur, James".
  2. Punctuation (including hyphens and apostrophes but not spaces) should be stripped out of names for sorting purposes, such that Maurie Fa'asavalu sorts as "Faasavalu, Maurie", not "Fa'asavalu, Maurie", and D:Fuse sorts as "Dfuse".
  3. Accented characters should be replaced with their unaccented counterparts: Uldis Bērziņš sorts as "Berzins, Uldis", not "Bērziņš, Uldis".
  4. Disambiguating terms should be omitted: Will Smith (comedian) sorts as "Smith, Will", not "Smith, Will (comedian)" or "Smith (comedian), Will".
  5. All parts of the title should be included except disambiguating terms; no names should be abbreviated or omitted, including middle names, where they form part of the title.
  6. Similarly, the sort key should not include any term which does not appear in the title. For instance, where an article's title is a nickname or stage name, the article should not be sorted by the person's real name. Also, if the title is an bbreviated name such as "Don" or "Chris", the article should be sorted by that name, and not the long version ("Donald", "Christopher").
  7. Suffixes should be placed at the end of the sort key, such that Robert J. Smith II sorts as "Smith, Robert J., II", not "Smith II, Robert J."
  8. Second and subsequent capital letters in a series of capital letters should be converted to lower case, such that RJD2 sorts as "Rjd2".
  9. The prefix "Mc" is generally sorted as "Mac", such that Anna McCurley sorts as "Maccurley, Anna", not "Mccurley, Anna" or "McCurley, Anna".

Any comments? --Stemonitis 13:19, 6 January 2007 (UTC)

Re. #1: Not needed imho. Would this make an essential difference? We try to avoid complexity in guidelines where there's no real issue to address.

Re. #2: Seems useful, apart from the "hyphens" which I wouldn't change (in other words: keep them in the sort key as they are in the page name). Note also:

  • This is not a "people names" exclusivity, so should go in Wikipedia:Categorization#Category sorting;
  • Attention should be drawn, however, that the current "people" sorting rule usually introduces punctuation (comma) where there wasn't one the in the original article name, and also that the comma in, for example, Charles de Secondat, baron de Montesquieu is retained. Plus the comma added that example uses two commas in the category sort key ([[Category:Enlightenment philosophers|Montesquieu, Charles de Secondat, Baron de]]). Note that this is an example currently in the categorization of people guideline. I also don't think that geographic names using a comma in the page name (e.g. London, Ontario) omit that comma when sorting. Maybe the simplest would be to make exception for commas for the "no punctuation" rule. Anyway, if "commas" are a general exception, this rule should move to Wikipedia:Categorization#Category sorting entirely.
  • Also, I see that you kept the period in e.g. "Smith, Robert J., II". This needs some refining if we want to use "remove punctuation" as a more or less general rule.

Re. #3: In fact already included. Since it's not only "people" that may have accents in their name, the issue is discussed at Wikipedia:Categorization#Category sorting. These general rules are linked from Wikipedia:Categorization of people#Ordering names in a category.

Re. #4: Not needed imho. "Smith, Will" and "Smith, Will (comedian)" would generally lead to the same sorting result. Warning against "Smith (comedian), Will" seems a bit like treating our editors in a childish way. WP:BEANS.

Re. #5: Not needed: redundant redundancy.

Re. #6: Not needed imho, self-evident.

Re. #7: Useful. Since you used the generic term "suffix" it also encompasses the "disambiguating terms" of #5 (so that the more complex and in fact contradictory formulation of #5 is certainly not needed separately).

Re. #8: No. This would be an issue for Wikipedia:Categorization#Category sorting anyhow (I mean, consecutive capitals and letters in a name is not a "people" exclusivity). And then I'd oppose it. There's no common sense in it, imho. This would, to name only one of the multiple problems this would create, make the current {{3CC}} template impossible to use, and then after all the instances of where this template is used are manually converted, the sorting order in the Category:Lists of three-character combinations would be exactly the same as it is now. Solution in search of a problem.

Note that for stage names etc using non-standard letters in their name (like for instance DJs but the same could probably be said about wrestlers etc) I do believe sometimes a bit of creativity is needed, but the kind of creativity that is hard to catch in rules:

  • Use of {{DEFAULTSORT:Rjd2}} in the RJD2 seems perfectly OK to me (didn't even know about the existence of {{DEFAULTSORT:...}}...);
  • ?uestlove: [[Category:American hip hop musicians|Questlove]] and the first entry of List of hip hop DJs and producers#0–9 show a different sorting order... I wouldn't know which is more appropriate...
  • [[Category:Hip hop DJs|DJ Quik]] and [[Category:Hip hop DJs|Cam, DJ]] reflect different approaches too (note that both are under "D" in List of hip hop DJs and producers). Also here I wouldn't know what works best in general. Couldn't the DJ-interested people agree on this? Or try to find out which way this is done most often in English? For example, check a few record stores, and see how the CD's are usually sorted?

Re. #9: Don't know about that one for sure: what is most habitual in English? Unless it can be clearly demonstrated that in general in English that is the way names are sorted, I'd reject it as redundant complexity. --Francis Schonken 14:15, 6 January 2007 (UTC)

These examples are all based on real situations that I've come across, so it's too late for WP:BEANS to apply. Of all of them, the internal capitals (1 and 8) is the most important problem, in my opinion. The MediaWiki software sorts all capitals before all lower-case letters, with the result that names like "DeWoskin" would be sorted before "Deac", which is clearly at odds with alphabetical order. The ideal situation would be for sort keys to be in all-capitals, but we've gone too far with the current system to do that. The best approximation is to consistently capitalise the first letter of a word only. Similarly, punctuation tends to come before A-Z/a-z, which also screws up the sorting, though I would perhaps be prepared to relent on the hyphens.
Perhaps there is some redundancy here, but I wanted to explain my ideas fully. I'm open to suggestions for better wordings. If you think it would be clear that "suffix" also includes "disambiguating term" (I wouldn't consider the disambig. a part of the name, but would consider the suffix as such, for instance, so would interpret the two differently), then that's fine. I see, however, no harm in reiterating rules held elsewhere. If it's important to strip out accents (and for sorting in categories, it obviously is), it would be appropriate to have a statement (like my 3), and link it to the appropriate guideline (to prevent real or apparent conflict between the two guidelines).
In response to your comments to 4, 5 and 6, I can only re-iterate that all these mistakes have been made in the past (and in good faith) and will almost certainly be made again. If that could be prevented by the inclusion of a short sentence here, then I see no reason not to. Providing too much information is usually better than providing too little. For the "Mac" question, we probably need to check how other encyclopædias do it; I haven't any to hand. --Stemonitis 14:56, 6 January 2007 (UTC)
Re "real situations", yes, I kinda noticed. For that reason I elaborated my answer to #8 a bit, but ended up in an edit conflict. Anyway, you can see my elaborations above, resuming (partly) to: it's not possible to catch everything in rules. Rules don't outdo using your common sense.
Also, if "mistakes" have been made, some of these are so evident, that it would be unnecessary to write guidelines about them. Typos are corrected by the dozen every second, yet there is no guideline listing all possible typos, warning against each of them separately. Indeed summing up "most popular" typos in a guideline would be somewhat of a WP:BEANS approach, while even for popular typos most people don't make them. --Francis Schonken 15:22, 6 January 2007 (UTC)
English usage on the alphabetization of Mc varies extravagantly; I have even seen it alphabetized as a separate letter, after M. It is probably simplest to let them fall between Mb and Md - because that is what editors who haven't looked at this guideline are most likely to do; and for some names the distinction between McX and MacX represents a difference of families. Septentrionalis PMAnderson 18:40, 10 January 2007 (UTC)
Yes, this is what I tend to do at the moment, and I have left that recommendation out from my new draft (see below). --Stemonitis 18:42, 10 January 2007 (UTC)
On the same grounds, I'm not sure about the decapitalization of internal capitals, which will at least sort as letters. Internal š, say, is different; it will sort in a place that is definitely wrong; but there's nothing really wrong with sorting Macbean separately from MacBean; and, again, trying to enforce this will mean an awful lot of corrections of what editors will naturally do. (And for not much profit; most categories won't have more than one Scot, so that, although we will have to correct all the MacBeans, for most of them it will have no effect on the category.) Septentrionalis PMAnderson 19:28, 10 January 2007 (UTC)
Most people will not realise that the software effectively uses two non-overlapping alphabets. B comes between A and C in alphabetical order, regardless of whether it's uppercase or lowercase, so there is definitely something wrong with sorting Macbean after MacWilliam or MacBean before Macalistair. I don't believe that any reputable reference work puts all the people with a capital letter after the "Mac" before all the people with a lower-case letter after the "Mac" (the dictionary I have to hand certainly doesn't), and I can't really see any argument in favour of it. (In fact, the two alphabets do not even abut; the characters "[", "^", "]" and "_" appear between them, although those characters will rarely appear in article titles, and especially rarely straight after "Mac").
The argument that many categories will contain at most one Scot does not hold water. There are plenty of categories with many people whose names begin with "Mc" or "Mac", and it would be helpful if they all sorted in the same way. Any kind of consistency will require a great number of changes, but that doesn't mean that we shouldn't make them, and it certainly doesn't mean that we shouldn't work out what they should be. It is to be hoped that editors who are unsure about how to categorise biographical articles will come to this page to find clarification, and find it, and so the consistency will trickle through over time even if no direct action is taken. And for those cases where it doesn't make any difference, no-one is obliged to make any change, so there's no harm done. --Stemonitis 22:54, 10 January 2007 (UTC)
OK, I've tried to cover all of the things I was trying to say into as concise a form as possible. My sandbox contains a draft of the relevant section, with my additions mostly confined to the last three bullet points. I think the extra explanation is helpful, because a lot of people seem to be confused as to why these changes are necessary. The short sentence "The sort key should mirror the article's title as closely as possible" covers several of my ponits above; I wish I'd thought of that wording before. I have also tidied up the remaining text a bit, and tried to make the layout more readily comprehensible. Any comments would be greatly appreciated. --Stemonitis 17:31, 10 January 2007 (UTC)
No-one has made any complaints about the draft, so I'll copy it across (but removing the "In categories dealing with British peers" qualifier which was included there). --Stemonitis 12:32, 12 January 2007 (UTC)
No approval either! I thought it too scetchy to give it a serious tought - didn't you see the evident errors? No, Louis IX is not a British peer, etc, etc. --Francis Schonken 13:02, 12 January 2007 (UTC)
It would probably have been better if you'd mentioned this before, but at least I'm getting the feedback now. Other than the Louis issue, are there any other changes you'd like to see made? Little things like that can easily be changed (for instance, by generalising the sub-heading to "Nobility"), and at any time, including after the new text is included. --Stemonitis 13:31, 12 January 2007 (UTC)
Sorry, no, I don't think the rough draft at User:Stemonitis/Testing ground worth any further consideration in this stage. If you'd be able to get the *obvious* (as in: should be obvious to anyone) errors, typos and internal contradictions out of it, that might be a step forward, and I might give it a second look. But then please also don't run ahead of other discussions regarding the changes you'd like, but which are currently far from approved (see the comments by myself and by others on this page). --Francis Schonken 15:47, 12 January 2007 (UTC)
For goodness' sake, if they're that obvious, then please tell me what they are. I have read through it, and didn't spot any typos, for instance (which doesn't necessarily mean that there aren't any). And I have heeded the comments on this page. It was too wordy and included redundancies; now it is shorter and better. It was thought that capital and lower-case letters both sorted as one alphabet; this is not the case. It was unclear whether there was consensus for forcing "Mc" to sort as "Mac"; that was removed. In the absence of any further or more detailed comments, it is difficult to know where the problems lie. And may I add that taking the attitude that improving it is beneath you is really quite galling. If you don't want to help, then that's fine, but please don't hinder. I can re-order the bullet points quite easily (perhaps you think that the suffixes issue belongs under "Sort by surname" or that "Augustine of Hippo" belongs under "Other exceptions"), so I don't see that as a major stumbling-block, either, even if the order is currently imperfect. --Stemonitis 16:06, 12 January 2007 (UTC)
Since no concrete suggestions have been proferred, I have reinstated my improved text. I hope that any problems that are found with it can be worked out without recourse to large-scale reversion. --Stemonitis 11:55, 16 January 2007 (UTC)
Your changes were an improvement over the current state, especially in organization. The textual changes are minor, so I'm not sure what Schonken's problems are. The existing text is flimsy, but that is another point. Your two additions, eliminating punctuation and lower-casing internal capitals, lead to an alphabetization that, as far as I can tell, confirms with English-language reference works and the MARC standards, so they are recommendable. --Afasmit 13:24, 17 January 2007 (UTC)

@Stemonitis:

  1. Since you didn't stop pressing me, I took a closer look at User:Stemonitis/Testing ground, and improved it to the best of my abilities. You will see that a lot of the improvements you have proposed are no longer there, for the reasons I explained above, mainly that they are not specific to names of people, so they should be discussed at Wikipedia talk:Categorization, and in the case a consensus for them would emerge, they should be implemented at Wikipedia:Categorization#Category sorting.
  2. Note that even for those of the new principles proposed by you I'm mildly positive about: *they need work*, among others the "remove punctuation", is vague, the way you desribed it contradictory (while not mentioning the commas etc); the "Decapitalisation after first capital" rule makes no sense in more than 90% of the cases, we *have* categories where everything is sorted capitals only, etc, etc, (all this are repeats of what I already said, and which was not implemented yet in any version of your proposals)...
  3. Again, take these new proposals to Wikipedia talk:Categorization, they are not specific for articles on *people*, so they don't belong in a guideline that is exclusively on the categorisation of people: first you need consensus on general collation rules for categories, before any of this would be implemented in the guideline on categorisation of people. --Francis Schonken 01:36, 18 January 2007 (UTC)


Francis, forgive me if I seem underwhelmed. A lesser person might have been insulted by your edit to my draft (which replaced the entire text and further improvements with the existing text, without any attempt to incorporate any of my hard work). I will nonetheless try to assume that you were acting in good faith. The issue about removing punctuation is not important enough to warrant full-scale reversion; perhaps a clarification along the lines of "and then" would be enough to indicate the (already implicit) order of events. (Furthermore, the examples would probably clear up any potential confusion.) Your assertion that internal decapitalisation "makes no sense in more than 90% of the cases" seems not to be backed up with any evidence. My own (extensive) experience suggests that it nearly always causes a problem. Names such as LeXxx, DeXxxx, and so on are routinely mis-sorted. I cannot remember a name with internal capitals that did not need to be manually resorted. But this all leaves the biggest point, namely that the rules could be applied more widely. Well, yes, perhaps they could, but I don't see that that stops them being implemented here, and, more importantly, I'm not sure that they do apply so generally that they would make sense at WP:CAT. Rules that apply to human names need not apply to microchips or automobiles or other types or article. I have seen no rule stating that all guidlines must apply at the highest level possible, and I can't imagine that such a rule would be helpful. If this is the limit of your criticism, then I really can't understand why you insist on reverting. Nobody owns the text, which can be improved as long as the changes represents consensus. I believe that my draft does represent consensus here, since you are the only person opposing it, and you don't seem to want to help improve the text. --Stemonitis 02:28, 18 January 2007 (UTC)
Sorry, no, that is not how it works. You can't force the rules in Wikipedia:Categorization via the back door of Wikipedia:Categorization of people. If you continue to try that, I think it would be better to move the content of this section to Wikipedia talk:Categorization.
Note that many categories contain both people names as entries of articles that are not person names. It made me think about this old joke of a dictatorial president-for-life who allegedly visited the UK, and was impressed by the cars driving at the left side of the road. So he came home and ordered the cars to drive left in his country, adding: "if it works well, we'll order in a few weeks the same for trucks". A bit exagerrated, but I hope this made clear what I think.
Note also that Wikipedia talk:Categorization is a much more active talk page (I mean, in terms of number of people participating), with a high average experience in categorisation issues. Two people against one, plus some additional criticism by PManderson (like it is on this page) is not going to establish "consensus" anywhere near soon on this. --Francis Schonken 09:34, 18 January 2007 (UTC)

Ordering of surnames in specific cases: Mc, O' and St.

I think specific mention should be made of surnames where the alphabetical indexing is different to the strict spelling of the surname. The three specific cases are mentioned in the section heading. I have in front of me a late 19th century Whitaker's Almanack listing the members of the House of Commons which includes:

Lyell, Leonard
M'Arthur, Wm. A.
Macartney, Wm. G. E.
M'Calmont, James M.
M'Cartan, Michael
M'Carthy, Justin
McDermott, Patrick
Macdona, John C.
Macdonald, J. A. Murray
MacDonnell, Dr. Mark A.
M'Ewan, William
Macfarlane, Donald H.
McGittigan, Patrick

.. and so on. Were this expressed in our category indexing, anyone with a surname beginning in M', Mc, or Mac, would be indexed with their surname beginning "Mac". Another case is surnames beginning with O', and here the list goes:

O'Driscoll, Florence
O'Keefe, Francis A.
Oldroyd, Mark
O'Neill, Hn. Robt. T.
Owen, Thomas

This means that surnames beginning with "O'" should be indexed without the apostrophe. Likewise surnames beginning with "St." as an abbreviation for "Saint" should have "Saint" spelled out in the category indexing. I propose to make this addition to the policy, but would be interested to know if anyone wishes to object. As I see it, this is not a controversial issue. Sam Blacketer 09:35, 24 February 2007 (UTC)

I'm with you on the O'Xxx issue, but the Mac variations may be more debatable. Sorting all of Macxxx, MacXxx, McXxx, Mcxxx, M'Xxx and M'xxx as Macxxx is bewildering for people unfamiliar with the practice. It would be more transparent, albeit perhaps less traditional, simply to sort by the letters included in the name, without punctuation, and without any internal capitals. Abbreviations like St. are somewhere in between "O'" and "Mc" in clarity, but are probably sufficiently widely understood to be sorted as "Saint Xxx" even when titled "St. Xxx". However, this does bring up the overlooked problem of spaces in surnames: "Saint John, Forename" sorts before "Saint, Surname", because space sorts before comma in ASCII, but I doubt that any reputable reference work (except Wikipedia) follows such a scheme. --Stemonitis 17:19, 25 February 2007 (UTC)
On "St.", I have certainly seen alphabetical lists produced recently where "St. " was indexed at the beginning of ST, but presumed it was always an error caused by computer sorting not recognising the abbreviation. Certainly, a surname of "Saint" should be before "St. John". On the "Mc" problem, I wonder if this is a case of national variations? I have not seen any alphabetical list which has separated them which could not be attributed to a computer's misrecognition. Sam Blacketer 11:28, 26 February 2007 (UTC)
I wonder if it's possible to distinguish between the naïvété of a computer and a deliberate, conscious simplification. Sorting "McX" after "Mbxxx" could easily be someone's deliberate choice. I guess we can only mirror the practice used by Britannica and other encyclopædias, whatever that might be. The only solution I can see to the multi-word surname issue is to replace the spaces (ASCII 32) in the surname with a character that would sort after the comma (ASCII 44) but before any (lower-case) letters, and my recommendation would be either to use the underscore ("_", ASCII 95), so that Ian St. John would be sorted as "Saint_John, Ian", for example, or to simply run the words together so that Ian St. John would sort as "Saintjohn, Ian". I am not sure which approach I prefer. Again, it probably depends on the usage preferred elsewhere. --Stemonitis 17:36, 26 February 2007 (UTC)

Ordering of Mac, Mc and M'

This came up is one of my early edits, when I knew less than I do now about Wikipedia. For ease of finding a name, even when the correct spelling is not certain, it is common practice to sort them all as if they were Mac. As described above this can be before the other Ms or after Mab... The Telephone Directory places them all ((including names such as Mace) after Mabbott and sorts all as Mac, but that is hardly a definitive source! I am told that this is documented in a "standard text" for librarians, but have not sourced that. All I have found on-line is Everything - third section; second bullet. Collation indicates that this practice may have fallen out of favour since computerisation. I agree that much of the time it is an unnecessary complication; even in List of Scots there do not appear to be a vast number of Macs and Mcs (m' less common nowadays). However I believe we should have an agreed style. Comments? Finavon 22:16, 11 June 2007 (UTC)

I would also advocate having all McX names sorted as Macx, but it's a lot of work, and I think we would need to establish a wider agreement (perhaps including groups like Wikipedia:WikiProject Scotland) before insisting upon it. There are a lot of McX articles still sorted as McX, and it might be suitable as a bot task once a general agreement had been reached. There are also good reasons for not sorting McX as MacX, so I don't think we should be too rash. --Stemonitis 07:07, 19 June 2007 (UTC)
I've just recently been applying the current guideline, WP:Categorization_of_people#Ordering_names_in_a_category, specifically the bit that reads: The first letter of each word should be in upper case, and all subsequent letters should be in lower case, regardless of the correct spelling of the name to a number of "Mac"/"Mc" articles when I noticed that there were too many "Mc" articles sorted as "Mac" to be coincidence. Assuming I'd missed a policy somewhere, I went looking, but I haven't seen it anywhere. This talk page seems to have the most serious discussion on the issue of whether "Mac" and "Mc" should be split or merged.
It's my impression that the merge approach, of having them all sort together regardless of actual spelling, is an older style that's fallen out of fashion, probably a victim of the shift from hand collation to machine collation. I used to see address books with a separate "Mc" tab, the 15th edition of the Encyclopedia Britannica (circa 1984) used merge (and had a one line assertion that it was the right thing to do in a well-constructed index, in the Mac article). I also used to see small-town phone books that had a separate "Mc" section.
However, most recent things that I can find have gone with split:
I do find some holdouts for merge:
Personally, I favor split because:
a) it seems to be the current trend, see above.
b) it's a straightforward mechanical rule that doesn't require any judgment calls. While Wikipedia guidelines shouldn't favor simplicity to the significant detriment of the encyclopedia, when it's a relatively free matter of style, the simpler guideline should be chosen. There's already a significant learning curve to becoming a good editor; I don't think adding more special cases helps.
c) Adopting the merge approach will lead to secondary rules that have to be made. Consider Dick and Mac McDonald; under a merge approach, they sort as "Mac"; what then do we do with the giant corporation named for them, McDonald's, sort it the same or different? And articles derivative of the company name, such as McJob and McMansion? We could have one rule for people and another for non-people, but that produces absurdity, with McMurdo Sound and Archibald McMurdo being separated.
d) It strikes me as being a slippery slope or Camel's nose. There are, for example, Chinese names that have several alternate romanizations, such as 王, which has been romanized as both Wang (surname) and Wong (surname). (And there's a different chinese name that commonly romanizes as both "Wong" and "Huang".) It could easily be argued that Chinese historic names ought to be sorted under a canonical name that represents some chosen standard romanization, rather than whatever happens to be the historical accepted romanization, but then we'd be arguing about which is best, for dozens of names, and new editors would have even more rules to learn. (We already have folks arguing that the historical romanization for Mehmed II ought to be replaced with the contemporary Turkish romanization "Mehmet"). I think accepting a split approach would be to throw an Apple of Discord, giving everybody with a socio-linguistic axe to grind that little bit of inspiration and ammunition to keep flogging their cause (but maybe I'm just paranoid). And no one seriously considers sorting "Derby" and "Darby" together, even though they evolved from the same name (and are pronounced the same in some parts of the world, though not mine).
e) It could possibly be argued that using the phonological argument for grouping "Mac" and "Mc", when we don't use phonological sorting anywhere else, is quietly promoting Scottish nationalism. While I'm proud of my one distant ancestress from Scotland (a McDiarmid), I don't think the Scots need special treatment.
Since it's a reasonable argument to have, given the historic precedents of sorting both ways, whatever we come up with as a consensus should be memorialized in the guideline. Studerby 21:11, 8 July 2007 (UTC)
Of these five reasons, only a) and b) are really relevant. The case under c) that it would introduce inconsistencies is not especially significant since it is rare that any such pairs of articles would be categorised together, most categories being either biographical or non-biographical, and relatively few being mixed (I can't think of any, although I'd be prepared to bet that there are some). Sorting rules already differ between categories, and that's not a problem. Transliteration of Chinese (d) is utterly irrelevant, because we would sort on whatever romanisation is used in the article title. Finally, e) doesn't apply, because this is not really a phonological argument; it's entirely to do with traditions of collation, and Mc and Mac have traditionally been grouped together, whereas others such as Darby and Derby have not. Scottish nationalism has no bearing on the issue, for several reasons, including that Mc- / Mac- surnames are also Irish. I think it is still unclear whether the general trend among relevant works is to lump or to split (my dictionary has McX explicitly under MacX), and that is more or less the only criterion we should be using to judge. Simplicitly is vaguely desirable, but not at the cost of authority. Surnames and their collation are surprisingly complicated, and attempts to over-simplify are likely to be fruitless. Please, let us stick to the relevant factors and not get carried away with speculation and invention. I might also note that whatever is decided upon (if anything at all), most of the McX articles will be ill-sorted, because a large majority have internal capitals in the sort key. If there weren't so few other surnames beginning "Mc" (cf. Leri Mchedlishvili and Guram Mchedlidze), this would be quite urgent. --Stemonitis 21:35, 8 July 2007 (UTC)

I agree with Stemonitis, and I always index both MacX and McX as Macx. This isn't just a a convention, it's a convention with a reason, because the spellings are not always handled consistently across a family or even for the same person, and there is no guarantee that an entry found somewhere for "John MacCarthy" will not be listed elsewhere as "John McCarthy", or (without capitalisation) as "John Mccarthy" or "John Maccarthy". The consistent approach makes it easier to find articles, and isn't that the whole point of indexing? --BrownHairedGirl (talk) • (contribs) 22:23, 7 November 2007 (UTC)

There's been a recent discussion of this and other alphabetisation issues at Wikipedia_talk:Manual_of_Style#Alphabetization, which may be of interest. PamD (talk) 08:22, 10 February 2008 (UTC)

  • Oppose It's always best to spell the person's name correctly. As a "Mc" myself, I find it nearly offensive when someone misspells my name wrong the first time. When I correct the spelling and they continue to do so, it just makes me think they don't care about accuracy. Some will find it offensive. This is a guideline--one I will forever choose to ignore and I beg all other Wikipedians to follow my lead.--Paul McDonald (talk) 01:54, 15 July 2010 (UTC)
    • details I've put together the first draft of an opposition essay here.--Paul McDonald (talk) 15:34, 15 July 2010 (UTC)
    • It's not a question of spelling the name correctly. The sort key never displays anywhere; it just controls sorting. --Auntof6 (talk) 23:13, 15 July 2010 (UTC)
      • response I get that. Why in the world would Wikipedia ever choose sort articles incorrectly?--Paul McDonald (talk) 02:54, 16 July 2010 (UTC)
"Incorrectly" is a matter of definition, not of Universal and Absolute Truth. It's long been a common sorting practice to separate out all the Mc/Mac names from the other M names; we're simply continuing that practice, one that most people are familiar and comfortable with. -- Jack of Oz ... speak! ... 13:20, 16 July 2010 (UTC)
"Incorrectly" is a matter of definition, and sorting "Mc" the same as "Mac" is incorrect. I know of no sorting algorythm that calls for equalization of different logical values. The other Mc/Mac names can be sorted differently from the other M names automatically because they are "Mc" and "Mac", which would be different from "Morris" -- "simply continuing" a practice because "most people are familiar and comfortable with" is another way of saying "we've always done it that way" which, of course, never makes it right.--Paul McDonald (talk) 18:07, 16 July 2010 (UTC)
When it comes to spelling, "we've always done it that way" is exactly what does make it right! This is why it's as incorrect to spell your surname MacDonald, as it would be to spell your neighbour MacDonald as McDonald; in both cases, you have family tradition to support your particular variants. Some Mc/Mac surnames have a space after the Mc/Mac, and some protocols would regard these surnames as simply "Mc" or "Mac"; thus "Mac Clellan" would sort before "MacAlister". Is that what you want? I also note your own username is Paulmcdonald, a name you yourself chose, yet you get nearly offended when someone misspells your name wrong the first time. What's the message there? -- Jack of Oz ... speak! ... 22:42, 16 July 2010 (UTC)
But the point that "we've always done it that way" loses its force when you realize that WE don't do it this way anymore...even the US Library of Congress stopped doing it this way 20+ years ago - look below in the other thread (search this page for the phrase "a couple of decades ago" to jump to it) for the email response I got from the Library of Congress when I asked them about it. Such a system might have made sense in former centuries when different people spelled the same names differently (even from page to page), but given that we Scots have been literate for a while (end sarcasm tag) and we really know how we want our names spelled, I cannot see any justification for continuing this archaic practice.
William J. 'Bill' McCalpin (talk) 18:51, 19 July 2010 (UTC)
Just a note: it isn't only entries for today's literate people that we're talking about, it's entries for people from all different times whose names might have been spelled differently in different places. --Auntof6 (talk) 20:00, 19 July 2010 (UTC)
Comment I'm reasonably sure that no one from the 1700's is going to look up anything on Wikipedia.--Paul McDonald (talk) 20:35, 19 July 2010 (UTC)
Of course not, but someone today might look up someone who lived then whose name might have been spelled in different ways. Not a big deal, just a thought I had. --Auntof6 (talk) 22:28, 19 July 2010 (UTC)
  • Point of order: This thread had a gap from Feb 2008 to July 2010. The topic was discussed again in 2009/2010 at #Mc_vs._Mac below. I'm not sure what the protocol is for merging the various threads on the same topic, but thought I'd alert you to it! It doesn't seem helpful to start again on a thread which is not the most recent on the topic. PamD (talk) 13:49, 16 July 2010 (UTC)
And see Wikipedia_talk:Manual_of_Style/Archive_116#.22Mac_vs._Mc.22_Discussion_again for another archived discussion. PamD (talk) 13:52, 16 July 2010 (UTC)
    • Probably should have been archived. Ideas?--Paul McDonald (talk) 18:07, 16 July 2010 (UTC)

Does anyone else find it interesting that articles must have reliable sources but Policies and Guidelines are promulgated based solely on the opinions of the editors who voice their opinions, often in the face of reliable sources? It does no good to cite reliable sources, such as the Library of Congress, in these instances. Consensus, defined as those who can defend an opinion longer than those on the other side, will always take precedence over rigor and reliable sources. JimCubb (talk) 21:35, 16 July 2010 (UTC)

Mc vs. Mac

This seems to be discussed above, but is "For a surname which begins with Mc or Mac, the category sort key should always be typed as Mac with the remainder of the name in lowercase" necessary? I mean, I get McD should be Mcd, but why do we have to auto-change from Mc to Mac? Is there something I'm not getting? Wizardman 04:42, 1 March 2009 (UTC)

The reason is that we want McDonald/MacDonald or McAdams/MacAdams to be listed sorted as the same name.Headbomb {ταλκκοντριβς – WP Physics} 04:52, 1 March 2009 (UTC)
Hm. to me they're still different, but to each his own, it's not something i particularly care about, just thought i'd ask. Wizardman 05:10, 1 March 2009 (UTC)
They are different names (Macdonald also exists), but if you only have the spoken version, this sorting lets you look in just one place. It doesn't matter where there are only a few Mc/Macs, but makes a difference to manual searching when there are lots. Please don't change the visible name, just the sort name (so no auto-change!). Finavon (talk) 14:00, 1 March 2009 (UTC)
This is something I've always hated, actually. There's lots and lots of names that have a most-common spelling and less common variants, e.g. Thompson, Thomson, but we don't "normalize" spelling for any of them, and would properly be reviled if we did. Mc/Mac was often treated differently from other spelling variations in the past, but it seems to be a usage that's dropped out of most common references, Wikipedia being the only current exception I happen to know. The Encyclopedia Britanica used to consolidate Mac/Mc, but no longer does so. Studerby (talk) 22:03, 14 April 2009 (UTC)
I see your point about Thomson/Thompson etc. The Mc/Macs are in a slightly different category, though. Each well known Mc/Mac surname can have up to 4 variants: McDonald, MacDonald, Mcdonald, Macdonald (and possibly others such as M' Donald and those that have a space after the Mc/Mac). Britannica etc can deal with these effectively as they're written by a relatively small coterie of experts. WP is written by, potentially, everyone in the world. Native English-speakers have a hard enough time remembering which notable people are Macs, which are Mcs, which capitalise the first letter of the rest of the surname, and which don't - let alone people for whom English is a second or later language. It makes very great sense to me to look at a category and see all the Mc/MacDonalds etc grouped together. They still come out with the correct spellings. Brian McDonald appears before Charles MacDonald, who appears before Egbert Mcdonald, who appears before Simon Macdonald. If we sorted them under their exact spellings, the number of duplicate articles would rise significantly; the number of merges necessary to fix them would be too great; and it would be virtually unmanageable. I've identified various duplicate entries by simply resorting any Mcs or Macs I come across as "Mac", and decapitalising the first letter of the rest of the name. It's simple, effective, and once done, it stays done. -- JackofOz (talk) 22:18, 14 April 2009 (UTC)
We conventionally de-capitalize ALL non-initial letters (we have to, to get correct sorting for things like "duBois"|"DuBois"|"Dubois"), leaving us with "Mac"|"Mc"; I don't think it's too much to ask for people to look in 2 places in very large categories. The issue is irrelevant in most small categories, as they don't have enough entries under "M" for the searched-for name to be missed. That said, consensus seems to be against me on this (although I don't think a formal effort to determine the consensus has been done), and I have no problem with following consensus or feel like agitating for a review - it came up, I commented.... A lot of lesser issues in sorting are matters of choice, and this is one. I don't think the current way is wrong, per se, I just really really don't like it and disagree with the common rationale for it and wish we were more like other major references in this regard. Part of the issue is really that Wikipedia needs to improve its finding aids; categories are a weak but necessary aid for finding people when you know a little about them but not exactly how the name is spelled. That is the only scenario (I can think of) in which the effort to sort all Mc/Macs together makes particular sense, and it does. Better finding aids would let us dispense with what I see as an oddity... but we're not there yet. Studerby (talk) 19:47, 15 April 2009 (UTC)
Because users have trouble distinguishing Mc/Mac is no reason for Wikipedia to start altering the way these names are sorted. Thompson/Thomson, Chris/Kris, Tom/Thom, and Evenson/Evanson are other examples where we would not change the sort order. Dictionaries do not purposefully mis-order the word "sophomore" simply because English speakers drop the middle syllable ("o"). This is senseless. It does not matter whether a non-English speaker can remember the difference between Mc/Mac or McDonald/Mcdonald/MacDonald/Macdonald. The information needs to be correct. Most English speakers do not pronounce the "g" in "Nguyen"; does this mean all Nguyens are now to be sorted as "Nuyen"? No, but we'll mis-sort Mc/Mac names. Senseless. - Tim1965 (talk) 00:22, 17 May 2009 (UTC)
Well, it depends what you mean by "correct". Firstly, all the individual names appear correctly spelled, wherever they appear in a list. Secondly, it's very common in the real world for long lists of names sorted by surname to start the M section with a sub-section for the Mcs and Macs, and the remaining M surnames follow. That's a useful system out there, and it's just as useful here. It's a convention to do it that way; it's neither more nor less "correct" than having a list sorted strictly alphabetically. -- JackofOz (talk) 03:46, 17 May 2009 (UTC)
I too question the rationale behind this seemingly silly convention. What of Smith/Smythe, Nguyen/Winn, or any other hard to spell last names? The fact that one may not know which spelling is correct for a given individual is adequately solved by disambiguation. I don't see any reason that this would lead to duplicate articles, and I believe it's more correct to alphabetize people correctly than to worry about finding entries in large categories. I don't see any compelling reason that a non-paper encyclopedia would categorize names phonetically, and it's even more confusing when it's only done for one class of surname. Is there really consensus for this? Oren0 (talk) 03:41, 2 June 2009 (UTC)
 _________________________________________|
 |
I had trouble believing that this merging of Mc and Mac is a good idea (perhaps because my last name is 'McCalpin', I am entitled to have an opinion ;-) ). I noticed in the discussion at the top of the page that one part of the US Library of Congress uses the "split" method (i.e., not treating Mc and Mac as the same spelling), so I asked their reference section what the policy was for the entire Library. This was the answer I got:
Question History:
Patron: General Inquiry:
Wikipedia sorts last names that begin with Mc- 'merged' with last names that begin with Mac-, as if the Mc- names were actually written as Mac-. This leads to a different sort sequence in their lists than any computer would generate and different than most people would expect.
Their argument is that this is the way it's been done for a long time, because of the variations and unpredictability in the spellings of Scottish names. However, others point out that fewer and fewer institutions are building their sorted lists this way, choosing to go for the more straightforward "sort on the actual spelling".
Which does the Library of Congress do and why? Do you know if there have been international standards on this subject by those in library sciences?
Thanks!
Bill McCalpin
(here follows verbatim reply from the Library of Congress)
Librarian 1: Thank you for consulting the Library of Congress's Digital Reference Section.
It is my understanding that the Library of Congress rules for filing catalog cards changed a couple of decades ago [emphasis mine] from filing all the "M' " "Mc" and "Mac" together ("as they sound" it was explained to me) and the Library now files them exactly as they are spelled.
The official filing rules may be found in the title "Library of Congress Filing Rules" described here: < http://www.loc.gov/cds/catman.html#locfm > and available in print via the Cataloging Distribution Service at the Library of Congress for $10. The title is also available as part of the "Cataloger's Desktop," an online subscription resource for catalogers.
The text confirms that the current practice at the Library is to alphabetize exactly as words are spelled.
Section 1., "Basic Filing Order" says the following:
(begin quote)
Fields in a filing entry are arranged word by word, and words are arranged character by character. ::This procedure is continued until one of the following occurs:
a. A prescribed filing position is reached.
b. The field comes to an end (in which case placement is determined by another field of the entry or by applying one of the rules given hereafter).
c. A mark of punctuation indicating a subarrangement is encountered.
1.1. Order of Letters
Letters are arranged according to the order of the English alphabet (A-Z). Upper and lower case letters have equal filing value.
(end quote)
After I have sent you my response, I plan to refer your inquiry to my colleagues in the Library's Acquisitions and Bibliographic Access Directorate (the most authoritative source of this information at LC), so they can confirm this information, and provide you with any additional information they may have on any incipient standards of which they are aware.
Another useful book that includes rules on alphabetizing, if you wish to explore this further is:
LC Control No.: 2005004214
Personal Name: Mulvany, Nancy C.
Main Title: Indexing books / Nancy C. Mulvany.
Edition Information: 2nd ed.
Published/Created: Chicago : University of Chicago Press, 2005.
Description: xiv, 315 p. ; 24 cm.
ISBN: 0226552764 (alk. paper)
You may want to check for this book at your local library.
We hope you find this information helpful. Good luck with your research!
Digital Reference Section
Ask A Librarian Service
The Library of Congress/lsg
So, could anyone give me a reason that actually makes sense to people who carry this type of last name why Wikipedia wants to sort differently than Library Science professionals, other than "well, that's how people used to do it"? Also, who makes this decision and how is it changed?
William J. 'Bill' McCalpin (talk) 17:36, 5 February 2010 (UTC)

Mc... vs Mac...

Why do we treat these two names as the same when the legal names have diverged and are no longer synonyms? As of now we force the sort key to always be "Mac", but why? --Richard Arthur Norton (1958- ) (talk) 02:35, 15 April 2009 (UTC)

See "Mc vs. Mac" above. -- JackofOz (talk) 06:36, 16 April 2009 (UTC)