See also the current guideline Wikipedia:Naming conventions (use English) and the proposed guideline Wikipedia:Naming conventions (standard letters with diacritics) which was {{rejected}} on 21 April 2007

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section.

Motivation

edit

A clear policy statement on this issue seems to be necessary, to avoid continuous repetitions of the type of discussion exemplified at WP:Requested moves/Tennis. This is only a stub of a proposal so far; I hope others will help develop it (or I will sometime when I'm wider awake). I believe the proposed statement actually reflects current practice, and certainly should continue to be the practice, since a serious reference work exists to provide serious information.—Preceding unsigned comment added by Kotniski (talkcontribs) 05:12, 8 June 2008

Discussion (no votes yet please!!)

edit
  • This proposal doesn't really change much; see Wikipedia:Naming conventions (use English) for the use of diacritics or special characters. Ironholds 05:15, 8 June 2008 (UTC)Reply
    • Though people seem to quote WP:UE as an argument against diacritics. Whatever the present situation is, it seems that a more explicit statement is required (see WP:Requested moves/Tennis for an example of the confusion as to what current policy is).--Kotniski (talk) 05:20, 8 June 2008 (UTC)Reply
      • Your policy doesnt seem to really be specific, though. it comes out as you either can or cant use diacritics, although we prefer you do. there's no hard yes/no, its optional, and in a world where diacritics in a querty keyboard require fiddling with the alt key and numerical pad for hours on end i can't see it being taken up on a large scale. That being said, i think it's a nice idea, although it could be quite difficult, i.e irish names with 4/5 diacritics in the full thing. You'd also have to (most likely) have redirects for most of the articles with this policy in place so user's dont have to type it all out when searching, although i'm sure that already exists for many articles.Ironholds 05:25, 8 June 2008 (UTC)Reply
        • Its easy enough to enter dialectics using the insertion fields beneath the edit box. Also on non-dodgy operating systems it can be a lot easier - in fact using my nice OS I can get all of the common Western European ones [é, ü, ß, å, ñ, ç, etc] using only two keys. --Neil (talk) 16:12, 8 June 2008 (UTC)Reply
  • I think instead that the current version of conventions, Wikipedia:Naming conflicts - see the section on proper nouns, where it is written that "If a native name has a common English-language equivalent, the English version takes precedence", should be adopted as policy. Following this, it should then be clear when to use diacritics or not: They are used in article names if the subject is commonly described with diacritics in English, and not if they are not (and, of course, in any article lead, the original spelling is provided). This appears suitable for this place, the English wikipedia. --HJensen, talk 06:38, 8 June 2008 (UTC)Reply
    • The point is that pretty much all diacriticed words can appear without diacritics in English. There are some sources (including many good ones) that simply don't use diacritics, or use only a small subset of them. There will also always be plenty of good sources that do use diacritics. So to say that one or other form is correct and the other incorrect (or "not English") is generally going to be unsupportable. We are perfectly entitled to decide which of these styles is appropriate to WP. (Of course, we don't want to punish anyone for using either form, but in order to avoid pointless edit wars and things like the tennis-player debate, we should have it set down explicitly which style is preferred.)--Kotniski (talk) 08:02, 8 June 2008 (UTC)Reply
      • I do not purport that one or other form is "correct." The keyword in the quoted text is "common", which is completely different. And a common English-language equivalent can then be determined from case to case. --HJensen, talk 13:14, 8 June 2008 (UTC)Reply
        • But the "equivalents" are just the names without the diacritics, not actual English names. And shouldn't Wikipedia be consistent with/without diacritic use? BalkanFever 13:33, 8 June 2008 (UTC)Reply
          • They aren't "without the diacritics", they are "as used in reliable sources". By policy, we are mindless sheep following English usage. If the sources consistently omit the diacritics, so do we. Somedumbyankee (talk) 18:37, 8 June 2008 (UTC)Reply
        • This is the crucial point: "common English-language equivalent" means Venice, not Venezia; Bucharest, not Bucureşti; it shouldn't mean "if diacritics are commonly stripped, keep them that way". Wikipedia is not a newspaper; there is a reason why Britannica and National Geographic retain diacritic characters in Latin-based alphabets. GregorB (talk) 16:53, 8 June 2008 (UTC)Reply

Use them when the person is not known/is known with diacritics in the English world. Don't use them when the person is widely known in English without them. It doesn't seem that complex to me when you remove the WP:IDHT from it. Narson (talk) 13:39, 8 June 2008 (UTC)Reply

(Dedent) This page/policy needs to have two sections, Article Name and Name usage. For the name usage both the Diacriticed word and the English equivalent need to be listed in bold in the article introduction. The rest of the article will use the page name. For the page name either the Diacriticed word or the English equivalent should be the page name, the other should redirect to the proper page. --Lemmey talk 13:44, 8 June 2008 (UTC)Reply

  • In the interests of searching, redirects and linking and so on: the rule could be that page titles would be determined on a case-to-case basis on whichever is most well-known. The page itself would then contain the phrase with diacritics. Ironholds 14:01, 8 June 2008 (UTC)Reply

There is no need for this guideline. The current WP:UE guideline is neither overtly hostile to accent marks or friendly. The WP:UE guideline, follows Wikipedia Policies (WP:V, WP:NOR and WP:Naming conventions) "Use the most commonly used English version of the name of the subject as the title of the article, as you would find it in verifiable reliable sources". and "Wikipedia does not decide what characters are to be used in the name of an article's subject; English usage does. Wikipedia has no rule that titles must be written in certain characters, or that certain characters may not be used. Follow the general usage in English verifiable reliable sources in each case, whatever characters may or may not be used in them." As for names in a page there is a section in the MOS that covers it see Wikipedia:MOS#Foreign terms.

For the majority of foreign names the current guidelines are consistent with Wikipedia policy, buy there are two special categories in WP:UE where reliable third party English language sources may not be enough to determine what to use in English. The first is Divided usage in these cases if it can not be agreed what is best, then it is a good idea to put the page up for WP:RM to decide the issue (as the use of accent marks is a contentious issue). The second is No established usage in English the suggestion is to use the name in the local language. --Philip Baird Shearer (talk) 19:25, 8 June 2008 (UTC)Reply

The "need" for this guideline (or one like it) is amply illustrated by the length of the tennis-player discussion (and other similar discussions which break out from time to time). If our current guidelines are leading to massive confusion then they are not doing their job. Imagine how much better WP would be if all the effort spent arguing were being put into editing articles.--Kotniski (talk) 08:46, 9 June 2008 (UTC)Reply
Perhaps sources that systematically strip diacritics aren't reliable. Perhaps those who don't (Britannica and National Geographic - I've bored even myself mentioning them) are. GregorB (talk) 19:52, 8 June 2008 (UTC)Reply
If sources are used in an article then then they are reliable (otherwise they should not be used). What is or is not reliable source is decided at policy level such as WP:SOURCES, not at a guideline level. --Philip Baird Shearer (talk) 20:08, 8 June 2008 (UTC)Reply
OK, I'll be more direct now: why not adopt what Britannica and National Geographic are doing? They don't "source" spelling (it makes no sense, IMO); they say this: for Latin-based alphabets, use anglicized form if it exists (as in Warsaw rather than Warszawa); if not, use the original spelling. Apparently, they don't consider stripping of diacritics as "Anglicization" and don't "source" it; it makes no sense to source sloppiness. GregorB (talk) 20:17, 8 June 2008 (UTC)Reply
We are doing exactly that, it's mostly that people are responding with WP:IDHT when someone suggests that there is an existing anglicized form. Somedumbyankee (talk) 20:20, 8 June 2008 (UTC)Reply
GregorB Wikipedia is not Britannica and National Geographic, we have our own policies and guidelines which cover this issue. Another international publication that has a policy on this is the Economist. Presumably the editor of the Economist, not unreasonably, considers that educated English people (at least those target readers of the Economist) should be aware of the usage in the other major world languages that are written in a modern Latin alphabet (but not all accent marks in all languages): "Put the accents and cedillas on French names and words, umlauts on German ones, accents and tildes on Spanish ones, and accents, cedillas and tildes on Portuguese ones: Françoise de Panafieu, Wolfgang Schäuble, Federico Peña. Leave the accents off other foreign names. Any foreign word in italics should, however, be given its proper accents."[1]. Is the Economist any less of a reliable source than National Geographic? Now rather than follow any one style guide, it has been agreed in Wikipedia Policy that "Generally, article naming should prefer what the greatest number of English speakers would most easily recognize, with a reasonable minimum of ambiguity, while at the same time making linking to those articles easy and second nature."(WP:NC) and that we determine what that name is by using WP:SOURCES. It is really that simple. If we were to go with this proposed guideline, then we are going against Wikipedia policies (because they say use common English names and use reliable sources to determine what that is). The current WP:UE rule of using whatever reliable sources use, is a simple rule and a reasonable compromise given the difference of opinions on the use of accent marks in article titles there is between Wikipedia editors. --Philip Baird Shearer (talk) 21:03, 8 June 2008 (UTC)Reply

Could it be that the "generosity" of The Economist towards French, German, Spanish and Portuguese spellings while ignoring others is simply based on their strong focus on economy? Does that overwrite general purpose encyclopedias' usage? Squash Racket (talk) 05:35, 12 June 2008 (UTC)Reply

I am sure we can make up lots of reasons why the Economist Style guide is as it is, but that is not relevant, but what is relevant is when Economist correspondents follow the style guide they are not being lazy when they do not use East European accent marks. Yes we should follow usage in encyclopaedias and other reliable sources, that is what Wikipedia policies and guidelines state we should do. For each page name if reliable sources for that page use accent marks so should we and if they do not then neither should we. --Philip Baird Shearer (talk) 09:57, 12 June 2008 (UTC)Reply
And if some do and some don't? How do we count them? Weight them? Is there really any point in doing such research for each individual name, if what is going to come out (even assuming agreement can be reached in each case) is going to be inconsistency and confusion within Wikipedia? Remember that making a good encyclopedia is more important than following guidelines, and in any case we are discussing a proposed change to the guidelines, so stating that the current guidelines say something different is a bit of a non-argument.--Kotniski (talk) 10:14, 12 June 2008 (UTC)Reply
In that case we should be going against current Wikipedia policies. This all started with the names of tennis players. I'll be blunt: I don't care whether atptennis.com spells it "Đoković" or "Djokovic", as they are not arbiters on matters of spelling. Britannica would choose "Đoković" per policy; I haven't yet heard a principled argument as to why Britannica's policy is not good. GregorB (talk) 21:19, 8 June 2008 (UTC)Reply
A most unlikely, and unsourced, claim. The Brittanica uses Tudjman; I don't see why they would not spell Djokovic similarly. Septentrionalis PMAnderson 21:42, 8 June 2008 (UTC)Reply
GregorB, I notice that you have now dropped National Geographic. Was that because of the Economist? Are you sure about Britannica? What happens with your proposal if Britannica does not have an article on a person or they do not use accent marks such as Lech Walesa and Gdansk it does not seem that they do always use accent marks if the local spelling does, which is what you have suggested. BTW does this mean you would support a move of Lech Wałęsa to Lech Walesa and Gdańsk to Gdansk? --Philip Baird Shearer (talk) 21:44, 8 June 2008 (UTC)Reply
I've dropped NG because strictly speaking it's not a reference work, and neither is the Economist. The fact that NG is "stringent" about spelling despite being "just a magazine" and not a reference work only strengthens my argument. As for Britannica, I see now that what they do is slightly odd: Tudjman and Walesa, yet Priština (exactly opposite to what Wikipedia does at moment). Different criteria for place names and personal names? (I wouldn't support that, whatever the criteria might be.) Still: if they sourced their spelling, it would certainly be Pristina then. GregorB (talk) 22:51, 8 June 2008 (UTC)Reply
The Britannica Book of the Year uses Djokovic. Septentrionalis PMAnderson 22:57, 8 June 2008 (UTC)Reply
Obviously, I have misconstrued Britannica's style guide because it appears to be inconsistent, as per my comment above. E.g. Britannica uses Pavel Josef Šafarík. It appears that Czech and Slovakian are on the with-diacritics list in their style guide, but Croatian and Serbian aren't. (Same with NG, incidentally.) This is again odd, as it would imply that "Š" can appear through Czech, but cannot appear through Croatian, although it is the same character pronounced in the same way. GregorB (talk) 23:23, 8 June 2008 (UTC)Reply

One argument put forward for using accent marks is that it aids pronunciation. I think this is not as strong an argument as some proponents of accent marks like to suggest. Most accent marks are of little use to the average English reader, because even if someone knows how to read one set of accent marks (because they learnt them when they learnt that language at school), as most native English speaking people are unlikely to have learnt more than one foreign language at school, they are unlikely to be able to read other accent marks. As pronunciation a common problem in English -- for example how many English speakers know how to pronounce Mousehole or Southwark? -- we solve that problem by including IPA in articles where a pronunciation guide is useful (/ˈmaʊzəl/, and /ˈsʌðək/, locally also /sʌvək/) which seems a better way to go and less of a problem than lots of accent marks that are meaningless to most native English language speakers and are not all pronounced consistently between languages. I notice that this approach is used in the Lech Wałęsa article despite the accent marks in the article name. --Philip Baird Shearer (talk) 21:44, 8 June 2008 (UTC)Reply

As a proponent of diacritics (more general then accent marks, I believe), I'd agree fully that "aiding pronunciation" is a weak argument. GregorB (talk) 22:51, 8 June 2008 (UTC)Reply
I would guess that at least as many of our readers understand European diacritics as understand IPA. Also the IPA representation generally appears only in the home article - you don't get IPA representations of a name in every article in which it appears. And the fact that English pronunciation is itself frequently difficult is no reason to make pronunciation of foreign names harder than it needs to be.--Kotniski (talk) 08:29, 9 June 2008 (UTC)Reply
Understand the diacritics of any one European language than understand IPA? Possibly, although it will depend on the language. Understand all of them, as is being called for? Most unlikely. IPA has its weaknesses too, but the best solution is to include the unEnglish form and IPA once, in the lead. Septentrionalis PMAnderson 15:49, 10 June 2008 (UTC)Reply
In Hungarian diacritics help pronunciation. For example Zsuzsa Körmöczy becomes Zsuzsa Kormoczy when Anglicised. Squash Racket (talk) 05:59, 12 June 2008 (UTC)Reply

I also don't believe that inconsistencies in e.g. Britannica's style are any excuse for our not trying to be consistent ourselves. The point is that both styles are acceptable in good English (as is shown by the many good sources which use one or the other, or a mix of both), it can be clearly seen that the with-diacritic form is more useful to many readers and no less useful to the rest (at least, I haven't noticed any attempt to refute that claim yet - apologies if I've missed something), so we will be improving the quality of our encyclopedia by adopting the with-diacritic style as our standard. In fact perhaps just maintaining the quality of the encyclopedia and avoiding long arguments, since in my experience the with-diacritic style is indeed the one which we currently prefer, and attempts to remove diacritics (like the tennis example) tend to fail. --Kotniski (talk) 08:39, 9 June 2008 (UTC)Reply

Also, using diacritics (for a person's name, at least) is correct. We are an encyclopaedia (better than Britannica et al.), so just because one convention is more common in reliable sources, it doesn't mean we have to blindly use it. Especially considering the sources' reliability has nothing to do with diacritic use - they are reliable for their information. As I stated above, a name without diacritics is not an English language equivalent. Geoffrey Keating is the English equivalent of Seathrún Céitinn, whereas Seathrun Ceitinn is not. There is no German or Slovak equivalent, so the interwikis use his Irish name. (Portuguese interwiki seems to be an anomaly). BalkanFever 09:13, 9 June 2008 (UTC)Reply
Here we come to the fundamental falsehood. Using diacritics is correct if and only if the diacritics are used in English. In many cases, this would impose diacritics where nobody uses them, including the person concerned and his wife. Septentrionalis PMAnderson 15:49, 10 June 2008 (UTC)Reply
Another point - while using diacritics is not necessarily helpful, what does omitting them do? Using diacritics can be helpful to some people, but omitting diacritics isn't helpful to anyone. There is absolutely no benefit in not using diacritics. It causes more confusion than having them. For example: Đoković shows how to pronounce the name if you know the orthography: /dʑokovi/. Djokovic makes people who don't know the orthography think it's /dʒokovik/, and it makes those who do know the orthography think it's /dʑokovits/. (Don't worry about /dʑ/ and /dʒ/ or đ and dj, focus on c and ć) BalkanFever 09:13, 9 June 2008 (UTC)Reply
That's nice, but what would people think if we write Djokovich? How could they possibly confuse that? /dʑokovi/ seems like the only way to read it. As for the blindly using what is used in reliable sources - let me reminded you're the one that renames articles basing it on what reliable sources (youtube etc) use. Some time ago the article Samuil of Bulgaria was renamed to Samuel of Bulgaria despite the fact that the vote was against such a move and most sources call him Samuil. So, I don't think Wikipedia blindly follows sources, but what probably is common sense. I mean, the wiki in English is the wiki written in English'. We have to make sure English speakers know what they're reading. So a compromise on the side of the slim minority that knows what Đ or ć means (even though it's different in some languages and you'd have to see where the person comes from before trying to read his name) would be pretty unfair to the majority that would prefer a more readable (whatever that means) version. --Laveol T 11:47, 9 June 2008 (UTC)Reply
Tell me, should I ignore the fact that you seem to be blindly reverting most of my contributions, and now have "coincidentally" disagreed with me in a page you've never seen before? BalkanFever 11:55, 9 June 2008 (UTC)Reply
Nope, actually I'm following the page from some time back ;) As you see I didn't say anything to your previous comment as I did not disagree with it. Ok, we'll try to keep things not that personal here (as I don't know who went through all Salvic mythology articles and reverted me - see, I didn't even say anything about the towns in Greee:))--Laveol T 12:20, 9 June 2008 (UTC)Reply
Lol, actually I went through my watchlist backlog (I had one mythology article there) and then I saw that the name sections were screwed up in general, not just the ones you added to :). But yes, you did kick it all off. BalkanFever 12:38, 9 June 2008 (UTC)Reply
I'll answer you anyway. Neither Samuil nor Samuel have diacritics, so it's not relevant. And Samuel is the English language equivalent of Samuil anyway (right now in my spellchecker Samuil is being underlined as incorrect) so it's following WP:UE, and it's perfectly fine. Omitting diacritics from a name, however, is not an English equivalent. It doesn't help English speakers at all - if you think about it, it can only misinform. "More readable"? If you don't know what it means, how will anyone else? And I'm dying to know, which articles did I rename according to youtube? BalkanFever 12:07, 9 June 2008 (UTC)Reply
The Samuil part is only an example that Wikipedia does not blindly follow anything - as it's perfectly clear from the context that you chose to ignore. My spellchecker underlines the word Samuil, too, but it underlines the word spellchecker as well and words like neighbour because of the extra u. More readable had the stuff in brackets cause some names simply don't have a more readable version. How would you read a Hungarian name for example? My point is that diacritics can always be represented with letters that everyone understands and is able to read. You didn't say anything about the Djokovich case - don't you think a native speaker of English would find it more readable than Đoković? How's he supposed to know what these strange letters mean? --Laveol T 12:20, 9 June 2008 (UTC)Reply
Simply because we don't use phonetic English spelling (IPA is much better for that). The argument for omitting diacritics has always been that the sources use it - and the sources don't use Djokovich. But my point is that it doesn't matter what a source chooses for the name of a person when it comes to use of diacritics: it doesn't change their name. A native speaker of English (like me, perhaps?) might not understand what the diacritics in Đoković mean, but how is he meant to know what the letters in Djokovic (specifically the c) stand for? Is it a /k/ like in English? Is it a /ts/ like in Serbian? Or is it č without the háček? Is it a ç without the cedilla? Actually, it's a ć without the acute accent. The diacritics can be represented by the basic letters, yes, but they shouldn't. Especially since the purpose of the diacritic is to differentiate from the basic letter. BalkanFever 12:38, 9 June 2008 (UTC)Reply
English orthography only recognizes the basic letter. Insisiting on the inclusion of Ð or ç is like insisting on the inclusion of Ж or Σ or の. None of these characters exist in the standard English alphabet. Somedumbyankee (talk) 12:56, 9 June 2008 (UTC)Reply
I hear we use the Extended Latin alphabet here, which is why a while ago the software was changed in order to include diacritics in the titles (previously it couldn't). BalkanFever 13:00, 9 June 2008 (UTC)Reply
It's supported because the scope of the project now includes topics that don't have common English names and using the diacritics makes the most sense. You can use cyrillic and greek and kana, but they clearly aren't English usage (try searching for π or BORДT), and the guideline says to use English (generally if there isn't any English usage, it's not notable, but there are exceptions). Somedumbyankee (talk) 13:13, 9 June 2008 (UTC)Reply
Kotniski you wrote. "I also don't believe that inconsistencies in e.g. Britannica's style are any excuse for our not trying to be consistent ourselves." We have a simple consistent rule. Do what the reliable sources do. This is in line with all the policies and most guidelines (KISS).
But reliable sources do different things. Indeed it is far from clear to what extent any particular source is reliable. If your "simple consistent rule" actually worked, we wouldn't have these endless discussions. This isn't an issue that needs to be complicated; if we work on it we can surely reach consensus for a sensible and unambiguous guideline which would be easy to apply in 99% of cases.--Kotniski (talk) 13:56, 10 June 2008 (UTC)Reply
It does work. Disruptive nationalists ignore it. Septentrionalis PMAnderson 15:49, 10 June 2008 (UTC)Reply
BalkanFever you wrote "Also, using diacritics (for a person's name, at least) is correct." Where did you get that idea from? AFAICT there is no such thing as correct usage in English only usage (As is shown by the compilation of the Oxford English Dictionary). If we stick to using reliable English sources for the spelling of names, (and presumably in most cases the sources used in an article are reliable sources), then we keep to Wikipedia policies and guidelines. If reliable English sources use accent marks so should we and if they do not we should not. For example the article Zürich is ünder a "ü" even though the English pronunciation of Zurich is not pronounced that way. The Zürich page has been there for a number of years, after starting out as Zurich, and it ought to be moved back as in English it is commonly not spelt with a "u" and not a "ü". The use of accent marks should not be a consideration for pronunciation (it it were then we could go with the Economist guidelines [2]), what matters is that we follow naming convention and other policies. --Philip Baird Shearer (talk) 13:40, 9 June 2008 (UTC)Reply
I don't doubt that an average native speaker would find "Djokovic" more readable than "Đoković"; an average native speaker would also find Sports Illustrated more readable than Wikipedia. There isn't a single standard on matters of English style and usage (with respect to diacritics and otherwise): there is a continuum (and a trade-off) between maximum readability/familiarity/convenience, and maximum correctness. Everything between the extremes in that continuum is at least permissible. (And the argument "But this is not English!" does not hold.) Still, it is important to note that, as the standards go up (hopefully, in this example case, from popular weekly magazines to Wikipedia and similar reference works), the balance invariably shifts towards correctness - the question is only how far. For readability, familiarity and convenience, we might as well consult Sports Illustrated, but for correctness... And I don't think I'm being elitist here: this is a some kind of encyclopedia after all. GregorB (talk) 13:19, 9 June 2008 (UTC)Reply
Therefore native spelling and (a possibly a pronounciation guide) is required to be included in the articles lead.--HJensen, talk 18:31, 9 June 2008 (UTC)Reply
GregorB "Wikipedia is not a crystal ball. It is not our business to predict what term will be in use; but to observe what is and has been in use, and will therefore be familiar to our readers. If Torino ousts Turin, we should follow; but we should not leap to any conclusion until it does.(WP:UE and based on the WP:NOT policy). Use the common name (WP:NC ) and keep it simple--Philip Baird Shearer (talk) 13:55, 9 June 2008 (UTC)Reply
I'm not saying that Wikipedia should "predict" anything, this is a distortion of my argument. That Turin should take precedence over Torino (or, say, Joan of Arc over Jeanne d'Arc) is also not disputed. Anglicized names such as these should be somehow sourced; obviously they didn't appear from thin air. My contention is this: "simply stripping diacritics" ≠ "Anglicization", ergo sourcing should not apply, etc. - my previous comment. GregorB (talk) 16:06, 9 June 2008 (UTC)Reply
But quite frequently stripping diacritics is anglicization. There are three major entries under Dvorak, which are all originally the same Czech name: one, the composer Antonin, has come to be spelled with diacritics within the last fifty years (as has Šafarík); the others, the actress Ann Dvorak and the inventor of the Dvorak keyboard, were Americans and did not use them; nor are they now so spelt. The diacritics went the same way as the feminine Dvorakova. Septentrionalis PMAnderson 20:07, 9 June 2008 (UTC)Reply
They may have legally changed their names, which is reason enough. There is a tennis example of this: Monica Seles. No contest there. GregorB (talk) 20:20, 9 June 2008 (UTC)Reply
Which would require us to do original research to see if they have. Much simpler to observe that the sources for Ann Dvorak all spell her without diacritics. Septentrionalis PMAnderson 15:49, 10 June 2008 (UTC)Reply
To PMAnderson, you mentioned one source using "Tudjman". The fact is that the "Dj" combination may still be used as an alternative to "đ" even when the rest of the letters are marked with diacritics. To everyone else, diacritcs were originally dropped either because the original printing machines were not designed to recreate them, or because the original editor was either too lazy, sloppy or just ignorant to take any notice. The fact is plain and simple: diacritics are additions - not letter replacements - they complement the grapheme, and as such, they cause no difficulty when reading. Ţō ṭáķè àñ éχãṃρłẽ, ωĥö ṝẽâłłγ ṣṭřúģģļèš ťó ŕéáđ ţħïš ??? Every character in that last sentence is alien to its plain counterpart among the 26 basic letters of English. The human brain copes with diacritics by ignoring them when it is unsure how the letter is supposed to be pronounced in the source language. As some have already pointed out, the names printed without diacritics are not transcribed into English because if they were, he who transcribed would have to do an awful lot more to match the shape with the expected pronunciation. For example, the letter c appears three times in Croatian/Bosnian etc., twice containing a diacritic and the other time without. When it is without the diacritic, it represents the sound of the "zz" in pizza, and the closest you can get to it in English is to use "ts." The other two characters (č, ć) are post-alveolars so therefore, a plain English c can never render the sounds of their Croatian counterparts. I won't go any deeper into this, but I will say one thing. Here on the free encyclopaedia, we can all write names as and how we choose. If someone should come along and ammend a name by adding a diacritic, or moving a page to the relevant name involving the diacritic, it is primitive to revert it: it brings us backwards when our purpose is to be knowledgeable, and somewhat advanced. I accept that no tennis lover can be familar with every language of the world. So if he/she wishes to use "Ivo Karlovic", then that is fine, nobody need take exception. If then one reader with a knowledge of the South Slavic written languages reads it and changes it to "Karlović", let us be grateful that he/she is aiming to improve the article quality by adding accuracy. There is yet one thing still unmentioned regarding foreign origin names and I now wish to raise it: we've discussed diacritics, and many feel that they are un-English. Then how does one react when they learn that certain features of people's names are infact, digraphs? Two letters side by side, devised to represent one single sound. "Dj" is an example of a digraph, so are Croatian/Serbian/Macedonian Lj, and Nj, just as Spanish LL and Polish Cz. Hungarian even has trigraphs, as in Dzs. If diacritics are alien, then how bad are multigraphs? Just because the average person is not aware of this does not change things, because he probably won't know how to pronounce it in the first place. Take a common Hungarian surname, such as "Kovács", take off the diacritic, you get "Kovacs". I challenge anybody with no knowledge of Hungarian to pronounce the consonants of that word as they would be in the source language. If the clever commentators (as is so often the case) try to make spectacles of themselves by showing off and giving a tennis player his/her "native sounding" name, then people will be thrown by the over-all spelling (why does Kovacs have a "C+S" if it is pronounced such and such). I ask that opponents of diacritics consider these things. Evlekis (talk) 13:28, 9 June 2008 (UTC)Reply
Evlekis, why not simply stick to using the WP:NC policy and use the name as it is spelt in reliable sources? Then for most names we do not have to consern ourselves with whether to use accent marks or not and the policy is neither pro-diacritics or anti-diacritics instead "Wikipedia does not decide what characters are to be used in the name of an article's subject; English usage does. Wikipedia has no rule that titles must be written in certain characters, or that certain characters may not be used. Follow the general usage in English verifiable reliable sources in each case, whatever characters may or may not be used in them."(WP:UE) --Philip Baird Shearer (talk) 13:46, 9 June 2008 (UTC)Reply
In nearly all cases, though, people can find reliable sources which use diacritics and other reliable sources that don't. Then there are endless disputes about which sources are reliable, whether their decision to (not) use diacritics was made for reasons which are relevant to us, etc. etc. Why can't we just agree to settle the issue once and for all? Either style is perfectly good English; there are good practical reasons why the encyclopedia will be that much better with the proposed style (which in practice is usually followed already); so let's just adopt that style as we have adopted many other conventions for the good of the project. (But for my view on the Dj thing, see below.)--Kotniski (talk) 16:39, 9 June 2008 (UTC)Reply
Some guy wants to look up this awesome tennis player he saw on TV and he goes to wikipedia, but he can't find it because no one has put up the redirect yet. The overall sense of WP:Naming Conventions is that recognizability is more important than absolute accuracy. With redirects in place this may be a strange choice since people looking for Đoković under "D" will still find it (not true for a paper encyclopedia). Dead tree versions of this project have been considered, though, and where would this article go in an English paper version? I just don't see how it matters a whole lot either way for the online version, so I'm preferring to stand by what has already been decided. The alternative is to restart every single edit war that has happened over the previous incarnation of the guideline. Somedumbyankee (talk) 13:52, 9 June 2008 (UTC)Reply
(ec) In my view Đ shouldn't be treated as just another diacritic, precisely because it is transcribed Dj instead of D, and thus hinders recognition of familiar names (and I presume that Dj is unambiguous for those who know the relevant languages anyway - correct me if I'm wrong). But this rather tricky point (trickiest when there are other diacritics like ć in the same word) shouldn't distract attention from the main issue - in the vast majority of cases, adding diacritics makes a familiar word no less recognisable even for people who are used to seeing it without them.--Kotniski (talk) 16:39, 9 June 2008 (UTC)Reply
Searching Wikipedia through Google always works; Google apparently maps the diacritics to English letters and vice versa. Problem (or non-problem) of recognizability has been well illustrated by Evlekis in his comment above. Đoković will appear where DEFAULTSORT says: the practice thus far was to omit diacritics from sort keys (and rightly so), so no problem there, you'll find him under "D". Stare decisis? I'd say dura lex, sed lex. :-) And I foresee a lot of edit warring about it in the future. GregorB (talk) 16:23, 9 June 2008 (UTC)Reply
The clearer we state it now, the easier it will be to deal with edit wars in the future (one side will have a clear guideline on their side).--Kotniski (talk) 16:39, 9 June 2008 (UTC)Reply
There is a clear guideline which conforms with WP:NPOV and WP:NOR. Since there isn't much of a consensus to change it other than this WP:POVFORK of a guideline page, it remains. The guideline as is doesn't favor or oppose use of diacritics, it just says "Wikipedia is a follower and not a leader in all things, including English language usage." Somedumbyankee (talk) 23:44, 9 June 2008 (UTC)Reply
Which is not a clear guideline at all. It remains deliberately unclear, I suspect, because of failure to obtain consensus in the past; and the fact of its lack of clarity leads to time being wasted on debates like the tennis player one. And while WP does not invent its own version of English, it does have the ability to make a reasoned choice between equally correct styles of English, and to strive for consistency within the project. (The POVFORK charge is nonsense, of course, since this is a proposal to change the wording of current guidelines - though not current practice in the vast majority of cases.)--Kotniski (talk) 07:18, 10 June 2008 (UTC)Reply
If this is a proposal to change current guidelines, it should be on the page with the current guidelines instead of having its own page. The current guideline is plenty clear in that it states exactly what kind of evidence should be used to make these decisions. If the evidence is unclear, it falls back on native usage and the "correct" spelling. The result is not what some people want, so the response ends up WP:IDHT. Somedumbyankee (talk) 13:04, 10 June 2008 (UTC)Reply
Well, the proposal is a bit vague as it stands, and anyway would affect several guideline pages, so I thought it would be better to start it off on a separate page and try and develop it into something concrete. Unfortunately people seem to be clinging rather unconvincingly to the status quo, claiming it is "clear" without addressing the fact that attempts to apply it often lead to massive and inconclusive discussions.--Kotniski (talk) 14:07, 10 June 2008 (UTC)Reply

Arbitrary break

edit

As I've stated in similar discussions in the past years, whenever the only purported difference between a "foreign" word and the "English" word lies in omitting diacritics, Wikipedia should not omit the diacritics,

  • as it does not really enhance readability to omit them,
  • as it is more correct to include diacritics,
  • as Wikipedia in my opinion should not strive to simplify its content, in general,
  • and so on and so forth.

I know that there's a faction of editors which agrees with me and an about equally large faction which vehemently disagrees, so I really don't see what we'll get out of this umpteenth repetition of this discussion...? —Nightstallion 22:39, 9 June 2008 (UTC)Reply

Because the anti-diacritic guys say "use English" when most of the time it's not actually English. The anti-diacritic guys say do what the reliable sources do: why are the sources reliable? They are judged on their reliability by the accuracy of their information, by their neutrality (in some cases), not for their diacritic use. I there is such a thing as an authority on diacritics, it's not Britannica. It's not ATP. It's not a peer-reviewed paper on molecular biology. They don't seem to have any arguments. I'm basically summing up things that have been said by a number of guys here: purposefully omitting diacritics is going backwards. Why should the encyclopaedia stop educating? Because some guy doesn't like seeing ŵøřđş łīķè ←those? How about we just ignore all rules on this one and simply adopt and implement the guideline? BalkanFever 09:27, 10 June 2008 (UTC)Reply
Well, it's pretty much implemented already (as I've already said, I believe it reflects widespread current practice). I don't think you can use WP:IAR to adopt new rules though (sounds a bit paradoxical, and anyway won't work).--Kotniski (talk) 14:07, 10 June 2008 (UTC)Reply

To Philip Baird Shearer in response to me yesterday: you mentioned "reliable sources." Can I point out that that this is a presentation debate rather than a content dispute where-by Source A is seen to be more reliable than Source B for such and such a reason whilst they give diametrically opposed accounts of a given scenario (such as a political incident). Perhaps a more appropriate term would be "reputable", you can argue that the Mid-Afternoon Echo (fictional) is more reputable than Socialist Millionaire Weekly (also fictional), but again, it is one's own subjective verdict. I think what you will find is that a source will either use diacritics, or it will not. You won't for instance find the forms Slobodan Živojinović and Novak Djokovic on the same line. Such a finding would certainly move you closer to establishing that the latter really is the conventional English form, given its appearance alongside another name which includes diacritics; though I am sure that if it is ommited in one place, then it will be everywhere. And if diacritics are not used for Croatian/Serbian names, then they won't be used for any language. But going back to "reputable", yes I accept that it is a subjective term. But to ask you your view PBS, do you not consider an article which does contain diacritics as having the qualities of what you would call reputable? Do you not feel that omitting them is the hallmark of a lazy, arrogant, ignorant, "couldn't care less" attitude? In any case, I still do not see a problem with every day editors reintroducing the diacritics, as they may do in good faith. Evlekis (talk) 09:52, 10 June 2008 (UTC)Reply

'Which they may do in good faith. Yes, they usually do, but only in the peculiar Wikipedian sense of good faith, roughly synonymous with "good intentions", which is compatible with ignorance, illiteracy, and collective self-pity. Septentrionalis PMAnderson 15:40, 10 June 2008 (UTC)Reply
Evlekis what is or is not a reliable source is described in WP:SOURCES, and to a large extent it depends on the subject of an article as to what is reliable. If one is looking at an historical figure then the reliable sources tend to be journals and books. But if one is looking at reliable sources for popular sports men and women it tends to be "magazines ... and books published by respected publishing houses; and mainstream newspapers". For example the article on the footballer Nikola Zigic should not have diacritics, becasue the the vast majority of reliable English language sources do not use them. However the determination of the name for the artcile on the war criminal Zoran Žigić if more difficult to determine because the reliable sources are split (and include the ICTY trial transcripts) although the majority seem to favour "Zoran Zigic". "do you not consider an article which does contain diacritics as having the qualities of what you would call reputable?" The judgement on reliability has little to do with whether a source does or does not use diacritics. For example a contributor to an internet forum may or may not use not use diacritics, and the use or none use will not make the forum any more of a reliable source. "Do you not feel that omitting them is the hallmark of a lazy, arrogant, ignorant, "couldn't care less" attitude?" No, for example I see nothing wrong with The Economist's approach [3], and I don't consider the the correspondents of the Economist to to be "lazy, arrogant, ignorant or to have a "couldn't care less" attitude". --Philip Baird Shearer (talk) 13:03, 11 June 2008 (UTC)Reply
Exactly, reliability has nothing to do with diacritic use, so it is not an argument that "most reliable sources don't use diacritics". BalkanFever 08:04, 12 June 2008 (UTC)Reply
BalkanFever I have difficulty understanding the point you are trying to make. When looking at reliable English language sources for different articles, in most case there will be a clear indication in the sources if accent marks are used or if they are not used, in which case Wikipedia policies and guidelins indicate that we should follow the lead given by reliable sources (WP:V, WP:NOR, WP:NC and WP:UE). In some exceptional cases the person or place may be notable but there will be no English language reliable sources. In which case use the local spelling if it is in a Latin script (WP:UE#No established usage). Finally there will be some cases where the name appears in several different spellings and or with or without accent marks and there is not clear common usage. In these cases then if there is no consensus on the correct spelling of the name to use, it may be necessary to use the WP:RM procedure to decide the issue.
BalkanFever Let me give you a non accent mark example which will help to clarify the issue for you. Should we name the article about the Prussian Gerneral Hans Joachim von Zieten or Hans Joachim von Ziethen or Johann Joachim von Ziethen? In such case the name used is dertemined by looking at reliable English language sources and determining what is the most common. If not how do you think we should determine the spelling of peoples names? Exactly the same procedure is used for the use of accent marks, why do you think that accent marks should be an exception to the rule? --Philip Baird Shearer (talk) 09:32, 12 June 2008 (UTC)Reply
If I can answer instead, accent marks should be treated differently because (a) the decision whether or not to use them in a particular source is likely to have been an editorial or stylistic one (possibly affected by technical restrictions which don't apply to us) rather than one of factual accuracy; (b) including accents makes a name no less recognisable to those who are used to seeing it without them, while dropping them reduces the encyclopedia's information content (see many arguments to that effect in this discussion and elsewhere).--Kotniski (talk) 09:42, 12 June 2008 (UTC)Reply
Kotniski, to address your points, the first on is a guess, and if were true then we would would we have to write général and hôtle as they too are borrowed words? Or do you have some other criteria than using the content of reliable English sources to decide this issue? The second one is misleading, as we would not need national varieties of English unless people found spelling mistakes and some grammatical constructions grating. I suspect for many English speaking readers seeing Funny Foreign Squiggles on names that do not usually have them is as annoying as the word color spelt colour. And I suspect that as you are in favour of diacritics that you find it grating when they are not present. The current guidelines, of following reliable sources, is consistent with Wikipedia three major content policies, and the naming conventions (also a policy). It is also a reasonable compromise between the two poles of all or nothing when it comes to accent marks. --Philip Baird Shearer (talk) 10:49, 12 June 2008 (UTC)Reply
I don't intend this proposal to apply to common English words like general and hotel, only to foreign names (people, places) - if that isn't clear from the wording as it stands then it certainly should be in any final version. Well, funny squiggles may annoy some and their absence may annoy others, true, but the difference between them and the "u" in colo(u)r is that they do actually add information, which is what we do. And having a situation where they are sometimes used and sometimes not, for reasons which will not be clear to the reader, is particularly likely to lead to misunderstandings. --Kotniski (talk) 11:00, 12 June 2008 (UTC)Reply
We are talking about names of people here, not borrowed words. BalkanFever 10:58, 12 June 2008 (UTC)Reply
Kotniski has pretty much summed it up. If you are going to base diacritics vs no diacritics in the title on the majority of reliable sources, then those sources used as evidence have to show that use diacritics at some point. Otherwise, it can only be assumed that it is a technical restriction or stylistic decision. I already knew how non-diacritic procedure works, and I support using English in a case such as the Prussian guy (everyone here does). But to repeat myself for the umpteenth time, dropping diacritics is not using English. BalkanFever 10:54, 12 June 2008 (UTC)Reply
BalkanFever you write "But to repeat myself for the umpteenth time, dropping diacritics is not using English." is this a personal opinion, or do you have an authoritative source that backs up the statement? If what you say is true, then why is it that many reliable English language sources drop accent marks on many names? And please do not put it down to laziness or ignorance as we have discussed reliable sources that drop accent marks in some cases, either as a known editorial policy (such as in The Economist), or like Britannica (which uses some editorial criteria that remains opaque to us), the result is that both publications use Lech Walesa and are you really suggesting that they are not using English when they do so? Current Wikipedia content policies on this issue follows a policy of "English usage", a policy that is exemplified by the compilation of the Oxford English Dictionary. --Philip Baird Shearer (talk) 13:32, 12 June 2008 (UTC)Reply
If they called him "Lewis Wales" or something that would be an English name. "Lech Walesa" is a Polish name without the diacritics, therefore an incorrectly spelt Polish name. He doesn't have an English name. Names of foreigners hardly translate. Sure, names of historical figures, but not names of contemporary sportspeople or politicians. BTW, can you tell me of a benefit of omitting the diacritics? Not "it follows the guidelines" but an actual benefit to the reader. BalkanFever 13:58, 12 June 2008 (UTC)Reply
The benefit of using the common English spelling for "Lech Walesa" is that "Generally, article naming should prefer what the greatest number of English speakers would most easily recognize, with a reasonable minimum of ambiguity, while at the same time making linking to those articles easy and second nature." (WP:NC), and as I mentioned above the current policy is also a good compromise. BalkanFever, Your edit to this proposed guideline is already moving in the direction of the current WP:UE guideline. What is the difference between "Where the person or place has a common English name:Bucharest over Bucureşti; Geoffrey Keating over Seathrún Céitinn" different from WP:UE "Use the most commonly used English version of the name of the subject as the title of the article, as you would find it in verifiable reliable sources (for example other encyclopedias and reference works)"? --Philip Baird Shearer (talk) 18:51, 12 June 2008 (UTC)Reply
This seems to be exactly the same question which I already answered above with (a)(b)(c) points. Basically, if you recognise Walesa you will also recognise Wałęsa, but if you recognise Bucharest you won't necessarily recognise Bucureşti.--Kotniski (talk) 21:08, 12 June 2008 (UTC)Reply
The wording of the amendment says "Where the person or place has a common English name" tack on "in verifiable reliable sources" and that is what WP:UE says. WP:KISS. --Philip Baird Shearer (talk) 22:51, 12 June 2008 (UTC)Reply

Oppose. I agree with the arguments put forth by others in opposition to changing the policy; so, I won't repeat those arguments now (please ... hold your applause). But I do have one thing to add. Maybe I missed the argument somewhere, but using diacritics while editing Wikipedia is a PITA (pain-in-the-ass) if the only way you can use them is to hunt-and-pick the character in the space below the editing screen. Many editors are not going to do that because: (a) they don't know that the purpose of the box below the editing screen is to make the insertion of "weird" characters easier; (b) the characters are in such a relatively small font and packed close together that picking the right one is problematic for those with eyesight problems or for just the plain lazy editor; and (c) for many English speakers, the only time they've ever seen diacritics is when something goes horribly wrong in their word processing software or they accidentally pick the wrong font, which means that they will assume the diacritics are erroneous and should be "corrected." So, a "use diacritics" policy throughout English Wikipedia would result in articles with diacritics in their titles becoming: (a) less likely to be edited, especially by the inexperienced; and (b) a hopelessly inconsistent hodgepodge of words-with-diacritics, words-without-diacritics, and word-with-the-wrong-diacritics, leading to confusion, frustration, and wasteful editing time. The fact that anonymous IP address editing is allowed and even encouraged on Wikipedia is clear evidence of our policy to encourage "anyone" to edit. Therefore, placing obstacles in the way of editing on English Wikipedia is not the way to go. 03:53, 14 June 2008 (UTC)Tennis expert (talk)

Umm, the diacritics debate if far more relevant to the article location (the title) than to text. Obviously there is no way to get any form of consistency (regarding any issue) across the board in 2 million articles; but generally if I see names spelt incorrectly I correct them. And nobody should revert me if I do. The titles should contain diacritics, and that is what is being argued for. What we don't want is mass moves (or mass requests) that throw the encyclopaedia backwards because people like you don't like them. There's a benefit in using diacritics, there's no benefit in omitting them. BalkanFever 05:12, 14 June 2008 (UTC)Reply
I don't think the "you don't like them" statement above is fair. It would be like if I said that there SHOULD be no diacritics on the English wiki just because YOU like them. That doesn't sound very good, does it? So please stick to arguments.--HJensen, talk 20:07, 14 June 2008 (UTC)Reply

Tudjman

edit

To Evlekis: every English source I have ever seen, and I followed the Balkan Wars throughout his presidency, uses Tudjman — except Wikipedia; I mentioned the Britannica because, and only because, it was the source under discussion. The suggestion that we should use a form used only by an extreme minority is contrary to the clear purposes of our naming conventions: to be intelligible to English speakers. Septentrionalis PMAnderson 20:00, 9 June 2008 (UTC)Reply

As a personal matter, even though I recognize Tuđman as the Croatian form, it is much harder to read and to recognize than the English version; this is why English sources don't use it. Septentrionalis PMAnderson 20:00, 9 June 2008 (UTC)Reply
I did say, dj causes no problems, it is acceptable in Croatian writing alongside the other characters with their diacritics. Evlekis (talk) 09:29, 10 June 2008 (UTC)Reply
Exactly my point - I'm from the Balkans, but if you show me Tuđman I'll just say "What the hell is that? Who's that guy and how am I supposed to red that"? But if you show me Tudjman I'll have no problem with it. --Laveol T 20:34, 9 June 2008 (UTC)Reply
Dj is actually used in Croatian and Serbian though, as the less correct form, if that makes sense. What I mean is in Serbian Djoković is possible, yet still correct to an extent. But going back to the English: after you're told Tuđman is Tudjman, (you will be in the article lead) you will have learned something, no? Or will you forget what the đ is each time? Using the diacritics is educational, to a degree. I'll stop using đ/dj examples now, and move on to the more clear cut. Are you going to say that Ivanišević confuses you but Ivanisevic doesn't? If you have a problem, you automatically ignore the diacritics and focus on the letters, as shown by Evlekis' sentence. Anyone can see that š is plain s with a caron (háček). If they don't know what the caron represents, they can find out, or they ignore it. But at least they know that there is a caron there. There is no benefit to removing them. BalkanFever 08:53, 10 June 2008 (UTC)Reply
I'm saying I have a problem with Ivanišević, but no such problems with Ivanishevich. It reads plain and simple. I told you the same about Djokovich, but you simply ignored it. And again - we have to make sure that the potential reader will actually be able to read the person's name. As I said there are tons of such diacritics that are used in different context in different languages. Am I supposed to be able to read in 30 languages only to understand how the hell I should read a name since it should be in English as this is (for the third time) the English-language Wikipedia? --Laveol T 20:05, 10 June 2008 (UTC)Reply
But this is where you yourself are applying Romanisation of Bulgarian to a non-Bulgarian name. Many people born in America swap the č for ch and š for sh, and at that point that is their name. Charles Buchinsky/Buchinski is English for Karolis Bučinskis, and of course everyone here supports that. Karolis Bucinskis simply isn't. Eveyone also supports his common English name Charles Bronson as the name of the title
We should mention Tuđman once, saying that it is the Croatian spelling, as a potentially useful fact; we may even point out that the Croatian alphabet used to spell the sound dj, before Tudjman's birth, but this is probably supererogation. We should not confuse our readers in the hope of educating them; we also betray our mission by suggesting to speakers of third languages that Tuđman is common English usage, or will be intelligible to English speakers. Septentrionalis PMAnderson 15:35, 10 June 2008 (UTC)Reply
As for Ivanišević, we should do what English does, whatever it has come to be. "Be not the first by which the new is tried, nor yet the last by which it is laid aside." Septentrionalis PMAnderson 15:35, 10 June 2008 (UTC)Reply
I understand the issue here, and I must concede this is a good example. It is a bit of a corner case, here "Đ" is problematic, while, e.g. "Š" might not be (as in spelling of Šafárik, which can be sourced, as already noted). But I'm against "hybrid" solutions ("Djoković"), and I'd rather go without diacritics altogether than have some per case hodge-podge solution, whereby Đoković-the-famous-tennis-player is "Djokovic", while Đoković-the-hypothetical-not-so-famous-singer is "Đoković". Let's return to the National Geographic Society's style guide for a second. Taken literally, and applied to personal names, it in fact prescribes Tudjman and Djokovic and Šafárik: Slovak language is on the keep-the-diacritics list, while Croatian and Serbian aren't. I'd prefer this (mutatis mutandis, of course) to the current interpretation of the policy. GregorB (talk) 19:41, 10 June 2008 (UTC)Reply
Hybrid solutions are (almost entirely) against present guidance. Very little English usage accepts only some diacritics and refuses others; there may be a German example. Septentrionalis PMAnderson 20:37, 10 June 2008 (UTC)Reply
Exactly. Still, I thought: "Hey, Đ obviously freaks people out, just like ß does!". More about Gauss follows in the pop quiz section... GregorB (talk) 21:06, 10 June 2008 (UTC)Reply


@Laveol: But this is where you yourself are applying what you are used to (Romanisation of Bulgarian) to a Slavic name. Actually, many people born in America swap the č for ch and š for sh, and at that point that is their name. Charles Buchinsky/Buchinski is English for Karolis Bučinskis, and of course everyone here supports that. Karolis Bucinskis simply isn't. Everyone also supports his common English name Charles Bronson as the location of the article. That is what the guideline is meant for. If they have a name in English that is common. "Ivanisevic" is not a name, it is a construct born out of laziness of the first guy to write it down and the limitations of the computer of the first guy to type it. "Ivanishevich" is definitely not a name. The argument about diacritics for pronunciation isn't the strongest anyway - IPA is for that. Bulgarian and Russian Romanisation systems adopted a system equating the Cyrillic letters with the Latin (specifically English) that represent the same (almost) sounds. Czech orthography, and later Gaj's Latin alphabet (Crotian, and Romanisation of Serbian) went the other way, using diacritics. Djokovich means Ђоковицх. Ivanishevich means / ivanishevitsh / (scroll over individually). So unless the variants you mention actually exist (with Bulgarian and Russian they exist inherently, but not for others), then no. Otherwise I will let you decide what the ř in Antonín Dvořák should be rendered as all on your own. BalkanFever 08:01, 12 June 2008 (UTC)Reply


Charles Bronson has changed his name in that form-himself !Legally. in his legal documents. And he himself used that name. he decided himself no to be Karolis Bučinskis. So that comparison with Tuđman makes no sense- Tuđman has never signed himself as Tudjman! So ,that name version has no legal validity --Anto (talk) 19:49, 12 June 2008 (UTC)Reply

Clarity

edit

I don't understand how the present guidance is unclear: use what your sources use, unless there is a clear demonstration that English usage as a whole differs. It may be useful to adopt that phrasing somewhere; but the intent is plan. Septentrionalis PMAnderson 20:37, 10 June 2008 (UTC)Reply

It's not unclear as much as it is obscure. I was completely unaware that the titles like "Goran Ivanišević" go against the guidelines until the latest tennis renaming affair - and I have 54,000 edits, so I guess I would have noticed it by now. GregorB (talk) 23:50, 10 June 2008 (UTC)Reply
It depends. If the sources say Ivanisevic and this appears to be English usage, then WP:UE supports that; on the other hand, if they say Ivanišević, WP:UE supports that. It's a question of fact, like his height and his scoring statistics. Septentrionalis PMAnderson 00:58, 11 June 2008 (UTC)Reply
Yes, but off the top of my head I can't remember a single Croatian name with diacritics that would be supported by current WP:UE - yet they were all there with diacritics until recently. (Same with other languages/alphabets.) The guideline was apparently there all the time, but it wasn't enforced at all. GregorB (talk) 11:51, 11 June 2008 (UTC)Reply
Are there any that made it through a formal (i.e. GA or FA) review process with article names with diacritics contrary to sources? Articles on wikipedia not following all applicable guidelines isn't surprising. Somedumbyankee (talk) 12:55, 11 June 2008 (UTC)Reply
Almost certainly. Some articles get through FA and GA with no review of content at all. Septentrionalis PMAnderson 17:19, 11 June 2008 (UTC)Reply
Diacritics are an issue of presentation, not content. I don't know, FA process is rather pedantic, and it's odd how noone (to my knowledge) raised the issue. If the guidline itself is clear (and I'm not saying it isn't), what caused this situation then? GregorB (talk) 18:07, 11 June 2008 (UTC)Reply
There have always been the two schools: "All diacritics must be used because they're correct" and "All diacritics must go because none of them are English"; WP:UE contains an effort to decide between these on a case-by-case basis. Both schools ignore English usage in the effort to get their way; but the guideline usually prevails on the balance of power, and of arguments.
The tennis articles were proposed for moving by an editor who is personally of the diacritics-are-scum school, which is not yet represented in this discussion; but he's right insofar as he was moved to act by a bunch of names that he, an experienced tennis fan, had never seen before. He went too far in including Björn Borg, I think; but that is the sort of thing evidence should decide. (Not Swedish usage; English is not Croatian, and always has adapted proper names when it feels like it.) Septentrionalis PMAnderson 18:58, 11 June 2008 (UTC)Reply

You are wrong on both counts. I've never said that "diacritics-are-scum." And I've never said that I've never seen tennis players names with diacritics. Thanks for completely misrepresenting my opinions without any basis in fact. Do you do this often? Tennis expert (talk) 04:09, 12 June 2008 (UTC)Reply

All I know is what you have said. If you have said anything inconsistent with my interpretation, I have not seen it. Feel free to explain at length. Septentrionalis PMAnderson 18:28, 12 June 2008 (UTC)Reply
How disingenuous of you. Either cite where I said the things you are accusing me of or strike them. Your imagination is running wild. Tennis expert (talk) 02:04, 13 June 2008 (UTC)Reply

Pop quiz

edit

So which of the following would you spell with a diacritic:

Replies? Septentrionalis PMAnderson 15:35, 10 June 2008 (UTC)Reply
Are 1) and 2) trick questions? Because there are no diacritics in the titles of these two articles at German Wikipedia, and I have no reason to doubt Germans got it right.
(1) is; (2) is not: there is (minority) German usage for Göthe. But we have already had, even without this page, a good soul trying to clean up Emmy Noether on the assumption that the diacritic must be right. Talk:Emmy Noether contains much more. Septentrionalis PMAnderson 21:34, 10 June 2008 (UTC)Reply
Gauss/Gauß is very tricky. I can observe three things: 1) speaking from experience, Croatian usage respects original spelling from all Latin-based alphabets (within technical limitations, and not too stringently applied in less formal writing), so umlauts are fine but "ß" is next to impossible to encounter in Croatian, which is rather illustrative, 2) interwiki links at Carl Friedrich Gauss are interesting - I don't see a particular logic there, and 3) even in German mainstream usage "ss" is acceptable, and there is a recent general trend of moving from "ß" to "ss" (I could be wrong here - correct me). I'd go with "Gauss". Rationale to follow, first I'd like to see other replies. GregorB (talk) 21:22, 10 June 2008 (UTC)Reply
You are incorrect only in that the trend is not recent. English mathematical usage is invariable: Gauss, and we should go with it, rather than import other conventions. Septentrionalis PMAnderson 21:34, 10 June 2008 (UTC)Reply

Gauß is indeed tricky insofar as he very often used his latinised name, which was also Gauss. —Nightstallion 08:33, 11 June 2008 (UTC)Reply

It's not tricky at all... Looking at the sources in the article, it's very clear that articles in English would never use the β. Articles auf Deutsch might have a problem, but here it is unambiguous. Somedumbyankee (talk) 13:40, 11 June 2008 (UTC)Reply
Of course it's not tricky with a guideline like this, a guideline that says "do what others do". Imagine a following Wikipedia guideline. How does one write dates and date ranges? "Do what others are doing." Should italics be used for book titles? "Do what others are doing." You'll notice nothing is tricky, everything is clear, all answers are there. This is because this is not a guideline at all. Only when you want to have an actual guideline it becomes tricky, because you have to decide what you want and devise the rules that will achieve that goal, not just copy and paste the end result. See also the example on Đoković-the-tennis-player and Đoković-the-singer. GregorB (talk) 18:24, 11 June 2008 (UTC)Reply
Very well; let's consider that example: does English italicize book titles? Yes; look at the bibliography of any half-dozen well-printed English books: nine times out of ten all six of them will do so, and the remaining time there will be one exception. This is not hard; but it does require the ability and willingness to read English, and the patience to do what English does. We don't need to redesign the English language, and we don't need this kind of guideline. Septentrionalis PMAnderson 19:07, 11 June 2008 (UTC)Reply
I don't understand your comment. Wikipedia guideline on the question "should italics be used for book titles?" isn't "do what others are doing", it is yes. Is this a kind of guideline we "don't need"? GregorB (talk) 20:19, 11 June 2008 (UTC)Reply
Going back a bit, "Of course it's not tricky with a guideline like this, a guideline that says "do what others do"." That is exactly the point. It usually provides a straight answer. It defers the decision to a reliable external source (same thing we do with all other facts). "Use italics for book titles" more or less defers to every English style guide I've ever seen (except when writing longhand, where underlining is used instead). If English grammar changed on how books were cited, we would probably change too. Somedumbyankee (talk) 13:06, 12 June 2008 (UTC)Reply

Added exception

edit

I have amended the proposal with an exception to address the Dj/ss issue (basically I'm saying we should use dj and ss rather than Đ and ß). The wording may obviously still need work. As to the general principle of the whole proposal, I still haven't seen any convincing reason given against it. The basic argument against it seems to be that we should "follow English usage"; but English usage is divided on this matter, and I see no reason not to adopt a consistent style which will help users of the encyclopedia (this has been argued many times without apparent refutation) and is pretty much in line with well-established practice on WP.--Kotniski (talk) 08:22, 12 June 2008 (UTC)Reply

Is ß actually a diacritic? I think I read somewhere (probably WP:UE) that it's a foreign letter rather than a modification. BalkanFever 08:44, 12 June 2008 (UTC)Reply
I think you're right, but it should still probably be mentioned in any guideline like the one being proposed, just to make things clear to people.--Kotniski (talk) 08:59, 12 June 2008 (UTC)Reply

I vehemently disagree with that. If the common English usage is to change ß to ss, we should do it -- however, for names which are not commonly seen in English (a street name in a German-speaking city which is not regularly the subject of English-speaking people's attention, for instance), we should keep it to be "-straße", not "-strasse". It's a question of endonym vs exonym, and simply changing ß to ss does not spontaneously create an English exonym if there wasn't one before. —Nightstallion 10:29, 12 June 2008 (UTC)Reply

But it could also be written with an -ss- in German, right? Which means it wouldn't be a totally original invention.--Kotniski (talk) 10:44, 12 June 2008 (UTC)Reply
It could be, but only in the case of technical restrictions. Which don't apply to Wikipedia. —Nightstallion 12:05, 12 June 2008 (UTC)Reply
If there is no English coverage of a German street, should we have an article on it? If there is, why should we not follow the usage of our sources? Septentrionalis PMAnderson 18:26, 12 June 2008 (UTC)Reply
Per my recent comments, I'd generally support a consistent solution over individually sourced deviations from the original spelling. So, for ß: "it is always 'ss', unless there is a particular reason to keep it". I could support the same for Đ (always "dj"), but in this case, as previously noted, other diacritics should be stripped too (e.g. "Djoković" is out of the question - again, without a good reason to the contrary, and there isn't one). "ß" is otherwise a corner case for reasons already outlined in the pop quiz section, so even the per case decisions (as suggested by Nightstallion) could be OK. As I said, ß is tricky... GregorB (talk) 13:19, 12 June 2008 (UTC)Reply
I oppose, in general, systematic solutions. The relatively narrow deviations from usage involved in WP:NCNT, which are in general justifiable as disambiguations, are hard enough to defend.
And how would this differ from what we now do? What "particular reason" would there be to keep ß except that English sources do? (In practice, however, we will always see the argument that we have a particular reason: it's right in German.) Septentrionalis PMAnderson 18:26, 12 June 2008 (UTC)Reply
I don't know of any "particular reason" why ß should be kept; I can't think of a single such case, but it doesn't mean it's impossible. As for systematic solutions: for the third time I must bring up Đoković-the-tennis player and the hypothetical Đoković-the-not-so-famous-singer (no English sources on him); they should either both be listed as "Đoković" or both be listed as "Djokovic" (of course, the guideline has to choose the former or the latter; this is another issue). This produces consistent results. If the "obscure" guy isn't mentioned in English sources, you are not going against usage by spelling his name in either way, so there's no harm done. GregorB (talk) 21:04, 12 June 2008 (UTC)Reply
If Đoković-the-tennis-player becomes well known and reliable English language sources spell her name Dokovic-the-tennis-player then following the current guidelines we should use Dokovic-the-tennis-player,(WP:NC and WP:UE) while at the same time if another person Đoković-the-not-so-famous-singer (no English sources on him) then his name should remain Đoković-the-not-so-famous-singer (WP:UE#No established usage). In reality it is much more likely to be a geographic location that is notable but with few or no reliable English sources, hence we have Guantanamo Bay Naval Base and Guantánamo Province. -- WP:KISS -- Philip Baird Shearer (talk) 23:18, 12 June 2008 (UTC)Reply

Attempting to summarize the arguments

edit

This argument is rapidly turning into soup, so I'm attempting to understand the whole of it. I'm leaving out arguments that make little sense to me(i.e. National Geographic is an unreliable source). I'd also like to throw out that the current guideline is very rarely followed (i.e. Slobodan Milošević).

No one has suggested that National Geographic is an unreliable source. If it is used as a reference in an article there is no reason why it along with other reliable sources can not be consulted about the spelling of a name. --Philip Baird Shearer (talk) 14:19, 13 June 2008 (UTC)Reply
Maybe it was over at the tennis discussion, but there were some comments that essentially said "if they don't use diacritics they aren't a reliable source on spelling." Somedumbyankee (talk) 14:35, 13 June 2008 (UTC)Reply
It is alleged that National Geographic always use diacritics if the native spelling does. As for "if they don't use diacritics they aren't a reliable source on spelling." one would have to explain why the Britannica Encyclopaedia that does not use diacritics on their articles on Lech Walesa and Gdansk is only considered unreliable for certain names and explain away style guides like the Economist's style guide] that deliberately drop diacritics on some words. The assumption that it is just laziness does not stand up to scrutiny. --Philip Baird Shearer (talk) 15:30, 13 June 2008 (UTC)Reply
Ah, OK. I think I kinda skimmed over the argument because it made no sense to me. At any rate, I omitted it from the summary because I couldn't hope to represent it properly. Somedumbyankee (talk) 15:35, 13 June 2008 (UTC)Reply
Poor development choices going back many years, such as using legacy ISO-8859-1, have limited their character inventory. This has had a huge effect on their on their general editorial guidelines. In general I've found Encarta, particularly in the last few years, to be much more reliable and consistent with academic resources: Lech Wałęsa, Gdańsk. Of course Wikipedia is no longer limited to minimal legacy character sets. Of course it is more difficult to type in diacritics. And that is why it will always be less common. However, an encyclopedia should prioritize knowledge and accuracy first and foremost. Unfortunately that does not happen so often on Wikipedia. Redirects can take care of the rest. 123.224.232.195 (talk) 15:52, 13 June 2008 (UTC)Reply
User:123.224.232.195 you wrote "I've found Encarta, particularly in the last few years, to be much more reliable and consistent with academic resources" What do you base your analysis of usage by academic resources, because Google scholar returns:
  • about 392 for -Lech-Walesa Lech-Wałęsa
  • about 3,710 for Lech-Walesa -Lech-Wałęsa
--Philip Baird Shearer (talk) 08:08, 14 June 2008 (UTC)Reply

Arguments for this proposed:

  • Using diacritics reflects the native spelling and aids pronunciation for those who know the underlying language.
  • Including diacritics does not change whether the name can be understood by those who are not familiar with the underlying language.
Demonstrably false, as the discussions on various move proposals will show. Septentrionalis PMAnderson 22:05, 13 June 2008 (UTC)Reply
Only false for cases like Đ and ß, as far as I can tell, which are dealt with specifically in the current version of the proposal.--Kotniski (talk) 11:01, 14 June 2008 (UTC)Reply
Also made about Lüneberg, for example; I'm not sure I agree, but then I speak German. We should be edited to be accessible to those who don't. In a different sense, it has been made about the Polish slashed ł; the slash can be unrecognizable in some displays. Septentrionalis PMAnderson 21:58, 14 June 2008 (UTC)Reply
  • The current guidelines are unclear and a clear guidance will reduce disagreements.
The current guidelines are not unclear. --Philip Baird Shearer (talk) 14:19, 13 June 2008 (UTC)Reply
It is the way in which they are to be applied which is unclear.--Kotniski (talk) 11:01, 14 June 2008 (UTC)Reply
No one despite repeated requests has come up with a source that states that "Including the diacritics is the correct spelling". However the Oxford English Dictionary includes entries on usage and that what the Wikipedia policies and guidelines also use the same criteria (common usage in reliable sources). --Philip Baird Shearer (talk)
Agree that "is the correct spelling" is wrong (should be "is one possible correct spelling"). "Making Wikipedia better" is for me the overriding argument, and has not yet been properly addressed as far as I've seen.--Kotniski (talk) 11:01, 14 June 2008 (UTC)Reply
It would make much of Wikipedia worse. It would inform readers, falsely, that English normally uses a slashed l in Stanislaw Ulam. I'm not sure that literate mathematicians and physicists ever do; I am sure his autobiography does not. Septentrionalis PMAnderson 23:00, 14 June 2008 (UTC)Reply
Cases like Ulam (a naturalized American) would obviously need to be excepted from the proposed guideline. I agree though that sometimes people might drawn wrong conclusions of the type you allude to (WP uses this therefore English most commonly uses this). I don't think this is such a great danger, however, because: a) people know that both forms are always correct and the approximate distribution of such forms in English as a general rule; b) people know that encyclopedias are more precise than average English sources. A far more significant potential mis-deduction, resulting from failure to apply the proposed rule, is Wikipedia (a source which generally uses diacritics) has no diacritic in this word therefore there isn't one.
Anybody who thinks WP is more precise than other sources has only himself to blame; we can avoid any misconception by including the original form in parentheses when it differs, as we already recommend. Please consider also Roger Joseph Boscovich, which is the conventional, and almost invariable, English form; Boscovitch also occurs, and should redirect. Septentrionalis PMAnderson 20:52, 15 June 2008 (UTC)Reply
Not sure what you mean by the first sentence (no worries if we convey misinformation, just blame the reader for believing us?) I say something about the parenthesis idea somewhere below (basically it's needless clutter when the only difference is a diacritic). And I agree that Boscovich should be so written - I think that would be covered by the first exception currently in the proposal.--Kotniski (talk) 11:09, 16 June 2008 (UTC)Reply
  • We are no more accurate than other sources; we are less so. Any conclusion based on a demonstrably silly premise is the fault of those who draw it; the position here is much the same as those who believe (because we said so for five minutes) that Alexander Hamilton was a go-go dancer, or (because we said so for a month) that he was educated in New England.
  • What's the difference between Boscovich and Djokovic? Boscovich is common, English, and spelt differently than the Croatian name; so is Djokovic (Djoker, by constrast, is English, spelt differently, and uncommon). Septentrionalis PMAnderson 20:41, 17 June 2008 (UTC)Reply

Arguments against this proposed policy:

  • Existing guidelines are not biased for or against use of diacritics and better reflect WP:NPOV policies.
Against that, use of diacritics is more a style issue where "bias" does not arise; consistency is more important.--Kotniski (talk) 11:01, 14 June 2008 (UTC)Reply
  • The existing guidelines rely on sources rather than the opinion of editors and potential WP:OR.
    • The proposed guidelines are actually more objective and less subject to opinion (about, for example, which sources should be followed in the event of a conflict).--Kotniski (talk) 11:01, 14 June 2008 (UTC)Reply
  • Use of diacritics can cause problems depending on operating system and browser settings and they should only be used if necessary. (I, for one, am seeing a few squares where there should be characters in some of these discussions).
AFAICT no-one on this page has suggested that is a problem. --Philip Baird Shearer (talk) 14:19, 13 June 2008 (UTC)Reply
I'm doing so, see my comments below. Somedumbyankee (talk) 14:35, 13 June 2008 (UTC)Reply
  • Following common usage makes the article more accessible since English writing does not use any of these characters.
The current policies and guidelines do not make this argument they say use what reliable English language sources use. If the sources use diacritics so should wikipedia if they do not then Wikipedia should not. --Philip Baird Shearer (talk) 14:19, 13 June 2008 (UTC)Reply

Meta-arguments:

  • The relationship between this and existing policy is unclear. Is it designed to replace existing guidance?
Yes it is clearly intended to ignore several policies and the WP:UE guideline. --Philip Baird Shearer (talk) 14:19, 13 June 2008 (UTC)Reply
Ignore? Modify would be a better word.--Kotniski (talk) 11:01, 14 June 2008 (UTC)Reply

Any significant arguments that I have misrepresented or outright missed? Somedumbyankee (talk) 02:55, 13 June 2008 (UTC)Reply

I think that sums it up pretty well, nice work! I would rephrase "including the diacritics is the correct spelling", since it is hardly tenable that it is the only correct spelling, and I would also leave out (or strongly disagree with) the last of the arguments against, since "accessible" is unsupported and English writing often does use these characters. I would also add something about current practice, since diacritics do seem to be in very general use on Wikipedia, indicating that the current guidelines are probably not being applied in this matter (practice is one of the determinants of policy).--Kotniski (talk) 07:53, 13 June 2008 (UTC)Reply
Aye, one should probably rephrase that pro point as "can be considered to be the most correct form of spelling". —Nightstallion 09:27, 13 June 2008 (UTC)Reply
There are very few diacritics used in written English, though they show up in a few words sometimes (i.e. café, naïve). Use in café is mostly a foreign branding issue rather than an actual native spelling, and diaeresis use in naive is definitely optional. The New Yorker magazine is infamous for using the diaeresis in spellings that otherwise don't use it, such as "coöperation." It is extremely unusual, but they've been doing it so long that it's just sort of a quirk that everyone tolerates (see The New Yorker#Style). The accessibility issue isn't a straw man: on my current settings, "Ţō ṭáķè àñ éχãṃρłẽ, ωĥö ṝẽâłłγ ṣṭřúģģļèš ťó ŕéáđ ţħïš" leads to 5 characters that display as boxes rather than intelligible writing. Somedumbyankee (talk) 14:35, 13 June 2008 (UTC)Reply
We haven't discussed this much, but characters that display as boxes on a significant proportion of today's browser/OS platforms absolutely are a problem, and the guideline should take this into account. GregorB (talk) 15:30, 13 June 2008 (UTC)Reply
I've tried IE6, IE7, Firefox 2 & 3 - IE6 displays boxes, the other three display it correctly. GregorB (talk) 09:53, 14 June 2008 (UTC)Reply
This is something to be investigated, but since WP uses at least the European diacritics extensively, we assume that it's no longer intended to supported legacy browsers which don't display them. And if it is, then we should simply abandon (the relevant) diacritics completely without arguing about sources.--Kotniski (talk) 11:01, 14 June 2008 (UTC)Reply
I have always opposed the position that we "are the encyclopedia anyone can edit if they have good enough software". If English actually predominately uses the diacritics, we are faced with the choice between two kinds of unintelligibility and must face up to that; but it should be avoided where English doesn't. Septentrionalis PMAnderson 21:18, 15 June 2008 (UTC)Reply
An excellent and much needed summary... One remark, though: I'd leave out WP:NPOV and WP:OR, as these guidelines don't apply here. NPOV and OR are problems with article content, not with Wikipedia guidelines. In a general sense, guidelines should be "neutral" and "not original", it's just that this is not the same neutrality and non-originality that applies to articles as prescribed by WP:NPOV and WP:OR. So the argument summary is essentially correct, only needs rephrasing. GregorB (talk) 08:30, 13 June 2008 (UTC)Reply
I disagree with your analysis as the article content policies do apply because you can not read Wikipedia policies in isolation. For example the name of the article (unless it is merely descriptive), ought to appear on the first line of the article, that makes it content, and content is covered by the content policies. Further most editors would agree that if the article is named Lech Wałęsa the usage after the first sentence should be "Wałęsa" and not "Walesa" and vice versa. This is something specified in the MOS Foreign terms (Spelling and transliteration) "Within an article, spell a name that appears in the article title as in that title (covered in naming conventions) rather than an alternative spelling, unless there is a good reason to do so, such as may be given in Naming conventions (use English).". GregorB a lot of people have spent a lot of time trying to make the policies and guidelines coherent and WP:UE is not a whim of a few editors but an attempt to fit foreign names within the content policies and other policies (such as WP:NOT) of Wikipedia. This proposed policy has already been modified from its original position and the exceptions are already making it more like WP:UE. If one modifies this a little more to comply with the WP:NC policy of "Common English name" then it will have a crude reiteration of WP:UE. --Philip Baird Shearer (talk) 10:51, 14 June 2008 (UTC)Reply
On the other hand, if your analysis is correct, then the entire WP:MOS is, by virtue of being unsourced, one huge WP:OR. That doesn't make any sense to me: guidelines don't have to be sourced, therefore WP:OR does not apply to them. I'm just saying. GregorB (talk) 22:25, 14 June 2008 (UTC)Reply
The MOS is not an article any more than this talk page is. The content guidelines such the MOS and content policies refer to the content of Wikipedia articles (and portals). For example in this case it does not matter that the proposed guideline uses OR what matters is that its usage would breach Wikipedia policies if applied to articles. There is a parallel example which is Wikipedia:Naming conventions (names and titles), but there is a consensus that it should be exception to the common names chiefly because there is no third party common consensus on these names. In this cases there is a deep split and no consensus in the community over the general use of accent marks, so the best thing to do is to defer to third party sources, as in the majority of cases usage by reliable third party English sources is clear and is what the content policies and naming conventions dictate, and for those cases where it is not clear what common English usage is (but not no English usage which defaults to the local spelling in the case of extended Latin alphabet names), it is simpler to agree them on a case by case basis using a WP:RM if a local consensus does not exist on the article's talk page. An alternative to reduce these conflicts would be to do as we do for WP:MOS#National varieties of English and "Retaining the existing varietyspelling" --Philip Baird Shearer (talk) 10:36, 15 June 2008 (UTC)Reply
It is simpler to admit that MOS is often original research which a couple of cranks have managed to revert war into guidelines. WP:NCNT is a perfect example of letting convenience prevail when there is no common name, or none we can use: Henry II is the common name for many of the entries on that dab page, but we can't use it for any of them. Septentrionalis PMAnderson 18:24, 15 June 2008 (UTC)Reply

Break 2

edit

Argument for: Using diacritics is in accordance with WP:UE. Guido den Broeder (talk) 09:29, 13 June 2008 (UTC)Reply

This is covered in the meta-arguments section, since both sides are making the exact same claim. Somedumbyankee (talk) 14:40, 13 June 2008 (UTC)Reply
The policy seems clear to me so no, this is not covered. Guido den Broeder (talk) 17:05, 13 June 2008 (UTC)Reply
Using diacritics is sometimes in accordance with WP:UE. This is perhaps not crystal clear, but I don't think it's disputed here. GregorB (talk) 17:41, 13 June 2008 (UTC)Reply
Well, it is now. My position is that is always in accordance with WP:UE. Guido den Broeder (talk) 18:02, 13 June 2008 (UTC)Reply
What part of WP:UE is so badly phrased that always using diacritics, even when, for example, nobody has ever used them in English, is in accordance with it? Septentrionalis PMAnderson
We are talking about a proposed amendment to guidelines; WP:UE would probably be one of the guidelines rephrased as a result of acceptance of the broad proposal. --Kotniski (talk) 11:01, 14 June 2008 (UTC)Reply
The guideline says: "Use the most commonly used English version of the name." Leaving out diacritics typically doesn't yield any version of the name whatsoever, but a different word that is not the name, let alone that it would yield an English version. It is therefore perfectly phrased, IMHO. Guido den Broeder (talk) 11:27, 14 June 2008 (UTC)Reply
On the contrary, this problem chiefly arises when diacritics are customarily dropped in our English sources, and some good soul insists on reinserting them, or as in the case of Noether, inserting them when they never existed. Version was chosen for WP:UE in large part to include this case, and your reading is arbitrary. Septentrionalis PMAnderson 21:31, 14 June 2008 (UTC)Reply
I stand by what I said. Guido den Broeder (talk) 22:22, 14 June 2008 (UTC)Reply
Fine, but your statement "Leaving out diacritics typically doesn't yield any version of the name whatsoever, but a different word that is not the name, let alone that it would yield an English version." is indeed an opinion. But it made me think that maybe the phrasing "English version" in the guideline is misleading? Maybe it should be changed to "English representation". Would that be less offending? I mean, it is a fact of life that names get different treatments in different languages. The word "representation" is perhaps more "neutral"?--HJensen, talk 22:37, 14 June 2008 (UTC)Reply
(Left) If the diacriticless version is not used in English, it is not an English version; if it is uncommon, WP:UE opposes it. But Follow the general usage in English verifiable reliable sources in each case, whatever characters may or may not be used in them. was intended to cover this point; usage does include typographical usage, although we can add the adjective if necessary. Septentrionalis PMAnderson 22:44, 14 June 2008 (UTC)Reply
Whatever the original intention, this still seems to be bad policy (for all the reasons set out above) and ought to be reworded in such a way that diacritic/no-diacritic questions are not settled by this rule. Whether something is/is not used (in the sources we know about) does not decide whether it is English. English usage follows certain rules; there are infinitely many good English sentences, for example, most of will never be uttered. English permits Roman-alphabet-based foreign names to be written either with or without diacritics (evidence for this is overwhelming); you don't need to find an example in a particular case to know that the general rule applies there too. And it's better for an encyclopedia to adopt a consistent and informative style than try to obey an often undecidable criterion.--Kotniski (talk) 11:37, 15 June 2008 (UTC)Reply
Aye. The most correct / original form of the name should be used IMO, but I know that not everyone values WP:PRECISION more highly than WP:COMMONNAMES... —Nightstallion 11:44, 15 June 2008 (UTC)Reply
"And it's better for an encyclopedia to adopt a consistent and informative style than try to obey an often undecidable criterion" I disagree, (I think we should follow reliable third party English language sources), but does this mean Kotniski you would lend equal support to this proposed guideline if it were reworded to say "The preferred style on Wikipedia is not to use diacritics, as the English alphabet only has 26 letters", or are you pushing a specific agenda that tolerates no dissent? --Philip Baird Shearer (talk) 13:02, 15 June 2008 (UTC)Reply
Well noo, I wouldn't support such a rewording because a) it isn't true (diacritics are widely used on WP, and they are certainly admissible in English), and b) such an policy would make WP (somewhat) less informative and therefore a (somewhat) worse encyclopedia. And the problem with the third-party sources of which you speak is that they are not consistent among or even within themselves. --Kotniski (talk) 16:04, 15 June 2008 (UTC)Reply
So you are biased on this issue. I don't think your arguments are coherent. The first one is equally true the other way around. Surly one can put up a contrary argument for the second one that if we follow other reliable sources the reader of our encyclopaedia can then be confident that what he or she reads here is common practice in reliable English language sources. That is not to say that one ought not write "Lech Walesa (Polish:Lech Wałęsa)" and with such a construction more information is provided (common English usage and the Polish spelling) than simply by placing the article at Lech Wałęsa and not noting common English usage. --Philip Baird Shearer (talk) 19:12, 15 June 2008 (UTC)Reply
I am more persuaded by one set of arguments than the other, if that counts as being biased. Are you suggesting we should write people's names twice every time we mention them, just to tell people what they almost certainly already know, namely that names with diacritics are often written in English without them? There used to be a template for doing this (at the top of Łódź it used to say "this name uses the letters ł, ó and ź, if these are not desired it can be written Lodz" or something like that), but this was deleted a while ago, presumably because it's patronising and places undue emphasis to keep repeating a general rule for every case. You might as well write "colo(u)r" every time to tell people that both spellings are used in English sources.--Kotniski (talk) 08:17, 16 June 2008 (UTC)Reply
Mention them twice in the opening sentence and then use the page name in the rest of the article (as suggested by the WP:MOS). To mention them twice is briefer than the template you are suggesting and more informative as it tell the person from what language the alternative spelling is derived. It also has the advantage that the article will also be found when searching with search engines which do not combine diacritics with non diacritics searches. It also helps with names where there is more than one way of spelling it as with Gdansk or Kiev. As to the comment "just to tell [readers] what they almost certainly already know" in some cases (as with Göttingen),it helps people to know that the reliable English language sources do not drop the diacritics. --Philip Baird Shearer (talk) 17:17, 17 June 2008 (UTC)Reply

The orginal form of the name is the most correct?

edit

Let's have a source for that opinion. It may well be true of Croatian (although the Croatian Wikipedia shows exceptions), but it is not true in English (it never has been, or we anglophones wouldn't use Rome, Nuremberg, Ovid). If it were true for Wikipedia, we wouldn't transliterate Greek, Russian, or Chinese names at all.

We should not use Croatian customs here. This is the English Wikipedia. The claim above is a fallacy, and if it remains unevidenced, I suggest it be admitted that this proposed amendment has no hope of consensus. Septentrionalis PMAnderson 18:20, 15 June 2008 (UTC)Reply

"Original form is the most correct" would not make much sense anyway until we agreed on what exactly "correct" means. English is indeed not Croatian. Anyway: "Wien" instead of Vienna is confusing in English as much as it is in Croatian. GregorB (talk) 18:39, 15 June 2008 (UTC)Reply
Also, I believe we'd agree that "correctness is good (other things being equal)" and "consistency is good (other things being equal)", even without sourcing. The trouble here is, as I said, not only the exact definition of "correctness", but the fact that, unfortunately, as one goes for correctness and consistency, other things are not being equal, and that needs to be taken into consideration too. GregorB (talk) 19:02, 15 June 2008 (UTC)Reply


Here we go again, PMAnderson. You and your mythology... :(

  • What makes something "regular"?? Rules? Which rule of English grammar, ortography says "use no diacritics?? ! " No one! Which rule says "tranliterate the Đ, Ć, Š into dj, ch,sh" No one. So ,stop repeating that nonsence about "regular spelling of foreign names" :
  • Your comparison with Greek, Russian, or Chinese names are totally meaningless because they are written by different script.

We don't transliterate names from the same script. --Anto (talk) 19:51, 15 June 2008 (UTC)Reply

    • Since I have never said "use no diacritics", this comment is misdirected. In fact, I support several, both here and elsewhere.
    • This proposal does say to transcribe Đ as dj. Please read it before commenting further; it's not long.
    • Really, those who cannot spell English themselves should show more moderation in their opinions on English spelling. Septentrionalis PMAnderson 20:14, 15 June 2008 (UTC)Reply
The world did not start to function when Columbo has reached America. We are familiar with certain rhetorics-old as the world itself.
here we see :
http://en.wikipedia.org/wiki/Wikipedia:Naming_conflict#Proper_nouns
  • Is the name in common usage in English? (check Google, other reference works, websites of media, government and international organisations; focus on reliable sources)
  • Is it the official current name of the subject? (check if the name is used in a legal context, e.g. a constitution)
  • Is it the name used by the subject to describe itself or themselves? (check if it is a self-identifying term)
Does the usage of diacritics oppose to these rules?
  • Sometimes
  • NO
  • NO
English spelling is spelling the words. There are no rules for spelling the personal names. Neither those from anglophone world ( just an illlustration :'Sean , Shawn or Shaun -which is proper spelling?? Linda or Lynda? )
I would react to your comments about my spelling but I am afraid that a monoglot like you is totally unable to understand it.
Go ahead, be uncivil; because WP as a whole is written for anglophone monoglots doesn't mean all of us are; nor that we lack access to dictionaries. Septentrionalis PMAnderson 02:02, 17 June 2008 (UTC)Reply
I bet you would like that I do, so that you can eliminate me from the discussion. But , I won't give you that pleasure. Sweet dreams! --Anto (talk) 15:09, 17 June 2008 (UTC)Reply
--Anto (talk) 15:25, 16 June 2008 (UTC)Reply
No amount of bolding can change your last claim from being a falsehood; as Ulam (and many of the tennis figures who provoked this discussion) will show. Many people do drop diacritics in languages which do not normally use them. Septentrionalis PMAnderson 02:02, 17 June 2008 (UTC)Reply
If you don't want to use diacritics I can't help you . but , that is just your opinion. Btw, I am still waiting for your sources about "obliged transliteration for Latin script names"-grammar tutorial from some university i.e. Since you are such an "expert" for English I guess that will not be hard for you <:irony> --Anto (talk) 15:09, 17 June 2008 (UTC)Reply
QWhat is Anto talking about? I have never typed "obliged transliteration for Latin script names" in my life. Septentrionalis PMAnderson 19:26, 17 June 2008 (UTC)Reply
You have been written that Đ is beeing transliterated as DJ an that it is "regular English spelling" . See here:
"Tudjman is correct spelling in English! "
So , I challenge you here:show me those English grammar/spelling tutorials from some university(not newspapaers, tabloids etc. that are no in charge for language regulation ) that says that foreign characters must be tranliterated. or, to be shorter that Đ has to be transliterated DJ . As I have written , that would not be a big problem for use since you are top-gun expert--Anto (talk) 20:03, 17 June 2008 (UTC)Reply
I have never said that foreign characters "must be tranliterated" or even transliterated. I don't believe it. Tudjman is the correct spelling in English, not because it is transliterated, but because the overwhelming majority of English speakers spell it that way. That is what correct spelling in English means; at root, English has no other way to determine correct spelling. Septentrionalis PMAnderson 20:28, 17 June 2008 (UTC)Reply


All we need to do is use common English spelling then it does not matter what the rules are in foreign Languages. For example some names like "Hermann Göring" are translated via German grammar rules to "Hermann Goering" (and were spelt that way in the War crime transcripts) others like "Zürich" are not and are translated as "Zurich". --Philip Baird Shearer (talk) 20:02, 15 June 2008 (UTC)Reply

And sometimes, as with Göttingen, common usage retains them. When this does occur, it is not difficult to document. Septentrionalis PMAnderson 20:14, 15 June 2008 (UTC)Reply

Going back to the title of this sub-section, the proposal is not saying that one form is more "correct" than the other. However it asserts that the diacritic form is more informative (and therefore more appropriate for an encyclopedia). Of course we can work on defining exceptions to that very general rule.--Kotniski (talk) 08:22, 16 June 2008 (UTC)Reply

I've added a few more exceptions to the draft proposal (Goering is covered by one of them).--Kotniski (talk) 10:55, 16 June 2008 (UTC)Reply
In fact I've just realised that Goering is currently at Göring (and the other spelling wasn't even mentioned in the lead until I added it). Not sure what that proves, but at least supports the claim that diacritics are in extremely wide use in Wikipedia already, so to whatever extent policy is supposed to be descriptive as opposed to prescriptive, the proposal is sound.--Kotniski (talk) 11:03, 16 June 2008 (UTC)Reply
The article started life as Goering and was renamed to Göring in July 2002. There was a requested move to move the article back to Goering in January 2005 but there was no consensus for the rename. For most of the history of the article the alternative spelling of Goering has been included in the article. But one of the problems is that when a name is move to what some perceive to be the "correct" name, often other "incorrect" versions of the name are removed (it is a form of linguistic intolerance one rarely sees when a page is under an anglicisation). Also the current formulation that has been put in place "Hermann Wilhelm Göring (or commonly Goering)" -- which is common for this type of problem -- I think reads much better as "Hermann Wilhelm Goering (German: Göring)". --Philip Baird Shearer (talk) 13:59, 16 June 2008 (UTC)Reply
"reads much better as ..." -- And that's precisely where we disagree, and will continue to do so for years, I expect. —Nightstallion 18:27, 16 June 2008 (UTC)Reply
Which is an admission that there is no consensus for this change (and since this proposal supports Djokovic, I doubt Anto supports it either). Septentrionalis PMAnderson 02:02, 17 June 2008 (UTC)Reply
Looks like you're right about the lack of consensus, but observance of what's happening in practice implies that there isn't consensus for, or any will to apply, the literal wording of the existing policy either. Maybe this whole set of guidelines needs a comprehensive review.--Kotniski (talk) 07:48, 17 June 2008 (UTC)Reply
See Talk:Engelbert Dollfuss for an example of current practice. This is admittedly eszett, but single letters also have diacritics removed when it's not English practice (note also the anon comment on the page cited for an an instance of diacritics-are-scum). On the other hand, diacritics tend to remain without discussion when they are English practice. Septentrionalis PMAnderson 19:43, 17 June 2008 (UTC)Reply

Close this policy proposal

edit

I think this policy (UD) is doomed. The current approach we have at UE is viable and a good compromise (Use what RS use). Let's leave it at that. Note that I would support using diacritics all over the place as a policy, but there is a snowball's chance in hell to get this adopted Jasy jatere (talk) 07:43, 17 June 2008 (UTC)Reply

I wouldn't be so pessimistic - the idea has its vociferous opponents, but it seems to represent widespread practice (even too widespread, as the Goering and Tudjman examples show), so if the exceptions can be well defined, I think it has a good chance of gaining wider community consensus (not unanimity). The problem with the current guidelines is that they are simply not followed, and would in some cases lead to ugly inconsistency and instability (and constant argument) if they were.--Kotniski (talk) 07:55, 17 June 2008 (UTC)Reply
I think what happens is just as articles about British or American topics attract British and American editors respectively (and they like to use British and American spellings), so articles about subjects in the orbit of other national interests attract editors from that nation state or an adjacent nation states. So although there would be a majority of editors in favour of the policies and guidelines as expressed in WP:UE, often there is a local consensus to place the article name under the spelling of the name most familiar to the editors who edit the article, and as many of them are able to read an write the local national language (often as a mother tongue), so they have a preference for the "correct" spelling and ignore the spelling in the majority of reliable English language sources. Just as British and American editors are willing to alter spellings to their preferred national spellings, so too are foreign editors when the topic is one they are familiar with and they are used to seeing the topic name with diacritics. This is not to say that all editors who's mother tongue is not English support the use of diacritics and all who's mother tough is English support the removal of "Funny Foreign Squiggles", but there are enough for me to see it as a stereotypical pattern. However there is wide support for the three core content policies and for WP:NC and if enough editors are involved in the consensus building exercise, most would support WP:UE as it builds on those policies. --Philip Baird Shearer (talk) 17:00, 17 June 2008 (UTC)Reply

Although, I see it is hard to reach consensus here, I think this kind of policy is must have for Wikipedia. There are so many discussions like this. There are many articles that do get moved to and from stripped/diacritics version. Wikipedia is not consistent on this issue. As kind of compromise, maybe the name of WP:UD guideline should be Usage of diacritics, and then we can reach consensus on whether to use them or not. But I strongly support guideline that would clear this issue.--Irić Igor -- Ирић Игор -- K♥S (talk) 17:53, 17 June 2008 (UTC)Reply

Đ in Vietnamese...

edit

...is transcribed as "D", not "Dj"; this needs to be taken into account if Đ was to be made an exception.

BTW, for an example of inconsistency and hodge-podge regarding diacritics use, take a look at Category:Vietnamese politicians. Former president is Ngô Đình Diệm, but his brothers are Ngo Dinh Khoi and Ngo Dinh Nhu. This isn't intolerable, but isn't good either. GregorB (talk) 08:08, 17 June 2008 (UTC)Reply

A search of Google Books and Scholar seem to show that Ngo Dinh Diem is overwhelmingly more common than Ngô Đình Diệm (BOOKS 1259 on Ngo-Dinh-Diem -Ngô-Đình-Diệm, 70 on Ngô-Đình-Diệm -Ngo-Dinh-Diem. SCHOLAR 3,170 for Ngo-Dinh-Diem -Ngô-Đình-Diệm, 3 for -Ngo-Dinh-Diem Ngô-Đình-Diệm). On that evidence unless someone can show that it is faulty the article Ngô Đình Diệm should be moved to Ngo Dinh Diem --Philip Baird Shearer (talk) 08:30, 17 June 2008 (UTC)Reply

Proposal featuring the differentiation of diacritics and extensions

edit

Summary: A proposal that advocates usage of diacritics in many cases but that differentiates letters with diacritics from "extensions". It calls for relatively wide usage of diacritics but allows for exceptions that cover a number of cases where their use is controversial.

Editorial comments: I am a strict proponent of adhering to WP:UE at Wikipedia as my record can attest. However, unless a placename is a true exonym or a person's name is truly "English" (e.g., that person is notable primarily in an English-speaking country or is naturalised in such a country), most place or personal names cannot be said to have an English form at all. Many "English verifiable reliable sources", especially web sources but also many older print sources, are or were limited by technological considerations from using diacritics but these do not exist at Wikipedia. (In some cases, the comparative method can be used to determine whether diacritics are dropped for techinical or conbvenience reasons but even this is not fool-proof. For example, the Economist style guidelines mentioned above are the worst of geo-bias — they deem Western European languages worthy of carrying diacritics but not others [e.g., Gerhard Schröder but Abdullah Gul ]).

Therefore, since an encyclopedia is a reference work of higher calibre than a wire service news story and should aim to be marginally more "highbrow" (for lack of a better word), Wikipedia should, to a degree, reflect the underlying native names of persons and places when these are variations of the Latin alphabet.

The current situation works surprisingly well but there are cases such as Vietnam's Bac Kan Province (with no diacritics like the articles on many other Vietnamese places) where there are few English speakers versus Pūpūkea, Hawai'i that, like many placenames in Hawaii (where WP:UE has been vetoed) carry diacritics despite rarely if ever being used in English.

The proposal below attempts to standardise diacritc usage at Wikpedia while acknowledging some of the problems that can occur. It assumes regular diacritics do no "harm" to the unfamiliar reader — the name can be read by ignoring them — whereas "extensions" render a term unpronounceable to the unfamiliar reader. — AjaxSmack 00:21, 18 June 2008 (UTC)Reply

Proposal

edit

For the placename or person that is well known in the English-speaking world, i.e. is widely mentioned in English-language sources:

  1. When person or place has a name in the Latin alphabet including letters with diacritics (or some ligatures), e.g., Å, Œ, Ř, Ŵ, names should be spelt with them. (e.g., Ngô Đình Diệm)
  2. When a name includes Latin "extensions" (and other more obscure ligatures), e.g, Ŋ, ß, Ʌ, Þ, the name should be spelt with the normal Latin substitute for these extensions (e.g., Abülfaz Elçibay, not Əbülfəz Elçibəy)
  3. Certain letters with unusual circumstances follow national conventions. For example, Đđ (D with stroke) is rendered "Dj" in South Slavic contexts following usual English conventions but is rendered "Đ" in Vietnamese contexts. Ðð (eth) is rendered as "Dh" in Icelandic, Faroese contexts due to the complication of the lowercase form, "ð".
(This results in Meissen, but Göttingen and Tudjman but Dvořák.)

For the placename or person that is not well known in the English-speaking world, i.e. is not widely mentioned in English-language sources the preferred style on Wikipedia is to use diacritics, as this provides maximum information to the reader. This includes article titles; alternatives without diacritics should be set up as redirects.

AjaxSmack 00:21, 18 June 2008 (UTC)Reply

Discussion

edit

I would be interested in comments on the above proposal and in hearing from others on is how one would differentiate "well known in the English-speaking world" and not well known. I don't consider Google hits from computer generated Weather sites and such to be evidence of usage in English. Even mere mentions in traditonal printed texts don't hack it. I prefer "critical commentary" (to borrow a phrase from WP:FAIR) on a subject before it can be considered to be well-known.— AjaxSmack 00:21, 18 June 2008 (UTC)Reply

Your proposal is exceedingly complex, so complex as to be unworkable. Aside from that insurmountable problem, I think you need a new project page for this proposal. This project apparently has been closed because no consensus in favor of it emerged. Tennis expert (talk) 02:51, 18 June 2008 (UTC)Reply
Compared with most Wikipedia policy, this proposal doesn't seem particularly complex at all. Moreover it seems to be a) quite easy to follow in practice, and b) quite in line with what already happens. I suggest what we do is (for some reason I've started saying everything in terms of a)b)c) recently): a) close this discussion and archive the proposal; b) put Ajax's proposal on a new page called WP:Usage of diacritics (as someone suggested); c) discuss from there.--Kotniski (talk) 06:34, 18 June 2008 (UTC)Reply
Why wouldn't that be just the reopening of this closed discussion where it was apparent that the suggested policy was no where near gaining consensus? Haven't we spent enough time on all the various permutations of the suggested policy already? Tennis expert (talk) 06:37, 18 June 2008 (UTC)Reply
Obviously we haven't, since consensus has not been reached either way. --Kotniski (talk) 06:40, 18 June 2008 (UTC)Reply
The proposal was to change existing Wikipedia policy. Where there is no consensus to change a policy, the policy stays as it is. You (or others) can repeatedly open new project pages everyday to achieve what could not be achieved the previous day. But what's to be gained by that? Aren't you just trying the community's patience with these repetitious debates? "Give it a rest" is my recommendation. Tennis expert (talk) 06:48, 18 June 2008 (UTC)Reply
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.