Wikipedia talk:Naming conventions (Chinese)

Latest comment: 6 months ago by Folly Mox in topic Language nomenclature

Standardised Cantonese/Hakka romanisation edit

The article heavily focuses on the use of Mandarin to romanise Chinese characters and phrases, but there is no real guideline (aside from "follow what the sources say") as to the romanisation of other Sinitic languages. While there has been ample and lively discussion on this talk page and others about the scope of names that should be romanised with each language (which is still a constantly ongoing tug of war because of the inherent hyperpoliticisation), I don't see any real discussion about the standards of such, the way Mandarin transliteration is elaborated on. Even aside from the raw pronounciation, there are some differences in transliteration conventions in Cantonese and Mandarin (which is the example I'll stick to on basis of personal knowledge), e.g.:

  • Cantonese names tend to use a hyphen (Kwok Fu-shing) vs Mandarin names that tend to concatenate given name (Xi Jinping)
  • Cantonese transliterations lean towards spacing by character (Sai Yeung Choi South Street) vs Mandarin spacing by phrase/word (Zhongshan Road)

Therefore, I feel like an alternative description of Cantonese or Hakka transliteration scheme also deserves a place in the guide. And if someone knows sufficient Taishanese, Shanghaiese, etc to make a separate transliteration guide for that, that would be very welcome too. Alternatively, I feel like a better option might be to split this article into a disambig that redirects to three or four different transliteration guides for the various Sinitic languages commonly needed to be transliterated. Fermiboson (talk) 11:11, 25 April 2023 (UTC)Reply

Are there any standards out there to recommend? Otherwise we should continue to follow the sources, as in the above examples. Kanguole 07:37, 26 April 2023 (UTC)Reply
There are definitely standards out there, although none have an overwhelming majority usage and one of the bigger problems is that the majority of speakers don’t know it. But it’s not exactly like the majority of English speakers follow MOS either, so I feel like that could be worked around. At the very least we could add something on identifying whether a source is in Cantonese, as opposed to an archaic name that might need to be changed to a pinyin transliteration. Fermiboson (talk) 15:25, 26 April 2023 (UTC)Reply
Re Cantonese, as much as I don’t like it personally, the tendency is to move towards Jyutping, judging from its usage in dictionaries and language teaching materials. Pentaxem (talk) 22:04, 3 September 2023 (UTC)Reply
The problem with moving to Jyutping is that it would match virtually none of the already commonly used names. To take the above example: Kwok Fu-shing becomes Gwok Fu-sing, which is still recognisable to an HKer who knows who he is; Sai Yeung Choi South Street would become Sai Joeng Coi South Street, which honestly sounds more Korean than Cantonese. Given even pronunciations of certain words within the language aren't really standardised, I think the follow the sources approach for the actual pronunciations themselves is the best we can do; this discussion would more be about things like formatting as mentioned above, or of when to translate instead of transliterate, etc (e.g. in HK "University Road" vs in the Mainland "Daxue Road"; I haven't checked the latter exists but you get my point.) Moreover, it would be good to acknowledge that the Sinitic languages are not just Mandarin even in written form, in principle. Fermiboson (talk) 06:45, 6 September 2023 (UTC)Reply
I strongly support this suggestion. Even if there isn't a single majority usage atm, for the sake of readability Wikipedia should pick a standard and stick to it.
I'm not very familiar with non-Mandarin romanizations and would really appreciate a guide. SilverStar54 (talk) 12:12, 4 September 2023 (UTC)Reply
I'm fairly surprised there haven't been more RfCs over this and related topics, given the ire it usually attracts. As I said above, I don't think a full standard like there is for Mandarin is achievable or within our scope, but for one the claim in the project page that all written Chinese is Mandarin Chinese is patently false. Admittedly, I do not have any examples of it causing problems, so I could possibly be guilty of having a solution that needs a problem here. I would appreciate it if anyone had any examples to make a case however. Fermiboson (talk) 06:49, 6 September 2023 (UTC)Reply
I think "Chinese names should be written in Hanyu Pinyin unless there is a more common romanization used in English" covers pretty well what we do and what we should do. Yau Ma Tei is the standard placename in English for 油麻地 although it is not any of the commonly used romanizations of Cantonese nowadays. There seems to be even less of a standard for non-Mandarin than for Mandarin (where Taiwan and Singapore commonly use other systems than Hanyu Pinyin, with different systems used for different people: our coverage of Taiwan and people connected to it uses at least (simplified) Wade-Giles, Gwoyeu Romatzyh, Hanyu pinyin and Tongyong pinyin). Wikipedia should not invent standards that are not used by the majority of sources. Better to stick to the sources than to surprise people by "standardized" article titles that are different from everywhere else. —Kusma (talk) 09:50, 6 September 2023 (UTC)Reply
I agree largely with what you said. As noted above, however, this is not really about the standard of romanticisation, which as you say is close to nonexistent. Other aspects covered in this MOS, such as word ordering and grouping, hyphenation, when to translate and when to transliterate etc can often be different; I'm not sure how many situations there are where these differences could not be supported with sources, but the differences certainly exist (e.g. according to the guide "Tuen Ma Line" should be written "Tuen Mun- Ma On Shan Line" if no sources proving common usage otherwise existed). These formatting rules are currently largely centered around Mandarin Chinese and it would perhaps be prudent to at least note different standards exist for sources in different Sinitic languages and, where possible, also list any such formatting standards in other languages. (As a hopefully uncontroversial example, I have added the name hyphenation rule to the relevant section.) Fermiboson (talk) 10:09, 6 September 2023 (UTC)Reply
Right. Spaces need to be addressed properly, and what you wrote makes sense. Hong Kong placenames usually have single syllable words (Chek Lap Kok), Singapore has everything (Ang Mo Kio, Yishun). Macao placenames are a wonderful mess (using various degrees of Portuguese-ness). Basically I would not bother trying to write a convention covering all of Greater China; Hanyu pinyin can be standardized, but anything else has local rules that aren't easily generalized. —Kusma (talk) 10:59, 6 September 2023 (UTC)Reply
Sources might not agree on which romanization system to use, but almost all are internally consistent. Why shouldn't Wikipedia be the same? Not choosing a romanization system is itself a choice, and it's a bad one. It's really hard to read and understand a work that mixes different systems willy-nilly.
Again, I suggest we pick one system and stick to it. The only exception would be if a clear majority (i.e., not just a plurality) of sources differ from whatever standard we pick. SilverStar54 (talk) 15:18, 7 September 2023 (UTC)Reply
Isn't that what we do? We prefer Hanyu pinyin (our "one system") unless sources do something else, which is a fairly common occurrence when talking about people or places not in Mainland China. Sources about Taiwanese politics do not use consistent romanization systems, but they typically all use the same romanization for the same person. Standard for Ma Ying-jeou is Gwoyeu Romatzyh, Lee Teng-hui is Wade-Giles, Tsai Ing-wen isn't in any system I know. Others like James Soong or Audrey Tang are generally known by adopted English names. Recognizable names (people won't recognize the names Ma Yingjiu, Li Denghui, Cai Yingwen, Song Chuyu, or Tang Feng) are generally better than internal consistency that won't be visible to the casual reader anyway. —Kusma (talk) 15:51, 7 September 2023 (UTC)Reply
Pinyin is our standard for Mandarin, but I think what Fermiboson is suggesting is to decide on a standard for each of the other Chinese languages (Cantonese, Hakka, etc.). These languages don't follow the same rules of pronunciation, so it doesn't make sense to use Hanyu pinyin for them. SilverStar54 (talk) 16:39, 7 September 2023 (UTC)Reply
I am not convinced that a single standard for Cantonese makes sense. The "obvious" choice of Jyutping doesn't work for Hong Kong place names (a rather large set of examples) as demonstrated above. —Kusma (talk) 17:12, 7 September 2023 (UTC)Reply
That's a good point. Do other Sinitic languages have similar large bodies of exceptions, or just Cantonese?
I still think it would be a good idea to define a default system for each Chinese language, even if there are major exceptions. Maybe we should focus on in-article standards rather than titles? Perhaps we should move this discussion to WP:MOSCHINA? SilverStar54 (talk) 19:43, 7 September 2023 (UTC)Reply
That's a good point. In retrospect this thread would perhaps be better suited over there. If one of you want to do a move/copy to the talk over there, feel free to do so. Fermiboson (talk) 03:52, 8 September 2023 (UTC)Reply

Spacing/determining word boundaries edit

I propose that after the sentence "Pinyin is spaced according to words, not characters", we should add a link to the section of the pinyin article explaining how to determine word boundaries. The rules on when to combine words/when not to combine them are a bit complicated. SilverStar54 (talk) 03:53, 24 July 2023 (UTC)Reply

Language nomenclature edit

mirroring my post on Talk: Chinese characters:


his applies to various related articles in Category: Chinese characters, in clarifying the often-conflated categories. Here's how I see it:

  • Chinese characters are the logographs originally used to constitute the morphemes in the Old Chinese writing system, which is, perhaps unhelpfully, also called 'Chinese characters'.
  • A writing system includes orthographic rules and conventions, in this case relating to semantics and phonology in various spoken languages. a set of characters with expected meanings and pronunciations is an array of conventions as such.
  • However, I don't think Traditional and Simplified characters constitute separate writing systems per se, even though there are mergers in character variants from the former to the latter.
  • Kanji, for example, is a set of characters within the greater Japanese writing system, using Chinese characters.
  • I wish we could call these character sets 'scripts', but that word is largely claimed by the systematic graphical (ish) styles such as clerical script and regular script.

Remsense (talk) 04:54, 2 October 2023 (UTC)Reply

The native word used to differentiate Traditional and Simplified character forms is usually 字體 (character form), although typically encountered in the opposite order, as in 繁體字. The word differentiating clerical script and regular script, in contrast, is 書. I'm not sure if any of this is helpful, or really what the question here is supposed to be, but I'd be glad to type more Chinese if the question can be clarified. Folly Mox (talk) 07:07, 2 October 2023 (UTC)Reply
Actually maybe 字形 (character form) is more common than 字體 (character form). I'm probably a little mixed up in my modern technical vocabulary. Folly Mox (talk) 07:11, 2 October 2023 (UTC)Reply
Oh my gosh I thought this was WP:RD/L; this doesn't have to take the form of a question at all. My goodness it might be bedtime. Folly Mox (talk) 07:13, 2 October 2023 (UTC)Reply
very helpful! i guess the default term for 'simplified' and 'traditional' might be 'character form', as opposed to 'writing system' or 'script' or 'character set'? Remsense (talk) 18:05, 2 October 2023 (UTC)Reply
"Character set" makes sense when you're talking about computing, and maybe statistical analysis. "Character form" ("Simplified form", "Japanese form", etc) is probably the closest term for general linguistic description. Folly Mox (talk) 20:08, 2 October 2023 (UTC)Reply