Module talk:Lang-zh/Archive 4

Latest comment: 6 years ago by Szqecs in topic Hakka
Archive 1 Archive 2 Archive 3 Archive 4 Archive 5

Template:Zh-full

I've been working on merging {{zh-full}} into this one. The motivation is that where possible it makes sense to replace instances of {{zh-full}} with {{zh}}, as the recent work on this template has improved its output significantly. Where not possible, so where was used because of the features it provides over this one, then it should be possible to add the features to this. In particular the ability to list things in an arbitrary order is something that was pretty much impossible before but can be easily done in Lua.

As a first step I've been going through articles using {{zh-full}} and replacing them with {{zh}} where possible. I've only been doing this as there are good editorial reasons: this template provides better output (proper language tagging, consistent italicisation), handles special cases better such as t and s being the same, handles empty fields properly, avoids redirects in its links, and is much shorter and easier to type. I've did other cleanup as I went, in particular of Chinese language.

Cantonese first issues

And it has been possible in almost all cases, to replace them. The vast majority just had Chinese (simplified and traditional), pinyin and Wade-Giles in some combination. One had IPA which I changed back to pinyin as more common, useful and easily understood. Otherwise they were straightforward replacements, with a few requiring 'first=t' to put traditional Chinese first.

The only two left are Hong Kong topics, Bat Seui Yiu... Yun Mei Dak Ho Pa and A.S. Watson Group, which have Jyutping and pinyin Romanisation and put the Jyutping first. This is easy to fix though: have 'first=t' also put Cantonese Romanisations before pinyin and Wade Giles. The logic is that when first=t is used and Cantonese is supplied it's a Hong Kong topic. Not Taiwanese as Cantonese is not used there, not mainland as simplified Chinese should come first there. And in a Hong Kong target if both Cantonese and Mandarin Romanisations are given the Cantonese should go first. This is very simple fix: it only requires changing one line, this one

		orderlist = {"c", "s", "t", "p", "tp", "w", "j", "cy", "poj", "zhu", "l"}

to something like:

		orderlist = {"c", "s", "t",  "j", "cy", "p", "tp", "w", "poj", "zhu", "l"}

The other options are

  • add a separate option for 'Cantonese first'. Easy to do but another option that hardly anyone will use seems unnecessary
  • Add the code I envisaged at the start to output fields in the order they're given. I now see no need for this at all, given how all the instances of {{zh-full}} didn't use it's ability to reorder stuff; with the two exceptions noted above everything was in the same order, or with t and s swapped which {{zh}} already supports.

Any thoughts?

I can perceive Cantonese first being used for some mainland/crossborder articles such as Cantonese people. Also if the template is ever expanded to include Hakka or Tibetan, then there would be cases where those should go first and so the system of prioritisation needs to be flexible enough to allow future expansion.

Perhaps change the frist=t into a region specifier for TW, CH, HK, SG. So for example specifying region=HK would select the order {"c", "s", "t", "j", "cy", "p", "tp", "w", "poj", "zhu", "l"}

Another way would be to allow the editor to supply an ordered list to override the detault. This could be made backwards compatible with first=t. For example if the editor did first=t,s,j the template would output t frist, then s, then j, followed by any other supplied fields in the default order. If the editor did first=t,poj,tp then we would have the Taiwan related sections first and the others following. Most editors would not use this, leaving the default, but it would be there for those that wanted to. Also it wouldn't break any existing pages as those currently with first=t would still get t outputed first followed by the others in default order, the same as today.

Another feature of {{zh-full}} is that it allows you to rename the labels. I don't know if anyone has used that feature on any article? I also don't know if anyone would want to use that feature or if it could be deprecated without anyone noticing? Rincewind42 (talk) 15:29, 24 May 2014 (UTC)

I had thought of that before: doing it based on region rather than a single switch, first=t. The problem is it's quite disruptive – all existing instances of first=t would have to be found and updated, changed to region=tw or region=hk based on the article, and I don't know how you'd find them. Or you leave both first=t and region=xx in the template which introduces redundancy as with Cantonese=first, and seems overkill for something that there's no obvious need for - the only instances of {{zh-full}} with ordering unsupported by {{zh}} were with Jyutping first.
The editor specifying the order is another way of doing it. The way it would work is with an extra option, ordered=no. If the module detects this it doesn't use a fixed order but the order is the same as the parameters passed to the template. Essentially the same as how {{zh-full}} works but done in code not templates within templates as for that. It also introduces redundancy but can be thought of as two levels: a switch for simple cases, an option to use any order for more specialised cases.
I don't think replacing labels is a good idea. It would make the template much more complex and be little used (it wasn't used at all within {{zh-full}}). If editors need that degree of control over labels, links, formatting they need not use the template, or can use it for some languages but use {{lang}} with their own labels and formatting for those they want customised.--JohnBlackburnewordsdeeds 16:22, 25 May 2014 (UTC)
I've changed the sandbox to put Cantonese first when first=t is specified; as noted above it's a very simple change. The results look like this
{{Zh/sandbox|s=中国|t=中國|p=zhōngguó|j=Gwong²zau¹|cy=Gwóngjàu}}
gives
simplified Chinese: 中国; traditional Chinese: 中國; pinyin: zhōngguó; Jyutping: Gwong²zau¹; Cantonese Yale: Gwóngjàu
while
{{Zh/sandbox|s=中国|t=中國|p=zhōngguó|j=Gwong²zau¹|cy=Gwóngjàu|first=t}}
gives
traditional Chinese: 中國; simplified Chinese: 中国; pinyin: zhōngguó; Jyutping: Gwong²zau¹; Cantonese Yale: Gwóngjàu
--JohnBlackburnewordsdeeds 21:31, 26 May 2014 (UTC)
|first=t might also be used for Taiwan topics, or for ancient topics. In both cases one would still want pinyin before the Cantonese romanizations. Kanguole 22:53, 26 May 2014 (UTC)
Taiwan topics should not have or need Cantonese Romanisations; looking at Languages of Taiwan and it's not even mentioned. As for ancient topics it's usual to put simplified first and just give pinyin. I wish there were some easy way of checking this but I strongly suspect first=t is only used in Hong Kong and Taiwan topics and in those only Hong Kong topics include Cantonese [Romanisations].--JohnBlackburnewordsdeeds 23:55, 26 May 2014 (UTC)
I've found a way of checking uses of first=t. I went to Special:Export, exported all the pages in Category:Articles containing traditional Chinese-language text, which gave me a 69MB XML file. I searched through that for all instances of {{zh}} with first=t in it, writing them out to a file, which I've copied to here: User:JohnBlackburne/zhdump. It's possible I missed some: the way I searched would have missed out templates split over multiple lines (I saw one) and also templates containing other templates such as {{linktext}}, but it should have found the majority. I can see six Jyutping and three Cantonese Yale, out of a little over 200, so it's almost all Taiwanese.--JohnBlackburnewordsdeeds 23:57, 27 May 2014 (UTC)
Looking at the uses at User:JohnBlackburne/zhdump, which I've added article links to, there's at least one problem article, Jao Tsung-I who's has first=t, pinyin and Cantonese Romanisations. His bio is varied and interesting, with stints in Hong Kong and Singapore so there's no obvious right way round; it may be based on his personal or professional preferences. It's not obvious the Cantonese Romanisations should go first in this case just because of first=t.
So I've thought of a way of handling traditional and Cantonese first separately, without introducing a new parameter or redoing everything with regions (which anyway not help much with Jao Tsung-I): overload the existing parameter so you can supply other values. I.e. first=t, first=j (for Jyutping) and first=tj or first=jt are all supported. Easiest just to demonstrate it:
  • {{Zh/sandbox|s=中国|t=中國|p=zhōngguó|j=Gwong²zau¹|cy=Gwóngjàu|first=t}}:
  • {{Zh/sandbox|s=中国|t=中國|p=zhōngguó|j=Gwong²zau¹|cy=Gwóngjàu|first=tj}}:
  • {{Zh/sandbox|s=中国|t=中國|p=zhōngguó|j=Gwong²zau¹|cy=Gwóngjàu|first=t|labels=no}}:
  • 中國; 中国; zhōngguó; Gwong²zau¹; Gwóngjàu
  • {{Zh/sandbox|s=中国|t=中國|p=zhōngguó|j=Gwong²zau¹|cy=Gwóngjàu|first=jt|labels=no}}:
  • 中国; 中國; zhōngguó; Gwong²zau¹; Gwóngjàu
So Hong Kong articles might have first=tj/first=jt, Taiwan ones probably first=t. I don't know if first=j would ever be used but it's there if anyone needs it. Obviously it still puts Chinese characters first and everything else after, but I don't see any need for this to change, especially after looking at all the uses of {{zh-full}}, all of which put Chinese first. This can easily be extended to handle more cases.
The third way I suggested, of ordering the fields based on the order of the parameters, is I think now not possible. It would easy enough to code but is incompatible with the visual editor which sorts parameters alphabetically, so would not allow you to put them in a particular order.--JohnBlackburnewordsdeeds 15:48, 29 May 2014 (UTC)

other fields

The other thing I was looking at was other fields; what fields were being used that {{zh}} doesn't support? The answer is one, {{zh-IPA}}, in one article, which I replaced with pinyin. None of the others, so no {{zh-xiao}}, no {{zh-hkgov}}, no {{zh-viet}}. So there is no need to add any of them to this module, at least not based on their usage in {{zh-full}} This ties into #Other Chinese scripts above; I was hoping it would give some indication of which if any other languages of China were already being used in the larger template, but it seems not.

Of them all the only one being used at all outside of {{zh-full}} is {{zh-IPA}}; it's a useful template, providing links to three relevant articles. It's the only one I can see it making sense to add to this template, as an extra field, which would make it easier for editors to find if they want to add IPA. Or it could be tidied up and properly documented, and linked from here. --JohnBlackburnewordsdeeds 19:44, 23 May 2014 (UTC)

There is a complication with how zh-IPA works in that it accepts a second field to switch between different types of IPA such as Mandarin and Cantonese IPA or others. I notice that {{Chinese}} has two fields mi and ci but no others. Are there any other IPA types that could be used? In {{Nihongo}} there is a blank extra field which if added to this module could work something like Chinese: 北京 however I don't see any advantage of this when you could have just done Chinese: 北京; [/pɐk˥kɪŋ˥/]; [/beɪˈdʒɪŋ/]. It would probably be easier just to add mi and ci fields the same as {{Chinese}} has done. Rincewind42 (talk) 15:29, 24 May 2014 (UTC)
Having looked at it a bit more the IPA situation's a bit of a mess. There's a template for general IPA, {{IPA-all}}; {{IPA-wuu}} for Shanghainese/Wu is a redirect to it; there's a separate template for Cantonese/Yue, {{IPA-yue}}; there is none for Mandarin that I can see. There is though {{IPAc-cmn}} which converts pinyin to IPA ('cmn' is the IANA code for Chinese Mandarin; we use 'zh' for legacy reasons).
I'd be happiest leaving {{zh-IPA}} as a separate template; editors can use that or one of the other templates as appropriate. IPA's generally not needed for Chinese as pinyin is all you need for pronunciation and easier to learn than IPA; it's not like English where spelling and pronunciation are very irregular. As it's rarely used and there are different templates it's not obvious what should be added to this template. It was used only once within {{zh-full}} so there's no suggestion there that it needs to be added from that either.
What to do with {{zh-IPA}} is a separate issue but it should probably be either redirected to {{IPA-all}} like {{IPA-wuu}} if it's essentially identical, or moved to {{IPA-cmn}} like {{IPA-yue}}; properly named and added to the list of templates at Template:IPA-all/doc it might be used a bit more.--JohnBlackburnewordsdeeds 15:55, 24 May 2014 (UTC)
I generally agree. I think the best next step from here, after the ordering issue above is finalised, is to look at related templates such as {{CJKV}} which could be changed to run of almost the same code that this template currently uses and thus give it extra features like no links, no labels and ordering. Rincewind42 (talk) 15:04, 25 May 2014 (UTC)

CJKV

I hadn't looked at it before, or at least not recently, but looking at it now {{CJKV}} works almost identically to this one; the latest italic changes here bring it closer. The only differences I can see are lack of language tags there for romanisations (which should be added there), lack of support for Japanese, Korean and Vietnamese here which would be easy to add (including 'if the Chinese and Japanese are the same then combine them - horrible to do in parser code, easy in Lua). So it would be straightforward to merge them.

And there are good reasons for doing so. There's not quite the same need as there is for {{zh-full}}, but apart from the reasons {{zh}} was converted to Lua looking at how its used in some cases it's used where {{zh}} would do, e.g. in Chery A15, while most uses are very similar. As with {{zh-full}} it makes no sense to have two templates being used for mostly the same thing if there's no technical reason to keep them separate.--JohnBlackburnewordsdeeds 15:40, 25 May 2014 (UTC)

Although there is no technical reason, there is a cultural reason why we have {{zh}} and {{nihongo}} as separate templates. Korean doesn't seem to have it's own template. {{CJKV}} joins these together but it doesn't distinguish between Kanji and Kana or Hangul and Hanja. Also it doesn't include Japanese/Korea romanisations. If you combined identical Hanja/Hanzi characters, how would you label it? Also which language comes first? There needs to be a better way to order these. The the parameter bloat might become significant. Look at {{Chinese}} for example. All those options by how often are they used? In the end, though CJKV could be merged with zh, there will need to be two or three separate instances of near identical code. Partly to keep the parameters simple so that people can understand the template, and partly so the various interested groups don't conflict. Rincewind42 (talk) 05:20, 26 May 2014 (UTC)
Just for the record, I've come across numerous cases in the past where people have been upset or offended simply because of the template name. Like the former Yugoslavia and the rest of Eastern Europe, nationalism is kind of a thing in East Asia, and I've seen people getting upset over templates such as {{Japanese particle}} and {{Chinese}}, simply because the template has the word "Chinese" or "Japanese" in it, instead of "Chinese", "Korean", "Japanese", or whatever seems to be the topic of the article. Upset editors often blank or delete templates, or revert template additions, simply because they don't like the name of a template, and that's it; I remember having to make the {{Language particle}} redirect because one Korean editor got oh so offended by the word "Japanese". It's not as simple as things should be.

If we ever decide to use {{zh}} or anything else to replace CJKV templates after merging parameters, I think it would probably be a good idea to create template redirects as well; it's difficult to satisfy the needs of every single editor otherwise. Words such as "Chinese" and "Japanese" are a sensitive political issue in some areas, and edit wars often start over trivial matters such as these. I think the mindset is that if you put the {{Chinese}} template on a Korean topic article, it's claiming that it "belongs to China" or something (though logically speaking, it really shouldn't, it's just a template used to contain multilingual names). --benlisquareTCE 05:45, 26 May 2014 (UTC)

As well as template redirects using Lua offers another possibility: two templates with the same Lua implementation. That's how most of the citation templates work; they call Module:Citation/CS1. You can then supply extra parameters that tell it to do slightly different things (or very different things) depending on which template it invoking it, though in this case they work so similarly already that it should be possible to treat them the same way.
I'm not too worried about this causing problems, such as cultural ones. It not that {{zh}} is for Chinese and {{CJKV}} is for other languages; it's mostly used for Chinese too. In fact some if not most of its uses are just Chinese, where the current {{zh}} is a drop in replacement, the only differences being things in {{CJKV}} now fixed in {{zh}}. Further it's used very few times, 329 or around 1% the transclusion count of {{zh}}, so should be far less disruptive than recent changes to this template.
{{Nihongo}} is quite separate; not only does it share no fields with this one (except I think 'literally') but it works very differently, with a number of minor and major differences in output. If I were converting that to Lua I'd do it separately, i.e. as its own module.--JohnBlackburnewordsdeeds 13:18, 26 May 2014 (UTC)
I should add that having looked at it again there are at least two bugs in the output of {{CJKV}} that need fixing. See Template talk:CJKV#Problems. The problems are precisely the sorts of problems that arise in complex parser code (the extra semi-colon was happening here too, with s = t, before it was converted to Lua). That template's code is particularly tricky though, even worse than this one before it was converted to Lua, so by far the easiest fix would be to convert it to Lua.--JohnBlackburnewordsdeeds 13:41, 26 May 2014 (UTC)

First=tj

Breaking this out into a new section as I think this is ready for rolling out, but I'll summarise it again as it's embedded a few subsections back.

I've added support (to the sandbox) for putting Cantonese Romanisations 'first', that is before Mandarin Romanisations. So Jyutping and/or Cantonese Yale (whichever is specified) go before pinyin, Tongyong pinyin and Wade–Giles (again whichever is/are specified). The thinking is similar to putting traditional before simplified Chinese chars, it won't be used as much but it's there for when editors need it. My survey of the uses of {{zh-full}} suggest it's the only ordering not supported by the current template that there's any need for.

The way it works is by overloading the first= parameter, so as well as specifying first=t an editor can supply first=j, or combine them in one as first=tj or first=jt. If 't' is supplied then traditional Chinese characters go first. If 'j' is supplied then Cantonese Romanisations go first. Anything else is ignored (which allows for future support for other reordering rules using other letters). Here it is in action and, examples have also been added to the testcases:

  • {{Zh/sandbox|s=中国|t=中國|p=zhōngguó|j=Gwong²zau¹|first=t|labels=no}}: 中國; 中国; zhōngguó; Gwong²zau¹
  • {{Zh/sandbox|s=中国|t=中國|p=zhōngguó|j=Gwong²zau¹|first=j, t|labels=no}}: 中國; 中国; Gwong²zau¹; zhōngguó

This is better than other options considered.

  • Having Cantonese first when traditional is first works with most but not with all articles, and would change existing articles perhaps incorrectly.
  • Using regions (HK, TW, CN, SG, etc.) might work better but would be very disruptive and not handle all cases, such as Jao Tsung-I.
  • Adding another parameter would be unnecessary clutter, especially for something so little used.
  • Doing it optionally based on the order parameters are listed, like {{zh-full}}, is incompatible with the Visual Editor.

It doesn't change any existing instances of the template, so is very safe. It's also interesting as perhaps the first change that would be almost impossible to do with parser functions, i.e. without Lua (sure there would be a way but it would be horribly complex).

So I think this is ready to roll out to the main template/module.--JohnBlackburnewordsdeeds 03:15, 30 May 2014 (UTC)

I would prefer that the 'tj' be delimited in some way such as a comma. e.g. 't,j'. This is because at some time in the future you may want to expand the list of accepted values for 'first' and some future attributes might not be single letters. This would future proof the template somewhat. Rincewind42 (talk) 14:43, 30 May 2014 (UTC)
Done. I might have quibbled over this if it were just for the reason given, as 26 letters of the alphabet should enough for as many options in future as we might need. But I think it also helps visually: it's easier to see that there are two things, not one, as "jt" could e.g. be an odd abbreviation for jyutping. The delimeter can be anything non-alphabetic; comma, comma-space, slash, space should all work. The code supports multiple-character specifiers though it only recognises 't' and 'j' at the moment. I've updated the testcases and the examples above.--JohnBlackburnewordsdeeds 15:28, 30 May 2014 (UTC)

Again please update the main module from its sandbox, to effect the changes described immediately above and in more detail in #Cantonese first issues.--JohnBlackburnewordsdeeds 23:23, 31 May 2014 (UTC)

  Done Jackmcbarn (talk) 18:53, 1 June 2014 (UTC)

Template:Zh-full merged

I've finished merging {{Zh-full}} into this one. Adding |first=j support made this possible: it let me convert the two remaining articles using {{zh-full}} to use {{zh}}. preserving the parameter order. The motivation for this was that the Lua implementation of this template made {{zh-full}} redundant, as it was easy to add the features to this that made that no longer necessary. {{zh-full}} had many small problems, mostly ones fixed in {{zh}} both before and after its conversion to a module. It also had unwieldy syntax, was probably slower, and got on poorly with the visual editor. Finally all but a handful of instances of it were unnecessary: almost all could be replaced with {{zh}}. The few extra Romanisations etc. {{zh-full}} supported weren't used, and the only additional ordering was putting Cantonese Romanisations first which {{zh}} now supports.

I also went through all the sub-templates, changing any of those that were being used to {{zh}}, then redirecting them also to this. They were {{zh-chinese}}, {{zh-simp}}, {{zh-trad}}, {{zh-pinyin}}, {{zh-tongyong}}, {{zh-wade}}, {{zh-xiao}}, {{zh-jyut}}, {{zh-yale}}, {{zh-hkgov}}, {{zh-poj}}, {{zh-zhuyin}}, {{zh-lit}}, {{zh-viet}}. They weren't meant to be used outside of {{zh-full}} but most seemed to have one or two uses.

The only one I've left is {{zh-IPA}}. This is used in a handful of articles; it's not been used much as it's hard to find but that it's been used at all suggests there's a need for it. There's already an {{IPA-cmn}} though which is better named and more widely used, so it could perhaps be merged into that.--JohnBlackburnewordsdeeds 21:53, 1 June 2014 (UTC)

Merging Template:zh-IPA to Template:IMA-cmn

I've been looking at this as the last of the sub-templates of {{zh-full}} to be merged, and have posted details on its talk page, Template_talk:Zh-IPA. Posting here as this is the much more watched page/the main discussion page for these templates.--JohnBlackburnewordsdeeds 01:08, 3 June 2014 (UTC)

Italics pinyin

could this be reversed? Pinyin in italic text makes the page looks to busy and uneasy.--Wester (talk) 16:59, 14 June 2014 (UTC)

The template has italicized pinyin since 2009. It's in line with common practice in academic writing and the usual treatment of foreign text in Wikipedia. Kanguole 19:46, 14 June 2014 (UTC)
It would be useful if you could link to the article or articles you think have this problem. In my experience when articles look too busy with e.g. Chinese and Romanisations it's often down to other causes. Too much Chinese is a common one, for e.g. linked items which don't need Chinese and pinyin given (readers can just follow the link). Or the Chinese and Romanisations could be moved into an infobox and minimised in the text. Finally the |labels=no option can be used to dramatically de-clutter second and subsequent uses of the template.--JohnBlackburnewordsdeeds 20:53, 14 June 2014 (UTC)

Capitalising the first letter

There are a few occasions where the zh template is actually used at the beginning of a sentence—such as when inside a table or a list, and so the first letter of the first word should be capitalised in these occasions. However, there is currently no way to do so. The result is that either editors must use the lang template and writing labels in full or else put up with the mess of incorrectly capitalised text. Could another tag be added to change the formatting in such cases. Rincewind42 (talk) 13:39, 9 July 2014 (UTC)

@Rincewind42:: I just got round to this. A simple enough addition, to whatever label is first. It checks if 'scase' (short for sentence case) is set to anything and capitalises the first label if it is. I've added it to the sandbox and a test to Template:Zh/testcases, copied below with another:
--JohnBlackburnewordsdeeds 20:21, 23 July 2014 (UTC)
Looks OK. Rincewind42 (talk) 00:46, 24 July 2014 (UTC)

Please update the module from its sandbox with the above change, as detailed above.--JohnBlackburnewordsdeeds 20:33, 24 July 2014 (UTC)

  Done – Paine Ellsworth CLIMAX! 13:55, 27 July 2014 (UTC)

"See also" addition request

Can someone add the following to the "See also" section?

*{{tlx|Infobox Chinese}} - infobox template supports traditional and simplified Chinese as well as other common romanizations.

to read:

  • {{Infobox Chinese}} - infobox template supports traditional and simplified Chinese as well as other common romanizations.

Thanks. —  AjaxSmack  02:57, 27 February 2015 (UTC)

Done, with some small edits to it and the other entries. You could have added it yourself as it’s part of the documentation page which isn't protected.--JohnBlackburnewordsdeeds 09:53, 28 February 2015 (UTC)
Thanks. I didn't notice that the doc page wasn't locked.  AjaxSmack  22:57, 3 March 2015 (UTC)

Language tagging for pinyin yet again

I know this has come up before (here, for example) but I want to readdress the issue of the language tagging and that the chosen fonts render very poorly when using Firefox. Here's an example from the Xinpi article:

 

As you can see, the tones are barely legible even after increasing the font size and cf the Pe̍h-ōe-jī text which renders just fine.

From previous discussions, I understand that this is a Firefox bug but the problem has been festering for quite a while. Any chance anything can be done on the Wikipedia end? Firefox is a major browser and asking users to edit style sheets or change browsers is a bit excessive. —  AjaxSmack  23:27, 3 March 2015 (UTC)

Had a look myself with Firefox and it looks OK. It's not just a problem with the browser but with the browser and a certain intersection of user settings. I think you need to specify fonts other than the defaults for Chinese, or that's what I recall when it last came up. See Module talk:Zh/Archive 3#Latn problem. It's disappointing it's still not fixed. I submitted a patch for it to Firefox, and I know it's been looked at by other people since but it seems not a priority for Firefox's devs.
I'd be very reluctant to remove this from the template. Firefox users are only a minority of users (17% of Wikimedia), and I assume only a minority of them are experiencing problems. And it would not just be this template, as the same HTML/CSS is output by other templates, such as {{lang}}. The logical fix would be in the site CSS to catch all these instances, but then do we also do it for all the other languages that are rendered incorrectly by Firefox's horribly broken code? Better I think to recommend to users they fix it themselves, by editing their CSS, changing settings, switching browsers or even patching Firefox. Some users might even find it the incorrect version acceptable, depending which fonts it uses (yours seems to be using a bitmap font which is particularly illegible but is not a feature of most modern OSes).--JohnBlackburnewordsdeeds 00:31, 4 March 2015 (UTC)

Pe̍h-ōe-jī

POJ should be nan, not hakWiki Wikardo 04:12, 11 February 2016 (UTC)

You are right. It is not mentioned but it should be I think under zh-min-nan in this doc. which means 'nan' should be used. I have updated the sandbox and it seems to work in the testcases. Can the main module be updated with this change?--JohnBlackburnewordsdeeds 06:03, 25 February 2016 (UTC)
  Done Bazj (talk) 07:38, 25 February 2016 (UTC)

Different traditional and simplified glyphs despite unified Unicode characters

I tried to use this template in the article Tsai Ing-wen to give the different traditional and simplified forms of this person’s Chinese name, as was done in the corresponding article on de.WP. However the result of writing {{zh|t=蔡英文|s=蔡英文|p=Cài Yīngwén}} is “Chinese: 蔡英文; pinyin: Cài Yīngwén” and the HTML includes <span xml:lang="zh" lang="zh">蔡英文</span> where the characters are given only once and lack any markup for script (zh-Hant vs. zh-Hans instead of just zh). Would it be possible to alter this template so that it is possible to give different traditional and simplified glyphs even when the Unicode characters for the two are unified? (In case you are not familiar with this aspect of Unicode read the article Han unification). LiliCharlie (talk) 12:54, 16 January 2016 (UTC)

The module recognises when the traditional and simplified characters are identical, and if they are it combines them as has happened here. This is normal practice in WP articles; it is only useful to give both when they are different, and the template helps with this by eliminating such redundancy. It only does if they are identical (the bit of the script that does it is args["s"] == args["t"] on line 131). The German version of the template has much simpler (non-module) code which does not do this.
What you may be seeing is some difference due to the different fonts your system is using for simplified and traditional. That is I think uncommon though. I do not see it here or on de.wp, and I suspect the same will be true for the vast majority of en.wp users. It will only users with particular settings for e.g. traditional and simplified Characters that will notice any difference, and then the difference will only be in the rendering not the underlying characters. You can change your settings, or use a style sheet to control the rendering of particular page elements here. See User:JohnBlackburne/common.css for some examples.--JohnBlackburnewordsdeeds 14:01, 16 January 2016 (UTC)
Well, the difference I see is certainly not just due to different fonts; I actually use the Source Han Sans font family for all CJK locales, so I see exactly the same glyphs when there should be no difference, and different glyphs when there should be one. Nowadays all major browsers seem to define different locales for at least simplified and traditional Chinese characters (and usually also distinguish between traditional TW and HK, as occasionally even these very similar locales use different glyphs; see here). A lot of East Asian readers are extremely fussy about glyphic differences, and at times (though rarely) they even fail to recognize a Han character rendered in a shape that is uncommon to them. In my case, the traditional and simplified glyphs for the character / differ in each and all of its four components 艹⺼又示, esp. 艹⺼示. Why not give the readers the information that they are different? — Besides, you can’t really rely on Unicode’s (or rather, the IRG’s) unificaton scheme, which often seems quite random. For example and are unified (=one Unicode character) while and [単] Error: {{Lang}}: unrecognized language code: jp (help), which show exactly the same glyphic difference, are not. Unicode’s Han unification is known to be a disputed matter, and even according to the Unicode Standard locale markup is indispensable in cases like these. — P.S.: I already use stylesheets to see whatever I want to see. What I would like to achieve though is that any user gets what they deserve: 100% reliable information. LiliCharlie (talk) 15:29, 16 January 2016 (UTC)
The problem is that you are not giving readers that information, that they are different. If it displays both simplified and traditional and they look the same, as they are the same characters, then probably most users will not notice the duplication (most do not read Chinese) but those who do will be confused over why the same characters appear twice although they are the same, unlike on other pages. I’ve looked at it with three browsers on two different OSes and the simplified and traditional characters look the same. wikt:蔡 says simplified and traditional are the same. That your browser displays them differently must be down to your browser and OS settings. I suspect very few readers of the English WP have similar settings, though it would be very hard to find out. It is not something that can really be addressed in the template/module as it would break how it appears on many other pages, though how many I do not know.--JohnBlackburnewordsdeeds 16:46, 16 January 2016 (UTC)
/ has clearly different glyphs for CN and TW/HK in Unicode’s CJK Unified Ideographs chart (look for U+8521). More examples: / (U+671B, p. 162) and / (U+9F9C, p. 545), which confuses even Chinese scholars. LiliCharlie (talk) 18:47, 16 January 2016 (UTC)
FWIW the glyphs displayed on my PC for the trad and simp are almost identical other than in font weight. As a Chinese speaker/reader, the nuance involved is not a big issue and I'm sure that equally applies to my 1.2 Billion compadres, since we care about where the strokes are, not their weight.  Philg88 talk 07:41, 17 January 2016 (UTC)
Font weight? There should be no differernce in weight between the fonts used for traditional and simplified Chinese. No, what I’m talking about here are structural differences: selection, number and relative position of the strokes. As in vs. . Do you really expect en.WP users to be able to tell that these are “identical,” just variants of the same Unicode character while and are not? What you write makes me believe you are unfamiliar with the basic issues and the idiosycrasies of IRG/Unicode Han unification. — Besides I think that Wikipedia is there for those who want to present or acquire (scientifically) accurate knowledge, not for those who think they don’t need accuracy and can ignore existing differences because as “practical” language users they know better than the experts. LiliCharlie (talk) 08:57, 17 January 2016 (UTC)
This is an issue that we've never really addressed because the difference is generally insignificant except in serious hanzi-related studies, but we probably should. On my (Windows) desktop I see no difference, but now on my MacBook Pro it is correctly displaying the variant forms. The font weight is not an issue here, Phil's machine is just rendering them as different fonts, but they are indeed variant forms.  White Whirlwind  咨  09:48, 17 January 2016 (UTC)

That is interesting. I too am on a MacBook but a fairly old model with an older version of OS X, and am seeing no difference in rendering. It would not surprise me though if Apple supported this with their frequent OS updates which generally keep current with Unicode changes. The latest iOS might be similar.

I still think there is no need to change the template for this, but it is something that is straightforward to do without actually modifying the template. Just change the content of one of the strings without changing the rendering, with e.g. a zero-width joiner:

But I would not recommend this as it will be confusing for other editors not familiar with this obscure piece of markup. It is better to avoid the template altogether, and supply the links and templates for markup yourself:

But because the difference between simplified and traditional characters will still not be apparent to most readers it is probably worth including a footnote in addition to the links.--JohnBlackburnewordsdeeds 15:35, 17 January 2016 (UTC)

This is quite spontaneous and not well thought of:
Another solution would be to create a new template (probably based on this module) for these infrequent cases. The documentation might warn users not to use it unless the glyphs representing the two identical character strings bear a minimum amout of dissimilarity. (I don’t know if it is possible to check for dissimilarity using a whitelist or a blacklist.) And, as you say, the template also warns readers that what they see may not be what they are supposed to see, and possibly also provides a link to a help page. LiliCharlie (talk) 16:54, 17 January 2016 (UTC)

Definitely not a separate/new template. This template+module combines the functionality of a number of previous templates, as having the code all in one place makes it easier to maintain and ensures a consistent style and format for all uses. Since the template switched to using a Lua module there is no longer a technical need to have separate templates (previously the limitations of parser functions made it necessary). It is easy to add here if there is consensus to do so.

So I have added a “nomerge“ option to the sandbox version in the same way as other options, and added some examples to the testcases: Template:Zh/testcases. This one from there demonstrates how it works:

  • {{Zh/sandbox |t=蔡英文|s=蔡英文|p=Cài Yīngwén|nomerge="y"}} gives:

Clearer than using obscure markup but with the same effect. In particular it does not change any existing uses. Have a look at the module sandbox Module:Zh/sandbox for the particular changes. If this seems OK to other editors we can go ahead and add it to the main template.--JohnBlackburnewordsdeeds 23:52, 17 January 2016 (UTC)

What is the purpose of this template? I think it is to convey the Chinese name of the person, book, etc that is the subject of article, for readers who understand characters. It's not to give lessons in typography to people who don't understand hanzi – we have specialist articles for that. In a proper setup, zh-Hant should yield traditional forms, zh-Hans simplified ones and zh the reader's preference between these. (Of course, both fonts will have to do something artificial if they cover all the non-unified variants.) So if the reader sees unified characters in their preferred form, they will know which characters are meant, and the template's job is done. In such cases (and Han unification is rather conservative) the other variant is unnecessary, and this template already produces distracting clutter. However, this doesn't apply to {{infobox Chinese}}, which has more room. Kanguole 01:18, 18 January 2016 (UTC)
I‘m on a more up to date Mac now which does draw the character differently for simplified and traditional, but it is impossible to see the difference unless I make the text size about as large as the browser will let me, and look at the part of the character that is different. It really is a minor thing, not important for describing the subject of the article, about on par with the various Romanisations that appear in {{infobox Chinese}}. Accordingly I've added it to that template where it takes little space.--JohnBlackburnewordsdeeds 03:14, 18 January 2016 (UTC)

Why is the order of {{Zh/sandbox}} input (t < s ) and output (Hans < Hant) reversed? LiliCharlie (talk) 14:43, 18 January 2016 (UTC)

It ignores the order of the parameters that are passed. you can use |first= to override the default ordering. See the template documentation.--JohnBlackburnewordsdeeds 23:22, 18 January 2016 (UTC)
Why have this double reversed logic "nomerge=y". Can we please keep it as "merge=no" (not collapsed) and "merge=yes" (collapsed and default). That way the logic is the same way round as labels=no and links=no (we don't do nolinks=y and nolabels=y).
There seems to be some bug in the scripts implementation for args["s"] == args["t"]. If s=U+8521 and t=U+671B then s=/=t and should return false. So why does the script return a true? Maybe to your eye, s=U+8521 and t=U+671B look incredibly similar and maybe the font author for my font didn't bother drawing the minor difference so on my computer (Win 10 English with no extra packages for fonts) they really look identical, but the computer doesn't know all that. The computer only sees a number code for a character. So how can apple==orange return true? Rincewind42 (talk) 01:56, 25 February 2016 (UTC)
I am assuming we won't go ahead with the merge option as no-one else seems to want it and the problem in the article has been addressed another way. As for the other problem can you provide an example; my test with the characters with those unicode values works fine:
simplified Chinese: ; traditional Chinese:
--JohnBlackburnewordsdeeds 05:43, 25 February 2016 (UTC)
I miss read the unicode numbers that Philg88 had posted so I've struck out my previous comment. I'm now quite sure that this is all just about fonts. If you look at CJK Unified Ideographs chart (Large PDF) for U+8521 you'll see three Chinese characters marked with 蔡 G0-324C, 蔡 HB1-BDB2, 蔡 T1-6E5B. When I look at the Unicode PDF, I can see that there is a small difference in the direction of the stroke on the HB1-BDB2 and T1-6E5B versus the G0-324C. It is a very small difference but it is there. When I copy/paste those characters into a World.doc, the differences remain until I change the font. If I set the font for all of the characters to Microsoft JhengHei then all three render in Word as the HB1 and T1 render on the PDF. If I change the font to Microsoft YaHei, NSimSun or SimSun, then all three render as per G0 on the PDF. I get the same results when testing with U+671B and U+9F9C. In particular I find SimSun's rendering U+9F9C strikingly different form the rendering by of Microsoft JhengHei. It's not just a slight change, there are several extra strokes added and removed. Now I don't have a huge number of Chinese fonts installed but the only way I can get these characters to render as the Unicode PDF file renders them, is to vary my font selection from character to character. Rincewind42 (talk) 16:24, 25 February 2016 (UTC)
Also compare the Unicode chart glyphs for U+57E9 / to U+4E89 and U+722D . Love —LiliCharlie (talk) 17:02, 25 February 2016 (UTC)
Lang-zh/Archive 4
Traditional Chinese蔡英文
Simplified Chinese蔡英文

This is the infobox copied from the article. How does that look? It shows different characters for me, though the change is a very small one which I can only see if I increase the font size by several steps. The font(s) it uses are PingFang SC and PingFang TC, a new font in Mac OS X 10.11. We don’t have a policy on character variants that I am aware of but my own view is outside of articles on Chinese characters we should not bother with them. The vast majority of readers will not notice them. Even people who can read Chinese can surely read the character if it is the 'wrong' variant. Often variations in rendering due to different fonts can be more significant. Unless there is a particular reason for mentioning it, such a logo which uses a particular non-standard variant, it does not need to be mentioned. {{Infobox Chinese}} is exceptional though as it often contains obscure, little used transcriptions which are of little interest, but hidden away in a collapsible box so they don't clutter or distract from the article content. Seems the best place for obscure character variants like this.--JohnBlackburnewordsdeeds 17:18, 25 February 2016 (UTC)

Even though the differences may seem ridiculously small to people who grew up in the West, it is probably true what hsknotes wrote in this comment on Language Log: "... And in Chinese, the font change and simplifications make an arguably far bigger difference than u's becoming w's or th's from þ or even colour being turned into color. Sometimes the medium is the message, or at least is part of it." And this usually applies even to English speakers from the Sinosphere, where a simple font change is often considered a means of conveying identity and political attitude. Love —LiliCharlie (talk) 17:50, 25 February 2016 (UTC)
That comment though is about the significant differences introduced by simplification. But that’s not what’s happening here. It’s not been simplified as it is no simpler. There is according to e.g. wikt:蔡 just one character for simplified and traditional. There is only one unicode address, 8521. They are treated as the same, the differences are so small to be invisible at normal font sizes, smaller than e.g. differences due to the font(s) or other factors.--JohnBlackburnewordsdeeds 18:53, 25 February 2016 (UTC)
The comment explicitly mentions font change though. I know it's hard for Westerners to understand that glyphic differences even of the same character are seen as political statements ("communist/Mao forms"). I also know that most browsers display CJK characters at too small sizes. (My eyesight has become bad, and on my system CJK characters are bigger if language markup is used, that's why I often make edits adding it, in order to be able to read it myself.) Love —LiliCharlie (talk) 19:13, 25 February 2016 (UTC)

Sidney Lau

Firstly, there is a perfectly fine Sidney Lau romanisation page detailing his system. Secondly, the two existing templates for Cantonese Romanisation include one, Jyutping, used nowhere but academic circles, and another, Yale, very popular outside of Hong Kong but of little relevance there. By way of contrast, the Sidney Lau system, for all its faults (which is not to say the other two don't have theirs) is infinitely closer to the reality on the ground, i.e. proper names that one sees everywhere, is certainly extremely popular as the tool for teaching Cantonese to foreigners (it being so much more natural to English speakers and fewer initial barriers to learning) and, unlike the other two, very close to the official government "Standard Romanisation". Third, it is still time-tested: semicentennial this year. Thus I propose the parameter sl be introduced to the template for Sidney Lau romanisation so as to facilitate its inclusion in appropriate places in WP. If there is a consensus and this edit made, I undertake to update the /docs page accordingly (as I attempted yesterday). sirlanz Sirlanz 13:41, 30 June 2016 (UTC)

I've added it to the sandbox and tested it on the testcases page, using 'sl' as you did when you edited the documentation and other obvious parameters. It seems to work fine:
If that looks OK then it’s a straightforward addition to the module that changes nothing else, and is OK to go.--JohnBlackburnewordsdeeds 15:00, 30 June 2016 (UTC)
Wouldn't it be more useful to add this to {{infobox Chinese}}? I think it would be better to pick a small number of romanizations for the inline template. Kanguole 15:32, 30 June 2016 (UTC)
I would add it there also, if it’s not already added. I see no problem adding it here too: there is already a lot of redundancy for most use cases, with e.g. Romanisations other than Hanyu Pinyin barely used. Sirlanz makes the case for it better than I could, except I would add it’s the Romanisation I’m most familiar myself from my time in Hong Kong.--JohnBlackburnewordsdeeds 15:41, 30 June 2016 (UTC)
Code/cases looked good. Synced to sandbox. — Andy W. (talk ·ctb) 17:24, 30 June 2016 (UTC)

Template-protected edit request on 24 August 2016

Could someone please add {{subst:tfm|lang-zh}} to the top of this template as it has been nominated for merging. Pppery (talk) 12:55, 24 August 2016 (UTC)

  Done — JJMC89(T·C) 15:50, 24 August 2016 (UTC)

Template edit request

Please wrap the tfd notice in no-include. This template is included in thousands of articles, often appearing in the first sentence of articles, so the notice is now polluting those articles in a highly visible way. E.g. Hong Kong now beings

  • Hong Kong (‹The template Zh is being considered for merging.› Chinese:...

--JohnBlackburnewordsdeeds 05:23, 25 August 2016 (UTC)

  Done — JJMC89(T·C) 05:49, 25 August 2016 (UTC)
Come on, another incorrect noincluding request. Tfd/Tfm notices are supposed to show up in artcicles, and noincluding is only mentioned in the instructions as a technical hack for substituted templates. Pppery (talk) 14:50, 4 September 2016 (UTC)
The point of the notice is to notify editors of the discussion and that clearly worked, it is one of the the most active TfD discussions I have ever been involved in. Given that, and the disruption it was causing to thousands of articles, I think noincluding it was the obvious thing to do.--JohnBlackburnewordsdeeds 15:11, 4 September 2016 (UTC)
@JohnBlackburne: I agree that the tfd was quite active, however that does not justify noincluding. It is clearly stated in the listing instructions that noinclude tags should only be added for templates designed to be substituted (which this one isn't) Pppery (talk) 16:00, 4 September 2016 (UTC)

See also Template:Template for discussion#Which type should be used?:

"In rare cases, where the insertion of any template is deemed too detrimental to a large number of articles, or if it breaks markup, it might be advisable to disable the notifications completely."

I would say this is one of those cases. The many thousands of articles this is used in is a large number, and it was very detrimental to many them; see the Hong Kong example above.--JohnBlackburnewordsdeeds 16:09, 4 September 2016 (UTC)

@JohnBlackburne: I wasn't originally aware of that section of the template doc, but I disagree that the transclusion of the notice isn't too detrimental to a large number of articles. Many articles transclude the template, yes, but in the article you specified, Hong Kong only transcludes the template four times, which is not enough to justify hiding a notice over. If you think that the notice takes up too much space, just make it smaller by using |type=tiny rather than hiding it. Pppery (talk) 16:37, 4 September 2016 (UTC)

Italic romanisation

Why is the romanisation suddenly in italics? It looks really ugly. Citobun (talk) 15:36, 2 November 2016 (UTC)

Suddenly? (There's no accounting for taste, but I think it looks fine. And it also conforms to English typographic tradition for foreign words.) Love —LiliCharlie (talk) 16:13, 2 November 2016 (UTC)
See the guideline MOS:FOREIGNITALIC for the reason. I think it looks better like that, certainly not ugly. You might try a different browser as some (e.g. Firefox) have problems displaying Non-English text.--JohnBlackburnewordsdeeds 21:51, 2 November 2016 (UTC)
Take Raymond Chan as an example. On my browser the pinyin looks OK. But the Jyutping is stretched and weird and ugly. Is it the same for others? Maybe it was always italics and I didn't notice, but I don't think the Jyutping looked this weird until recently. Citobun (talk) 13:41, 4 November 2016 (UTC)
Not on my browser, but is it perhaps an issue with the lang tags? The pinyin is tagged as "zh-Latn-pinyin", but the Jyutping is tagged as "yue-jyutping" – perhaps it's being rendered with a character font, and a tag of "yue-Latn-jyutping" might avoid that. Kanguole 14:27, 4 November 2016 (UTC)
Though according to the registry, the prefix for jyutping is just yue. Kanguole 14:33, 4 November 2016 (UTC)
Yes, according to the registry yue-jyutping is the correct way to tag it. It does seem odd but I think that registry contains a lot of legacy standards, created at different times, which are not consistent between e.g. pinyin and jyutping. As it’s the standard, as has been part of it since 2010, all browsers should support it.--JohnBlackburnewordsdeeds 09:40, 5 November 2016 (UTC)

Bbánlám pìngyīm

@Citobun, Kanguole, and JohnBlackburne: Sunshine567 has asked me to add Bbánlám pìngyīm to the module. Is there consensus for doing this? Jc86035 (talk) Use {{re|Jc86035}}
to reply to me
11:31, 9 August 2017 (UTC)

I’ve added it to the sandbox in a similar way to other pinyin, e.g.
Chinese: 閩拼方案
I don’t know if it’s a commonly used Romanisation, or at all used, but we don’t actually have a criteria for inclusion, and some of the ones in there are already pretty obscure. If no-one has any objections I’d say put it in.--JohnBlackburnewordsdeeds 12:02, 9 August 2017 (UTC)
If the first principle for inclusion be that the system of Romanisation has a degree of prevalence of use, it may be that Bbánlám pìngyīm fails the test. Only one source is cited for support of its WP entry and we already have the well-established Pe̍h-ōe-jī system for Amoy/Xiamen Hokkien in the template. So I would suggest caution about adding what may be an obscure method. sirlanz 12:09, 9 August 2017 (UTC)
As to the obscurity point, none of those systems in the template now can be described as "pretty obscure" or certainly not in the same league of the obscurity of the proposed new inclusion. Without more input, I would say it's a bad move to add it. sirlanz 13:25, 9 August 2017 (UTC)
It's included in {{Infobox Chinese/Chinese}}, but there are many other romanizations included there but not here, so if this is added then so should the rest of them, I guess. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me
13:48, 9 August 2017 (UTC)
Infobox facilitates appearance of a Romanisation in a very limited way; the template here opens up use throughout article text, so there are good reasons to be very restrictive. Indeed, the real issue here is whether there ought to be a WP:en policy about Romanisation of Chinese or just open slather. Each dialect should be limited to one or, at most, two Romanisation schemes and the scheme(s) ought to be chosen based upon (1) prevalence of use by knowledgeable English readers and (2) ease of comprehension by uninformed readers. We should attempt not to admit Romanisation scattershot across the pages; adding this particular minnow of Romanisation will not assist here. sirlanz 14:31, 9 August 2017 (UTC)
Sunshine567, can you say where you think this will be used, as in in what articles other than Bbánlám pìngyīm? --JohnBlackburnewordsdeeds 13:55, 9 August 2017 (UTC)

Hakka

Is there no Hakka for this template? Szqecs (talk) 15:29, 14 February 2018 (UTC)