Template talk:Lang-sh

Add topic
Active discussions


This language is written in both Latin and Cyrillic, i.e. there's no specific reason to give prevalence to Cyrillic, and this Wiki is written in Latin. If we have to make a choice with regard to how to gear this template, imo we need to go with that. In fact, lead foreign language names in general should be written with English-speakers in mind: if they can be legitimately written in Latin - that ought to be preferred. -- Director (talk) 20:03, 7 January 2014 (UTC)

I'm going by precedent previously discussed at Template talk:Lang-sr. This is why {{lang-sh-Latn}} and {{lang-sh-Cyrl}} exist. --Joy [shallot] (talk) 20:10, 7 January 2014 (UTC)
Well, as you may know, today Serbian allows Latin - but officially prefers Cyrillic: lang-sr is not really a valid precedent. Even so, an argument could be made against the situation over there (but I'll be damned if I'll be the one making it :)). Namely that this Wiki is written for English speakers, who use Latin. Setting the previous arguments aside, one could look at it this way as well: this is the "hbs" template; "h" and "b" prefer Latin, only "s" prefers Cyrillic.
We ought to modify those two sub-templates as well. -- Director (talk) 20:12, 7 January 2014 (UTC)
Modify them how? — Lfdder (talk) 00:01, 8 January 2014 (UTC)
Scratch that, they're fine. -- Director (talk) 03:58, 8 January 2014 (UTC)

Show both cyrillic and latinEdit

Hello! In the sandbox I've made a version of this template that automatically rewrites cyrillic input as both cyrillic and latin. Testcases show the result. Any opinions regarding implementing this change in production version of template? — Dmitrij D. Czarkoff (talktrack) 23:57, 15 June 2014 (UTC)

IMO the bigger problem with Cyrillic is that we're now implicitly italicizing it, which is contrary to MOS:Ety. Maybe the best solution is to implement handling for two parameters, one for Latin, one for Cyrillic. --Joy [shallot] (talk) 09:56, 16 June 2014 (UTC)
@Joy: I updated module behind my proposal, so now Latin text is italicized while Cyrillic is not. — Dmitrij D. Czarkoff (talktrack) 11:01, 16 June 2014 (UTC)
I still see Cyrillic rendered in italics in the second test case. --Joy [shallot] (talk) 18:01, 16 June 2014 (UTC)
@Joy: Each testcase shows three strings: (1) request, (2) rendering with production version and (3) rendering with sandbox version. I see Cyrillic rendered in italics with current template (which does not add Latin text) and in regular with sandbox version (which adds Latin text in italic. Are your observations different? — Dmitrij D. Czarkoff (talktrack) 22:22, 16 June 2014 (UTC)
Ah, that's all right. But the thing you're missing is that we have combined input in the wild. I added a third test case, have a look. --Joy [shallot] (talk) 00:11, 17 June 2014 (UTC)
@Joy: Yes, I know. This needs to be cleaned up around the actual introduction of the change. Or, alternatively, transitional version with parameter switching on and off (default: off) charecter translation can be implemented, together with a backlog category, so that dual-alphabet input could be cleaned up over some extended period of time. After backlog category is clean, parameter would be set to default on, and consequently removed after some grace period. — Dmitrij D. Czarkoff (talktrack) 00:27, 17 June 2014 (UTC)
  Solved in sandbox. I also added custom CSS class "lang-sh-old" to the old syntax, so that it may be easier found on page after modifying user's custom CSS file (example). See test cases. — Dmitrij D. Czarkoff (talktrack) 01:22, 17 June 2014 (UTC)
That makes it as broken as the current template. Not a terribly high standard, that. :) I do have another, separate concern - this seems to mean that the editor has to know the Cyrillic spelling in order for it to be transliterated automatically into Latin. But what about the converse situation? Ideally, do something like add a second parameter, {{lang-sh|Latin text|Cyrl}}, allowing the editor to choose when to output both scripts without having to actually spell it out in Cyrillic. --Joy [shallot] (talk) 18:01, 19 June 2014 (UTC)
@Joy: This is temporary version. Once backlog is cleared, the template would be changed to accept only Cyrillic and print erroneous output on "Latin,Cyrillic" input. Reverse translation is impossible due to ambiguity of "lj", "nj" and "dž", which happens when editors don't use Unicode (eg. "nj", "lj" and "dž"). That is: I can easily bring up a module that converts proper Latin Unicode Serbo-Croatian, but only a handful of editors could use them. — Dmitrij D. Czarkoff (talktrack) 18:17, 19 June 2014 (UTC)
Uh, no, don't do that. Enforcing Cyrillic as the sole input to this kind of a generic template would not only lead to a flurry of redundant edits, but it would present a false impression to editors that this language is primarily written in Cyrillic - which it most certainly is not. You could do that at e.g. {{lang-sh-Cyrl}}, but not here. --Joy [shallot] (talk) 19:18, 19 June 2014 (UTC)
@Joy: OK, then a question: if I made a module for Unicode Latin to Cyrillic transcription and turn this template into {{lang-hbs|c=Cyrillic text|l=Latin text}} with any of |l= or |c= sufficient, would you find it appropriate? — Dmitrij D. Czarkoff (talktrack) 19:48, 19 June 2014 (UTC)
Oh, I completely concur with you that nobody uses Unicode characters for the digraphs lj, nj and dž (nor should we try to make them change). I suppose that, by converting lj into Cyrillic l+j instead of Cyrillic lj etc, IOW introducing occasional errors in the Cyrillic, we wouldn't be doing anything horrible, because one could still fix it with the manual override. But at the end of the day, you'd still have to deal with not only this but all the other use cases of the template, such as when we used Cyrillic for the srpskohrvatski variant, and Latin for the hrvatskosrpski variant of a phrase (see the intro at Yugoslav People's Army for an example). So all in all it seems to be a lot of work for a questionable level of benefit. --Joy [shallot] (talk) 20:35, 19 June 2014 (UTC)


What was the reason to move this template from well known SH to HBS? FkpCascais (talk) 12:12, 16 October 2018 (UTC)

This 2013 move was since reverted in 2020. --Joy [shallot] (talk) 14:13, 8 August 2021 (UTC)

Template-protected edit request on 21 February 2019Edit

Option to use, or version with "abbreviation" parameter for the longer language name, so that template, in this particular case "lang-hbs", with other version such as "lang-sh" (both of which have both -Latin and -Cyrillic script), can give abbreviated "SH:" and/or at least "S-H:", or both, instead only "Serbo-Croatian:" when used under specific circumstance and place, such as in "infobox" for example. ౪ Santa ౪99° 12:38, 21 February 2019 (UTC)

  Not done: please make your requested changes to the template's sandbox first; see WP:TESTCASES. -- /Alex/21 10:57, 22 February 2019 (UTC)

Pipe & article nameEdit

[[Serbo-Croatian language|Serbo-Croatian]] should be replaced with [[Serbo-Croatian]]. The article itself is called Serbo-Croatian. Surtsicna (talk) 18:40, 17 September 2020 (UTC)

Two parameters and "romanized"Edit

This template seems to have a mode where you can have:

{{Lang-sh|Савез комуниста Југославије (CKJ)|Savez komunista Jugoslavije (SKJ)}}

rendered as:

Serbo-Croatian: Савез комуниста Југославије (CKJ), romanizedSavez komunista Jugoslavije (SKJ)

This is improper - both alphabets are native and equal in status, and indeed Latin has long been the more common one. How do we remove this implication that Cyrillic is the 'original' that is then 'romanized'? --Joy [shallot] (talk) 09:40, 31 July 2021 (UTC)

If the two texts are the same, use two templates
{{lang-sh-Cyrl|Савез комуниста Југославије (CKJ)}}, {{lang|sh-Latn|Savez komunista Jugoslavije (SKJ)}}
Serbo-Croatian Cyrillic: Савез комуниста Југославије (CKJ), Savez komunista Jugoslavije (SKJ)
{{lang-sh}} more-or-less expects that the Latin alphabet is more common because it renders its first argument in italics (has done since its creation as {{lang-hbs}}).
Trappist the monk (talk) 11:10, 31 July 2021 (UTC)
Trappist the monk ah, so you're saying this invocation is actually wrong - because MOS:BADITALICS? But how can we prevent people from using it? Even if we swapped the two parameters, the result is bad:
Serbo-Croatian: Savez komunista Jugoslavije (SKJ)
The 2nd parameter doesn't even get rendered. This leads me to believe there's code somewhere underneath {{lang}} that autodetects non-Latin characters in the 1st parameter, and only then prints the 'romanized: ...2...'? How do we then get this template to throw an error if the 2nd positional parameter is used at all? --Joy [shallot] (talk) 11:29, 31 July 2021 (UTC)
Yes, MOS:BADITALICS because using {{lang-sh}} for text written with Cyrillic script is not correct. Italic rendering can be disabled:
{{lang-sh|Савез комуниста Југославије (CKJ)|italic=no}}
Serbo-Croatian: Савез комуниста Југославије (CKJ)
Why should we prevent editors from using this template? It is perfectly valid for text written with Latin script which you have noted is the more commonly used form. Swapping the two parameter values is nonsensical because, while Cyrillic-script text may be a transliteration of Latin-script text, Cyrillic-script is not a romanization of Latin-script text.
Yes, at line 1162, Module:lang inspects the content of {{{1}}} (args.text); if wholly Latin script then it does not render {{{2}}} (args.translit). This because it doesn't make sense to romanize Latin-script text.
I'm not inclined to change anything. Module:Lang supports some 780-ish {{lang-??}} templates; adding special-case code for this one opens the door for special case code for that one and that one and ... No thank you.
Trappist the monk (talk) 12:11, 31 July 2021 (UTC)
One thing that you might do is create a template that takes two arguments |cyrl= and |latn= to render both scripts with {{lang-sh|<Latin-script text>}} and {{lang|sh-Cyrl|<Cyrillic-script text>}}. You can invoke Module:Unicode_data function is to make sure that |latn= has only Latin script and similarly that |cyrl= is not Latin script:
{{#ifeq:true|{{#invoke:Unicode data|is|Latin|Savez komunista Jugoslavije (SKJ)}}|Latin|not Latin}} → Latin
{{#ifeq:true|{{#invoke:Unicode data|is|Latin|Савез комуниста Југославије (CKJ)}}|Latin|not Latin}} → not Latin
mixed scripts in either of |cyrl= and |latn= will be treated as non-Latin:
{{#ifeq:true|{{#invoke:Unicode data|is|Latin|Savez комуниста Jugoslavije (CKJ)}}|Latin|not Latin}} → not Latin
A third parameter |first= or some such that takes one of two keywords, cyrl or latn, could be used to select which of Cyrillic- or Latin-script text is rendered first.
Trappist the monk (talk) 15:01, 31 July 2021 (UTC)
No, I don't mean prevent people from using the whole template, but only the bad invocation form with 2 parameters which doesn't get rendered well.
Can we use this invoke.Unicode data conditional in this template instead, to do something useful? Not as a special case in the lang template itself but simply here, before transcluding that one. --Joy [shallot] (talk) 15:54, 31 July 2021 (UTC)
I don't think that we should prevent the legitimate use of {{{2}}}. I don't know what that legitimate case might be, and perhaps you don't either, but if or when that legitimate case arises, {{lang-sh}} should support {{{2}}} so that the template remains consistent with all of the other {{lang-??}} templates.
The problem, as I understand it, is that editors commonly want to place Latin-script text adjacent to the equivalent Cyrillic-script text. To do that, they concoct a variety of schemes that include the misuse of {{lang-sh}} as you have described. Some of those schemes are legitimate, others are not. You can see some of them in these search results. This is why I suggested the new template to handle both Latin and Cyrillic when the Latin is not a romanization of the Cyrillic.
This may also be an issue with {{lang-sr}} and perhaps others. See Template talk:Lang-sr § Latin text not "romanisation". The new template might be made to support multiple languages where the Latin-script text is not a romanization. Are there others? Pinging LeoC12 for comment.
Another parameter to consider for a new template would be some sort of |separator= parameter to specify how the Latin-script text and the Cyrillic-script text are joined in the rendering.
Trappist the monk (talk) 17:19, 31 July 2021 (UTC)
What do you mean by preventing legitimate use of the 2nd positional parameter in this case? I thought we already concluded from the aforementioned examples that it does not apply here - as it is implemented now. Certainly if the implementation of {{lang}} could be amended to not imply 2nd parameter as transliteration (as in, something foreign), that would be good, but still this particular template would always have to set that option. --Joy [shallot] (talk) 21:16, 31 July 2021 (UTC)
You wrote: I don't mean prevent people from using the whole template, but only the bad invocation form with 2 parameters which doesn't get rendered well. To me that says that you want to prevent the use of {{{2}}} in {{lang-sh}}. I have never agreed that that is or should be an option. This template should operate in the same way that all other Module:Lang-based {{lang-??}} templates operate because there may be cases where {{lang-sh}} may legitimately use {{{2}}}. I said before that I don't know what that legitimate case might be but if that legitimate case exists, {{lang-sh}} should support it.
Trappist the monk (talk) 23:07, 31 July 2021 (UTC)
Well, that's the core issue, the way this is implemented, a legitimate use case for "romanized" doesn't exist. If for example we could find some Serbian term from e.g. 1300s that was always written natively in Cyrillic, and Latin was foreign to all of its users, that would be primarily written in Cyrillic and its Latin form would be a romanization that should be marked as such. But the premise precludes this already - no such thing has been the case by definition, because Serbo-Croatian first started to be standardized in the 19th century with an understanding that neither alphabet can be exclusively primary. There is simply no use case for this kind of thinking here. --Joy [shallot] (talk) 09:06, 2 August 2021 (UTC)
To do what I proposed earlier, in Module:Lang/utilities I have created the core of what might be a new template, perhaps {{lang-x2}}. In its basic form, it correctly renders both texts:
{{#invoke:Lang/utilities|lang_x2|sh|Savez komunista Jugoslavije (SKJ)|Савез комуниста Југославије (CKJ)}}
Serbo-Croatian: Savez komunista Jugoslavije (SKJ), Савез комуниста Југославије (CKJ)
to swap the rendered order:
{{#invoke:Lang/utilities|lang_x2|sh|Savez komunista Jugoslavije (SKJ)|Савез комуниста Југославије (CKJ)|swap=yes}}
Serbo-Croatian: Савез комуниста Југославије (CKJ), Savez komunista Jugoslavije (SKJ)
to change the separator (recognizes , (default separator), ;, and /:
{{#invoke:Lang/utilities|lang_x2|sh|Savez komunista Jugoslavije (SKJ)|Савез комуниста Југославије (CKJ)|separator=/}}
Serbo-Croatian: Savez komunista Jugoslavije (SKJ)/Савез комуниста Југославије (CKJ)
or, insert a string of text as a separator – string must begin and end with matching single (') or double (") quote marks:
{{#invoke:Lang/utilities|lang_x2|sh|Savez komunista Jugoslavije (SKJ)|Савез комуниста Југославије (CKJ)|separator=' {{red|or}} '}}
Serbo-Croatian: Savez komunista Jugoslavije (SKJ) or Савез комуниста Југославије (CKJ)
you can identify the non-Latin script:
{{#invoke:Lang/utilities|lang_x2|sh|Savez komunista Jugoslavije (SKJ)|Савез комуниста Југославије (CKJ)|script2=Cyrl}}
Serbo-Croatian: Savez komunista Jugoslavije (SKJ), Савез комуниста Југославије (CKJ)
[[Serbo-Croatian language|Serbo-Croatian]]: <i lang="sh">Savez komunista Jugoslavije (SKJ)</i>, <span title="Serbo-Croatian-language text"><span lang="sh-Cyrl">Савез комуниста Југославије (CKJ)</span></span>
Trappist the monk (talk) 13:53, 3 August 2021 (UTC)
The core code now supports the standard parameters supported by {{lang}} and the {{lang-??}} templates. Most require enumeration but some, like |label=, |link=, |cat=, |nocat= are not enumerated because they apply to both renderings or only to the first rendering.
The default label can be hidden:
{{#invoke:Lang/utilities|lang_x2|sh|Savez komunista Jugoslavije (SKJ)|Савез комуниста Југославије (CKJ)|script2=Cyrl|label=none}}
Savez komunista Jugoslavije (SKJ), Савез комуниста Југославије (CKJ)
<span title="Serbo-Croatian-language text"><i lang="sh">Savez komunista Jugoslavije (SKJ)</i></span>, <span title="Serbo-Croatian-language text"><span lang="sh-Cyrl">Савез комуниста Југославије (CKJ)</span></span>
The default label can be changed to something else:
{{#invoke:Lang/utilities|lang_x2|sh|Savez komunista Jugoslavije (SKJ)|Савез комуниста Југославије (CKJ)|script2=Cyrl|label=[[League of Communists of Yugoslavia]]}}
League of Communists of Yugoslavia: Savez komunista Jugoslavije (SKJ), Савез комуниста Југославије (CKJ)
[[League of Communists of Yugoslavia]]: <i lang="sh">Savez komunista Jugoslavije (SKJ)</i>, <span title="Serbo-Croatian-language text"><span lang="sh-Cyrl">Савез комуниста Југославије (CKJ)</span></span>
it is possible to independently fidget with italics:
{{#invoke:Lang/utilities|lang_x2|sh|Savez komunista Jugoslavije (SKJ)|Савез комуниста Југославије (CKJ)|italic1=no|italic2=yes}}
Serbo-Croatian: Savez komunista Jugoslavije (SKJ), Савез комуниста Југославије (CKJ)
[[Serbo-Croatian language|Serbo-Croatian]]: <span lang="sh" style="font-style: normal;">Savez komunista Jugoslavije (SKJ)</span>, <span title="Serbo-Croatian-language text"><i lang="sh">Савез комуниста Југославије (CKJ)</i></span>
I'll create the template soon, probably tomorrow.
Trappist the monk (talk) 21:58, 4 August 2021 (UTC)
Trappist the monk (talk) 17:15, 5 August 2021 (UTC)
So, is it ready to be applied now? --Joy [shallot] (talk) 14:09, 8 August 2021 (UTC)