Template talk:Engvar

Latest comment: 9 years ago by SMcCandlish in topic List of options (ENGVARs)
WikiProject iconLanguages Template‑class
WikiProject iconThis template is within the scope of WikiProject Languages, a collaborative effort to improve the coverage of languages on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
TemplateThis template does not require a rating on Wikipedia's content assessment scale.

Use standard language codes edit

This template and anything related to it need to use standard, predictable, memorable ISO language codes, like en-GB and en-AU, not made-up fake ones like en-UK and en-AUS. People should not have to memorize non-standard variances to get templates to work. I tried to just fix this, but someone reported that it broke something, so there appears to be a dependency somewhere. All that should be needed is to make en-gb and en-au (after they've been lower-cased by a parserfuction) be the real parameter names, with en-uk and en-aus aliases for them respectively (e.g. {{{en-gb|{{{en-uk}}}}}}). I thought I'd done that, but it was dark:30 and I must have erred.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  06:43, 7 November 2014 (UTC)Reply

I believe the code in Template:Engvar/sandbox will resolve the matter, though it's unclear to me why it would need to retain "en-uk" and "en-aus" at all, except in the switch. I'm guessing there's some down-stream dependency, doing something with the specific values "en-uk" and "en-aus". If so, these should be tracked down and corrected to handle en-gb and en-au. If the sandbox code works, the reverted changes I made to the /doc can probably simply be unreverted after the sandbox code is made live, rather than re-edited by hand.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  06:50, 7 November 2014 (UTC)Reply
You changed the parameter names in the template. Why are you surprised that that breaks templates in articles? -DePiep (talk) 00:03, 8 November 2014 (UTC)Reply
  • SMcCandlish, I'm thinking of redesigning the template as a whole. The logic is a unnecessary complicated. Of course that should include your proposal to use ISO only. (I made this one a while ago). Are there any other templates for this need, you know of? I find it strange that it is used rarely. More later. -DePiep (talk) 14:20, 9 November 2014 (UTC)Reply
Proposal

Template rebuilded from scratch. Proposed future documentation:

"Template {{engvar}} can be used within templates that can appear on pages in multiple ENGVAR languages (varieties). It adjusts language to the variety as is used in that article. (Template language follows article language).
E.g., Template:Example uses the word vapor/vapour (en-us/en-uk) encoded and is used in five en-us articles and in five en-uk articles. In one article the template must show "vapor", in the other "vapour". For this, the template Template:Example has parameter |engvar= that can be set |engvar=en-us or |engvar=en-uk in any article. The template has a default variety set.
In the template code: |label1={{engvar|vapor |en-uk=vapour}} (here, the en-us word is the default)
In an article: {{example|engvar=en-uk |otherdata=...}} → shows the word vapour."
Notes:
1. We only use the correct identifiers. If necessary, articles will be edited upon introduction. (SMcCandlish, I ask you check me for any wrong language identifier I write here. I'm not that familiar with it).
2. Without prejudice or opinion on variety preference, that template should have a default variety spelling, so the |engvar= setting is only required to alter from that default.
3. Can/should the template also cover this situation, about grouping varieties: say it has en-us (default) and en-uk defined. Is there a general way to group say "en-sa" into en-uk? Can we hardcode this? (so the article can specify |engvar=en-sa and the template shows the en-uk word by rule). Or is this per word? In that case, the template:example code should be: {{engvar|engvar=en-uk}} (we need a better example for this, a three- or four-variety situation)
4. SMcCandlish, can we use the sandbox for this development? -DePiep (talk) 15:53, 9 November 2014 (UTC)Reply
Fine by me; my sandbox stuff was just a test.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  00:32, 17 November 2014 (UTC)Reply
I'm working on this. But I do think that the formal code for British English is en-UK, not en-GB (see ISO 3166-1 alpha-2). The template will recognise both though, because this helps the editor. -DePiep (talk) 14:13, 16 November 2014 (UTC)Reply
I did see ISO 3166-1 alpha-2; the code is "en-GB" (see Language localisation, and various external sources). The 3166-1 alpha-2 code "uk"/"UK" was "reserved" as an alternative by request of the UK so that it doesn't confusingly get assigned to something else (Ukraine, or some future United Kingdom of Central Africa or whatever). The code "uk"/"UK" is used only a) for the gTLD ".uk" and as the two-letter code the EU uses internally (it does the same with "EL" for Greece ([Ellas] Error: {{Lang}}: unrecognized language code: gk (help)) instead of the official ISO "GK", because the EU prefers to use member nations' names for themselves instead of exonyms like "Greece").  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  00:32, 17 November 2014 (UTC)Reply
Aaaargh, ISO has a standard for every taste. OK, it will be "en-GB" for British English formally (and en-UK accepted). -DePiep (talk) 00:52, 17 November 2014 (UTC)Reply

Maximum code flexibility edit

Even if the template accepts "en-UK", "en-AUS", "en-CAN" as legacy input, it should use as the actual parameter, output, and documented codes "en-GB", "en-AU", and "en-CA" as they're the proper ISO 3166-1 alpha-2 codes. Other valid values are en-US, en-NZ and (if we wanted to get that fine-grained) "en-ZA" for South Africa, "en-IE" for Ireland, "en-IN" for India, "en-HK" for Hong Kong, "en-SG" for Singapore, "en-BZ" for Belize, etc. I would definitely include at least US, UK, CA, AU, NZ, ZA, HK, and IE at a minimum. In theory, we should accept a value for any country where English is an official language. After handling ISO 3166-1's proper codes, it wouldn't hurt to also support formally ISO-reserved alternatives (I think "UK" is the only one for a country in which English is a major language), as well as (also common but incorrect in the context) three-letter ISO 3166-1 alpha-3 country codes like "USA", "GBR", "CAN", "AUS", because of their familiarity and due to the paucity of people who understand the difference between usage of these codes. Probably even add support for IOC and FIFA three-letter country codes in the rare cases they differ from the ISO ones, because they're actually more familiar to many people (due to sports broadcasts) than the ISO ones. Once in a while they conflict with each other (see Comparison of IOC, FIFA, and ISO 3166 country codes), but I'm not sure this happens in any relevant cases. We shouldn't allow "en-SA" as a mistake for South Africa, as that really means Saudi Arabia.

I would proceed along this flowchart of sorts:

  1. Is English an official language there?
  2. Add the ISO 3166-1 alpha-2 code as the "real" one; document that "SA" does not mean "South Africa", but "ZA" does.
  3. Add the ISO 3166-1 alpha-3 code as an alternative. Maybe no need to individually document it, but mention in the docs that these codes, as a class, will work.
  4. If there's a variant one, add the IOC code as an alternative. Probably no need to individually document it, but mention in the docs that IOC codes work. In the unlikely case of a conflict with ISO (i.e., ISO gives the same IOC code to a different country), mention that this specific IOC doesn't mean what people might think it means (ISO trumps IOC).
  5. If there's a variant one, add the FIFA code as an alternative. Probably no need to individually document it, but mention in the docs that FIFA codes work. In the unlikely case of a conflict with ISO or IOC, mention that this specific one doesn't mean what people might think it means (ISO trumps IOC, which in turn trumps FIFA).

Then we should proceed to make sure that similar flexibility is applied to other templates that could use it, {{lang}}, the {{lang-xx}} templates, flag templates (they probably already have this covered, at least for three-letter codes), etc. The goal being to not make everyday editors have to memorize the difference between these coding systems; if they can figure out even one of them of it should be enough.

An argument could even be made to support the full country names as input.

I don't think we should go so far as to support sub-national codes (e.g. for Wales, Scotland, etc.), except as alternatives that output the main national code (e.g. if someone tries to use FIFA "en-WAL" or ISO_3166-2:GB's sub-national "en-WLS" for Wales, it should output "en-GB". If we go that far, the codes are: England: ENG (ISO 3166-1 alpha-3, FIFA); Northern Ireland NIR (ISO 3166-1 alpha-3, FIFA); Scotland SCT (ISO 3166-1 alpha-3) & SCO (FIFA); Wales WLS (ISO 3166-1 alpha-3) & WAL (FIFA) (Scotland), Isle of Man IM (ISO 3166-1 alpha-2), IMN (ISO 3166-1 alpha-3), GBM (ISO 3166-1 alpha-3), IOM (postal abbreviation); Jersey JE (ISO 3166-1 alpha-2), JEY (ISO 3166-1 alpha-3), & GBJ (ISO 3166-1 alpha-3); Guernsey GG (ISO 3166-1 alpha-2), GGY (ISO 3166-1 alpha-3), GBG (ISO 3166-1 alpha-3); Alderney GBA (ISO 3166-1 alpha-3). Some uncommon ISO 3166-2:GB reserved codes are EAW (England and Wales), GBN (Great Britain) and UKM (United Kingdom), but we can probably ignore them.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  00:32, 17 November 2014 (UTC)Reply

Most of this already is in there. PLease take a look at the current documentation. There is a list of accepted user (articel-side) input. Internally the key will be "en-US" codes. There is no output (the code will not appear in the article).
This template is not to organise the whole coding, as you describe. It only needs to cover the national varieties that are used in this enwiki. Already I found some fifteen, see the /doc list. Recognition by name is already live now. These are already known by various categories and templates (see HELP:ENGVAR) for this wiki.
Side solutions are needed for these: Oxford British, internally named "en-OED"; IUPAC definitions (key "en-IUPAC"), and how with Scottich English? Found no XX code, so I use "en-SCOTLAND" for now. -DePiep (talk) 01:01, 17 November 2014 (UTC)Reply

List of options (ENGVARs) edit

Below is a list of Engvar id's (identification of National variants of English), as proposed base for what Engvar should recognize & use. (The list currently used live is: here).

The table will evolve (change) over time, following talked points on this page.
Internal key
en-[ISO, alpha-2]
Language en-[ISO, alpha-3] Alt input* Note
 
en-US American English en-USA America*, United States*, US
en-AU Australian English en-AUS Australia*
en-GB British English en-GBR British*, UK, en-UK British Oxford will yield en-OED, see below
en-CA Canadian English en-CAN Canada*
en-EI Hiberno-English (Irish-English) en-IRL Hiberno*, Ireland*, Irish*
en-HK Hong Kong English en-HKG Hong Kong*
en-IN Indian English en-IND India*
en-JM Jamaican English en-JAM Jamaica*
en-MW Malawian English en-MWI Malawi*
en-NZ New Zealand English en-NZL New Zealand*
en-NG Nigerian English en-NGA Nigeria*
en-PK Pakistani English en-PAK Pakistan*
en-PH Philippine English en-PHL Philippine*
(en-SCO) Scottish English Scottish*, Scotland* Scots (is not en-SCO), Scotch
en-SG Singapore English en-SGP Singapore*
en-ZA South African English en-ZAF South Africa* not en-SA (Saudi Arabia)
en-TT Trinidadian English en-TTO Trinidad*, Tobago*
(en-OED) British English Oxford spelling
British (Oxford) English
Oxford English Dictionary (OED)
Oxford*, OED* So only "OED" or "Oxford" are identifying this one.
(en-IUPAC) IUPAC spelling en-IUPAC
IUPAC spelling US ? – seen used elsewhere; to be researched
not English Not identifying
not en-SA "SA" is Saudi-Arabia, not South Africa
  • An code in brackets ( ) is not in the ISO-definition but is adopted here as id.
  • Spaces, hyphens, brackets are ignored (are removed before comparing). No need to add "... English" for identification.
  • * Alt names with an asterisk (*) are an all-catch: all input "Nigeria", "Nigerian", "Nigerian English" will hit.
  • Sources and useful links:
  • The code en-[ISO 3166 alpha-2] is used internally as the identifing Engvar name. It may be used as input. The code en-[ISO 3166 alpha-3] and alt names may be used as input by article editors, and will be recognized & read as the identifier.

Discussion edit

  • National variants not used in this enwiki are not listed.

-DePiep (talk) 10:20, 17 November 2014 (UTC)Reply


Following en-, country identifier, first one that exists:
1. ISO 2-letter country code
2. ISO 3-letter code (solves Scotland)
3. IOC code
4. FIFA code
5. Wikipedia internal styles & standards, rules & needs (OED, IUPAC)
Recognise: all codes will be recognized, unless they are conflicting somehow (to be added & checked).
 SMcCandlish, what do you think? -DePiep (talk) 10:35, 17 November 2014 (UTC)Reply
If I were a chimp, I'd give four thumbs up. Heh. Belize/BZ and Barbados/BB should be included, probably. There are probably a bunch of other Caribbean ones we're missing, and African (e.g. Nigeria/NG). Might be worth checking if there's a general Caribbean English code. ISO's approach to treating language as innately, intimately tied to nationality (and WP's consequent following in this direction) is actually linguistically unsound. (For example, British Columbian Canadian English, aside from high incidence of -our and -re spellings, has more in common with Pacific Northwest and Alaskan American English, than with Ontario Canadian English.)

"Scots" should be a not item like SA; Scots is not a variant of English, but a closely related language like Frisian, and has its own Wikipedia. While Irish is also a separate language, no one is ever confused that it's an English variant, but they often are with Scots. The use of "Scots" to mean "Scottish" is an archaism that only survives in a few phrases like "Scots-Irish ancestry". For the same reason, we don't need to support "Scotch" as input either (only used in some stock phrases like "Scotch whisky" and "Scotch doubles tournament".).

Actually, I'd propose eliminating the Scottish/Scotland option except as an alias for en-GB. We want to avoid the implication that any little sub-national variation should have its own entry, otherwise Cornish nationalists will want Cornish English, then the Americans will expect Southern English, New Englander, West Coast English, etc., etc. If we do include any at all, I wouldn't do more than support Scotland/Scottish/SCT/SCO, Wales/Welsh/WLS/WAL, Isle of Man/Manx/IM/IOM/GBM as aliases for en-GB, and Northern Ireland/Northern Irish/NIR as aliases for en-IE. Those places are countries/nations in some but not all legal senses. An argument can be made for en-IM as its own entry, because it has its own ISO 3166 country code, but linguistically that's iffy; for WP purposes there's no difference between Manx and mainland British English, because we wouldn't be using Manx dialect words like stubbin in WP articles except to explain what they mean in context. This argument could possibly be applied to all Caribbean dialects, honestly. Not even the article Jamaica is really written in Jamaican English (much less Jamaican Patois, which WP would treat as a separate language, as it has it's own orthography).

We should probably also include NI as a not entry (it's Nicaragua, not Northern Ireland or Nigeria, and English is not an official or even large minority language there).  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  20:44, 18 November 2014 (UTC)Reply

  • Thanx, I've put the thumbs up on my wall - all four
     
re Belize English/en-BZ, Barbados English/en-BB, (Carribean English, en-??): only if it has articles in this enwiki. See: ENGVAR articles we have (below).
re Nigeria missing - ? en-NG is in the list.
re "Scots" out - done. Is not en-SCO (and we keep "Scotch" out).
re remove "Scottish English": I disagree, see: ENGVAR articles we have (below).
re NI not an entry - OK; en-GB-N-Irl is not asked for so far
  • About ENGVAR articles:
The list we are compiling is not for a Wikipedia article (though your texts could go in there). We only need these Xx English languages, national variants of English (=ENGVAR), that are used in real WP articles. So far, I have found templates and categories that list those ~20 listed here. this cat, and these templates. That is why I hesitate for Barbados and Belize: no articles see so far.
  • New news: your ISO links &tc helped me a lot. Now SCO is not independent, so ISO 3166 has nothing for them (no en-XXX for us). But the IOC uses SCO. Good enough for us.
  • I have adjusted the working table. Internally, we'll use the en-XX codes (plus our own ones; I'll make those Qx-codes, yesss, that's how professional we are!). Editors input can be broader, as said ("Canada" will do for Canadian English/en-CA &tc.). Later more. -DePiep (talk) 22:31, 18 November 2014 (UTC)Reply
  • About our private solutions & codes. We need codes for "countries" that are not in this ISO. I propose, see ISO_3166-1_alpha-2#QM: Use the series QMQZ as private use. We can define & use them within our template. Propose:
en-QS = Scottish English
en-QO = Oxford English (OED)
en-QP = IUPAC English (actual need to be researched)
We do? -DePiep (talk) 22:35, 18 November 2014 (UTC)Reply
  • Implemented: refined list of en-variant codes: use 'en-GB' not 'en-UK'; use 'en-SCO', more ISO-alpha3 options. [1]. (Not done (yet?): IOC or FIFA alpha3 codes; IUPAC complicated language (like en-IUPAC-US); some disadvised codes removed; and other open discussion points here. -DePiep (talk) 21:34, 16 December 2014 (UTC)Reply
I'd forgotten about this, been busy off-wiki. Good job, I heartily approve. :-)  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  10:00, 21 March 2015 (UTC)Reply