Template talk:Unichar

Latest comment: 4 days ago by JMF in topic cwith handling generally
WikiProject iconWriting systems Template‑class
WikiProject iconThis template falls within the scope of WikiProject Writing systems, a WikiProject interested in improving the encyclopaedic coverage and content of articles relating to writing systems on Wikipedia. If you would like to help out, you are welcome to drop by the project page and/or leave a query at the project’s talk page.
TemplateThis template does not require a rating on Wikipedia's content assessment scale.

Off-topic request for information edit

No symbol#Unicode combining character has this interesting gem:

The Unicode code point for the prohibition sign is U+20E0 COMBINING ENCLOSING CIRCLE BACKSLASH. It is a combining character, which means that it appears on top of the character immediately before it. Example: Putting W⃠ will display the letter W inside the prohibition sign: W⃠ (if the user's system handles it correctly, which is not always the case).

On my system (ChromeOS), I see a lovely "circle with backslash" overlaid in the 0 C as produced by {{unichar}}. Applause!!! But for W⃠ I get a W and (next to it) a square. BOOH!!! So how is {{unichar}} managing to produce the rabbit from the hat when plain html falls on its face? John Maynard Friedman (talk) 19:01, 13 September 2022 (UTC)Reply

Current version requires the plain hex value for the character. In this case: the circle. |1=20E0. Then, to effectuate the combining, there is |cwith= as straight character (usually ◌ -dotted circle-). I understand: |cwith=W. Together:
{{unichar|20E0|cwith=W|SOMENAME}}U+20E0 W⃠ COMBINING ENCLOSING CIRCLE BACKSLASH
{{unichar|20E0|cwith={{dotted circle}}|SOMENAME}}U+20E0 ◌⃠ COMBINING ENCLOSING CIRCLE BACKSLASH
{{unichar|20E0|SOMENAME}}U+20E0 COMBINING ENCLOSING CIRCLE BACKSLASH
AFAIK, there is no solution for the bad overlap I see in the result. (One could try |size= or |image=). Hope tyhis is enough to stop you crying. -DePiep (talk) 19:19, 13 September 2022 (UTC)Reply
I'm still not seeing a "No Ws Allowed!" sign, even with cwith=. No real harm done, I was just curious to know how it was done. Ok, I accept, a good magician never reveals how his tricks really work!  . --John Maynard Friedman (talk) 19:36, 13 September 2022 (UTC)Reply
 
In Ffox I see the circle half positioned over the W (RH half), and overlapping with regular text L and R (hex text and name). My sandbox unichar version also has this issue. Hence the sizing issue I mentioned. In my (underdeveloped?) Chrome, I see W+a_square.
A solution would be: upload as a single image. Here is a animal for comfort after this setback. DePiep (talk) 19:52, 13 September 2022 (UTC)Reply

Proposal: use Template:Char edit

Would it be good to place the character itself in {{char}}? jlwoodwa (talk) 06:43, 9 July 2023 (UTC)Reply

Although generally keen on char, I'd need to be convinced in this case. Char is used to "isolate" a glyph under discussion from the associated running text. In the output of unichar, that is usually clear.
The only argument in favour that I can see is that, at present, unichar identifies the glyph by increasing its size and maybe the faint box used by char would be better? But conversely magnification makes it easier to "read".
Did you have a particular case that provoked the proposal? 𝕁𝕄𝔽 (talk) 07:51, 9 July 2023 (UTC)Reply
It's clear to anyone who's familiar with the format, but I'm not sure it's as clear to a general reader, especially one who doesn't know what the "U+ stuff" means. I haven't noticed any specific problems that this would solve, I just think it's good to have a consistent format for "inline character literals" on Wikipedia. jlwoodwa (talk) 08:19, 9 July 2023 (UTC)Reply
So how would we handle this example: U+20E0 COMBINING ENCLOSING CIRCLE BACKSLASH (which is already not handled terribly well). Likewise, Asiatic scripts present issues that don't occur to those of us only familiar with alphabetic scripts. A lot of development work has gone into this template to deal with these issues so changing it would not be trivial, given the need to verify many many test cases and rewrite to resolve anomalies. Annoyingly, one of the recent main developers, user:DePiep, is no longer available to advise. --𝕁𝕄𝔽 (talk) 10:20, 9 July 2023 (UTC)Reply
seems to work just fine. I understand the difficulty of modifying such a convoluted and widely-used template, though. Since it sounds like it's not obviously a bad idea, I'll try the "obvious implementation" in the sandbox, and give an update here when it's working. jlwoodwa (talk) 10:35, 9 July 2023 (UTC)Reply
on Chrome, the symbol overruns the box (or the box underruns)... 𝕁𝕄𝔽 (talk) 13:43, 9 July 2023 (UTC)Reply
... but then again it overruns the last digit of the codepoint right now. --𝕁𝕄𝔽 (talk) 13:45, 9 July 2023 (UTC)Reply

Combining diacritics are displaying as tofu on Android - fault may be in cwith= handling? edit

I don't know if this is new? The argument cwith=◌ or cwith=◌ is used heavily to display combining diacritics. I'm editing in Android right now and the symbol displays correctly. But in articles like diacritic, it is has more tofu than a Japanese restaurant. Is there a style serif somewhere that is blocking the last resort substitution? --𝕁𝕄𝔽 (talk) 13:22, 21 September 2023 (UTC)Reply

No, it is not unique to Unichar, that just happens to be where it first saw it. Diacritic doesn't even use unichar, it just uses a dotted circle and combining diacritic directly, thus ⟨◌́⟩. As it is a general problem, I will take it to Wikipedia:Village pump (technical). --𝕁𝕄𝔽 (talk) 13:37, 21 September 2023 (UTC)Reply
No solution suggested, it is an implementation defect in Android. So unless someone has a back-channel to Google, we just have to grin and bear it. --𝕁𝕄𝔽 (talk) 16:33, 22 September 2023 (UTC)Reply
Further discussion has revealed that the problem is due to deficiency in the system default sans-serif font. The workaround is to use serif and I have started to do that with success on "freestanding" cases. But {{unichar}} is heavily used so we really need a fix to it, please? --𝕁𝕄𝔽 (talk) 16:32, 23 September 2023 (UTC)Reply

Template enhancement needed, please edit

Requirement: when cwith=◌ is invoked, wrap the output in <span style="font-family: serif">{{1}}</span>, where {{1}} is the sequence dotted circle + combining diacritic. Is there a doctor in the house? --𝕁𝕄𝔽 (talk) 16:32, 23 September 2023 (UTC)Reply

Will this work {{unichar |0301 |combining acute accent |cwith=◌|use=script|use2=serif}} U+0301 ◌́ COMBINING ACUTE ACCENT, I just extended the template to support "serif" as a use2 param if you set use as "script." This might work also. {{unichar |0301 |combining acute accent |cwith=◌|use=script|use2=noto}} U+0301 ◌́ COMBINING ACUTE ACCENT Andre🚐 20:03, 23 September 2023 (UTC)Reply
Yes, that would work. I hate to be ungrateful but to employ that solution would create a lot of work, many many articles would to be updated to use it – and, when Google discards Roboto as default sans font, would all have to be undone again. AFIK, this is the only use-case for cwith=◌ so it would not have any deleterious effect elsewhere (and would be easy to back out). [BTW, we couldn't have use2=noto because it would break Bing and Safari.] --𝕁𝕄𝔽 (talk) 22:11, 23 September 2023 (UTC)Reply
Ok, I made the change to Template:Unichar/glyph, but let me know if it doesn't look right and I'll revert it. Andre🚐 23:51, 23 September 2023 (UTC)Reply

Misaligned diacritics edit

Can anyone explain (better still fix) this phenomenon:

  • U+0360 ◌◌͠ COMBINING DOUBLE TILDE , a tilde diacritic that spans a pair of adjacent characters: ◌͠◌ no markup: ◌͠◌

Just using the characters directly puts the diacritic in the right place but unichar fails (placement is offset). (At least when using Chrome on Chromebook).

  • U+0301 ◌́ COMBINING ACUTE ACCENT is ok. ◌́

𝕁𝕄𝔽 (talk) 16:42, 22 September 2023 (UTC)Reply

|cwith=◌◌ puts the dotted circles before the diacritic, but the diacritic is supposed to be between them. I don't know how it should be fixed though. — Eru·tuon 19:21, 22 September 2023 (UTC)Reply
Ah, of course. Obvious really. <blush> There are very few of these two-character diacritics so I don't really see it being worth anyone's while hacking the template to fix it. I'll just add a note to the documentation to say it doesn't work, handcrafting is required. --𝕁𝕄𝔽 (talk) 19:37, 22 September 2023 (UTC)Reply
I have added this text. It is not quite right, the display of the U+0360 is not exactly as produced by the template but does it matter?
** Note that cwith=◌◌ does not provide the desired result if the intention is to display a diacritic that spans two characters (such as those in the range U+035C to U+0362): the diacritic will be offset. In such cases, editors must emulate the template output by hand, because the correct HTML sequence is "first-character + combining-diacritic + second-character". Thus, for example, to show the combining double tilde U+0360, write U+0360 &#x25cc;&#x0360;&#x25cc; then (in {{small}}), COMBINING DOUBLE TILDE. This produces U+0360 ◌͠◌ COMBINING DOUBLE TILDE.
Comments (better still, direct edits to improve) welcome. --𝕁𝕄𝔽 (talk) 20:24, 22 September 2023 (UTC)Reply
Really this needs a "print this instead" for the character. All this size/font/cwith stuff could be put into that instead of trying to fool the automatic text generator into producing the desired result. Spitzak (talk) 21:50, 23 September 2023 (UTC)Reply
Sorry, I don't follow. Rather than spend time explaining, would you write the alternative text please? Here or in the doc. --𝕁𝕄𝔽 (talk) 22:14, 23 September 2023 (UTC)Reply
I meant that there could be a parameter, perhaps show, so that if invoked with show=foobar then instead of showing the character it shows "foobar". This could then contain any wiki or html markup desired and any trick needed to get the character to be correctly visible. In this example it would contain the two circles and the combining diacritic. Spitzak (talk) 00:08, 28 December 2023 (UTC)Reply
I think it does have a param does something similar, or it did 3 months ago. Andre🚐 00:15, 28 December 2023 (UTC)Reply
Hmm, a double parameter could be introduced to change the order of the output. Andre🚐 19:50, 24 September 2023 (UTC)Reply

Question on Error on off-Wiki edit

I've copied all the relating templates and modules to our wiki, and I've checked them a few times over, but it keeps giving me the following error:

I wrote:
└> "The character {{unichar|a9|COPYRIGHT SIGN}} is about intellectual property."
It should write:
└> "The character U+00A9 © COPYRIGHT SIGN is about intellectual property."
but gives me:
└> "The character Error using {{unichar}}: Input "a9" is not a Hexadecimal value. is about intellectual property."

I don't understand why it does this. Not sure if I should ask this here or somewhere else, but thought to try it here first. Kind regards,  Rodejong  💬 ✉️  23:15, 18 December 2023 (UTC)Reply

That is a charset encoding issue probably. Or something to do with your wiki's installation of php. U+00A9 © COPYRIGHT SIGN works fine here, as you can see. Andre🚐 00:16, 28 December 2023 (UTC)Reply
Thanks for answering. I'll ask the hosting guys to look in to that then. Kind regards,  Rodejong  💬 ✉️  00:53, 28 December 2023 (UTC)Reply

Enhancement request: sanity check or lazy invocation edit

At Copyright sign, a vandal changed {{unichar|25|Percent sign|html=}} to {{unichar|26|Percent sign|html=}}. No error was generated, though inspection shows that the name doesn't match the new, wrong, glyph. The template really should do a sanity check that the name actually matches the code-point and display an error status if not. For familiar glyphs like % and &, it is obvious but not if it is a j

Better still, don't ask for any text, indeed ignore any provided. A simple {{unichar|25}} should fetch the official name and not expect editors to do make-work.

Is there a template doctor in the house? 𝕁𝕄𝔽 (talk) 19:53, 2 April 2024 (UTC)Reply

It seems this has fallen through the cracks. I'm going to see if I can wrangle a modification to this template that will simply allow one to print the canonical Unicode name for a given code point. I would prefer it being the default or only behavior, but I am curious is this would be a problem for anyone. Remsense 12:58, 5 April 2024 (UTC)Reply
To my mind, anything but the canonical name is at best finger trouble. The family nlink= is there when the WP:common name and the canonical name don't match. As in U+005E ^ CIRCUMFLEX ACCENT ({{unichar|005E|circumflex accent|nlink=carat}} 𝕁𝕄𝔽 (talk) 18:00, 5 April 2024 (UTC)Reply
The issue being, it seems we need a data module of 150k entries that the module has to be searched every time—if we want to prevent vandalism, anyway—and that's about three orders of magnitude more entries than I've seen a module on here work with, so I am worried by the potential server load. Remsense 18:16, 5 April 2024 (UTC)Reply
Maybe WP:village pump/technical could advise? But it is not really a search when you already have the index and just want to fetch the record that matches that index. 𝕁𝕄𝔽 (talk) 18:24, 5 April 2024 (UTC)Reply
Doy, you're completely right on the latter point. Had the current flowing the wrong way in my brain there. I'll poke the pump. Remsense 18:27, 5 April 2024 (UTC)Reply
Well, that was easy!!!!!!!!!!!!!! {{Unichar/sandbox}} seems to work perfectly well. Thank you so much @Cryptic for lending some lost, cold, and confused lexicographers a helping U+2F3F ⼿ KANGXI RADICAL HAND Remsense 21:03, 5 April 2024 (UTC)Reply

The sooner we can put this live, the better. There's a lot of it about! (Kudos to Nickps for spotting this one in such a high-profile article but such basic stuff should't depend on eagle eyes to keep clean.) --𝕁𝕄𝔽 (talk) 10:33, 7 April 2024 (UTC)Reply

I am not sure of a particular reason why it can't, I just didn't want to be rash about doing so. It's not like it was a particularly technical change, if you'd like to do the honors? Remsense 10:38, 7 April 2024 (UTC)Reply
I'm happy to be the one to do it but you'll have to tell me how. 𝕁𝕄𝔽 (talk) 12:42, 7 April 2024 (UTC)Reply
Oh! Apologies for assuming everyone else is the one I should be asking how to do things. I've done it. Remsense 12:54, 7 April 2024 (UTC)Reply
The template should certainly ignore the text given but maybe we should start with a green warning to say that the template has done so. One like the error message you get if you accidently type firdt=John in a CS1/2 citation. We could do it silently and let those who have been taking advantage of the failure to check come and read the (to be revised) documentation which will tell them that the free text field is no more. 𝕁𝕄𝔽 (talk) 12:55, 7 April 2024 (UTC)Reply
Yes I can do that also, great idea. Remsense 12:57, 7 April 2024 (UTC)Reply
Revising the doc, I noticed that calling the template with no text generated just omitted it. I can't see why anyone would want to do that but we had best add a name=none option? 𝕁𝕄𝔽 (talk) 13:10, 7 April 2024 (UTC)Reply
I think it's nice to have just because I often am too lazy to tab to a template's documentation so I try all the things (=none? could it be =false? how about =no? Surely it will no longer confound me if I try =""—there we go!) Remsense 13:13, 7 April 2024 (UTC)Reply
Well we could just cheat and regard any input to name= as an instruction to omit. Who is ever going to use if to mean yes. --𝕁𝕄𝔽 (talk) 13:38, 7 April 2024 (UTC)Reply
This is usually the pragmatist's move with a binary parameter. I swear there's a thing that lets you check all the ways a user wants to say no or yes to something. Remsense 14:09, 7 April 2024 (UTC)Reply
I probably don't deserve praise for that one considering I'm the one who made the mistake in the first place [1] but thanks, I guess. Nickps (talk) 11:06, 7 April 2024 (UTC)Reply
Of course you do! It's never too late to make things right. Remsense 11:08, 7 April 2024 (UTC)Reply

Override option needed edit

See

In Unicode, the majuscule Ƣ is encoded in the Latin Extended-B block at U+01A2 and the minuscule ƣ is encoded at U+01A3.[1] The assigned names, "LATIN CAPITAL LETTER OI" and "LATIN SMALL LETTER OI" respectively, are acknowledged by the Unicode Consortium to be mistakes, as gha is unrelated to the letters O and I.[2] The Unicode Consortium therefore has provided the character name aliases "LATIN CAPITAL LETTER GHA" and "LATIN SMALL LETTER GHA".[1]

Right now, we have

  • U+01A2 Ƣ LATIN CAPITAL LETTER OI

We need a alias= as in alias=LATIN CAPITAL LETTER GHA , as suggested by Chatul at the Village Pump. There are a very few such cases where an error was made in the original standard that will never be changed. --𝕁𝕄𝔽 (talk) 13:49, 7 April 2024 (UTC)Reply

Will start this right now alongside the other thing. Remsense 14:10, 7 April 2024 (UTC)Reply
I think it would be ok for arg 1 to continue to work. Instead find all the invocations of this template and remove arg 1 unless it is actually necessary.Spitzak (talk) 19:07, 8 April 2024 (UTC)Reply
In principle, you are absolutely right – but in practice that would be a huge task, wildly out of proportion to the tiny number of cases where the Unicode Consortium admits it made an error. This is the most practicable solution to this specific problem. Meanwhile, ignoring the supplied 2= in favour of the canonical text resolves immediately the rather more cases of spelling errors and vandalism. --𝕁𝕄𝔽 (talk) 20:25, 8 April 2024 (UTC)Reply

Temporary reversion needed edit

@Remsense: we forgot the many instances of uses like this: {{unichar|2120|Service mark|nlink=} which now fail U+2120 SERVICE MARK because there is no such article as SERVICE MARK. Do'oh! --𝕁𝕄𝔽 (talk) 21:23, 8 April 2024 (UTC)Reply

Revert done: I'm working on the aliases as we speak also Remsense 21:26, 8 April 2024 (UTC)Reply
Which is now working:
{{Unichar/sandbox|1A2}}U+01A2 Ƣ LATIN CAPITAL LETTER OI
{{Unichar/sandbox|1A2|alias=yesgivemethealias}}U+01A2 Ƣ LATIN CAPITAL LETTER GHA
What should we do about this? It does say such use of |nlink= is deprecated. Should we clean it all up somehow? Remsense 21:38, 8 April 2024 (UTC)Reply
I have seen a lot of nlink=<blank>, indeed I confess to have been a major perpetrator – "monkey see monkey do". It works (worked) and there was (is?) no error message to say No data supplied with nlink=, ignored. So we need ...
first: a list of articles that use nlink= with no data, so that someone (aka me, since I know many of them are my fault) can go round and correct them. [I believe that the template already has such an exceptions report, though whether anyone has been checking since DePiep got canned must be doubtful.) Then we can reinstate the change.
second, add some code to say (for all the optional parameters), No data supplied with <param>=, ignored
PS sorry to have dropped the bombshell and not been around until now to help with the cleanup; officially I was otherwise engaged and shouldn't have been in a position to spot the error. <blush> --𝕁𝕄𝔽 (talk) 23:01, 8 April 2024 (UTC)Reply
My "first" wouldn't be needed if the current interception of nlink=<blank> were changed so that it linked to the U+XXXX or the target character rather than some name? Which adds support to the question of "do we even need nlink= ?". --𝕁𝕄𝔽 (talk) 23:58, 8 April 2024 (UTC)Reply
Don't apologize at all! Nothing about this is particularly burdensome. I am leaning towards linking to the character itself, are there cases where this is going to break? Remsense 00:03, 9 April 2024 (UTC)Reply
So, do you think directly linking to the character itself is the best move? That's where I am presently unless there are edge cases (e.g. I can think of high-range code points and non-printable ones, and maybe we can define those manually). Remsense 02:26, 9 April 2024 (UTC)Reply
yes, see below. 𝕁𝕄𝔽 (talk) 08:30, 9 April 2024 (UTC)Reply
The |nlink= default is now also working:
{{Unichar/sandbox|1A2|alias=yes|nlink=}}U+01A2 Ƣ LATIN CAPITAL LETTER GHA Remsense 13:47, 9 April 2024 (UTC)Reply

Do we even need nlink=

Say: we have a lot of technical redirects, why can't we just add U+XXXX as redirect format to a given page? Remsense 21:43, 8 April 2024 (UTC)Reply
As in, U+2120 now redirects to Service mark symbol, as already did . This seems like a pre-solved problem. Remsense 21:49, 8 April 2024 (UTC)Reply
It looks to be a neat solution. The only catch that I can see is that these U+XXXX aren't well watched and may be subject to vandalism. It is not an obvious vector for a "bad actor" so I guess it is a reasonable risk. The problem is that the attack won't be obvious and someone following a link to a Gardiner's sign list entity will have no idea how it happened. --𝕁𝕄𝔽 (talk) 23:01, 8 April 2024 (UTC)Reply
Are there any cases of nlink=target-name#section-name? I can't think why there would but if it is possible (as it is), someone somewhere will have done it. <sigh> --𝕁𝕄𝔽 (talk) 23:58, 8 April 2024 (UTC)Reply
I would say if necessary, the redirect page itself can link to a given section, if I'm understanding properly? Remsense 00:04, 9 April 2024 (UTC)Reply
Yes, that makes sense. I can't see any other reasonable possibility. 𝕁𝕄𝔽 (talk) 07:43, 9 April 2024 (UTC)Reply
Though there are cases where the nlink goes to a broad concept article (such as Gardiner's sign list) when there is no specific article. So nlink=<something other than one codepoint> is certainly valid and useful.
So to solve the current problem, we just need to change the behaviour of nlink=<nothing> so that it links to the target character article rather than its Unicode name. As you proposed already, I think? But we can't dispense with nlink= completely and just link everything willy-nilly since many codepoints (e.g., Chinese characters) don't have their own articles. --𝕁𝕄𝔽 (talk) 08:11, 9 April 2024 (UTC)Reply


Testcases

As a template editor, I find it helpful, when people point out exceptions and cases like this, to put them in the testcases page so that future editors do not have to remember them. – Jonesey95 (talk) 21:52, 8 April 2024 (UTC)Reply
Which testcases? I'm planning on ensuring there's an adequate library of them there once I'm done with this round of updates. Remsense 21:54, 8 April 2024 (UTC)Reply


Per above...is there actually a purpose to being able to set a custom link rather than create easter eggs? I say we just have it link in most cases to Ƣ i.e. the page for the character itself most of the time. Remsense 21:57, 8 April 2024 (UTC)Reply

Almost there edit

Great to see it working again, thank you. Just one left on the to-do list, I think?

  • name=none so that {{unichar|0123|name=none}} produces just plain U+0123 ģ

I need to document alias=yes: I will copy Unicode#Alias. --𝕁𝕄𝔽 (talk) 14:48, 9 April 2024 (UTC)Reply

And there you are: {{Unichar|1A2|alias=yes|name=none}}U+01A2 Ƣ Remsense 15:15, 9 April 2024 (UTC)Reply
It looks a lot like the use of the alias can be automatic, by just checking the alias database and using it instead of the real one if there is an entry. Is there a reason you did not do this? Spitzak (talk) 09:44, 10 April 2024 (UTC)Reply

Anomalies edit

Problems as I discover them

  • U+002E . [[.|FULL STOP]] ({{unichar|002E|Full stop|nlink=}}) misbehaving. OTOH, {{unichar|002E|nlink=Full stop}} behaves as it should. --𝕁𝕄𝔽 (talk) 19:42, 9 April 2024 (UTC)Reply
Knew I should've just looked at the page that definitely exists where they tell me what characters can't be used as article titles. Remsense 19:44, 9 April 2024 (UTC)Reply
Some you win, some you lose. I just came back to say it must be something to do with that character because these work:
{{unichar|002A|Asterisk| nlink= }}, {{unichar|0023|Number sign |nlink= }} --𝕁𝕄𝔽 (talk) 20:01, 9 April 2024 (UTC)Reply

Refs edit

References

  1. ^ a b "Unicode chart" (PDF).
  2. ^ "Unicode Technical Note #27: Known Anomalies in Unicode Character Names".

Cwith= and non-latin script edit

The Nepalese rupee sign, रू uses the combining diacritic technique of

  • U+0930 DEVANAGARI LETTER RA + U+0942 DEVANAGARI VOWEL SIGN UU.

Unfortunately, {{unichar|0930|cwith=ू}} produces

  • U+0930 ूर DEVANAGARI LETTER RA (A dog's breakfast).

Can anyone fix? 𝕁𝕄𝔽 (talk) 16:18, 21 April 2024 (UTC)Reply

I see that it is also a problem with latin script. In the example of "q with circumflex" below, the template fails to align the circumflex correctly over the q. --𝕁𝕄𝔽 (talk) 18:52, 21 April 2024 (UTC)Reply
The cwith character is printed first. Also you should not try to use this to show a character that is not a single code point. Spitzak (talk) 08:03, 22 April 2024 (UTC)Reply
Ah yes, of course. The general solution is your response to the next question. 𝕁𝕄𝔽 (talk) 08:24, 22 April 2024 (UTC)Reply

cwith handling generally edit

Suppose that somewhere there exist a letter q with circumflex, q̂. Before we enhanced the template to assert the canonical name (and only the canonical name), it was possible to write {{unichar|0071|cwith=̂|Latin small letter q with circumflex}} and get U+0071 LATIN SMALL LETTER Q WITH CIRCUMFLEX. Which of course was false: U+0071 is a common or garden q. The new arrangement is questionably better, producing U+0071 ̂q LATIN SMALL LETTER Q, which is a different kind of lie: the grapheme shown is not U+0071 and it is not (just) a Latin small letter q.

So I would like to propose that, when cwith=<combining diacritic>, we expose that fact in the description.

  • Thus, for example, {{unichar|0071|cwith=̂}} should produce U+0071 q LATIN SMALL LETTER Q with U+0302 ̂ COMBINING CIRCUMFLEX ACCENT : q̂

Comments? 𝕁𝕄𝔽 (talk) 18:50, 21 April 2024 (UTC)Reply

Cwith should be limited to only the dotted circle.
I do think the should be a simple "print this instead" argument to replace all the size, font, IMG, and cwith stuff. Spitzak (talk) 08:07, 22 April 2024 (UTC)Reply
Yes, I agree that the dotted circle should be the only valid option. Perhaps way back in the early developments, it also supported a coloured block to show the various forms of space character? These are now hardcoded but I guess there are too many combining diacritics to do the same here too.
I will revise the documentation accordingly.
As for all the other bells and whistles, it would take a full search of existing usage to determine where and why they are used. That is not a trivial task. 𝕁𝕄𝔽 (talk) 08:34, 22 April 2024 (UTC)Reply
I have revised the documentation to formally restrict the base character to ◌ and to deprecate any other usage. Please review.
When someone has time to revise the template, can this restriction be enforced, please? --𝕁𝕄𝔽 (talk) 10:27, 22 April 2024 (UTC)Reply
{ { unichar|0302|cwith=q}} produces U+0302 COMBINING CIRCUMFLEX ACCENT. Spitzak (talk) 10:28, 22 April 2024 (UTC)Reply
True, but should it? As per your earlier comment (with which I agree), the template should only produce real code points. --𝕁𝕄𝔽 (talk) 16:27, 23 April 2024 (UTC)Reply

More detailed request for development edit

The only legitimate character to use to display a combining diacritic is the dotted circle. So I propose that

  • cwith= is redefined to mean "circle with".
  • The preferred syntax is cwith=yes
    • cwith=◌ and cwith=&#x25CC; are accepted alternatives.
  • Any other argument is flagged as an error.

Is that reasonable? --𝕁𝕄𝔽 (talk) 16:27, 23 April 2024 (UTC)Reply

Is it possible to determine it is combining from the unicode info database? If so maybe just ignore the field entirely and use that. Spitzak (talk) 07:15, 25 April 2024 (UTC)Reply
Do we know how/whether that would work with non-Western scripts? Interestingly (at least on ChromeOS), this Devangari combiner comes with dotted circle out of the box: U+0942 DEVANAGARI VOWEL SIGN UU. I don't know how typical that is. --𝕁𝕄𝔽 (talk) 17:10, 26 April 2024 (UTC)Reply