User talk:ChristTrekker/UnicodeSymbol

My comments

Latest comment: 11 years ago14 comments4 people in discussion

Allow me to write in telegraph style.

Concept: good plan to use specialised font selections per Unicode character group (whichever group that is). Good to aim for a single complete template that covers all.

Commons.css encoders (those who have to enter and maintain the font set per class) might not be happy with:

-Non-stable groups, class-names and font-sets. IOW: they [class-names and font-sets] should work well in most/all of the browsers, they should not change too often (any more), and they should not be too many especially if they can be correctly grouped (e.g., when some different groups can share a font set). Also the number of classes can become an issue (it could grow to 100 or 200).

User input (editors input): use #switch to its best: it allows many inputs. Still ambiguous input should give a warning.
Class naming: I suggest to use all lc and use underscore "_" to suggest a space character. So that would be "unicode_emoticon" not "UnicodeEmoticon". I think the prefix "unicode_" is fine, maybe a shorter one can be found.

-When it applies, use the proper formal Unicode block name or the Unicode script name or via. These are well defined. If there is such a name, there is no need to introduce a new name.

-If block or script name does not cover the characger set ("card suits", "ipa"), first look if Unicode does use a subsection title in a block. "card suits" is not a block name, but is is a section title in block "Miscellaneous symbols" [1]. So then name the class "unicode_card_suits".

-If that not covers it, find a good short descriptive name. (I wouldn't use abbreviations like "PoliReli" or "poli_reli"). Remember you have to convince high level editors who do not like/need more jargon.

The process of introducing the classes. Now this is a more difficult part. It is nigh impossible to get the font set right first time, let alone for 100 or 200 classes. Unless you found a commons.css editor who will enter you first proposal of ~100 classes with preferred font sets ;-). I suggest you contact the class editors by now, ask what and what not they can implement.

-My approach was (at least, in theory) is to first use subtemplates per fontset (your class name) that has the fonts written down. Using <span style="font-family:...>" (minor start in {{script}}, with Cyrl example). These can be edited, tested, and when stable turned into a class. One by one.

Another thought: naming the classes is overseeable. Another issue can arise when character sets are to be combined (into one class). We definitely do not want knots in the future. I have no solution for this.
Of course the class should have the fallback font(s) too, and maybe some generic ones like for diacritics ("common script").
Question for commons.css editors (maybe you are): how does the "lang" affect this? I have the impression that browsers (HTML, CSS) try to wringe a font (-family) out of a language. That would turn the table upside down.

That's all ;-) Have a nice edit. -DePiep (talk) 17:01, 12 December 2012 (UTC)Reply

This is very good feedback; thank you. I've posted a RfC at mediawiki talk:common.css. ⇔ ChristTrekker 17:48, 12 December 2012 (UTC)Reply

I welcome the cross-referral from Mediawiki talk:Common.css#improving unicode support - RfC for significant changes but share User:Edokter's reservations at Template talk:Unicode#suggestions for extending/improving this template outside the BMP.

As a starting point, a web browser ("User Agent" or "UA") is expected to make its own arrangements if a glyph/character is not available in any of the specified font-families:

"If [in the last resort] a particular character cannot be displayed using this font, then the UA may use other means to determine a suitable font for that character. The UA should map each character for which it has no suitable font to a visible symbol chosen by the UA, preferably a 'missing character' glyph from one of the font faces available to the UA."
– http://www.w3.org/TR/CSS21/fonts.html#algorithm

In practice, some browsers give up more easily than others:

If I recall correctly, Internet Explorer (at least in older versions) ignores installed fonts if they are neither listed as a specified font-family nor set in the browser settings as the default font the specified script.
On the other hand, I believe that other browsers will search a wide range of installed fonts to find a matching glyph if none of the font-families and default fonts contains a glyph for the relevant character.

In the former case, a CSS helper template such as this may help. But I think browsers are increasingly reluctant to give up so easily! So it is questionable whether there is justification for the complexity of introducing and maintaining the proposed extended template (and its potentially numerous new classes).

In the latter case, a CSS helper template would not generally be necessary (unless it was to cover the exceptional cases where a browser's automatic fallback match identified an installed font containing inferior glyph to one available in another installed font).

There are numerous Unicode scripts and character blocks. Some blocks contain more than one script, and some scripts cover more than one block. Moreover, fonts often omit some characters even in the scripts and blocks for which the fonts are intended! Obscure Unicode characters are often omitted from fonts that cover neighbouring characters. So mapping character ranges to supported fonts is extremely hard to do accurately and comprehensively.

Based on the above understanding, I would question whether a broad-based approach would be worthwhile.

Instead, I would suggest identifying:

which browsers currently cause problems (only older versions of IE?); and
which individual characters or character sets affect enough articles or are problematic enough to justify a template and/or site-wide CSS changes.

Otherwise, there is a risk of trying to tackle hypothetical problems at the expense of addressing real ones.

It would be worth noting how MediaWiki supports a range of solutions for displaying mathematical and scientific symbols, using conversion of characters to PNG images or (as a recent user preference option) MathJax in JavaScript to provide proper font support. Some symbol rendering could probably be solved by applying <math>...</math> tags.

— Richardguk (talk) 00:24, 13 December 2012 (UTC)Reply

Thanks. I learn. Wow. Such reasoning should not be in a sub page. -DePiep (talk) 02:30, 13 December 2012 (UTC)Reply

Firefox (for example) does do a pretty good job of finding glyphs among the installed fonts for characters it needs to display. The problem, even in this case, is that this font substitution may not be ideal. Let's say you come across several characters in the u+1f3xx range, fairly newly encoded and thus poorly supported. The font specified for the text does not support them, but you actually have two fonts installed that do. The browser substitutes FontX's glyph for the first one and FontY's glyph for the next. Hooray, you can see them, problem solved—right? Not if the font characteristics and design are wildly different. They look odd together. What you didn't know is that FontY contained both symbols. By specifying FontY for all, e.g., "astronomy" characters, you get a consistent appearance that is more pleasing.

Nobody is saying this template needs to be used in all situations. I actually recommend against that. If you have a one-off occurrence of a rare character, sure, just rely on the browser's font-substitution capabilities. (Even that may not be necessary. Following the previous example, if the "astronomy" character used is u+2600, the default font probably supports it. No need to substitute because it's not missing. Specifying a different font than the default would, in this case, actually be counter-productive.) As long as it doesn't display as the "missing character symbol" it's good enough. Maybe I never said explicitly, but part of my reason for initiating this was because I wanted to rectify some of the jarring font substitutions I've noticed. I follow Unicode, and fonts. These are areas in which I can contribute a bit on WP, so perhaps I tend to come across these situations more than most editors do. It just looks bad when the em size and x-height are jumping around all over the place, and style of glyphs is inconsistent.

Also, please note that I'm not following a range- or block-based approach for character support. Very early in this discussion I had considered that, but I'm not convinced it's the best idea. I'm well aware that many fonts do not provide coverage for entire blocks. Trying to provide that comprehensive support "piecemeal" as we're attempting here is not going to be simple. Even surveying the handful of fonts I have, for the couple dozen character groups I have, I know it's tedious and imperfect. Yet I think the result is still beneficial. ⇔ ChristTrekker 14:55, 13 December 2012 (UTC)Reply

Some of the groups seem arbitrary; for example, ⟨⌘⟩ is used on keyboards, ⟨⏨⟩ in ALGOL, and ⟨📼⟩ as an emoji, and yet they are all in the "technology" group. Useful groups are those with characters likely to be used together, like {{Script}} (and its subtemplates), {{Music}}, and {{IPA}}. I agree with DePiep's points: we must identify groups that would actually be used. Gorobay (talk) 23:14, 31 December 2012 (UTC)Reply

(Apologies for not replying sooner; I hadn't noticed the comment.) Yes, ⌘ is used on computer keyboards, ⏨ is used in a computer programming language, 📼 directly represents a piece of technology. Groupings like "technology" might not be quite as objective as the ones for "plant" and "food" but I think they'd still be useful. Perhaps it's a bit too generic or overly broad, I guess that's open for debate. On the other hand, I already have about two dozen groups—how many is too many? Remember, the main point is to suggest a font that would contain the symbols instead of leaving it completely up to the UA's font substitution: Segoe UI Symbol contains 117⁄125 of them (for technology) so that's a win. ⇔ ChristTrekker 14:27, 9 January 2013 (UTC)Reply

A group is useful if a font targets it and its members would look silly in a browser-selected mish-mash of fonts. My three example characters are used in completely different domains, so it does not matter if multiple fonts are used.

I think it is not important how many groups there are. There should be exactly as many as necessary.

Is this meant to supplement or to replace {{Script}} etc.? Gorobay (talk) 00:52, 10 January 2013 (UTC)Reply

OK, your three examples were different domains, unlikely to be used together, sure. And if there were a topical subgroup of them that couldn't be rendered using the font specified, then sure, split those off into a new group so that a different font could be suggested. I don't see that as necessary here. Are the "keyboard symbol", "ALGOL symbol", or "hardware icon" subgroups not adequately served by Segoe UI Symbol? No, they're fine, so I don't see the point in splitting the group.

I agree there should be as many as necessary. But OTOH there could be hundreds of groups, depending on how someone might want to categorize things. Are all of them really "necessary"? What determines the necessity? If multiple groups are fully and well-served by the same font stack, there's no harm in combining those groups under a more generic name. Having additional groups has consequences. It takes more editorial effort to support them. Someone needs to assess fonts to see find how well they support all the groups: additional groups means additional effort. Someone has to update common.css (because it's locked) if font support changes: more small groups seems like it would yield a greater chance of something changing more frequently. So while, in theory, I think it would be great if we could precisely define each "grouping" of symbol usage, we need to be practical too.

This template is meant to supplement existing ones (e.g. {{IPA}}, {{script}}), although the others could be subsumed by this one if that were desired. (Basically, they all are a way of saying "I choose this font stack to display these characters with".) No other template allows for "arbitrary" groups of characters, and none specifically deals with dingbats. ⇔ ChristTrekker 15:57, 10 January 2013 (UTC)Reply

I think you proposed somewhere that before the fonts were put in common.css they should stay in this template for a trial period. If we do that at first, we could peruse Wikipedia looking for cases where this template might help, and update it as we find them. That is how to know what is necessary. Gorobay (talk) 17:47, 11 January 2013 (UTC)Reply

That's not a bad idea. One question is whether we want this to remain a separate template, or fold it into {{unicode}}. I've made the default (no/unknown parameter) simply add the .Unicode class. ⇔ ChristTrekker 19:49, 11 January 2013 (UTC)Reply

I support merging into {{Unicode}}. Eventually all font-selection templates should be merged. Gorobay (talk) 22:44, 11 January 2013 (UTC)Reply

I'd prefer that, too. Personally, the biggest negative to that is that I wouldn't be able to edit it myself any more. ☹ It would entail redefining its given purpose to be more aligned with what is described here. ⇔ ChristTrekker 15:52, 14 January 2013 (UTC)Reply

Should the political and religious symbols be split into their own groups? Then they could have a more meaningful name too, addressing another of DePiep's points. ⇔ ChristTrekker 15:52, 14 January 2013 (UTC)Reply

Add topic