Talk:Unicode block

Latest comment: 2 years ago by Drmccreedy in topic Image previews beside the glyphs

Proposed moves of unicode block articles edit

The following is a closed discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review after discussing it on the closer's talk page. No further edits should be made to this discussion.

The result of the move request was: Not moved. No consensus to move any of these because it's unnecessary disambiguation. There may be consensus to move the most generic sounding ones, but probably only if we have a better destination, either a dab page or as primary redirect to another article, for the generic name. Those can be proposed separately, as I don't see consensus for anything specific like that here. (non-admin closure) В²C 01:08, 10 May 2019 (UTC)Reply


Jorge Stolfi (talk) 22:01, 1 May 2019 (UTC)Reply

– A little more than half of the articles about unicode character blocks have the qualifier "(Unicode block)", the others (listed above) don't. I propose to rename the latter so that all of them do. Here are some reasons:

  1. The current names are not descriptive of the subjects. A name like "Control Pictures", "Geometric Shapes", "Mathematical Operators", "Greek and Coptic" or even "Latin Extended-B" gives no clue that the topic of the article is a section of a specific computer character set.
  2. The current names are not specific to the subject of the article. Even names like "Greek Extended" or "Miscellaneous Symbols", that one could infer are about character encodings, could apply to other character sets, besides Unicode, that were in use in the rather recent past, and would still deserve articles of their own.
  3. The naming of the Unicode blocks is not consistent. The fact that only half of the articles have the "(Unicode block)" qualifier causes difficulties for editors and potential confusion for readers. For example, an editor intending to link to an article on the history and use of Braille system may link to Braille Patterns instead by mistake.
  4. The articles violate the Wikipedia standards for titles. Many of the unqualified Unicode block articles are in the plural, have unnecessary capitalizations, or violate the standards in other ways. For example, to satisfy the standards the article Mathematical Operators should be named Mathematical operator. But of course that is not the name of the Unicode block; and it is the name of an article with a very different subject. Adding the qualifier "(Unicode block)" would satisfy the naming standards, besides avoiding confusion.
  5. Those topics are not very notable and are only of specialized and ephemeral technical interest. The division of the Unicode character space into blocks is mostly an artifact of the way the Unicode Consortium discusses, approves, and documents proposals to include characters. It has only tenuous (and often very questionable) connections to the history, usage, or semantics of those characters.
    The division is relevant only to those who are interested in the history of Unicode, or who intend to propose new symbols for it.
    The division is not relevant to users of Unicode. On the contrary, to find the Unicode for a desired glyph, like a special math symbol or a letter with a certain modifier, one should ignore the block division and use Google or some other generic search tool -- because one cannot tell which block that symbol has been put into.
    The division is not even useful to font designers. While at some point one would find computer fonts that were limited to one or two specific blocks, that has never been a rule, and fonts are increasingly cutting across the Unicode block boundaries.

Apparently the names above were assigned without the "(Unicode block)" because it was felt that the qualifier was unnecessary, since there was no other page in Wikipedia with that name. But that is not what "unnecessary" means. Most of the names above have a common-sense meaning that has nothing to do with Unicode; so a qualifier is necessary to differentiate them from those common meanings. If you say "Geeometric Patterns", "Number Forms", or "Greek and Coptic" to someone, even to a computer expert, the last thing she will think of is the Unicode block of that name. Initially the moves will create a redirect from each unqualified name to the coresponding qualified name. I will try to replace all uses of the former by the latter. In some cases, like "Tai Viet" or "Mathematical Operators" the redirect is inappropriate or pointless, in which case it will be deleted or redirected to a more appropriate article. Note that, if one will type "Tai Viet" to the search window, the "(Unicode block)" article will be listed anyway as one of the suggested alternatives. There are also half a dozen cases where the Unicode block article was merged into an article about a language, script, or typography article:

These merges should be undone, since the article about the Unicode block is suposed to have a lengthy section that documents the history of the block and the relevant Unicode Consortium publications, that do not belong to the articles above. For reference, the following articles are already named with the qualifier:

Jorge Stolfi (talk) 22:01, 1 May 2019 (UTC)Reply

  • Agree. Makes sense, and I think this qualifies as a circumstance for which we can have a specific naming convention. Rajanala Samyak (talk) 01:52, 3 May 2019 (UTC)Reply
  • In general, Support: a lot of the titles make more sense as redirects to more general coverage, Combining Diacritical Marks to combining character for example, given that combining diacritical marks as a concept are neither limited to that Unicode block nor unique to Unicode (see for example Windows-1258, ANSEL…). Control Pictures as a general concept would make most sense as a section in control character. And so forth. This is similar to the mentioned UK parliament constituencies, which take their name from a region, and said region would in general be the primary topic for the constituency name.

    The mentioned example of CJK Unified Ideographs Extension A would potentially be an exception, given that CJK Unified Ideographs is a specifically Unicode concept already: while JIS X 0208 does unify Kanji variants, it's only on the scale of one language; GB18030 (the current GB(K) version) is now based on Unicode; KS C 5601 didn't even unify certain different uses of identical written characters within the one orthography; Unicode's main comptetitors in pan-CJK encoding such as TRON (encoding) or CCCII eschew glyph form unification altogether. (I'm remaining agnostic on whether "Extension A" or "Supplement" names need the qualifier in general.)

    Halfwidth and fullwidth forms would logically take a broad view of the subject both in legacy encodings and in Unicode, possibly merging in (edit: or even moving into place and expanding) half-width kana to be honest. (As a speculative sidenote, the current article looks like it's expanded off from a Unicode block article as a place to provide more well-rounded coverage than just katakana, but that's a guess, I haven't dug through their histories.)

    But in general, just because there's a Unicode block named with a given exact form of a term, does not always logically make it the primary topic for that term. {{redirect}} can be used, {{redirect|Combining Diacritical Marks|the Unicode block|Combining Diacritical Marks (Unicode block)}} in the above example. -- HarJIT (talk) 16:49, 4 May 2019 (UTC)Reply
  • Comment I can agree about "CJK" since the only meaning of the word "CJK" is the Unicode plane. --Jorge Stolfi (talk) 22:46, 5 May 2019 (UTC)Reply
  • Oppose with the possible exception of the generic-sounding names like “Mathematical Operators”. Here is a point-by-point rebuttal.
    1. The current names are not descriptive of the subjects. – That does not matter. Names of things need not be descriptive. For example, Joseph Andrews sounds like a person, but it is about a novel; Abbey Road sounds like a road, but it is about an album; Tarantula hawk sounds like a kind of hawk, but it is about a kind of wasp. These Unicode blocks are no different.
    2. The current names are not specific to the subject of the article. – Names like “Mathemetical Operators” are indeed ambiguous, if you ignore the capitalization. Specific individual arguments should be made for disambiguating them, not as part of a mass renaming. Names like “Greek Extended” could theoretically have been used for other character encodings, but you would need to demonstrate that each such name actually is used for something else, which you haven’t. Names like “Arabic Presentation Forms-B” could not plausibly refer to anything other than Unicode blocks.
    3. The naming of the Unicode blocks is not consistent. – The rule is that the article title includes “(Unicode block)” if and only if the bare title is ambiguous. This rule has been applied consistently. Adding “(Unicode block)” indiscriminately would be inconsistent with the rest of Wikipedia.
    4. The articles violate the Wikipedia standards for titles. Many of the unqualified Unicode block articles are in the plural, have unnecessary capitalizations, or violate the standards in other ways. – The rules about plurals and capital letters do not apply to proper nouns. For example, the article about the novel Vile Bodies is at “Vile Bodies”, not “Vile body”. Similarly, Unicode block names are proper nouns: they are names, not mere descriptions. That they are names is clear because of the idiosyncratic style of names like “Arabic Presentation Forms-B”, which is not standard English prose.
    5. Those topics are not very notable and are only of specialized and ephemeral technical interest. – In that case, nominate them for deletion. As long as they are considered notable enough to have articles, and as long as there are no other notable topics with the same names, there is no need to disambiguate them, so they should not be disambiguated. Gorobay (talk) 15:32, 5 May 2019 (UTC)Reply
  • Comment: The examples are not quite convincing. The name of an article about a book like John Andrews must obviously be the title of the book, no matter whether it is descriptive or not. The article abut the album is named Abbey Road without the "(album)" only because editors decided that (for now!) 99% of the searches for that name are for the album, not the actual road. However, titles of books, albums, and songs are usually put in italics in English texts precisely because they would otherwise be confusing. They must somehow be marked to make it clear that they are not to be parsed for their English meaning, but taken unparsed as a whole, as the proper name of something.
    As for "Tarantula hawk", that is the common English name of the insect (not a proper name), so Wikipedia is not to be blamed if someone somehow thinks that it may be a spider of bird.
    But, otherwise, names of articles must be as descriptive of their concepts as possible, especially to readers who are not familiar with the subject but may want to read about it. The name of the article about X should not be how the X-ologists call X, but how a sufficiently broader community refers to X.
    Now, the names of the Unicode blocks are neither titles of novels nor common names for the concepts. They are merely the names of sections in the Unicode standard document, not in the crharacter encoding itself. Again, not even a computer expert will think of the Unicode block when reading "IPA Extensions" or "Ornamental Dingbats" or "Mongolian Supplement". In common English texts, those references would have to be qualified as "the Mongolian Supplement block of Unicode".
    The rule that "a qualifier should be omitted when there is no other article with that name" is should not be followed strictly and blindly. It is only a "testable" approximation to the ideal rule "avoid unnecessary qualifiers". One should take into account also the existence of articles or redirects whose names differ only in subtle details like capitalization or plurals. And it makes sense to consider also articles that don't exist, but should -- like "Rumi Numeral Symbols" -- or redirects that should point to other articles -- like "Linear B Ideograms". The wholesale renaming proposal above would clear the way for plugging such holes in the future, without having to go through a renaming discussion for each individual case. (Note that the plugging of many of those holes depend on editors who are not computer experts and don't have the time or knowhow to start such discussions.)
    Your suggestion that all Unicode block articles should be eliminated is not that absurd. Unicode is obviously a notable concept, but its division into blocks is merely an internal administrative choice by the Unicode Consortium, that has no relevance whatsoever for the users of the character set. (You may have noticed that the only reason why the block start and length must be multiples of 16 is that the Consortium document typesets the reference forms in tables with 16 columns. They say so themselves.)
    Indeed, those Unicode block articles are only another instance of an unfortunate trend, whereby some well-meaning editors decide to create inside Wikipedia a mirror of some arbitrary classification, index, or database that was created by some external agency. That is always a bad idea. Wikipedia should note the existence of such database, and describe it generically -- but then group, split, organize, and rename its contents in whatever way is most appropriate for Wikipedia readers.
    In the case of Unicode, logically there should be just one master article that describes the whole set, and su-articles on logically defined subsets (like "Numerals in Unicode", "Latin-based letters in Unicode", etc.) even if they cut across the Unicode blocks.
    But those "Unicode block" articles are here, so let them stand. Still, since they are rather obscure subjects that are not what their names seem to say, it is fair that they get ungainly names -- just as Barrackpore_(Vidhan_Sabha_constituency) should have a qualifier by default, whether the article on the city of Barrackpore exists or not.
    --Jorge Stolfi (talk) 23:48, 5 May 2019 (UTC)Reply
  •   Comment: This requested move has been listed at WP:CENT for wider community input. --qedk (t c) 18:21, 9 May 2019 (UTC)Reply
  • Oppose unnecessary disambiguation. The article is for explaining what the title is about, it's not the title's job to give that explanation. -- Tavix (talk) 18:32, 9 May 2019 (UTC)Reply
  • Support per nom, WP:NAMINGCRITERIA points 1, 3, 5, and WP:IAR. While I understand we should avoid unnecessary disambiguation, especially parenthetical, the proposal makes good sense on its face. I don't think we should be bending over backwards to satisfy WP:ATDAB when it's not even clear it is against this. There is a very reasonable argument that most of these names are either not precise enough (e.g., Mathematical Operators) or not natural enough (e.g. CJK Unified Ideographs Extension A), and that rather than going through these one by one, we can make the entire naming scheme consistent. The benefits seem to outweigh the small aesthetic cost of a parenthetical dab. Wugapodes [thɑk] [ˈkan.ˌʧɹɪbz] 20:32, 9 May 2019 (UTC)Reply
  • Support per nom and Wugapodes. The proposal is simple and consistent, and Gorobay's rebuttal is not strong enough to convince me that this shouldn't be done. I particularly disagree with their response to point 1. —⁠烏⁠Γ (kaw)  23:50, 09 May 2019 (UTC)Reply
  • Oppose It goes against Wikipedia naming practice to add unnecessary parenthetical disambiguation. The block names already have a consistent and easily-understood naming system, which is to use the exact Unicode name (spelling and capitalization), and only add "(Unicode block)" after the name if there is ambiguity. Adding "(Unicode block)" in all cases is unnecessary and in many cases would be so awkward that in future other editors will certainly attempt to remove the unnecessary modifier. BabelStone (talk) 00:11, 10 May 2019 (UTC)Reply

The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page or in a move review. No further edits should be made to this section.

change the font stack used in the inline css of some unicode block templates edit

The following templates have an inline css written into their HTML using a <style> tag:
{{Unicode chart CJK Unified Ideographs Extension E}} and {{Unicode chart CJK Unified Ideographs Extension F}}

The css in question is as follows style="border-collapse:collapse;background:#FFFFFF;font-size:large; text-align:center;font-family: sans-serif, 'Unicode内码天珩输入法配套字体', '方正宋体S-超大字符集', '方正宋体S-超大字符集(SIP)', '文泉驿等宽正黑', 'HanaMinB', 'HanaMinC', 'HanaMinExC', 'BabelStone Han Plain', 'BabelStone Han', 'FZSong-Extended', 'Arial Unicode MS', Code2002, DFSongStd, 'STHeiti SC', unifont, LastResort;"

I request that the following font families get added into the font stack:
TH-Tshyn-P0,TH-Tshyn-P1,TH-Tshyn-P2,TH-Tshyn-P16

TH-Tshyn is a font that supports all the characters in Unicode13.0, you can read more about it in http://cheonhyeong.com/English.html

P.S: why do these templates have inline css, as opposed to having their own class?

2806:264:4408:8E19:81B0:5A7:6C9F:5933 (talk) 07:30, 14 May 2021 (UTC)Reply

Assuming these are reusing the font list, this would be a good use for template styles (which can all share one template style) - not seeing a need for a site-wide class here though. — xaosflux Talk 22:26, 14 May 2021 (UTC)Reply

Image previews beside the glyphs edit

Most/all Unicode block pages and List of Unicode characters have the template:Contains_special_characters, which is good (it can reduce confusion and save time looking for errors), but not everyone can install full fonts on every device. I propose we can add images of the characters in the mentioned articles, either

  • inside the table cell, next to the glyph,
  • in a column (for List of Unicode characters),
  • in a separate table or
  • one image of a table containing the characters and their U+numbers.

Is there a simple method to get the server to make the images? MediaWiki has a way of converting eg. <math> to images (probably using LaTeX), but that uses a fancy font, which I'm afraid will be illegible for many symbols.

Once we have a solution and agreement, can someone write a bot to make the edits? --Ziom 2.0 (talk) 12:38, 10 February 2022 (UTC)Reply

Here are my thoughts:
  • I don't really have an opinion about adding images to List of Unicode characters although I wonder how the page would load with tens of thousands of images on it and how it would be kept up-to-date.
  • I oppose adding images inside the Unicode block tables/templates for several reasons:
  1. It would be visually confusing.
  2. It would make cut-and-paste of text more difficult.
  3. Usually when new characters are added there isn't an available font to use to create an image.
  4. How would you decide which font to use. Taking CJK Unified Ideographs as an example, would you choose a font for Japan or China or Taiwan for the image? Another example is U+1F52B PISTOL: Are you going to use a font that matches the Unicode chart of a gun or one that matches most phone implementations which look like a toy water gun?
  5. New Unicode releases sometimes update the example character shapes, making it difficult to keep the images up-to-date.
  • I also oppose adding an image of the entire block to the Unicode block articles the same reasons as above but mainly because every block table starts with a link to the "Official Unicode Consortium code chart (PDF)" which already has a chart that is independent of font support. BTW: I don't think it's fair use to just screen shot Unicode PDF charts to make the Wikipedia images.
    I do agree that the lack of good font support is frustrating. DRMcCreedy (talk) 23:01, 10 February 2022 (UTC)Reply