Template talk:Unicodenover

Latest comment: 4 years ago by Koavf in topic As of

Counts

edit
So, "143,859 characters" isn't quite right either. With the new wording I think you probably want the assigned character count of 143,924. BTW: Unicode 13.0 looks like this:
  1,114,112 code points on 17 planes:
  +----------------------------------------------------------------------+
  |             +------------------------------------------------------+ |
  | 287,472     |            +---------------------------------------+ | |
  | allocated --| 143,924    | 143,696 graphical     +-------------+ | | |
  |             | assigned --| 228 special purpose --| 65 control  | | | |
  |             |            |                       | 163 format  | | | |
  |             |            |                       +-------------+ | | |
  |             |            +---------------------------------------+ | |
  |             | 137,468 private use (fixed)                          | |
  |             | 2,048 surrogates (fixed)                             | |
  |             | 66 noncharacters (fixed)                             | |
  |             | 3,966 reserved within allocated blocks               | |
  |             +------------------------------------------------------+ |
  |                                                                      |
  | 826,640 unallocated                                                  |
  +----------------------------------------------------------------------+

:DRMcCreedy (talk) 02:20, 1 May 2020 (UTC)Reply

Drmccreedy, So it's 143,696 + 228 = 143,924? ―Justin (koavf)TCM 02:34, 1 May 2020 (UTC)Reply
Hm. See https://www.unicode.org/versions/Unicode13.0.0/ "a total of 143,859 characters". ―Justin (koavf)TCM 02:35, 1 May 2020 (UTC)Reply
Unicode doesn't technically "define" the 65 control characters (like TAB and LINEFEED) so I'm guessing that's why they're excluded from their count. I'm fine with it being 143,859 to match the Unicode site so long as you you put a comment in the template like <!-- assigned characters minus the 65 control characters -->. DRMcCreedy (talk) 03:05, 1 May 2020 (UTC)Reply
Drmccreedy, Are you actually sure this is true, tho? Where are you getting your numbers? ―Justin (koavf)TCM 03:08, 1 May 2020 (UTC)Reply
The numbers are from scripts I run using the Unicode character database itself as input. I'm confident they're correct. DRMcCreedy (talk) 03:36, 1 May 2020 (UTC)Reply
Drmccreedy, Funny. I'll write them now to clarify. Thanks, D. ―Justin (koavf)TCM 04:50, 1 May 2020 (UTC)Reply
My preference is to keep with the figure of 143,859 as this matches the figure given by Unicode, and is the number of named characters. Control characters do not have official Unicode names, so it is reasonable to exclude control characters, along with PUA characters and non-characters, even though the 137,468 PUA characters are available for use by everyone. The fact that we're having this discussion here means that we should define what we mean by "characters" in the template text. I propose the following: "143,859 named characters (143,696 graphic characters and 163 format characters) and 137,468 private use characters as of Unicode 13.0". It is long-winded but it is accurate and avoids confusion. BabelStone (talk) 09:25, 1 May 2020 (UTC)Reply
BabelStone, PUA would really make no sense, agreed. Also agreed that we should say what reliable sources say, which is in this case is "a total of 143,859 characters". I did write to the Unicode Consortium, for what it's worth. As for the wording, what about cutting out the parenthetical part? ―Justin (koavf)TCM 09:58, 1 May 2020 (UTC)Reply
@Drmccreedy and BabelStone:: https://www.unicode.org/versions/stats/charcountv13_0.htmlJustin (koavf)TCM 19:51, 1 May 2020 (UTC)Reply
It's nice to have confirmation of the numbers. I've updated the template documentation to note what the total represents, which will be helpful for future releases. DRMcCreedy (talk) 04:37, 2 May 2020 (UTC)Reply

Reference

edit

I think a reference for the figure should be included inside the template. The lede to Unicode originally had a reference to the figure[1], but since the figure has been replaced by this template the reference is disjoined from the text it supports (if the template is updated the reference attached to the template in an article becomes out of date), so I have removed the reference. However, a reference for the figure provided by the template is still required, so I think it should be added to the template. BabelStone (talk) 09:14, 4 May 2020 (UTC)Reply

BabelStone, There is a reference in the template: Unicode 13. ―Justin (koavf)TCM 10:39, 4 May 2020 (UTC)Reply

As of

edit

The Unicode article currently has the awkward sentence "... and as of March 2020, there is a repertoire of 143,859 characters as of Unicode 13.0" with two "as of" because the template uses "as of" for the Unicode version and the Unicode article requires an "as of" for the date. Can we combine the date and version in the template with a single "as of"? For example, "143,859 characters as of Unicode 13.0 released in March 2020". BabelStone (talk) 09:19, 4 May 2020 (UTC)Reply

BabelStone, Good point. What do you think of it now? ―Justin (koavf)TCM 10:41, 4 May 2020 (UTC)Reply
  1. ^ Unicode, Inc. "Announcing The Unicode® Standard, Version 13.0". {{cite web}}: |first1= has generic name (help)