This page attempts to document standards and infrastructure for presentation of Unicode-related information on Wikipedia. It may also serve as a gathering point for work on building the same.

Templates edit

Unicode-related templates.

{{SpecialChars}}
Add a small message box (floated right) which informs the reader that the page uses special characters, which might not display properly. Here, "special" basically means anything beyond ASCII and maybe Latin-1. This template should be added to the top of any page that makes extensive use of Unicode.
{{Unicode}}
This just wraps the given character(s) in an HTML SPAN block with class "Unicode". CSS can then be applied on a per-browser/platform basis to select appropriate fonts, or maybe even do other fix-ups.

Glyph images edit

Wikipedia and/or Wikimedia Commons host many images of glyphs — characters rendered in a given font. In article text, we generally prefer to use literal Unicode characters, not these rendered images. Thus, these images are primarily used in articles about characters, where an illustration is appropriate. In particular, any #Unicode tables provide both the literal character and an image of the character.

Ideally, all such glyph images would be vector graphics, in SVG format. However, many exist in a raster graphics format, such as GIF. Converting or replacing these with SVGs is something that should be done.

As of this writing, there is no standardized naming of these images. Sometimes an expression of the codepoint is used as the file name, e.g., U+2122.svg. In other cases, the character name is used, e.g., OCR-A char Quotation Mark.svg.

Unicode tables edit

Many articles dealing with Unicode include tables of Unicode characters. The standard form for such tables is as given in the following example.

Example table edit

Example caption
Char Image Name Hex Decimal
Trigram for Earth U+2637 9783
Wheel of Dharma U+2638 9784
White frowning face (Emoticon) U+2639 9785
White smiling face (Emoticon) U+263a 9786

Legend edit

A copy of this legend, or something like it, will be linked from or displayed with all Unicode tables, once we figure out exactly how that should be done.
Char The literal character. If your computer lacks Unicode support, you may see other symbols instead of the proper character.
Image A sample image of the character, rendered in an example font.
Name The official name of the character. Additional information may be given in parenthesis.
Hex The numeric code point for the character, in hexadecimal (base 16), with "U+" prefix.
Decimal The same code point value, expressed in decimal (base 10).

Design features edit

The table format has the following design features:

  • Sortable
  • "Char" column
    • The literal Unicode character
    • For web browsers which support Unicode and can render it properly, gives the user a "native" presentation
    • Allows the reader to copy-and-paste the characters for real usage (like Charmap)
    • The {{Unicode}} template is used
  • "Image" column
    • A sample rendering of the Unicode glyph (see #Glyph images)
    • For systems/browsers which cannot render Unicode (or specific characters), allows the reader to see intended appearance
    • Provides a consistency check for character, image, and browser. Discrepancies will stand out.
    • When a glyph image isn't available, the table cell is left empty
  • "Name" column
    • The official codepoint name, as specified by the Unicode Consortium
    • Either the entire name, or individual words, may be wikilinked to articles
    • When the appropriate article title does not match the word(s) of the official name, piped links should be used to preserve the official name
    • Additional names or references can be provided in parenthesis, if needed
    • For illustration, in the above table:
      • Only "Trigram" is wikilinked, because "of Earth" is not part of ba gua (concept)
      • All of "Wheel of Dharma" is wikilinked, because Dharmacakra is synonymous with "Wheel of Dharma"
      • "Emoticon" is a parenthetical, as that is not part of the official Unicode codepoint name
  • "Hex" and "Decimal" columns
    • The codepoint number, in both decimal (base ten) and hexadecimal (base 16) formats
    • The “U+” prefix is used for hex, per the Unicode standard
    • Decimal is not prefixed, per WP:MOSNUM
  • The plan is to eventually add some kind of standard explanation of the columns to the tables, most likely as an adjacent template, or maybe links from the headers. Ideas welcome!

Articles with Unicode tables edit

See also edit