Tags (Unicode block)

Tags
Tags
Range	U+E0000..U+E007F; (128 code points)
Plane	SSP
Scripts	Common
Assigned	97 code points
Unused	31 reserved code points ; 1 deprecated
Unicode version history
3.1 (2001)	97 (+97)
Unicode documentation
	Code chart ∣ Web page
	Note:

Tags is a Unicode block containing formatting tag characters. The block is designed to mirror ASCII. It was originally intended for language tags, but has now been repurposed as emoji modifiers, specifically for region flags.

Legacy use edit

U+E0001, U+E0020–U+E007F were originally intended for invisibly tagging texts by language^[3] but that use is no longer recommended.^[4] All of those characters were deprecated in Unicode 5.1.

With the release of Unicode 8.0, U+E0020–U+E007E are no longer deprecated characters. The change was made "to clear the way for the potential future use of tag characters for a purpose other than to represent language tags".^[5] Unicode states that "the use of tag characters to represent language tags in a plain text stream is still a deprecated mechanism for conveying language information about text".^[5]

Current use edit

With the release of Unicode 9.0, U+E007F is no longer a deprecated character. (U+E0001 LANGUAGE TAG remains deprecated.) The release of Emoji 5.0 in May 2017^[6] considers these characters to be emoji for use as modifiers in special sequences.

The only usage specified is for representing the flags of regions, alongside the use of Regional Indicator Symbols for national flags.^[7] These sequences consist of U+1F3F4 🏴 WAVING BLACK FLAG followed by a sequence of tags corresponding to the region as coded in the CLDR, then U+E007F CANCEL TAG. For example, using the tags for "gbeng" (🏴󠁧󠁢󠁥󠁮󠁧󠁿) will cause some systems to display the flag of England, those for "gbsct" (🏴󠁧󠁢󠁳󠁣󠁴󠁿) the flag of Scotland, and those for "gbwls" (🏴󠁧󠁢󠁷󠁬󠁳󠁿) the flag of Wales.^[7]

The tag sequences are derived from ISO 3166-2, but sequences representing other subnational flags (for example US states) are also possible using this mechanism. However, as of Unicode version 12.0 only the three flag sequences listed above are "Recommended for General Interchange" by the Unicode Consortium, meaning they are "most likely to be widely supported across multiple platforms".^[8]

Unicode block edit

Tags^[1]^[2]^[3] Official Unicode Consortium code chart (PDF)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
U+E000x		BEGIN
U+E001x
U+E002x	SP	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
U+E003x	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
U+E004x	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
U+E005x	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
U+E006x	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
U+E007x	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~	END
1.^ As of Unicode version 15.1 2.^ Grey areas indicate non-assigned code points 3.^ Unicode code points U+E0001 and U+E0020 through U+E007F were deprecated with Unicode version 5.1 however as of Unicode version 9.0 only U+E0001 remains deprecated

History edit

The following Unicode-related documents record the purpose and process of defining specific characters in the Tags block:

Version	Final code points^[a]	Count	L2 ID	WG2 ID	Document
3.1	U+E0001	1	L2/97-203		Whistler, Ken; Adams, Glenn (1997-08-05), Plane 14 characters for generic tags
			L2/97-171R2		Whistler, Ken (1997-09-18), Plane 14 Characters for Generic Tags
			L2/97-256		Allouche, Mati (1997-10-20), Comments on Plane 14 Position Paper
			L2/97-255R		Aliprand, Joan (1997-12-03), "3.B. Lightweight language tagging", Approved Minutes - UTC #73 & L2 #170 joint meeting, Palo Alto, CA - August 4-5, 1997
			L2/98-027	N1670	Plane 14 characters for language tags, 1997-12-12
			L2/98-039		Aliprand, Joan; Winkler, Arnold (1998-02-24), "2.C REVISED PROPOSALS", Preliminary Minutes - UTC #74 & L2 #171, Mountain View, CA - December 5, 1997
			L2/98-286	N1703	Umamaheswaran, V. S.; Ksar, Mike (1998-07-02), "7.4", Unconfirmed Meeting Minutes, WG 2 Meeting #34, Redmond, WA, USA; 1998-03-16--20
			L2/98-281R (pdf, html)		Aliprand, Joan (1998-07-31), "IETF and W3C Issues (VI)", Unconfirmed Minutes - UTC #77 & NCITS Subgroup L2 # 174 JOINT MEETING, Redmond, WA -- July 29-31, 1998
			L2/00-010	N2103	Umamaheswaran, V. S. (2000-01-05), "9.1", Minutes of WG 2 meeting 37, Copenhagen, Denmark: 1999-09-13--16
			L2/01-301		Whistler, Ken (2001-08-01), "Tag Characters", Analysis of Character Deprecation in the Unicode Standard
			L2/02-166R2		Moore, Lisa (2002-08-09), "Character Deprecation", UTC #91 Minutes
	U+E0020..E007F	96	L2/16-042		Fonts, Agustin; Pournader, Roozbeh (2015-01-26), Clarifications Requested for "Full Emoji Data" and Emoji Flags
			L2/15-145R		Edberg, Peter (2015-05-07), Proposal for additional regional indicator symbols
			L2/15-107		Moore, Lisa (2015-05-12), "E.1.3 Proposal for additional regional indicator symbols", UTC #143 Minutes
			L2/15-190		Edberg, Peter (2015-06-29), PRI #299 Background: Representing Additional Types of Flags
			L2/15-206		Davis, Mark (2015-07-25), Region / Subdivision validity for flags
			L2/16-180R		Burge, Jeremy; Williams, Owen (2016-07-07), Proposal to include Emoji Flags for England, Scotland and Wales
			L2/17-016		Moore, Lisa (2017-02-08), "Action item 150-A59", UTC #150 Minutes, Add the three sequences for flags documented in L2/16-180R to emoji-sequences.txt for emoji 5.0.
			L2/17-048		Pournader, Roozbeh (2017-01-24), Feedback on PRI 343 (Unicode Emoji 5.0)
			L2/17-086		Burge, Jeremy; et al. (2017-03-27), Add ZWJ, VS-16, Keycaps & Tags to Emoji_Component
			L2/17-103		Moore, Lisa (2017-05-18), "E.1.7 Add ZWJ, VS-16, Keycaps & Tags to Emoji_Component", UTC #151 Minutes
^ Proposed code points and characters names may differ from final code points and names

References edit

^ "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
^ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
^ Whistler, K.; Adams, G. (January 1999). "RFC2482: Language Tagging in Unicode Plain Text". Network Working Group. doi:10.17487/RFC2482. {{cite journal}}: Cite journal requires |journal= (help)
^ Whistler, K.; Adams, G.; Duerst, M.; Klensin, J.; Klensin, J. (November 2010). Presuhn, R. (ed.). "RFC6082: Deprecating Unicode Language Tag Characters: RFC 2482 is Historic". Internet Engineering Task Force (IETF). doi:10.17487/RFC6082. {{cite journal}}: Cite journal requires |journal= (help)
^ ^a ^b "Unicode 8.0.0, Implications for Migration". Unicode Consortium.
^ "Emoji Version 5.0 List". Emojipedia. Retrieved 24 July 2021.
^ ^a ^b "UTR #51: Unicode Emoji". Unicode Consortium. 2017-05-18.
^ "emoji-sequences.txt". Unicode Consortium. 2023-06-05. Retrieved 5 March 2019.

[final-9] Proposed code points and characters names may differ from final code points and names

[1] "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.

[2] "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.

[3] Whistler, K.; Adams, G. (January 1999). "RFC2482: Language Tagging in Unicode Plain Text". Network Working Group. doi:10.17487/RFC2482. {{cite journal}}: Cite journal requires |journal= (help)

[4] Whistler, K.; Adams, G.; Duerst, M.; Klensin, J.; Klensin, J. (November 2010). Presuhn, R. (ed.). "RFC6082: Deprecating Unicode Language Tag Characters: RFC 2482 is Historic". Internet Engineering Task Force (IETF). doi:10.17487/RFC6082. {{cite journal}}: Cite journal requires |journal= (help)

[migration-5] "Unicode 8.0.0, Implications for Migration". Unicode Consortium.

[6] "Emoji Version 5.0 List". Emojipedia. Retrieved 24 July 2021.

[utr-51-7] "UTR #51: Unicode Emoji". Unicode Consortium. 2017-05-18.

[8] "emoji-sequences.txt". Unicode Consortium. 2023-06-05. Retrieved 5 March 2019.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[1]

[2]

[3]

[a]

Tags
Range	U+E0000..U+E007F (128 code points)
Plane	SSP
Scripts	Common
Assigned	97 code points
Unused	31 reserved code points 1 deprecated
Unicode version history

3.1 (2001)	97 (+97)

Unicode documentation
Code chart ∣ Web page
Note: ^[1]^[2]