Chinese characters[a] are logographs used to write the Chinese languages and others from regions historically influenced by Chinese culture. Chinese characters have a documented history spanning over three millennia, representing one of the four independent inventions of writing accepted by scholars; of these, they comprise the only writing system continuously used since its invention. Over time, the function, style, and means of writing characters have evolved greatly. Unlike letters in alphabets that reflect the sounds of speech, Chinese characters generally represent morphemes, the units of meaning in a language. Writing a language's entire vocabulary requires thousands of different characters. Characters are created according to several different principles, where aspects of both shape and pronunciation may be used to indicate the character's meaning.

Chinese characters
"Chinese character" written in traditional (left) and simplified (right) forms
Script type
Time period
c. 13th century BCE – present
  • Left-to-right
  • Top-to-bottom, columns right-to-left
Languages (among others)
Related scripts
Parent systems
  • Chinese characters
Child systems
ISO 15924
ISO 15924Hani (500), ​Han (Hanzi, Kanji, Hanja)
Unicode alias
U+4E00–U+9FFF CJK Unified Ideographs (full list)
Chinese characters
Chinese name
Simplified Chinese汉字
Traditional Chinese漢字
Literal meaningHan characters
Vietnamese name
Vietnamese alphabet
  • chữ Hán
  • chữ Nho
  • Hán tự
  • 𡨸漢
  • 𡨸儒
Chữ Hán漢字
Zhuang name
Korean name
Japanese name

The first attested characters are oracle bone inscriptions made during the 13th century BCE in what is now Anyang, Henan, as part of divinations conducted by the Shang dynasty royal house. Character forms were originally highly pictographic in style, but evolved over time as writing spread across China. Numerous attempts have been made to reform the script, including the promotion of small seal script by the Qin dynasty (221–206 BCE). Clerical script, which had matured by the early Han dynasty (202 BCE – 220 CE), abstracted the forms of characters—obscuring their pictographic origins in favour of making them easier to write. Following the Han, regular script emerged as the result of cursive influence on clerical script, and has been the primary style used for characters since. Informed by a long tradition of lexicography, states using Chinese characters have standardised their forms: broadly, simplified characters are used to write Chinese in mainland China, Singapore, and Malaysia, while traditional characters are used in Taiwan, Hong Kong, and Macau.

After being introduced in order to write Literary Chinese, characters were often adapted to write local languages spoken throughout the Sinosphere. In Japanese, Korean, and Vietnamese, Chinese characters are known as kanji, hanja, and chữ Hán respectively. Writing traditions also emerged for some of the other languages of China, like the sawndip script used to write the Zhuang languages of Guangxi. Each of these written vernaculars used existing characters to write the language's native vocabulary, as well as the loanwords it borrowed from Chinese. In addition, each invented characters for local use. In written Korean and Vietnamese, Chinese characters have largely been replaced with alphabets, leaving Japanese as the only major non-Chinese language still written using them.

At the most basic level, characters are composed of strokes that are written in a fixed order. Methods of writing characters have historically included being carved into stone, being inked with a brush onto silk, bamboo, or paper, and being printed using woodblocks and movable type. Technologies invented since the 19th century allowing for wider use of characters include telegraph codes and typewriters, as well as input methods and text encodings on computers.



Chinese characters are accepted as representing one of four independent inventions of writing in human history.[b] In each instance, writing evolved from a system using two distinct types of ideographs. Ideographs could either be pictographs visually depicting objects or concepts, or fixed signs representing concepts only by shared convention. These systems are classified as proto-writing, because the techniques they used were insufficient to carry the meaning of spoken language by themselves.[3]

Various innovations were required for Chinese characters to emerge from proto-writing. Firstly, pictographs became distinct from simple pictures in use and appearance: for example, the pictograph , meaning 'large', was originally a picture of a large man, but one would need to be aware of its specific meaning in order to interpret the sequence 鹿 as signifying 'large deer', rather than being a picture of a large man and a deer next to one another. Due to this process of abstraction, as well as to make characters easier to write, pictographs gradually became more simplified and regularised—often to the extent that the original objects represented are no longer obvious.[4]

This proto-writing system was limited to representing a relatively narrow range of ideas with a comparatively small library of symbols. This compelled innovations that allowed for symbols to directly encode spoken language.[5] In each historical case, this was accomplished by some form of the rebus technique, where the symbol for a word is used to indicate a different word with a similar pronunciation, depending on context.[6] This allowed for words that lacked a plausible pictographic representation to be written down for the first time. This technique pre-empted more sophisticated methods of character creation that would further expand the lexicon. The process whereby writing emerged from proto-writing took place over a long period; when the purely pictorial use of symbols disappeared, leaving only those representing spoken words, the process was complete.[7]



Chinese characters have been used in several different writing systems throughout history. The concept of a writing system includes both the written symbols themselves, called graphemes—which may include characters, numerals, or punctuation—as well as the rules by which they are used to record language.[8] Chinese characters are logographs, which are graphemes that represent units of meaning in a language. Specifically, characters represent the smallest units of meaning in a language, which are referred to as morphemes. Morphemes in Chinese—and therefore the characters used to write them—are nearly always a single syllable in length. In some special cases, characters may denote non-morphemic syllables as well; due to this, written Chinese is often characterised as morphosyllabic.[9][c] Logographs may be contrasted with letters in an alphabet, which generally represent phonemes, the distinct units of sound used by speakers of a language.[11] Despite their origins in picture-writing, Chinese characters are no longer ideographs capable of representing ideas directly; their comprehension relies on the reader's knowledge of the particular language being written.[12]

The areas where Chinese characters were historically used—sometimes collectively termed the Sinosphere—have a long tradition of lexicography attempting to explain and refine their use; for most of history, analysis revolved around a model first popularised in the 2nd-century Shuowen Jiezi dictionary.[13] More recent models have analysed the methods used to create characters, how characters are structured, and how they function in a given writing system.[14]

Structural analysis


Most characters can be analysed structurally as compounds made of smaller components (部件; bùjiàn), which are often independent characters in their own right, adjusted to occupy a given position in the compound.[15] Components within a character may serve a specific function: phonetic components provide a hint for the character's pronunciation, and semantic components indicate some element of the character's meaning. Components that serve neither function may be classified as pure signs with no particular meaning, other than their presence distinguishing one character from another.[16]

A straightforward structural classification scheme may consist of three pure classes of semantographs, phonographs and signs—having only semantic, phonetic, and form components respectively, as well as classes corresponding to each combination of component types.[17] Of the 3500 characters that are frequently used in Standard Chinese, pure semantographs are estimated to be the rarest, accounting for about 5% of the lexicon, followed by pure signs with 18%, and semantic–form and phonetic–form compounds together accounting for 19%. The remaining 58% are phono-semantic compounds.[18]

The Chinese palaeographer Qiu Xigui (b. 1935) presents three principles of character function adapted from earlier proposals by Tang Lan [zh] (1901–1979) and Chen Mengjia (1911–1966),[19] with semantographs describing all characters whose forms are wholly related to their meaning, regardless of the method by which the meaning was originally depicted, phonographs that include a phonetic component, and loangraphs encompassing existing characters that have been borrowed to write other words. Qiu also acknowledges the existence of character classes that fall outside of these principles, such as pure signs.[20]




Graphical evolution of pictographs

Most of the oldest characters are pictographs (象形; xiàngxíng), representational pictures of physical objects.[21] Examples include ('Sun'), ('Moon'), and ('tree'). Over time, the forms of pictographs have been simplified in order to make them easier to write.[22] As a result, it is often no longer evident what thing was originally being depicted by a pictograph; without knowing the context of its origin in picture-writing, it may be interpreted instead as a pure sign. However, if its use in compounds still reflects a pictograph's original meaning, as with in ('clear sky'), it can still be analysed as a semantic component.[23][24]

Pictographs have often been extended from their original meanings to take on additional layers of metaphor and synecdoche, which sometimes displace the character's original sense. This process has sometimes created excess ambiguity between the different senses of a character, which is usually resolved by creating new compound characters.[25]



Indicatives (指事; zhǐshì), also called simple ideographs or self-explanatory characters,[21] are visual representations of abstract concepts that lack any tangible form. Examples include ('up') and ('down')—these characters were originally written as dots placed above and below a line, and later evolved into their present forms with less potential for graphical ambiguity in context.[26] More complex indicatives include ('convex'), ('concave'), and ('flat and level').[27]

Compound ideographs


Compound ideographs (会意; 會意; huìyì)—also called logical aggregates, associative idea characters, or syssemantographs—combine other characters to convey a new, synthetic meaning. A canonical example is ('bright'), interpreted as the juxtaposition of the two brightest objects in the sky: 'SUN' and 'MOON', together expressing their shared quality of brightness. Other examples include ('rest'), composed of pictographs 'MAN' and 'TREE', and ('good'), composed of 'WOMAN' and 'CHILD'.[28]

The compound character illustrated as its component characters and positioned side by side

Many traditional examples of compound ideographs are now believed to have actually originated as phono-semantic compounds, made obscure by subsequent changes in pronunciation.[29] For example, the Shuowen Jiezi describes ('trust') as an ideographic compound of 'MAN' and 'SPEECH', but modern analyses instead identify it as a phono-semantic compound—though with disagreement as to which component is phonetic.[30] Peter A. Boodberg and William G. Boltz go so far as to deny that any compound ideographs were devised in antiquity, maintaining that secondary readings that are now lost are responsible for the apparent absence of phonetic indicators,[31] but their arguments have been rejected by other scholars.[32]



Phono-semantic compounds


Phono-semantic compounds (形声; 形聲; xíngshēng) are composed of at least one semantic component and one phonetic component.[33] They may be formed by one of several methods, often by adding a phonetic component to disambiguate a loangraph, or by adding a semantic component to represent a specific extension of a character's meaning.[34] Examples of phono-semantic compounds include (; 'river'), (; 'lake'), (liú; 'stream'), (chōng; 'surge'), and (huá; 'slippery'). Each of these characters have three short strokes on their left-hand side: , a simplified combining form of 'WATER'. This component serves a semantic function in each example, indicating the character has some meaning related to water. The remainder of each character is its phonetic component: () is pronounced identically to () in Standard Chinese, () is pronounced similarly to (), and (chōng) is pronounced similarly to (zhōng).[35]

The phonetic components of most compounds may only provide an approximate pronunciation, even before subsequent sound shifts in the spoken language. Some characters may only have the same initial or final sound of a syllable in common with phonetic components.[36] A phonetic series comprises all the characters created using the same phonetic component, which may have diverged significantly in their pronunciations over time. For example, (chá; caa4; 'tea') and (; tou4; 'route') are part of the phonetic series of characters using (; jyu4), a literary first-person pronoun. The Old Chinese pronunciations of these characters were similar, but the phonetic component no longer serves as a useful hint for their pronunciation due to subsequent sound shifts.[37]



The phenomenon of existing characters being adapted to write other words with similar pronunciations was necessary in the initial development of Chinese writing, and has remained common throughout its subsequent history. Some loangraphs (假借; jiǎjiè; 'borrowing') are introduced to represent words previously lacking another written form—this is often the case with abstract grammatical particles such as and .[38] The process of characters being borrowed as loangraphs should not be conflated with the distinct process of semantic extension, where a word acquires additional senses, which often remain written with the same character. As both processes often result in a single character form being used to write several distinct meanings, loangraphs are often misidentified as being the result of semantic extension, and vice versa.[39]

Loangraphs are also used to write words borrowed from other languages, such as the various Buddhist terminology introduced to China in antiquity, as well as contemporary non-Chinese words and names. For example, each character in the name 加拿大 (Jiānádà; 'Canada') is often used as a loangraph for its respective syllable. However, the barrier between a character's pronunciation and meaning is never total: when transcribing into Chinese, loangraphs are often chosen deliberately as to create certain connotations. This is regularly done with corporate brand names: for example, Coca-Cola's Chinese name is 可口可乐; 可口可樂 (Kěkǒu Kělè; 'delicious enjoyable').[40][41][42]



Some characters and components are pure signs, whose meaning merely derives from their having a fixed and distinct form. Basic examples of pure signs are found with the numerals beyond four, e.g. ('five') and ('eight'), whose forms do not give visual hints to the quantities they represent.[43]

Traditional Shuowen Jiezi classification


The Shuowen Jiezi is a character dictionary authored c. 100 CE by the scholar Xu Shen (c. 58 – c. 148 CE). In its postface, Xu analyses what he sees as all the methods by which characters are created. Later authors iterated upon Xu's analysis, developing a categorisation scheme known as the 'six writings' (六书; 六書; liùshū), which identifies every character with one of six categories that had previously been mentioned in the Shuowen Jiezi. For nearly two millennia, this scheme was the primary framework for character analysis used throughout the Sinosphere.[44] Xu based most of his analysis on examples of Qin seal script that were written down several centuries before his time—these were usually the oldest specimens available to him, though he stated he was aware of the existence of even older forms.[45] The first five categories are pictographs, indicatives, compound ideographs, phono-semantic compounds, and loangraphs. The sixth category is given by Xu as 轉注 (zhuǎnzhù; 'reversed and refocused'); however, its definition is unclear, and it is generally disregarded by modern scholars.[46]

Modern scholars agree that the theory presented in the Shuowen Jiezi is problematic, failing to fully capture the nature of Chinese writing, both in the present, as well as at the time Xu was writing.[47] Traditional Chinese lexicography as embodied in the Shuowen Jiezi has suggested implausible etymologies for some characters.[48] Moreover, several categories are considered to be ill-defined: for example, it is unclear whether characters like ('large') should be classified as pictographs or indicatives.[34] However, awareness of the 'six writings' model has remained a common component of character literacy, and often serves as a tool for students memorising characters.[49]


Diagram comparing the abstraction of pictographs in cuneiform, Egyptian hieroglyphs, and Chinese characters – from an 1870 publication by French Egyptologist Gaston Maspero[A]

The broadest trend in the evolution of Chinese characters over their history has been simplification, both in graphical shape (字形; zìxíng), the "external appearances of individual graphs", and in graphical form (字体; 字體; zìtǐ), "overall changes in the distinguishing features of graphic[al] shape and calligraphic style, [...] in most cases refer[ring] to rather obvious and rather substantial changes".[50] The traditional notion of an orderly procession of script styles, each suddenly appearing and displacing the one previous, has been disproven by later scholarship and archaeological work. Instead, scripts evolved gradually, with several coexisting in a given area.[51]

Traditional invention narrative


Several of the Chinese classics indicate that knotted cords were used to keep records prior to the invention of writing.[52] Works that reference the practice include chapter 80 of the Tao Te Ching[B] and the "Xici II" chapter within the I Ching.[C] According to one tradition, Chinese characters were invented during the 3rd millennium BCE by Cangjie, a scribe of the legendary Yellow Emperor. Cangjie is said to have invented symbols called () due to his frustration with the limitations of knotting, taking inspiration from his study of the tracks of animals, landscapes, and the stars in the sky. On the day that these first characters were created, grain rained down from the sky; that night, the people heard the wailing of ghosts and demons, lamenting that humans could no longer be cheated.[53][54]



Collections of graphs and pictures have been discovered at the sites of several Neolithic settlements throughout the Yellow River valley, including Jiahu (c. 6500 BCE), Dadiwan and Damaidi (6th millennium BCE), and Banpo (5th millennium BCE). Symbols at each site were inscribed or drawn onto artifacts, appearing one at a time and without indicating any greater context. Qiu concludes, "We simply possess no basis for saying that they were already being used to record language."[55] A historical connection with the symbols used by the late Neolithic Dawenkou culture (c. 4300 – c. 2600 BCE) in Shandong has been deemed possible by palaeographers, with Qiu concluding that they "cannot be definitively treated as primitive writing, nevertheless they are symbols which resemble most the ancient pictographic script discovered thus far in China... They undoubtedly can be viewed as the forerunners of primitive writing."[56]

Oracle bone script

Oracle bone script





Ox scapula inscribed with characters recording the result of divinations – dated c. 1200 BCE

The oldest attested Chinese writing comprises a body of inscriptions produced during the Late Shang period (c. 1250 – 1050 BCE), with the very earliest examples from the reign of Wu Ding dated between 1250 and 1200 BCE.[57] Many of these inscriptions were made on oracle bones—usually either ox scapulae or turtle shells—and recorded official divinations carried out by the Shang royal house. Contemporaneous inscriptions in a related but distinct style were also made on ritual bronze vessels. This oracle bone script (甲骨文; jiǎgǔwén) was first documented in 1899, after specimens were discovered being sold as "dragon bones" for medicinal purposes, with the symbols carved into them identified as early character forms. By 1928, the source of the bones had been traced to a village near Anyang in Henan—discovered to be the site of Yin, the final Shang capital—which was excavated by a team led by Li Ji (1896–1979) from the Academia Sinica between 1928 and 1937.[58] To date, over 150,000 oracle bone fragments have been found.[59]

Oracle bone inscriptions recorded divinations undertaken to communicate with the spirits of royal ancestors. The inscriptions range from a few characters in length at their shortest, to several dozen at their longest. The Shang king would communicate with his ancestors by means of scapulimancy, inquiring about subjects such as the royal family, military success, and the weather. Inscriptions were made in the divination material itself before and after it had been cracked by exposure to heat; they generally include a record of the questions posed, as well as the answers as interpreted in the cracks.[60][61] A minority of bones feature characters that were inked with a brush before their strokes were incised; the evidence of this also shows that the conventional stroke orders used by later calligraphers had already been established for many characters by this point.[62]

Oracle bone script is the direct ancestor of later forms of written Chinese. The oldest known inscriptions already represent a well-developed writing system, which suggests an initial emergence predating the late second millennium BCE. Although written Chinese is first attested in official divinations, it is widely believed that writing was also used for other purposes during the Shang, but that the media used in other contexts—likely bamboo and wooden slips—were less durable than bronzes or oracle bones, and have not been preserved.[63]

Zhou scripts

Bronze script
The Shi Qiang pan, a bronze ritual basin bearing inscriptions describing the deeds and virtues of the first seven Zhou kings – dated c. 900 BCE[64]

As early as the Shang, the oracle bone script existed as a simplified form alongside another that was used in bamboo books, in addition to elaborate pictorial forms often used in clan emblems. These other forms have been preserved in what is called bronze script (金文; jīnwén), where inscriptions were made using a stylus in a clay mould, which was then used to cast ritual bronzes.[65] These differences in technique generally resulted in character forms that were less angular in appearance than their oracle bone script counterparts.[66]

Study of these bronze inscriptions has revealed that the mainstream script underwent slow, gradual evolution during the late Shang, which continued during the Zhou dynasty (c. 1046 – 256 BCE) until assuming the form now known as small seal script (小篆; xiǎozhuàn) within the Zhou state of Qin.[67][68] Other scripts in use during the late Zhou include the bird-worm seal script (鸟虫书; 鳥蟲書; niǎochóngshū), as well as the regional forms used in non-Qin states. Examples of these styles were preserved as variants in the Shuowen Jiezi.[69] Historically, Zhou forms were collectively referred to as large seal script (大篆; dàzhuàn), a term which has fallen out of favour due to its lack of precision.[70]

Qin unification and small seal script

Small seal script

Following Qin's conquest of the other Chinese states that culminated in the founding of the imperial Qin dynasty in 221 BCE, the Qin small seal script was standardised for use throughout the entire country under the direction of Chancellor Li Si (c. 280 – 208 BCE).[71] It was traditionally believed that Qin scribes only used small seal script, and the later clerical script was a sudden invention during the early Han. However, more than one script was used by Qin scribes: a rectilinear vulgar style had also been in use in Qin for centuries prior to the wars of unification. The popularity of this form grew as writing became more widespread.[72]

Clerical script

Clerical script

By the Warring States period (c. 475 – 221 BCE), an immature form of clerical script (隶书; 隸書; lìshū) had emerged based on the vulgar form developed within Qin, often called "early clerical" or "proto-clerical".[73] The proto-clerical script evolved gradually; by the Han dynasty (202 BCE – 220 CE), it had arrived at a mature form, also called 八分 (bāfēn). Bamboo slips discovered during the late 20th century point to this maturation being completed during the reign of Emperor Wu of Han (r. 141–87 BCE). This process, called libian (隶变; 隸變), involved character forms being mutated and simplified, with many components being consolidated, substituted, or omitted. In turn, the components themselves were regularised to use fewer, straighter, and more well-defined strokes. The resulting clerical forms largely lacked any of the pictorial qualities that remained in seal script.[74]

Around the midpoint of the Eastern Han (25–220 CE), a simplified and easier form of clerical script appeared, which Qiu terms 'neo-clerical' (新隶体; 新隸體; xīnlìtǐ).[75] By the end of the Han, this had become the dominant script used by scribes, though clerical script remained in use for formal works, such as engraved stelae. Qiu describes neo-clerical as a transitional form between clerical and regular script which remained in use through the Three Kingdoms period (220–280 CE) and beyond.[76]

Cursive and semi-cursive

Cursive script

Cursive script (草书; 草書; cǎoshū) was in use as early as 24 BCE, synthesising elements of the vulgar writing that had originated in Qin with flowing cursive brushwork. By the Jin dynasty (266–420), the Han cursive style became known as 章草 (zhāngcǎo; 'orderly cursive'), sometimes known in English as 'clerical cursive', 'ancient cursive', or 'draft cursive'. Some attribute this name to the fact that the style was considered more orderly than a later form referred to as 今草 (jīncǎo; 'modern cursive'), which had first emerged during the Jin and was influenced by semi-cursive and regular script. This later form was exemplified by the work of figures like Wang Xizhi (303–361), who is often regarded as the most important calligrapher in Chinese history.[77][78]

Semi-cursive script

An early form of semi-cursive script (行书; 行書; xíngshū; 'running script') can be identified during the late Han, with its development stemming from a cursive form of neo-clerical script. Liu Desheng (劉德升; c. 147 – 188 CE) is traditionally recognised as the inventor of the semi-cursive style, though accreditations of this kind often indicate a given style's early masters, rather than its earliest practitioners. Later analysis has suggested popular origins for semi-cursive, as opposed to it being an invention of Liu.[79] It can be characterised partly as the result of clerical forms being written more quickly, without formal rules of technique or composition: what would be discrete strokes in clerical script frequently flow together instead. The semi-cursive style is commonly adopted in contemporary handwriting.[80]

Regular script

Regular script
A page from a Song-era publication printed using a regular script typeface[D]

Regular script (楷书; 楷書; kǎishū), based on clerical and semi-cursive forms, is the predominant form in which characters are written and printed.[81] Its innovations have traditionally been credited to the calligrapher Zhong Yao (c. 151 – 230), who was living in the state of Cao Wei (220–266); he is often called the "father of regular script".[82] The earliest surviving writing in regular script comprises copies of Zhong Yao's work, including at least one copy by Wang Xizhi. Characteristics of regular script include the 'pause' (; dùn) technique used to end horizontal strokes, as well as heavy tails on diagonal strokes made going down and to the right. It developed further during the Eastern Jin (317–420) in the hands of Wang Xizhi and his son Wang Xianzhi (344–386).[83] However, most Jin-era writers continued to use neo-clerical and semi-cursive styles in their daily writing. It was not until the Northern and Southern period (420–589) that regular script became the predominant form.[84] The system of imperial examinations for the civil service established during the Sui dynasty (581–618) required test takers to write in Literary Chinese using regular script, which contributed to the prevalence of both throughout later Chinese history.[85]



Each character of a text is written within a uniform square allotted for it. As part of the evolution from seal script into clerical script, character components became regularised as discrete series of strokes (笔画; 壁畫; bìhuà).[86] Strokes can be considered both the basic unit of handwriting, as well as the writing system's basic unit of graphemic organisation. In clerical and regular script, individual strokes traditionally belong to one of eight categories according to their technique and graphemic function. In what is known as the Eight Principles of Yong, calligraphers practice their technique using the character (yǒng; 'eternity'), which can be written with one stroke of each type.[87] In ordinary writing, is now written with five strokes instead of eight, and a system of five basic stroke types is commonly employed in analysis—with certain compound strokes treated as sequences of basic strokes made in a single motion.[88]

Characters are constructed according to predictable visual patterns. Some components have distinct combining forms when occupying specific positions within a character—for example, the 'KNIFE' component appears as on the right side of characters, but as at the top of characters.[89] The order in which components are drawn within a character is fixed. The order in which the strokes of a component are drawn is also largely fixed, but may vary according to several different standards.[90][91] This is summed up in practice with a few rules of thumb, including that characters are generally assembled from left to right, then from top to bottom, with "enclosing" components started before, then closed after, the components they enclose.[92] For example, is drawn in the following order:

Sequence and placement of the strokes in
Character Stroke

Variant characters

Variants of the Chinese character for 'turtle', collected c. 1800 from printed sources.[E] The traditional form (left) is used in Taiwan and Hong Kong. The simplified form (not pictured) is used in China, and the simplified form (top row; third from the right) is used in Japan.

Over a character's history, variant character forms (异体字; 異體字; yìtǐzì) emerge via several processes. Variant forms have distinct structures, but represent the same morpheme; as such, they can be considered instances of the same underlying character. This is comparable to visually distinct double-storey |a| and single-storey |ɑ| forms both representing the Latin letter A. Variants also emerge for aesthetic reasons, to make handwriting easier, or to correct what the writer perceives to be errors in a character's form.[93] Individual components may be replaced with visually, phonetically, or semantically similar alternatives.[94] The boundary between character structure and style—and thus whether forms represent different characters, or are merely variants of the same character—is often non-trivial or unclear.[95]

For example, prior to the Qin dynasty the character meaning 'bright' was written as either or —with either 'SUN' or 'WINDOW' on the left, and 'MOON' on the right. As part of the Qin programme to standardise small seal script across China, the form was promoted. Some scribes ignored this, and continued to write the character as . However, the increased usage of was followed by the proliferation of a third variant: , with 'EYE' on the left—likely derived as a contraction of . Ultimately, became the character's standard form.[96]



From the earliest inscriptions until the 20th century, texts were generally laid out vertically—with characters written from top to bottom in columns, arranged from right to left. A horizontal writing direction—with characters written from left to right in rows, arranged from top to bottom—only became predominant in the Sinosphere during the 20th century as a result of Western influence.[97] Many publications outside mainland China continue to use the traditional vertical writing direction.[98] Word boundaries are generally not indicated with spaces. Western influence also resulted in the generalised use of punctuation being widely adopted in print during the 19th and 20th centuries. Prior to this, the context of a passage was considered adequate to guide readers; this was enabled by characters being easier than alphabets to read when written scriptio continua, due to their more discretised shapes.[99]

Methods of writing

Ordinary handwriting on a lunch menu in Hong Kong. Here, (fǎn) is being used as an unofficial short form of (fàn; 'meal') by omitting the latter's 'EAT' component.

The earliest attested Chinese characters were carved into bone, or marked using a stylus in clay moulds used to cast ritual bronzes. Characters have also been incised into stone, or written in ink onto slips of silk, wood, and bamboo. The invention of paper for use as a writing medium occurred during the 1st century CE, and is traditionally credited to Cai Lun (d. 121 CE).[100] There are numerous styles, or scripts (; ; shū) in which characters can be written, including the historical forms like seal script and clerical script. Most styles used throughout the Sinosphere originated within China, though they may display regional variation. Styles that have been created outside of China tend to remain localised in their use: these include the Japanese edomoji and Vietnamese lệnh thư scripts.[101]


Chinese calligraphy of mixed styles by Song poet Mi Fu (1051–1107)

Calligraphy was traditionally one of the four arts to be mastered by Chinese scholars, considered to be an artful means of expressing thoughts and teachings. Chinese calligraphy typically makes use of an ink brush to write characters. Strict regularity is not required, and character forms may be accentuated to evoke a variety of aesthetic effects.[102] Traditional ideals of calligraphic beauty often tie into broader philosophical concepts native to East Asia. For example, aesthetics can be conceptualised using the framework of yin and yang, where the extremes of any number of mutually reinforcing dualities are balanced by the calligrapher—such as the duality between strokes made quickly or slowly, between applying ink heavily or lightly, between characters written with symmetrical or asymmetrical forms, and between characters representing concrete or abstract concepts.[103]

Printing and typefaces

Sample of Prison Gothic, a sans-serif typeface

Woodblock printing was invented in China between the 6th and 9th centuries,[104] followed by the invention of movable type by Bi Sheng (972–1051) during the 11th century.[105] The increasing use of print during the Ming (1368–1644) and Qing dynasties (1644–1912) led to considerable standardisation in character forms, which prefigured later script reforms during the 20th century. This print orthography, exemplified by the 1716 Kangxi Dictionary, was later dubbed the jiu zixing ('old character shapes').[106] Printed Chinese characters may use different typefaces,[107] of which there are four broad classes in use:[108]

  • Song (宋体; 宋體) or Ming (明体; 明體) typefaces—with "Song" generally used with simplified Chinese typefaces, and "Ming" with others—broadly correspond to Western serif styles. Song typefaces are broadly within the tradition of historical Chinese print; both names for the style refer to eras regarded as high points for printing in the Sinosphere. While type during the Song dynasty (960–1279) generally resembled the regular script style of a particular calligrapher, most modern Song typefaces are intended for general purpose use and emphasise neutrality in their design.
  • Sans-serif typefaces are called 'black form' (黑体; 黑體; hēitǐ) in Chinese and 'Gothic' (ゴシック体) in Japanese. Sans-serif strokes are rendered as simple lines of even thickness.
  • "Kai" typefaces (楷体; 楷體) imitate a handwritten style of regular script.
  • Fangsong typefaces (仿宋体; 仿宋體), called "Song" in Japan, correspond to semi-script styles in the Western paradigm.

Use with computers

The first four characters of the 6th-century Thousand Character Classic in different styles. From right to left: seal script, clerical script, regular script, Song type, and sans-serif type.

Before computers became ubiquitous, earlier electro-mechanical communications devices like telegraphs and typewriters were originally designed for use with alphabets, often by means of alphabetic text encodings like Morse code and ASCII. Adapting these technologies for use with a writing system comprising thousands of distinct characters was non-trivial.[109][110]

Input methods


Chinese characters are predominantly input on computers using a standard keyboard. Many input methods (IMEs) are phonetic, where typists enter characters according to schemes like pinyin or bopomofo for Mandarin, Jyutping for Cantonese, or Hepburn for Japanese. For example, 香港 ('Hong Kong') could be input as xiang1gang3 using pinyin, or as hoeng1gong2 using Jyutping.[111]

Character input methods may also be based on form, using the shape of characters and existing rules of handwriting to assign unique codes to each character, potentially increasing the speed of typing. Popular form-based input methods include Wubi on the mainland, and Cangjie—named after the mythological inventor of writing—in Taiwan and Hong Kong.[111] Often, unnecessary parts are omitted from the encoding according to predictable rules. For example, ('border') is encoded using the Cangjie method as NGMWM, which corresponds to the components 弓土一田一.[112]

Contextual constraints may be used to improve candidate character selection. When ignoring tones, 大学; 大學 and 大雪 are both transcribed as daxue, the system may prioritize which candidate should appear first based on the surrounding context.[113]

Encoding and interchange


While special text encodings for Chinese characters were introduced prior to its popularisation, The Unicode Standard is the predominant text encoding worldwide.[114] According to the philosophy of the Unicode Consortium, each distinct graph is assigned a number in the standard, but specifying its appearance or the particular allograph used is a choice made by the engine rendering the text.[F] Unicode's Basic Multilingual Plane (BMP) represents the standard's 216 smallest code points. Of these, 20992 (or 32%) are assigned to CJK Unified Ideographs, a designation comprising characters used in each of the Chinese family of scripts. As of version 15.1, Unicode defines a total of 98682 Chinese characters.[G]

Vocabulary and adaptation


Writing first emerged during the historical stage of the Chinese language known as Old Chinese. Most characters correspond to morphemes that originally functioned as stand-alone Old Chinese words.[115] Classical Chinese is the form of written Chinese used in the classic works of Chinese literature between roughly the 5th century BCE and the 2nd century CE.[116] This form of the language was imitated by later authors, even as it began to diverge from the language they spoke. This later form, referred to as "Literary Chinese", remained the predominant written language in China until the 20th century. Its use in the Sinosphere was loosely analogous to that of Latin in pre-modern Europe. While it was not static over time, Literary Chinese retained many properties of spoken Old Chinese. Informed by the local spoken vernaculars, texts were read aloud using literary and colloquial readings that varied by region. Over time, sound mergers created ambiguities in vernacular speech as more words became homophonic. This ambiguity was often reduced through the introduction of multi-syllable compound words,[117] which comprise much of the vocabulary in modern varieties of Chinese.[118][119]

Over time, use of Literary Chinese spread to neighbouring countries, including Vietnam, Korea, and Japan. Alongside other aspects of Chinese culture, local elites adopted writing for record-keeping, histories, and official communications, forming what is sometimes called the Sinosphere.[120] Excepting hypotheses by some linguists of the latter two sharing a common ancestor, Chinese, Vietnamese, Korean, and Japanese each belong to different language families,[121] and tend to function differently from one another. Reading systems were devised to enable non-Chinese speakers to interpret Literary Chinese texts in terms of their native language, a phenomenon that has been variously described as either a form of diglossia, as reading by gloss,[122] or as a process of translation into and out of Chinese. Compared to other traditions that wrote using alphabets or syllabaries, the literary culture that developed in this context was less directly tied to a specific spoken language. This is exemplified by the cross-linguistic phenomenon of brushtalk, where mutual literacy allowed speakers of different languages to engage in face-to-face conversations.[123][124]

Following the introduction of Literary Chinese, characters were later adapted to write many non-Chinese languages spoken throughout the Sinosphere. These new writing systems used characters to write both native vocabulary and the numerous loanwords each language had borrowed from Chinese, collectively referred to as Sino-Xenic vocabulary. Characters may have native readings, Sino-Xenic readings, or both.[125] Comparison of Sino-Xenic vocabulary across the Sinosphere has been useful in the reconstruction of Middle Chinese phonology.[126] Literary Chinese was used in Vietnam during the millennium of Chinese rule that began in 111 BCE. By the 15th century, a system that adapted characters to write Vietnamese called chữ Nôm had fully matured.[127] The 2nd century BCE is the earliest possible period for the introduction of writing to Korea; the oldest surviving manuscripts in the country date to the early 5th century CE. Also during the 5th century, writing spread from Korea to Japan.[128] Characters were being used to write both Korean and Japanese by the 6th century.[129] By the late 20th century, characters had largely been replaced with alphabets designed to write Vietnamese and Korean. This leaves Japanese as the only major non-Sinitic language typically written using Chinese characters.[130]

Literary and vernacular Chinese

Excerpt from a 1436 primer on Chinese characters[131]

Words in Classical Chinese were generally a single character in length.[132] An estimated 25–30% of the vocabulary used in Classical Chinese texts consists of two-character words.[133] Over time, the introduction of multi-syllable vocabulary into vernacular varieties of Chinese was encouraged by phonetic shifts that increased the number of homophones.[134] The most common process of Chinese word formation after the Classical period has been to create compounds of existing words. Words have also been created by appending affixes to words, by reduplication, and by borrowing words from other languages.[135] While multi-syllable words are generally written with one character per syllable, abbreviations are occasionally used.[136] For example, 二十 (èrshí; 'twenty') may be written as the contracted form 廿.[137]

Sometimes, different morphemes come to be represented by characters with identical shapes. For example, may represent either 'road' (xíng) or the extended sense of 'row' (háng): these morphemes are ultimately cognates that diverged in pronunciation but remained written with the same character. However, Qiu reserves the term homograph to describe identically shaped characters with different meanings that emerge via processes other than semantic extension. An example homograph is ; , which originally meant 'weight used at a steelyard' (tuó). In the 20th century, this character was created again with the meaning 'thallium' (). Both of these characters are phono-semantic compounds with 'GOLD' as the semantic component and as the phonetic component, but the words represented by each are not related.[138]

There are a number of 'dialect characters' (方言字; fāngyánzì) that are not used in standard written vernacular Chinese, but reflect the vocabulary of other spoken varieties. The most complete example of an orthography based on a variety other than Standard Chinese is Written Cantonese. A common Cantonese character is (mou5; 'to not have'), derived by removing two strokes from (jau5; 'to have').[139] It is common to use standard characters to transcribe previously unwritten words in Chinese dialects when obvious cognates exist. When no obvious cognate exists due to factors like irregular sound changes, semantic drift, or an origin in a non-Chinese language, characters are often borrowed or invented to transcribe the word—either ad hoc, or according to existing principles.[140] These new characters are generally phono-semantic compounds.[141]



In Japanese, Chinese characters are referred to as kanji. Beginning in the Nara period (710–794), readers and writers of kanbun—the Japanese term for Literary Chinese writing—began employing a system of reading techniques and annotations called kundoku. When reading, Japanese speakers would adapt the syntax and vocabulary of Literary Chinese texts to reflect their Japanese-language equivalents. Writing essentially involved the inverse of this process, and resulted in ordinary Literary Chinese.[142] When adapted to write Japanese, characters were used to represent both Sino-Japanese vocabulary loaned from Chinese, as well as the corresponding native synonyms. Most kanji were subject to both borrowing processes, and as a result have both Sino-Japanese and native readings, known as on'yomi and kun'yomi respectively. Moreover, kanji may have multiple readings of either kind. Distinct classes of on'yomi were borrowed into Japanese at different points in time from different varieties of Chinese.[143]

The Japanese writing system is a mixed script, and has also incorporated syllabaries called kana to represent phonetic units called moras, rather than morphemes. Prior to the Meiji era (1868–1912), writers used certain kanji to represent their sound values instead, in a system known as man'yōgana. Starting in the 9th century, specific man'yōgana were graphically simplified to create two distinct syllabaries called hiragana and katakana, which slowly replaced the earlier convention. Modern Japanese retains the use of kanji to represent most word stems, while kana syllabograms are generally used for grammatical affixes, particles, and loanwords. The forms of hiragana and katakana are visually distinct from one another, owing in large part to different methods of simplification: katakana were derived from smaller components of each man'yōgana, while hiragana were derived from the cursive forms of man'yōgana in their entirety. In addition, the hiragana and katakana for some moras were derived from different man'yōgana.[144] Characters invented for Japanese-language use are called kokuji. The methods employed to create kokuji are equivalent to those used by Chinese-original characters, though most are ideographic compounds. For example, (tōge; 'mountain pass') is a compound kokuji composed of 'MOUNTAIN', 'ABOVE', and 'BELOW'.[145]

While characters used to write Chinese are monosyllabic, many kanji have multi-syllable readings. For example, the kanji has a native kun'yomi reading of katana. In different contexts, it can also be read with the on'yomi reading , such as in the Chinese loanword 日本刀 (nihontō; 'Japanese sword'), with a pronunciation corresponding to that in Chinese at the time of borrowing. Prior to the universal adoption of katakana, loanwords were typically written with unrelated kanji with on'yomi readings matching the syllables in the loanword. These spellings are called ateji: for example, 亜米利加 (Amerika) was the ateji spelling of 'America', now rendered as アメリカ. As opposed to man'yōgana used solely for their pronunciation, ateji still corresponded to specific Japanese words. Some are still in use: the official list of jōyō kanji includes 106 ateji readings.[146]



In Korean, Chinese characters are known as hanja. Literary Chinese may have been written in Korea as early as the 2nd century BCE. During the Three Kingdoms period (57 BCE – 668 CE), characters were also used to write idu, a form of Korean-language literature that mostly made use of Sino-Korean vocabulary. During the Goryeo period (918–1392), Korean writers developed a system of phonetic annotations for Literary Chinese called gugyeol, comparable to kundoku in Japan, though it only entered widespread use during the later Joseon period (1392–1897).[147] While the hangul alphabet was invented by the Joseon king Sejong (r. 1418–1450) in 1443, it was not adopted by the Korean literati and was relegated to use in glosses in Literary Chinese texts until the late 19th century.[148]

Much of the Korean lexicon consists of Chinese loanwords, especially technical and academic vocabulary.[149] While hanja were usually only used to write this Sino-Korean vocabulary, there is evidence that vernacular readings were sometimes used.[126] Compared to the other written vernaculars, very few characters were invented to write Korean words; these are called gukja.[150] During the late 19th and early 20th centuries, Korean was written either using a mixed script of hangul and hanja, or only using hangul.[151] Following the end of the Empire of Japan's occupation of Korea in 1945, the total replacement of hanja with hangul was advocated throughout the country as part of a broader "purification movement" of the national language and culture.[152] However, due to the lack of tones in spoken Korean, there are many Sino-Korean words that are homophones with identical hangul spellings. For example, the phonetic dictionary entry for 기사 (gisa) yields more than 30 different entries. This ambiguity had historically been resolved by also including the associated hanja. While still sometimes used for Sino-Korean vocabulary, it is much rarer for native Korean words to be written using hanja.[153] When learning new characters, Korean students are instructed to associate each one with both its Sino-Korean pronunciation, as well as a native Korean synonym.[154] Examples include:

Example Korean dictionary listings
Hanja Hangul Gloss
Native translation Sino-Korean
; mul ; su 'water'
사람; saram ; in 'person'
; keun ; dae 'big'
작을; jakeul ; so 'small'
아래; arae ; ha 'down'
아비; abi ; bu 'father'


The first two lines of the 19th-century Vietnamese epic poem The Tale of Kieu, written in both chữ Nôm and the Vietnamese alphabet
  Borrowed characters representing Sino-Vietnamese words
  Borrowed characters representing native Vietnamese words
  Invented chữ Nôm representing native Vietnamese words

Chinese characters are called chữ Hán (𡨸漢), chữ Nho (𡨸儒; 'Confucian characters'), or Hán tự (漢字) in Vietnamese. Literary Chinese was used for all formal writing in Vietnam until the modern era,[155] having first acquired official status in 1010. Literary Chinese written by Vietnamese authors is first attested in the late 10th century, though the local practice of writing is likely several centuries older.[156] Characters used to write Vietnamese called chữ Nôm (𡨸喃) are first attested in an inscription dated to 1209 made at the site of a pagoda.[157] A mature chữ Nôm script had likely emerged by the 13th century, and was initially used to record Vietnamese folk literature. Some chữ Nôm characters are phono-semantic compounds corresponding to spoken Vietnamese syllables.[158] Another technique with no equivalent in China created chữ Nôm compounds using two phonetic components. This was done because Vietnamese phonology included consonant clusters not found in Chinese, and were thus poorly approximated by the sound values of borrowed characters. Compounds used components with two distinct consonant sounds to specify the cluster, e.g. 𢁋 (blăng;[d] 'Moon') was created as a compound of (ba) and (lăng).[159] As a system, chữ Nôm was highly complex, and the literacy rate among the Vietnamese population never exceeded 5%.[160] Both Literary Chinese and chữ Nôm fell out of use during the French colonial period, and were gradually replaced by the Latin-based Vietnamese alphabet. Following the end of colonial rule in 1954, the Vietnamese alphabet has been sole official writing system in Vietnam, and is used exclusively in Vietnamese-language media.[161]

Other languages


Several minority languages of South and Southwest China have been written with scripts using both borrowed and locally created characters. The most well-documented of these is the sawndip script for the Zhuang languages of Guangxi. While little is known about its early development, a tradition of vernacular Zhuang writing likely first emerged during the Tang dynasty (618–907). Modern scholarship on sawndip has described a network of regional writing traditions exhibiting both mutual influence and characteristic difference with one another.[162] Like Vietnamese, some invented Zhuang characters are phonetic–phonetic compounds, though not primarily ones intended to describe consonant clusters.[163] Despite the Chinese government encouraging its replacement with a Latin-based Zhuang alphabet, sawndip remains in use.[164] Other non-Sinitic languages of China written with Chinese characters include Miao, Yao, Bouyei, Bai, and Hani. Each of these languages are now written with Latin-based alphabets in official contexts.[165]

Graphically derived scripts

Excerpt from a 1908 edition of the 13th-century Secret History of the Mongols, featuring Chinese characters used to transcribe Mongolian and glosses to the right of each column

Between the 10th and 13th centuries, dynasties founded by non-Han peoples in northern China also created scripts for their languages that were inspired by Chinese characters, but did not use them directly: these included the Khitan large script, Khitan small script, Tangut script, and Jurchen script.[165] This has occurred in other contexts as well: Nüshu was a script used by Yao women to write the Xiangnan Tuhua language,[166] and bopomofo (注音符号; 注音符號; zhùyīn fúhào) is a semi-syllabary first invented in 1907[167] to represent the sounds of Standard Chinese;[168] both use forms graphically derived from Chinese characters. Other scripts within China that have adapted some characters but are otherwise distinct include the Geba syllabary used to write the Naxi language, the script for the Sui language, the script for the Yi languages, and the syllabary for the Lisu language.[165]

Chinese characters have also been repurposed phonetically to transcribe the sounds of non-Chinese languages. For example, the only manuscripts of the 13th-century Secret History of the Mongols that have survived from the medieval era use characters in this manner to write the Mongolian language.[169]

Literacy and lexicography


The memorisation of thousands of different characters is required to achieve literacy in languages written with them, in contrast to the relatively small inventory of graphemes used in phonetic writing.[170] Historically, character literacy was often acquired via Chinese primers like the 6th-century Thousand Character Classic and 13th-century Three Character Classic,[171] as well as surname dictionaries like the Song-era Hundred Family Surnames.[172] Studies of Chinese-language literacy suggest that literate individuals generally have an active vocabulary of three to four thousand characters; for specialists in fields like literature or history, this figure may be between five and six thousand.[173]



According to analyses of mainland Chinese, Taiwanese, Hong Kong, Japanese, and Korean sources, the total number of characters in the modern lexicon is around 15000.[174] Dozens of schemes have been devised for indexing Chinese characters and arranging them in dictionaries, though relatively few have achieved widespread use. Characters may be ordered according to methods based on their meaning, visual structure, or pronunciation.[175]

The Erya (c. 3rd century BCE) organised the Chinese lexicon into 16 semantic categories, as well as 3 describing abstract characters such as grammatical particles.[176] The Shuowen Jiezi (c. 100 CE) introduced what would ultimately become the predominant method of organisation used by later character dictionaries, whereby characters are grouped according to certain visually prominent components called radicals (部首; bùshǒu; 'section headers'). The Shuowen Jiezi used a system of 540 radicals, while subsequent dictionaries have generally used fewer.[177] The set of 214 Kangxi radicals was popularised by the Kangxi Dictionary (1716), but originally appeared in the earlier Zihui (1615).[178] Character dictionaries have historically been indexed using radical-and-stroke sorting, where characters are grouped by radical and sorted within each group by stroke number. Some modern dictionaries arrange character entries alphabetically according to their pinyin spelling, while also providing a traditional radical-based index.[179]

Before the invention of romanisation systems for Chinese, the pronunciation of characters was transmitted via rime dictionaries. These used the fanqie (反切; 'reverse cut') method, where each entry lists a common character with the same initial sound as the character in question, alongside one with the same final sound.[180]



Using functional magnetic resonance imaging (fMRI), neurolinguists have studied the brain activity associated with literacy. Compared to phonetic systems, reading and writing with characters involves additional areas of the brain—including those associated with visual processing.[181] While the level of memorisation required for character literacy is significant, identification of the phonetic and semantic components in compounds—which constitute the vast majority of characters—also plays a key role in reading comprehension. The ease of recognition for a given character is impacted by how regular the positioning of its components is, as well as how reliable its phonetic component is in indicating a specific pronunciation.[182] Moreover, due to the high level of homophony in Chinese languages and the more irregular correspondences between writing and the sounds of speech, it has been suggested that knowledge of orthography plays a greater role in speech recognition for literate Chinese speakers.[183]

Developmental dyslexia in readers of character-based languages appears to involve independent visuospatial and phonological disorders co-occurring. This seems to be a distinct phenomenon from dyslexia as experienced with phonetic orthographies, which can result from only one of the aforementioned disorders.[184]

Reform and standardisation

The first official list of simplified character forms, published in 1935 and including 324 characters[185]

Attempts to reform and standardise the use of characters—including aspects of form, stroke order, and pronunciation—have been undertaken by states throughout history. Thousands of simplified characters were standardised and adopted in mainland China during the 1950s and 1960s, with most either already existing as common variants, or being produced via the systematic simplification of their components.[186] After World War II, the Japanese government also simplified hundreds of character forms, including some simplifications distinct from those adopted in China.[187] Orthodox forms that have not undergone simplification are referred to as traditional characters. Across Chinese-speaking polities, mainland China, Malaysia, and Singapore use simplified characters, while Taiwan, Hong Kong, and Macau use traditional characters.[188] In general, Chinese and Japanese readers can successfully identify characters from all three standards.[189]

Prior to the 20th century, reforms were generally conservative and sought to reduce the use of simplified variants.[190] During the late 19th and early 20th centuries, an increasing number of intellectuals in China began to see both the Chinese writing system and the lack of a national spoken dialect as serious impediments to achieving the mass literacy and mutual intelligibility required for the country's successful modernisation. Many began advocating for the replacement of Literary Chinese with a written language that more closely reflected speech, as well as for a mass simplification of character forms, or even the total replacement of characters with an alphabet tailored to a specific spoken variety. In 1909, the educator and linguist Lufei Kui (1886–1941) formally proposed the adoption of simplified characters in education for the first time.[191]

In 1911, the Xinhai Revolution toppled the Qing dynasty, and resulted in the establishment of the Republic of China the following year. The early Republican era (1912–1949) was characterised by growing social and political discontent that erupted into the 1919 May Fourth Movement, catalysing the replacement of Literary Chinese with written vernacular Chinese over the subsequent decades. Alongside the corresponding spoken variety now known as Standard Chinese, this written vernacular was promoted by intellectuals and writers such as Lu Xun (1881–1936) and Hu Shih (1891–1962).[192] It was based on the Beijing dialect of Mandarin,[193] as well as on the existing body of vernacular literature authored over the preceding centuries, which included classic novels such as Journey to the West (c. 1592) and Dream of the Red Chamber (mid-18th century).[194] At this time, character simplification and phonetic writing were being discussed within both the ruling Kuomintang (KMT) party, as well as the Chinese Communist Party (CCP). In 1935, the Republican government published the first official list of simplified characters, comprising 324 forms collated by Peking University professor Qian Xuantong (1887–1939). However, strong opposition within the party resulted in the list being rescinded in 1936.[195]

People's Republic of China

Traditional ()
Simplified ()
Comparison between character forms, showing systematic simplification of the component 'GATE'

The project of script reform in China was ultimately inherited by the Communists, who resumed work following the proclamation of the People's Republic of China in 1949. In 1951, Premier Zhou Enlai (1898–1976) ordered the formation of a Script Reform Committee, with subgroups investigating both simplification and alphabetisation. The simplification subgroup began surveying and collating simplified forms the following year,[196] ultimately publishing a draft scheme of simplified characters and components in 1956. In 1958, Zhou Enlai announced the government's intent to focus on simplification, as opposed to replacing characters with Hanyu Pinyin, which had been introduced earlier that year.[197] The 1956 scheme was largely ratified by a revised list of 2235 characters promulgated in 1964.[198] The majority of these characters were drawn from conventional abbreviations or ancient forms with fewer strokes.[199] The committee also sought to reduce the total number of characters in use by merging some forms together.[199] For example, ('cloud') was written as in oracle bone script. The simpler form remained in use as a loangraph meaning 'to say'; it was replaced in its original sense of 'cloud' with a form that added a semantic 'RAIN' component. The simplified forms of these two characters have been merged into .[200]

A second round of simplified characters was promulgated in 1977, but was poorly received by the public and quickly fell out of official use. It was ultimately formally rescinded in 1986.[201] The second-round simplifications were unpopular in large part because most of the forms were completely new, in contrast to the familiar variants comprising the majority of the first round.[202] With the rescission of the second round, work toward further character simplification largely came to an end.[203] The Chart of Generally Utilised Characters of Modern Chinese was published in 1988 and included 7000 simplified and unsimplified characters. Of these, half were also included in the revised List of Commonly Used Characters in Modern Chinese, which specified 2500 common characters and 1000 less common characters.[204] In 2013, the Table of General Standard Chinese Characters was published as a revision of the 1988 lists; it included a total of 8105 characters.[205]


Regional forms of the character in the Noto Serif typeface family. From left to right: forms used in mainland China, Taiwan, and Hong Kong (top), and in Japan and Korea (bottom)

After World War II, the Japanese government instituted its own program of orthographic reforms. Some characters were assigned simplified forms called shinjitai; the older forms were then labelled kyūjitai. Inconsistent use of different variant forms was discouraged, and lists of characters to be taught to students at each grade level were developed. The first of these was the 1850-character tōyō kanji list published in 1946, later replaced by the 1945-character jōyō kanji list in 1981. In 2010, the jōyō kanji were expanded to include a total of 2136 characters.[206][207] The Japanese government restricts characters that may be used in names to the jōyō kanji, plus an additional list of 983 jinmeiyō kanji whose use are historically prevalent in names.[208][209]

South Korea


Hanja are still used in South Korea, though not to the extent that kanji are used in Japan. In general, there is a trend toward the exclusive use of hangul in ordinary contexts.[210] Characters remain in use in place names, newspapers, and to disambiguate homophones. They are also used in the practice of calligraphy. Use of hanja in education is politically contentious, with official policy regarding the prominence of hanja in curricula having vacillated since the country's independence.[211][212] Some support the total abandonment of hanja, while others advocate an increase in use to levels previously seen during the 1970s and 1980s. Students in grades 7–12 are presently taught with a principal focus on simple recognition and attaining sufficient literacy to read a newspaper.[148] The South Korean Ministry of Education published the Basic Hanja for Educational Use in 1972, which specified 1800 characters meant to be learned by secondary school students.[213] In 1991, the Supreme Court of Korea published the Table of Hanja for Use in Personal Names (인명용한자; Inmyeong-yong hanja), which initially included 2854 characters.[214] The list has been expanded several times since; as of 2022, it includes 8319 characters.[215]

North Korea


In the years following its establishment, the North Korean government sought to eliminate the use of hanja in standard writing; by 1949, characters had been almost entirely replaced with hangul in North Korean publications.[216] While mostly unused in writing, hanja remain an important part of North Korean education: a 1971 textbook for university history departments contained 3323 distinct characters, and in the 1990s North Korean school children were still expected to learn 2000 characters.[217] A 2013 textbook appears to integrate the use of hanja in secondary school education.[218] It has been estimated that North Korean students learn around 3000 hanja by the time they graduate university.[219]



The Chart of Standard Forms of Common National Characters was published by Taiwan's Ministry of Education in 1982, and lists 4808 traditional characters.[220] The Ministry of Education also compiles dictionaries of characters used in Taiwanese Hokkien and Hakka.[H]

Other regional standards


Singapore's Ministry of Education promulgated three successive rounds of simplifications: the first round in 1969 included 502 simplified characters, and the second round in 1974 included 2287 simplified characters—including 49 that differed from those in the PRC, which were ultimately removed in the final round in 1976. In 1993, Singapore adopted the revisions made in mainland China in 1986.[221]

The Hong Kong Education and Manpower Bureau's List of Graphemes of Commonly-Used Chinese Characters includes 4762 traditional characters used in elementary and junior secondary education.[222]


  1. ^ 漢字; simplified as 汉字 Also referred to as sinographs[223] or sinograms[224]
  2. ^ Zev Handel lists:[2]
    1. Sumerian cuneiform emerging c. 3200 BCE
    2. Egyptian hieroglyphs emerging c. 3100 BCE
    3. Chinese characters emerging c. 13th century BCE
    4. Maya script emerging c. 1 CE
  3. ^ According to Handel: "While monosyllabism generally trumps morphemicity—that is to say, a bisyllabic morpheme is nearly always written with two characters rather than one—there is an unmistakable tendency for script users to impose a morphemic identity on the linguistic units represented by these characters."[10]
  4. ^ This is the Middle Vietnamese pronunciation; the word is pronounced in modern Vietnamese as trăng.




  1. ^ Guangxi Nationalities Publishing House 1989.
  2. ^ Handel 2019, p. 1.
  3. ^ Qiu 2000, p. 2.
  4. ^ Qiu 2000, pp. 3–4.
  5. ^ Qiu 2000, p. 5.
  6. ^ Norman 1988, p. 59; Li 2020, p. 48.
  7. ^ Qiu 2000, pp. 11, 16.
  8. ^ Qiu 2000, p. 1; Handel 2019, pp. 4–5.
  9. ^ Qiu 2000, pp. 22–26; Norman 1988, p. 74.
  10. ^ Handel 2019, p. 33.
  11. ^ Qiu 2000, pp. 13–15; Coulmas 1991, pp. 104–109.
  12. ^ Li 2020, pp. 56–57; Boltz 1994, pp. 3–4.
  13. ^ Handel 2019, p. 51; Yong & Peng 2008, pp. 95–98.
  14. ^ Qiu 2000, pp. 19, 162–168.
  15. ^ Boltz 2011, pp. 57, 60.
  16. ^ Qiu 2000, pp. 14–18.
  17. ^ Yin 2007, pp. 97–100; Su 2014, pp. 102–111.
  18. ^ Yang 2008, pp. 147–148.
  19. ^ Demattè 2022, p. 14.
  20. ^ Qiu 2000, pp. 163–171.
  21. ^ a b Yong & Peng 2008, p. 19.
  22. ^ Qiu 2000, pp. 44–45; Zhou 2003, p. 61.
  23. ^ Qiu 2000, pp. 18–19.
  24. ^ Qiu 2000, p. 154; Norman 1988, p. 68.
  25. ^ Yip 2000, pp. 39–42.
  26. ^ Qiu 2000, p. 46.
  27. ^ Norman 1988, p. 68; Qiu 2000, pp. 185–187.
  28. ^ Qiu 2000, pp. 15, 190–202.
  29. ^ Sampson & Chen 2013, p. 261.
  30. ^ Qiu 2000, p. 155.
  31. ^ Boltz 1994, pp. 104–110.
  32. ^ Sampson & Chen 2013, pp. 265–268.
  33. ^ Norman 1988, p. 68.
  34. ^ a b Qiu 2000, p. 154.
  35. ^ Cruttenden 2021, pp. 167–168.
  36. ^ Williams 2010.
  37. ^ Vogelsang 2021, pp. 51–52.
  38. ^ Qiu 2000, pp. 261–265.
  39. ^ Qiu 2000, pp. 273–274, 302.
  40. ^ Taylor & Taylor 2014, pp. 30–32.
  41. ^ Ramsey 1987, p. 60.
  42. ^ Gnanadesikan 2011, p. 61.
  43. ^ Qiu 2000, p. 168; Norman 1988, p. 60.
  44. ^ Norman 1988, pp. 67–69; Handel 2019, p. 48.
  45. ^ Norman 1988, pp. 170–171.
  46. ^ Handel 2019, pp. 48–49.
  47. ^ Qiu 2000, pp. 153–154, 161; Norman 1988, p. 170.
  48. ^ Qiu 2013, pp. 102–108; Norman 1988, p. 69.
  49. ^ Handel 2019, p. 43.
  50. ^ Qiu 2000, pp. 44–45.
  51. ^ Qiu 2000, pp. 59–60, 66.
  52. ^ Demattè 2022, pp. 79–80.
  53. ^ Yang & An 2008, pp. 84–86.
  54. ^ Boltz 1994, pp. 130–138.
  55. ^ Qiu 2000, p. 31.
  56. ^ Qiu 2000, p. 39.
  57. ^ Boltz 1999, pp. 74, 107–108; Liu et al. 2017, pp. 155–175.
  58. ^ Liu & Chen 2012, p. 6.
  59. ^ Kern 2010, p. 1; Wilkinson 2012, pp. 681–682.
  60. ^ Keightley 1978, pp. 28–42.
  61. ^ Kern 2010, p. 1.
  62. ^ Keightley 1978, pp. 46–47.
  63. ^ Boltz 1986, p. 424; Kern 2010, p. 2.
  64. ^ Shaughnessy 1991, p. 1–4.
  65. ^ Qiu 2000, pp. 63–66.
  66. ^ Qiu 2000, pp. 88–89.
  67. ^ Qiu 2000, pp. 76–78.
  68. ^ Chen 2003.
  69. ^ Louis 2003.
  70. ^ Qiu 2000, p. 77.
  71. ^ Boltz 1994, p. 156.
  72. ^ Qiu 2000, pp. 104–107.
  73. ^ Qiu 2000, pp. 59, 119.
  74. ^ Qiu 2000, pp. 119–124.
  75. ^ Qiu 2000, pp. 113, 139, 466.
  76. ^ Qiu 2000, pp. 138–139.
  77. ^ Qiu 2000, pp. 130–148.
  78. ^ Knechtges & Chang 2014, pp. 1257–1259.
  79. ^ Qiu 2000, pp. 113, 139–142.
  80. ^ Li 2020, p. 51; Qiu 2000, p. 149; Norman 1988, p. 70.
  81. ^ Qiu 2000, pp. 113, 149.
  82. ^ Chan 2020, p. 125.
  83. ^ Qiu 2000, p. 143.
  84. ^ Qiu 2000, pp. 144–145.
  85. ^ Li 2020, p. 41.
  86. ^ Li 2020, pp. 54, 196–197; Peking University 2004, pp. 148–152; Zhou 2003, p. 88.
  87. ^ Norman 1988, p. 86; Zhou 2003, p. 58; Zhang 2013.
  88. ^ Li 2009, pp. 65–66; Zhou 2003, p. 88.
  89. ^ Handel 2019, pp. 43–44.
  90. ^ Yin 2016, pp. 58–59.
  91. ^ Myers 2019, pp. 106–116.
  92. ^ Li 2009, p. 70.
  93. ^ Qiu 2000, pp. 204–215, 373.
  94. ^ Zhou 2003, pp. 57–60, 63–65.
  95. ^ Qiu 2000, pp. 297–300, 373.
  96. ^ Bökset 2006, pp. 16, 19.
  97. ^ Li 2020, p. 54; Handel 2019, p. 27; Keightley 1978, p. 50.
  98. ^ Taylor & Taylor 2014, pp. 372–373; Bachner 2014, p. 245.
  99. ^ Needham & Harbsmeier 1998, pp. 175–176; Taylor & Taylor 2014, pp. 374–375.
  100. ^ Needham & Tsien 2001, pp. 23–25, 38–41.
  101. ^ Nawar 2020.
  102. ^ Li 2009, pp. 180–183.
  103. ^ Li 2009, pp. 175–179.
  104. ^ Needham & Tsien 2001, pp. 146–147, 159.
  105. ^ Needham & Tsien 2001, pp. 201–205.
  106. ^ Yong & Peng 2008, pp. 280–282, 293–297.
  107. ^ Li 2013, p. 62.
  108. ^ Lunde 2008, pp. 23–25.
  109. ^ Su 2014, p. 218.
  110. ^ Mullaney 2017, p. 25.
  111. ^ a b Li 2020, pp. 152–153.
  112. ^ Zhang 2016, p. 422.
  113. ^ Su 2014, p. 222.
  114. ^ Lunde 2008, p. 193.
  115. ^ Norman 1988, pp. 74–75.
  116. ^ Vogelsang 2021, pp. xvii–xix.
  117. ^ Wilkinson 2012, p. 22.
  118. ^ Tong, Liu & McBride-Chang 2009, p. 203.
  119. ^ Yip 2000, p. 18.
  120. ^ Handel 2019, pp. 11–12; Kornicki 2018, pp. 15–16.
  121. ^ Handel 2019, pp. 28, 69, 126, 169.
  122. ^ Kin 2021, p. XII.
  123. ^ Denecke 2014, pp. 204–216.
  124. ^ Kornicki 2018, pp. 72–73.
  125. ^ Handel 2019, p. 212.
  126. ^ a b Kornicki 2018, p. 168.
  127. ^ Handel 2019, pp. 124–125, 133.
  128. ^ Handel 2019, pp. 64–65.
  129. ^ Kornicki 2018, p. 57.
  130. ^ Hannas 1997, pp. 136–138.
  131. ^ Ebrey 1996, p. 205.
  132. ^ Norman 1988, p. 58.
  133. ^ Wilkinson 2012, pp. 22–23.
  134. ^ Norman 1988, pp. 86–87.
  135. ^ Norman 1988, pp. 155–156.
  136. ^ Norman 1988, p. 74.
  137. ^ Handel 2019, p. 34.
  138. ^ Qiu 2000, pp. 301–302.
  139. ^ Handel 2019, p. 59.
  140. ^ Cheung & Bauer 2002, pp. 12–20.
  141. ^ Norman 1988, pp. 75–77.
  142. ^ Li 2020, p. 88.
  143. ^ Coulmas 1991, pp. 122–129.
  144. ^ Coulmas 1991, pp. 129–132.
  145. ^ Handel 2019, pp. 192–196.
  146. ^ Taylor & Taylor 2014, pp. 275–279.
  147. ^ Li 2020, pp. 78–80.
  148. ^ a b Fischer 2004, pp. 189–194.
  149. ^ Hannas 1997, p. 49; Taylor & Taylor 2014, p. 435.
  150. ^ Handel 2019, pp. 88, 102.
  151. ^ Handel 2019, pp. 112–113; Hannas 1997, pp. 60–61.
  152. ^ Hannas 1997, pp. 64–66.
  153. ^ Norman 1988, p. 79.
  154. ^ Handel 2019, pp. 75–82.
  155. ^ Handel 2019, pp. 124–126; Kin 2021, p. XI.
  156. ^ Hannas 1997, p. 73.
  157. ^ DeFrancis 1977, pp. 23–24.
  158. ^ Kornicki 2018, p. 63.
  159. ^ Handel 2019, pp. 145, 150.
  160. ^ DeFrancis 1977, p. 19.
  161. ^ Coulmas 1991, pp. 113–115; Hannas 1997, pp. 73, 84–87.
  162. ^ Handel 2019, pp. 239–240.
  163. ^ Handel 2019, pp. 251–252.
  164. ^ Handel 2019, pp. 231, 234–235; Zhou 2003, pp. 140–142, 151.
  165. ^ a b c Zhou 1991.
  166. ^ Zhao 1998.
  167. ^ Kuzuoğlu 2023, p. 71.
  168. ^ DeFrancis 1984, p. 242; Taylor & Taylor 2014, p. 14; Li 2020, p. 123.
  169. ^ Hung 1951, p. 481.
  170. ^ Demattè 2022, p. 8; Taylor & Taylor 2014, pp. 110–111.
  171. ^ Kornicki 2018, pp. 273–277.
  172. ^ Yong & Peng 2008, pp. 55–58.
  173. ^ Norman 1988, p. 73.
  174. ^ Su 2014, pp. 47, 51.
  175. ^ Su 2014, p. 183; Needham & Harbsmeier 1998, pp. 65–66.
  176. ^ Xue 1982, pp. 152–153.
  177. ^ Yong & Peng 2008, pp. 100–103, 203.
  178. ^ Zhou 2003, p. 88; Norman 1988, pp. 170–172; Needham & Harbsmeier 1998, pp. 79–80.
  179. ^ Yong & Peng 2008, pp. 145, 400–401.
  180. ^ Norman 1988, pp. 27–28.
  181. ^ Demattè 2022, p. 9.
  182. ^ Lee 2015b.
  183. ^ Lee 2015a, The Brain Network for Chinese Language Processing.
  184. ^ McBride, Tong & Mo 2015, pp. 688–690; Ho 2015; Taylor & Taylor 2014, pp. 150–151, 346–349, 393–394.
  185. ^ Chen 1999, pp. 153.
  186. ^ Zhou 2003, pp. 60–67.
  187. ^ Taylor & Taylor 2014, pp. 117–118.
  188. ^ Li 2020, p. 136.
  189. ^ Wang 2016, p. 171.
  190. ^ Qiu 2000, p. 404.
  191. ^ Zhou 2003, pp. xvii–xix; Li 2020, p. 136.
  192. ^ Zhou 2003, pp. xviii–xix.
  193. ^ DeFrancis 1972, pp. 11–13.
  194. ^ Zhong 2019, pp. 113–114; Chen 1999, pp. 70–74, 80–82.
  195. ^ Chen 1999, pp. 150–153.
  196. ^ Bökset 2006, p. 26.
  197. ^ Zhong 2019, pp. 157–158.
  198. ^ Li 2020, p. 142.
  199. ^ a b Chen 1999, pp. 154–156.
  200. ^ Zhou 2003, p. 63.
  201. ^ Chen 1999, pp. 155–156.
  202. ^ Chen 1999, pp. 159–160.
  203. ^ Chen 1999, pp. 196–197.
  204. ^ Zhou 2003, p. 79; Chen 1999, p. 136.
  205. ^ Li 2020, pp. 145–146.
  206. ^ Taylor & Taylor 2014, p. 275.
  207. ^ 改定常用漢字表、30日に内閣告示 閣議で正式決定 [The amended list of jōyō kanji receives cabinet notice on 30th: to be officially confirmed in cabinet meeting]. The Nikkei (in Japanese). 24 November 2010.
  208. ^ 人名用漢字に「渾」追加 司法判断を受け法務省 改正戸籍法施行規則を施行、計863字に ["渾" added to kanji usable in personal names; Ministry of Justice enacts revised Family Registration Law Enforcement Regulations following judicial ruling, totaling 863 characters]. The Nikkei (in Japanese). 25 September 2017.
  209. ^ Lunde 2008, pp. 82–84.
  210. ^ Hannas 1997, p. 48.
  211. ^ Hannas 1997, pp. 65–66, 69–72.
  212. ^ Choo & O'Grady 1996, p. ix.
  213. ^ Lunde 2008, p. 84.
  214. ^ Taylor & Taylor 2014, p. 179.
  215. ^ 乻(땅이름 늘)·賏(목치장 영)... ‘인명용 한자’ 40자 추가된다 [乻 · 賏... 40 Hanja for Use in Personal Names added]. The Chosun Ilbo (in Korean). 26 December 2021.
  216. ^ Handel 2019, pp. 113; Hannas 1997, pp. 66–67.
  217. ^ Hannas 1997, pp. 67–68.
  218. ^ 북한의 한문교과서를 보다 [A look at North Korea’s "Literary Chinese" textbooks]. Chosun NK (in Korean). The Chosun Ilbo. 14 March 2014.
  219. ^ Kim Mi-young (김미영) (4 June 2001). '3000자까지 배우되 쓰지는 마라' ["Learn up to 3000 characters, but don't write them"]. Chosun NK (in Korean). The Chosun Ilbo.
  220. ^ Lunde 2008, p. 81.
  221. ^ Shang & Zhao 2017, p. 320.
  222. ^ Chen 1999, pp. 161.
  223. ^ Tam 2020, p. 29.
  224. ^ Fischer 2004, p. 166; DeFrancis 1984, p. 71.

Works cited


Primary sources

  1. ^ Maspero, Gaston (1870). Recueil de travaux relatifs à la philologie et à l'archéologie égyptiennes et assyriennes (in French). Paris: Librairie Honoré Champion. p. 243.
  2. ^ Laozi (老子) (1891). "80". 道德經 [Tao Te Ching] (in Literary Chinese and English). Translated by Legge, James – via the Chinese Text Project. [I would make the people return to the use of knotted cords (instead of the written characters).]
  3. ^ 系辞下 [Xi Ci II]. 易經 [I Ching] (in Literary Chinese and English). Translated by Legge, James. 1899 – via the Chinese Text Project. [In the highest antiquity, government was carried on successfully by the use of knotted cords (to preserve the memory of things). In subsequent ages the sages substituted for these written characters and bonds.]
  4. ^ Shao Si (邵思) (1035). Explaining Surnames 姓解 (in Literary Chinese). Tokyo. 2. doi:10.11501/1287529. Retrieved 30 May 2024 – via National Diet Library.
  5. ^ Morrison, Robert; Montucci, Antonio (1817). Urh-chih-tsze-tëen-se-yin-pe-keáou: Being a Parallel Drawn Between the Two Intended Chinese Dictionaries. London: Cadell & Davies, T. Boosey. p. 18.
  6. ^ Technical Introduction. The Unicode Consortium. 22 August 2019. Retrieved 11 May 2024.
  7. ^ Lunde, Ken; Cook, Richard, eds. (2023). "Standard Annex #38: Unicode Han Database (Unihan)". The Unicode Standard, Version 15.1.0. South San Francisco, CA: The Unicode Consortium. ISBN 978-1-936-21333-7.
  8. ^
    • "Introduction". 常用詞辭典 [Dictionary of Frequently-Used Taiwan Minnan]. Taiwan Ministry of Education. 2024.
    • "Introduction". 客語辭典 [Dictionary of Taiwan Hakka]. Taiwan Ministry of Education. 2023.

Further reading


Works of historical interest

  • Unihan Database – Reference glyphs, readings, and meanings for characters in The Unicode Standard, with information about the history of Han unification
  • Chinese Text Project Dictionary – Comprehensive character dictionary, including examples of Classical Chinese usage
  • – Character lookup by orthography, phonology, and etymology
  • Chinese Etymology by Richard Sears