Language complexity

Language complexity is a topic in linguistics which can be divided into several sub-topics such as phonological, morphological, syntactic, and semantic complexity.[1][2] The subject also carries importance for language evolution.[3] Although the concept of language complexity is an old one, the current interest has largely emerged since the beginning of the 21st century as it was previously considered problematic in terms of political correctness.[4][page needed]

Language complexity has been studied less than many other traditional fields of linguistics. While the consensus is turning towards recognizing that complexity is a suitable research area, a central focus has been on methodological choices. Some languages, particularly pidgins and creoles, are considered simpler than most other languages, but there is no direct ranking, and no universal method of measurement although several possibilities are now proposed within different schools of analysis.[5]


Throughout the 19th century, differential complexity was taken for granted. The classical languages Latin and Greek, as well as Sanskrit, were considered to possess qualities which could be achieved by the rising European national languages only through an elaboration that would give them the necessary structural and lexical complexity that would meet the requirements of an advanced civilization. At the same time, languages described as 'primitive' were naturally considered to reflect the simplicity of their speakers. On the other hand, Friedrich Schlegel noted that some nations "which appear to be at the very lowest grade of intellectual culture", such as Basque, Sámi and some native American languages, possess a striking degree of elaborateness.[5]

Darwin considered the apparent complexity of many non-Western languages as problematic for evolution theory which in his time held that less advanced people should have less complex languages. Darwin's suggestion was that simplicity and irregularities were the result of extensive language contact while "the extremely complex and regular construction of many barbarous languages" should be seen as an utmost perfection of the one and same evolutionary process.[6]

Equal complexity hypothesisEdit

During the 20th century, linguists and anthropologists adopted a standpoint that would reject any nationalist ideas about superiority of the languages of establishment. The first known quote that puts forward the idea that all languages are equally complex comes from Rulon S. Wells III, 1954, who attributes it to Charles F. Hockett. Within a year, the same idea found its way to Encyclopædia Britannica:

"All languages of today are equally complex(.) -- There are no 'primitive' languages, but all languages seem to be equally old and equally developed."[5]

While laymen never ceased to consider certain languages as simple and others as complex, such a view was erased from official contexts. For instance, the 1971 edition of Guinness Book of World Records featured Saramaccan, a creole language, as "the world's least complex language". According to linguists, this claim was "not founded on any serious evidence", and it was removed from later editions.[7] Apparent complexity differences in certain areas were explained with a balancing force by which the simplicity in one area would be compensated with the complexity of another; e.g. David Crystal, 1987:

"All languages have a complex grammar: there may be relative simplicity in one respect (e.g., no word-endings), but there seems always to be relative complexity in another (e.g., word-position)".[8]

In 2001 the compensation hypothesis was eventually refuted by the creolist John McWhorter who pointed out the absurdity of the idea that, as languages change, each would have to include a mechanism that calibrates it according to the complexity of all the other 6,000 or so languages around the world. He underscored that linguistics has no knowledge of any such mechanism.[8]

Revisiting the idea of differential complexity, McWhorter argued that it is indeed creole languages, such as Saramaccan, that are structurally "much simpler than all but very few older languages". In McWhorter's notion this is not problematic in terms of the equality of creole languages because simpler structures convey logical meanings in the most straightforward manner, while increased language complexity is largely a question of features which may not add much to the functionality, or improve usefulness, of the language. Examples of such features are inalienable possessive marking, switch-reference marking, syntactic asymmetries between matrix and subordinate clauses, grammatical gender, and other secondary features which are most typically absent in creoles.[8]

During the years following McWhorter's article, several books and dozens of articles were published on the topic.[4][page needed] As to date, there have been research projects on language complexity, and several workshops for researchers have been organised by various universities.[1]

Complexity metricsEdit

At a general level, language complexity can be characterized as the number and variety of elements, and the elaborateness of their interrelational structure.[9][10] This general characterisation can be broken down into sub-areas:

  • Syntagmatic complexity: number of parts, such as word length in terms of phonemes, syllables etc.
  • Paradigmatic complexity: variety of parts, such as phoneme inventory size, number of distinctions in a grammatical category, e.g. aspect
  • Organizational complexity: e.g. ways of arranging components, phonotactic restrictions, variety of word orders.
  • Hierarchic complexity: e.g. recursion, lexical–semantic hierarchies.[10]

Measuring complexity is considered difficult, and the comparison of whole natural languages as a daunting task. On a more detailed level, it is possible to demonstrate that some structures are more complex than others. Phonology and morphology are areas where such comparisons have traditionally been made. For instance, linguistics has tools for the assessment of the phonological system of any given language. As for the study of syntactic complexity, grammatical rules have been proposed as a basis,[8] but generative frameworks, such as Minimalist Program and Simpler Syntax, have been less successful in defining complexity and its predictions than non-formal ways of description.[11][page needed]

Many researchers suggest that several different concepts may be needed when approaching complexity: entropy, size, description length, effective complexity, information, connectivity, irreducibility, low probability, syntactic depth etc. Research suggests that while methodological choices affect the results, even rather crude analytic tools may provide a feasible starting point for measuring grammatical complexity.[10]

A comparisonEdit

Guy (1994)[12] illustrates the point[which?] by comparing two Santo languages he has worked on that are about as closely related as French and Spanish, Tolomako and Sakao, both spoken in the village of Port Olry, Vanuatu. Because these languages are very similar to each other, and equally distant from English, he holds that neither is inherently biased as being seen as more easy or difficult by an English speaker (see difficulty of learning languages).


Sakao has more, and more difficult, vowel distinctions than Tolomako:

Tolomako vowels
close i u
mid e o
open a
Sakao vowels (partial)
close i y u
close mid e ø o
open mid ɛ œ ɔ
open a ɒ

In addition, Sakao has a close vowel /ɨ/ that is unspecified for being rounded or unrounded, front or back, and is always unstressed. It also has the two diphthongs /œɛ, ɒɔ/, whereas Tolomako has none.

In addition, it has more and more difficult consonant distinctions:

Tolomako consonants
labial alveolar velar
nasal m n
plosive p t k
affricate ts
fricative β ɣ
trill r
approximant l
Sakao consonants
labial alveolar palatal velar glottal
nasal m n ŋ
plosive p t k
fricative β ð ɣ h
trill r
voiceless trill
approximant w l j

In addition, Sakao consonants may be long or short: /œβe/ "drum", /œββe/ "bed"

Tolomako has a simple syllable structure, maximally consonant–vowel–vowel. It is not clear if Sakao even has syllables; that is, whether trying to divide Sakao words into meaningful syllables is even possible.

Tolomako syllable structure
Sakao syllable structure
V (a vowel or diphthong) surrounded by any number of consonants:
V /i/ "thou", CCVCCCC (?) /mhɛrtpr/ "having sung and stopped singing thou kept silent"
[m- 2nd pers., hɛrt "to sing", -p perfective, -r continuous].


With inalienably possessed nouns, Tolomako inflections are consistently regular, whereas Sakao is full of irregular nouns:

Tolomako Sakao English
na tsiɣo-ku œsɨŋœ-ɣ "my mouth"
na tsiɣo-mu œsɨŋœ-m "thy mouth"
na tsiɣo-na ɔsɨŋɔ-n "his/her/its mouth"
na tsiɣo-... œsœŋ-... "...'s mouth"
Tolomako Sakao English
na βulu-ku uly-ɣ "my hair"
na βulu-mu uly-m "thy hair"
na βulu-na ulœ-n "his/her/its hair"
na βulu-... nøl-... "...'s hair"

Here Tolomako "mouth" is invariably tsiɣo- and "hair" invariably βulu-, whereas Sakao "mouth" is variably œsɨŋœ-, ɔsɨŋɔ-, œsœŋ- and "hair" variably uly-, ulœ-, nøl-.


With deixis, Tolomako has three degrees (here/this, there/that, yonder/yon), whereas Sakao has seven.

Tolomako has a preposition to distinguish the object of a verb from an instrument; indeed, a single preposition, ne, is used for all relationships of space and time. Sakao, on the other hand, treats both as objects of the verb, with a transitive suffix -ɨn that shows the verb has two objects, but letting context disambiguate which is which:

mo losi na poe ne na matsa
S/he hits ART pig PREP ART club
"He hits (kills) the pig with a club"
mɨ-jil-ɨn a-ra a-mas
S/he-hits-TRANS ART-pig ART-club
"He hits (kills) the pig with a club"

The Sakao could also be mɨjilɨn amas ara

The Sakao strategy involves polysynthetic syntax, as opposed to the isolating syntax of Tolomako:

Sakao polysynthesis
Mɔssɔnɛshɔβrɨn aða ɛðɛ     (or: ɛðɛ aða)
mɔ-sɔn-nɛs-hɔβ-r-ɨn a-ða ɛ-ðɛ
s/he-shoots-fish-follows-CONT-TRANS ART-bow ART-sea
"He kept on walking along the shore shooting fish with a bow."

Here aða "the bow" is the instrumental of sɔn "to shoot", and ɛðɛ "the sea" is the direct object of hoβ "to follow", which because they are combined into a single verb, are marked as ditransitive with the suffix -ɨn. Because sɔn "to shoot" has the incorporated object nɛs "fish", the first consonant geminates for ssɔn; ssɔn-nɛs, being part of one word, then reduces to ssɔnɛs. And indeed, the previous example of killing a pig could be put more succinctly, but grammatically more complexly, in Sakao by incorporating the object 'pig' into the verb:

mɨjilrapɨn amas
mɨ-jil-ra-p-ɨn a-mas
s/he-hit-pig-PFV-TRANS ART-club

Guy asks rhetorically, "Which of the two languages spoken in Port-Olry do you think the Catholic missionaries learnt and used? Could that possibly be because it was easier than the other?"

Language complexity and learningEdit

A common conventional wisdom is that some languages are inherently harder to learn than others as first or second languages, due to their greater complexity. However this belief is as of yet not supported by sufficient scientific evidence.

The perceived difficulty of second language acquisition seems to largely depend on the similarity between the learner's native language and the language they are learning. In a study conducted in 2013, scientists [13] used FSI’s data to try to identify the criteria that have an influence on the difficulty of foreign language learning.

  • First, a language that is genetically related to the learner's native language will be easier to learn than a language from a different family. This is mostly due to language structure. The closer a language is to another, the more similar their structures will be (this applies to sounds, grammar, vocabulary, and so on).
  • Another criterion is the writing system. Learners will be quicker to learn a language which uses the same writing system as their own native language.

Therefore, the most complicated language to learn for an English native speaker would be for example a non-Indo European ergative language with a different writing system and with postpositions instead of prepositions.

Another study [14] conducted in 2006, started with the commun idea that Arabic is hard to learn for an English native speaker, more so than Spanish or German. This study is also based on the FSI classification of languages according to their difficulty, placing Arabic in the fourth (relatively difficult) group. The study compares Arabic with languages usually perceived as easier to learn and concludes that Arabic is not inherently more complex than these languages. The study provides a list of linguistic properties that make Arabic actually simpler than these languages. For instance, despite the complexity of Arabic consonant roots, the Arabic verbal system relies on very specific sub-rules and uses only a single verb paradigm. On the other hand, Spanish is more complex than Arabic in its verbal tenses. French is more complex in its phoneme-grapheme correspondence. German, Polish and Greek have more complex systems of case inflections. Japanese has a more complex writing system. The fact that English native speakers perceive Arabic as particularly difficult to learn would then not be due to Arabic being inherently harder but rather to the fact that its structure and writing system are very different from English.

The belief that some languages are inherently harder to learn is less commonly found for first language learning, although first language acquisition should probably be more strongly correlated with the language's inherent complexity. Some studies have tackled this question. For instance, there is evidence from Danish that children learning a language with a complex sound structure might be slightly delayed in their lexical development.[15] Danish has a complex phonological system, with extensive lenition of plosives. In line with the hypothesis that a more complex phonology entails greater difficulties in word learning, Danish children were found to have a slight delay in early lexical development compared to children speaking other languages (although they seem to catch up when they reach two years of age). This suggests that sound structure might have an influence on the difficulty of a language. There is, however, not enough evidence as of yet to confidently say that some languages are globally easier or harder to learn as a first language.

Language complexity and creolesEdit

It is generally acknowledged that, as young languages, creoles are necessarily simpler than non-creoles.[16] Guy believes this to be untrue[citation needed]; after a comparison with Antillean Creole, he writes, "I assure you that it is far, far more complex than Tolomako!", despite being based on his native language, French.

Computational toolsEdit


  1. ^ a b Miestamo, Matti; Sinnemäki, Kaius; Karlsson (eds.), Fred (2008). Language Complexity: Typology, Contact, Change. Studies in Language Companion Series. 94. Amsterdam: John Benjamins. p. 356. doi:10.1075/slcs.94. ISBN 978 90 272 3104 8.CS1 maint: extra text: authors list (link)
  2. ^ Wurzel, Wolfgang Ullrich (2001). "Creoles, complexity, and linguistic change". Linguistic Typology. 5 (2/3): 377–387. ISSN 1430-0532.
  3. ^ Sampson, Geoffrey; Gil, David; Trudgill (eds.), Peter (2009). Language Complexity as an Evolving Variable. Oxford: Oxford University Press. p. 328. ISBN 9780199545223.CS1 maint: extra text: authors list (link)
  4. ^ a b Newmeyer, Frederick J.; Preston (eds.), Lauren B. (2014). Measuring Grammatical Complexity. Oxford: Oxford University Press. ISBN 9780199685301.CS1 maint: extra text: authors list (link)
  5. ^ a b c Joseph, John E.; Newmeyer, Frederick J. (2012). "'All Languages Are Equally Complex': The rise and fall of a consensus". Historiographia Linguistica. 39 (3): 341–368. doi:10.1075/hl.39.2-3.08jos.
  6. ^ Darwin, Charles (1871). The descent of man, and selection in relation to sex. London: John Murray. OCLC 39301709.
  7. ^ Arends, Jacques (2001). "Simple grammars, complex languages". Linguistic Typology. 5 (2/3): 180–182. ISSN 1430-0532.
  8. ^ a b c d McWhorter, John H. (2001). "The world's simplest grammars are creole grammars". Linguistic Typology. 5 (2/3): 125–166. doi:10.1515/lity.2001.001. ISSN 1430-0532.
  9. ^ Rescher, Nicholas (1998). Complexity. A philosophical overview. New Brunswick: Transaction. ISBN 978-1560003779.
  10. ^ a b c Sinnemäki, Kaius (2011). Language universals and linguistic complexity : Three case studies in core argument marking (Thesis). University of Helsinki. Retrieved 2016-04-28.
  11. ^ Hawkins, John A. (2014), "Major contributions from formal linguistics to the complexity debate", in Newmeyer, Frederick J.; Preston, Laurel B. (eds.), Measuring Grammatical Complexity, Oxford: University Press, pp. 14–36, doi:10.1093/acprof:oso/9780199685301.003.0002, ISBN 9780199685301
  12. ^ Jacques Guy, "sci.lang FAQ", message-ID: 3bjmtc$ci3@medici.trl.OZ.AU, sci.lang, 1994, December 1
  13. ^ Cysouw, Michael (2013). "Predicting language-learning difficulty". In Borin, Lars; Saxena, Anja (eds.). Approaches to Measuring Linguistic Differences. De Gruyter Mouton. pp. 57–82. ISBN 978-3-11-048808-1.
  14. ^ Stevens, Paul B. (2006). "Is Spanish really easy? Is Arabic really so hard? Perceived difficulty in learning arabic as a second language". In Wahba, Kassem M.; Taha, Zeinab A.; England, Liz (eds.). Handbook for Arabic Language Teaching Professionals in the 21st Century. Lawrence Erlbaum Associates. pp. 35–66. ISBN 978-0-203-76390-2.
  15. ^ Bleses, Dorthe; Vach, Werner; Slott, Malene; Wehberg, Sonja; Thomsen, Pia; Madsen, Thomas O.; Basbøll, Hans (2008). "Early vocabulary development in Danish and other languages: A CDI-based comparison". Journal of Child Language. 35 (3): 619–650. doi:10.1017/S0305000908008714. Retrieved 2020-05-18.
  16. ^ "Creole and pidgin language structure in cross-linguistic perspective | Abstracts". Retrieved 2015-08-11.