In historical linguistics, the homeland or Urheimat (/ˈʊərhmɑːt/, from German ur- "original" and Heimat, home) of a proto-language is the region in which it was spoken before splitting into different daughter languages. A proto-language is the reconstructed or historically-attested parent language of a group of languages that are genetically related.

Depending on the age of the language family under consideration, its homeland may be known with near-certainty (in the case of historical or near-historical migrations) or it may be very uncertain (in the case of deep prehistory). Next to internal linguistic evidence, the reconstruction of a prehistoric homeland makes use of a variety of disciplines, including archaeology and archaeogenetics.


There are several methods to determine the homeland of a given language family. One method is based on the vocabulary that can be reconstructed for the proto-language. This vocabulary – especially terms for flora and fauna – can provide clues for the geographical and ecological environment in which the proto-language was spoken. An estimate for the time-depth of the proto-language is necessary in order to account for prehistorical changes in climate and the distribution of flora and fauna.[1][2]

Another method is based on the linguistic migration theory (first proposed by Edward Sapir), which states that the most likely candidate for the last homeland of a language family can be located in the area of its highest linguistic diversity.[3] This presupposes an established view about the internal subgrouping of the language family. Different assumptions about high-order subgrouping can thus lead to very divergent proposals for a linguistic homeland (e.g. Isidore Dyen's proposal for New Guinea as the center of dispersal of the Austronesian languages).[4] The linguistic migration theory has its limits because it only works when linguistic diversity evolves continuously without major disruptions. Its results can be distorted e.g. when this diversity is wiped out by more recent migrations.[5]

Limitations of the concept

The concept of a (single, identifiable) "homeland" of a given language family implies a purely genealogical view of the development of languages. This assumption is often reasonable and useful, but it is by no means a logical necessity, as languages are well known to be susceptible to areal change such as substrate or superstrate influence.

Time depth

Over a sufficient period of time, in the absence of evidence of intermediary steps in the process, it may be impossible to observe linkages between languages that have a shared Urheimat: given enough time, natural language change will obliterate any meaningful linguistic evidence of a common genetic source. This general concern is a manifestation of the larger issue of "time depth" in historical linguistics.[6]

For example, the languages of the New World are believed to be descended from a relatively "rapid" peopling of the Americas (relative to the duration of the Upper Paleolithic) within a few millennia (roughly between 20,000 and 15,000 years ago),[7] but their genetic relationship has become completely obscured over the more than ten millennia which have passed between their separation and their first written record in the early modern period. Similarly, the Australian Aboriginal languages are divided into some 28 families and isolates for which no genetic relationship can be shown.[8]

The Urheimaten reconstructed using the methods of comparative linguistics typically estimate separation times dating to the Neolithic or later. It is undisputed that fully developed languages were present throughout the Upper Paleolithic, and possibly into the deep Middle Paleolithic (see origin of language, behavioral modernity). These languages would have spread with the early human migrations of the first "peopling of the world", but they are no longer amenable to linguistic reconstruction. The Last Glacial Maximum (LGM) has imposed linguistic separation lasting several millennia on many Upper Paleolithic populations in Eurasia, as they were forced to retreat into "refugia" before the advancing ice sheets. After the end of the LGM, Mesolithic populations of the Holocene again became more mobile, and most of the prehistoric spread of the world's major linguistic families seem to reflect the expansion of population cores during the Mesolithic followed by the Neolithic Revolution.

The Nostratic theory is the best-known attempt to expand the deep prehistory of the main language families of Eurasia (excepting Sino-Tibetan and the languages of Southeast Asia) to the beginning of the Holocene. First proposed in the early 20th century, the Nostratic theory still receives serious consideration, but it is by no means generally accepted. The more recent and more speculative "Borean" hypothesis attempts to unite Nostratic with Dené–Caucasian and Austric, in a "mega-phylum" that would unite most languages of Eurasia, with a time depth going back to the Last Glacial Maximum.

The argument surrounding the "Proto-Human language", finally, is almost completely detached from linguistic reconstruction, instead surrounding questions of phonology and the origin of speech. Time depths involved in the deep prehistory of all the world's extant languages are of the order of at least 100,000 years.[9]

Language contact and creolization

The concept of an Urheimat only applies to populations speaking a proto-language defined by the tree model. This is not always the case.

For example, in places where language families meet, the relationship between a group that speaks a language and the Urheimat for that language is complicated by "processes of migration, language shift and group absorption are documented by linguists and ethnographers" in groups that are themselves "transient and plastic." Thus, in the contact area in western Ethiopia between languages belonging to the Nilo-Saharan and Afroasiatic families, the Nilo-Saharan-speaking Nyangatom and the Afroasiatic-speaking Daasanach have been observed to be closely related to each other but genetically distinct from neighboring Afroasiatic-speaking populations. This is a reflection of the fact that the Daasanach, like the Nyangatom, originally spoke a Nilo-Saharan language, with the ancestral Daasanach later adopting an Afroasiatic language around the 19th century.[10]

Creole languages are hybrids of languages that are sometimes unrelated. Similarities arise from the creole formation process, rather than from genetic descent.[11] For example, a creole language may lack significant inflectional morphology, lack tone on monosyllabic words, or lack semantically opaque word formation, even if these features are found in all of the parent languages of the languages from which the creole was formed.[12]


Some languages are language isolates. That is to say, they have no well accepted language family connection, no nodes in a family tree, and therefore no known Urheimat. An example is the Basque language of Northern Spain and southwest France. Nevertheless, it is a scientific fact that all languages evolve. An unknown Urheimat may still be hypothesized, such as that for a Proto-Basque, and may be supported by archaeological and historical evidence.

Sometimes relatives are found for a language originally believed to be an isolate. An example is the Etruscan language, which, even though only partially understood, is believed to be related to the Rhaetic language and to the Lemnian language. A single family may be an isolate. In the case of the non-Austronesian indigenous languages of Papua New Guinea and the indigenous languages of Australia, there is no published linguistic hypothesis supported by any evidence that these languages have links to any other families. Nevertheless, an unknown Urheimat is implied. The entire Indo-European family itself is a language isolate: no further connections are known. This lack of information does not prevent some professional linguists from formulating additional hypothetical nodes (Nostratic) and additional homelands for the speakers.

Homelands of major language families

Western and central Eurasia

Map showing the present-day distribution of Indo-European languages in Eurasia (light green) and the likely Proto-Indo-European homeland (dark green).
The identification of the Proto-Indo-European homeland has been debated for centuries, but the steppe hypothesis is now widely accepted, placing it in the Pontic–Caspian steppe in the late 5th millennium BCE.[13] The leading alternative is the Anatolian hypothesis, proposing a homeland in Anatolia in the early 7th millennium BCE.[14]
The unrelated Kartvelian, Northwest Caucasian (Abkhaz-Adygean) and Northeast Caucasian (Nakh-Daghestanian) language families are presumed to be indigenous to the Caucasus.[15] There is extensive evidence for contact between the Caucasian languages, especially Proto-Kartvelian, and Proto-Indo-European, indicating that they were spoken in close proximity at least three to four thousand years ago.[16][17]
Although Dravidian languages are now concentrated in southern India, isolated pockets further north, placenames and substrate influences on Indo-Aryan languages indicate that they were once spoken more widely across the Indian subcontinent. Reconstructed Proto-Dravidian terms for flora and fauna support the idea that Dravidian is indigenous to India. Proponents of a migration from the northwest cite the location of Brahui, a hypothesized connection to the undeciphered Indus script, and claims of a link to Elamite.[18]
The homeland of the Turkic languages is thought to lie somewhere between the Transcaspian Steppe and Northeastern Asia (Manchuria),[19] with genetic evidence pointing to the region near South Siberia and Mongolia as the "Inner Asian Homeland" of the Turkic ethnicity.[19] Similarly several linguists, including Juha Janhunen, Roger Blench and Matthew Spriggs, suggest that modern-day Mongolia is the homeland of the early Turkic language.[20] Relying on Proto-Turkic lexical items about the climate, topography, flora, fauna, people's modes of subsistence, Turkologist Peter Benjamin Golden locates the Proto-Turkic Urheimat in the southern, taiga-steppe zone of the Sayan-Altai region.[21]
Inherited tree names seem to indicate a Uralic homeland to the east of the Ural Mountains. The internal branching of the family suggests an area between the Ob River and Yenisey River.[22] Uralic speakers are not genetically distinguished from their neighbours, but do share a genetic component that is of Siberian origin.[23][24]

Eastern Eurasia

Most scholars believe that Japonic was brought to northern Kyushu from the Korean Peninsula around 700 to 300 BCE by wet-rice farmers of the Yayoi culture, spreading from there throughout the Japanese Archipelago and somewhat later to the Ryukyu Islands.[25][26] There is fragmentary placename evidence that now-extinct Japonic languages were still spoken in central and southern parts of the Korean peninsula several centuries later.[27]
All modern Koreanic varieties are descended from the language of Unified Silla, which ruled the southern two-thirds of the Korean peninsula between the 7th and 10th centuries.[28][29] Evidence for the earlier linguistic history of the peninsula is extremely sparse.[30] The orthodox view among Korean social historians is that the Korean people migrated to the peninsula from the north, but no archaeological evidence of such a migration has been found.[31][32]
The reconstruction of Sino-Tibetan is much less developed than for other major families, so its higher-level structure and time depth remain unclear.[33] Proposed homelands and periods include: the upper and middle reaches of the Yellow River about 4–8 kya, associated with the hypothesis of a top-level branching between Chinese and the rest; southwestern Sichuan around 9 kya, associated with the hypothesis that Chinese and Tibetan form a subbranch; Northeast India (the area of maximal diversity) 9–10 kya.[34]
The most likely homeland of the Hmong–Mien languages is in Southern China between the Yangtze and Mekong rivers, but speakers of these languages may have migrated from Central China as a result of the expansion of the Han Chinese.[35]
Most scholars locate the homeland of the Kra–Dai languages in Southern China, possibly coastal Fujian or Guangdong.[36]
Austroasiatic is widely held to be the oldest family in mainland Southeast Asia, with its current discontinuous distribution resulting from the later arrival of other families. The various branches share a great deal of vocabulary concerning rice cultivation, but few related to metals.[37] Identification of the homeland of the family has been hampered by the lack of progress on its branching. The main proposals are Northern India (favoured by those who assume an early branching of Munda), Southeast Asia (the area of maximal diversity) and southern China (based on claimed loanwords in Chinese).[38]
The homeland of the Austronesian languages is widely accepted by linguists to be Taiwan, since nine of its ten branches are found there, with all Austronesian languages found outside Taiwan belonging to the remaining Malayo-Polynesian branch.[39]

North America

The Eskimo–Aleut languages originated in the region of the Bering Strait or Southwest Alaska.[40]
Na-Dené and Yeniseian
The Dené–Yeniseian hypothesis proposes that the Na-Dené languages of North America and the Yeniseian languages of Central Siberia share a common ancestor. Suggested homelands for this family include Central or West Asia,[41] Siberia,[42] or Beringia,[43] but there is currently not enough evidence to resolve the question.[44]
The Algic languages are distributed from the Pacific coast to the Atlantic coast of North America. It is suggested that Proto-Algic was spoken on the Columbia Plateau. From there, pre-Wiyot and pre-Yurok speakers moved southwest to the North Coast of California, while the pre-Proto-Algonquian speakers moved to the Great Plains, which was the center of dispersal of the Algonquian languages.[45][46]
Some authorities on the history of the Uto-Aztecan language group place the Proto-Uto-Aztecan homeland in the border region between the USA and Mexico, namely the upland regions of Arizona and New Mexico and the adjacent areas of the Mexican states of Sonora and Chihuahua, roughly corresponding to the Sonoran Desert. The proto-language would have been spoken by foragers, about 5,000 years ago. Hill (2001) proposes instead a homeland further south, making the assumed speakers of Proto-Uto-Aztecan maize cultivators in Mesoamerica, who were gradually pushed north, bringing maize cultivation with them, during the period of roughly 4,500 to 3,000 years ago, the geographic diffusion of speakers corresponding to the breakup of linguistic unity.[47]

South America

Proto-Tupian, the reconstructed common ancestor of the Tupian languages of South America, was probably spoken in the region between the Guaporé and Aripuanã rivers, around 5,000 years ago.[48]

Africa and Middle East

There is no consensus on the location of the Afroasiatic homeland, though based on current evidence somewhere in the eastern Sahara or adjacent regions is considered most likely.[49] Proto-Afroasiatic is estimated to have begun to break up in the 8th millennium BCE.[49] Proto-Semitic is thought to have been spoken in the Near East between 4400 and 7400 BCE, with Akkadian representing its earliest known branch.[50]
The validity of the Niger–Congo languages has become controversial. It probably originated in or near the area where these languages were spoken prior to Bantu expansion (i.e. West Africa or Central Africa). Its expansion may have been associated with the expansion of Sahel agriculture in the African Neolithic period, following the desiccation of the Sahara in c. 3500 BCE.[51][52]
Valentin Vydrin concluded that "the Mande homeland at the second half of the 4th millennium BC was located in Southern Sahara, somewhere to the North of 16° or even 18° of Northern latitude and between 3° and 12° of Western longitude."[53] That is now Mauritania and/or southern Western Sahara.[54]
The validity of the Nilo-Saharan family remains controversial. Proponents of the family view the border area between Chad, Sudan, and the Central African Republic as a likely candidate for its homeland prior to its dispersal around 10,000–8,000 BP.[55]
The original homeland of Central Sudanic speakers is likely somehwere in the Bahr el Ghazal region.[56]
The homeland of Khoe-Kwadi was likely the middle Zambezi Valley over 2,000 years ago.[57]

