The Inner–Outer hypothesis of the subclassification of the Indo-Aryan language family argues for a division of the family into two groups, an Inner core (focused on the Madhyadeśa in the Indo-Gangetic plain) and an Outer periphery, evidenced by shared traits of the languages falling into either group. Proponents of the theory generally believe the distinction to be the result of gradual migrations of Indo-Aryan speakers into India, with the inner languages representing a second wave of migration speaking a different dialect of Old Indo-Aryan that took over the first-wave speakers, who became relegated to the outer region.

The Inner–Outer hypothesis has taken many forms in its various iterations since it was proposed by Rudolf Hoernlé in 1880.[1] Some of its notable proponents include George Abraham Grierson[2][3] (who organised the Linguistic Survey of India), Franklin Southworth,[4] and Claus Peter Zoller.[5] It has faced a robust opposition, with scholars such as Suniti Kumar Chatterji[6] and Colin P. Masica[7] opposing the theory, and an alternative East–West theory of the top-level subclassification of Indo-Aryan proposed by John Peterson.[8] Recent work by Chundra Cathcart[9] and Peterson has sought to tackle the hypothesis with statistical methods; the hypothesis continues to be a contentious proposal with no clear consensus among scholars of Indo-Aryan.


The general structure of the hypothesis is that there were two groups of Indo-Aryan speakers who migrated into South Asia through differing routes and spoke different dialects of Old Indo-Aryan. This is meant to explain commonalities between the Outer languages (specifically the Eastern Indo-Aryan languages and Sindhi, Gujarati, Marathi, and associated smaller lects) that are not found in the Madhyadeśa region.

Hoernlé's original formulation in 1880 of the Inner–Outer hypothesis was an initial Māgadhan invasion (at this time, the migration of Indo-Aryans was considered to be an invasion, a theory now largely refuted) into the Indo-Gangetic plain, followed by a second Śaurasenī invasion that pushed the earlier Aryans outwards into the south, east, and west. He further thought that Pashto and the Nuristani languages belonged to the Outer Māgadhan group.[10] Grierson calls this the wedge theory and does not accept it in its entirety, but agrees in the political result, adding that there is textual evidence of hostility between the two groups in e.g. Mahābhārata, in which the Indus Aryans are called mleccha "~barbarians".[11]

The most recent iteration of the theory is by Franklin Southworth.[12] His hypothesis states that at the time of the Rigveda, the Indo-Aryans resided in the upper Indus valley and had largely lost contact with Iranian speakers. Some dialectal variation was already present at this stage but the Inner and Outer groups had not split. The Inner Indo-Aryan speakers, associated with the post-Rigvedic textual tradition of Hinduism, then migrated directly into the Madhyadeśa, while the Outer IA speakers took a path south into Sindh, then east to the Deccan and Malwa, and finally to the east into Bengal by the time of Aśoka. The Inner and Outer languages regrouped around Awadh, leading to dialect mixture in that area (reflected in Grierson's classification of those languages as intermediate between Inner and Outer IA).

The basic agreed-upon part of the hypothesis by its proponents from the very beginning is that modern Marathi–Konkani and Bengali–Assamese–Odia share more features with each other than with the other Indo-Aryan languages. Thus, they form the backbone of the Outer language family. Grouping of the transitional language is more thorny; most formulations of the theory add Gujarati, Sindhi, and the Bihari languages to the Outer family, while there is more contention (and lack of data) in categorising Western Punjabi, Dardic, Sinhala–Dhivehi, and Pahari. Eastern Hindi is thought to be transitional between Inner and Outer.

Below are the groupings proposed by proponents of the hypothesis.

Outer Indo-Aryan languages
Theory Bengali–Assamese–Odia Bihari E. Hindi Marathi–Konkani Gujarati Sindhi W. Punjabi Rajasthani Pahari Dardic Sinhala–Dhivehi
Hoernlé (1880) Yes Yes Yes Yes No No N/A N/A No N/A N/A
Grierson (1927) Yes Yes Mediate Yes No Yes Yes No No N/A N/A
Southworth (2005) Yes (core) Yes (core) No Yes (core) Yes Yes No No No Maybe No
Zoller (2016) Maybe No No Yes Yes Yes Yes Partially Yes Yes Yes

Zoller (2016, p. 79) noted that the big issue in grouping Indo-Aryan languages firmly into either group is the presence of dialect stratification; large koiné languages such as Hindi and Bengali, or the older Dramatic Prakrits, engaged in dialect levelling that obscured their belonging to either group, and local village dialects may have a very different composition of features that are unfortunately not historically preserved. He takes a more nebulous view of the grouping into Inner and Outer:

"[A]n individual language is either more Outer and less Inner Language or vice versa, depending on the amount of typical Outer Language features characterizing that individual language."

For him, Nuristani, Dardic, and West Pahari are among the most Outer languages, transitional into Old Iranian historically. Within branches, Zoller claims Assamese and Odia have more Outer features than Bengali, and Konkani has more than Marathi.[better source needed]


The evidence underlying the hypothesis is shared innovations between the Outer languages. Zoller narrows the proof for the hypothesis down to two criteria: the Outer languages must show some Proto-Indo-European feature that is not reflected in Vedic Sanskrit (thus proving that they descend from a different OIA dialect) and also they must show stronger substrate influence from Dravidian and Munda (thus proving that they represent an earlier migration).[13]

Grierson's linguistic evidence was almost entirely refuted by Chatterji; some of the points that were discarded in modern formulations of the hypothesis are the preservation of final short vowels in semi-tatsamas (e.g. Bihari mūratⁱ "idol"), the movement of the Outer languages towards a more synthetic paradigm, and various innovative developments of the sibilants (s > ś, h, x).[3] However, Grierson's primary evidence of the past in -l- has remained the most important piece of linguistic evidence in all forms of the hypothesis.

Southworth, based on the earlier work by Grierson, adds historical correlates of these features in the Ashokan Prakrit inscriptions (the northwestern inscriptions reflect a different dialect from the eastern and western ones) and potentially as early as the dialectal variation of Vedic Sanskrit.[4] He also tightens the diagnostic evidence of the grouping, focusing only on exclusively shared innovations as a basis for genetic classification of languages. His evidence is:

  • Past forms in -l-: This form is found in the Outer IA languages to mark the past indicative and/or the past/perfective participle. Evidence suggests that it was suffixed (rather than replacing) to the original past form in Sanskrit with -ta, e.g.: OIA ga-ta "gone" > MIA *ga-y-alla > Marathi gelā. The morpheme -alla/ulla/illa- was widespread in Middle Indo-Aryan as a general adjectival suffix (e.g. Hindi ag-lā "next" < MIA agg-alla). Southworth's examination of textual evidence finds that the past in -l- is earliest found in Marathi, with variable attestation in the Eastern IA languages, suggesting a diffusion of the change from west to east rather than a fully shared innovation. Nevertheless, this is the strongest example of a shared Outer feature.
  • Verb forms in -(i)tavya: This was generalised into a gerund, an infinitive, and the future tense from its necessitative use in late OIA, while the inner languages have a gerund and infinitive < OIA -anīya and assorted future forms.
  • Phonological change > a: The vocalic rhotic was changing into a non-rhotic vowel by the OIA stage, as evidenced by changes such as Proto-Indo-Iranian r̥H > OIA īr/ūr and variant forms in Vedic. There is a split by the time of Ashokan Prakrit of the reflex i in the Northwest and a elsewhere (i.e. in the East and West, corresponding to Outer IA). A modern example is OIA mr̥ttikā "earth" > Punjabi miṭṭī but Bengali māṭī, Marathi mātī.
  • Loss of length contrast in i/u: The length distinction from OIA i ~ ī and u ~ ū is lost in Outer languages, becoming positionally determined.
  • Word-initial accent: The Inner languages have a weight-based stress accent while the Outer language have a default initial-syllable accent, reflected by vowel lengthening in that position, e.g. OIA karpāsa > Hindi kapās, but Marathi kāpus, Bengali kāpās.
  • Phonological change l > n: This change is largely confined to Maithili and broadly Eastern IA, but Southworth suggests cases of lexical diffusion from east to west bypassing the Madhyadeśa languages, and thus linguistic links between the two that persisted quite late.

Zoller (2016) further claimed that only the first feature is necessary to judge an Outer subfamily, by linking it to the Indo-European adjectival suffix *-ulo/elo/ilo- and *-ah₂-lo-, *-eh₁-lo- that is not preserved as broadly in Vedic Sanskrit (besides individual lexical items such as bahulá "thick"). He notes similar developments of a gerund in Tocharian and participle forms in Slavic. The gemination in the MIA forms is explained as a reflex of PIE *-Vl-yo as in Tocharian. He also suggested that the d ~ alternation and c, j > ċ, (d)z (in Nuristani, Dardic, and Pahari) are relevant features to the hypothesis but left their investigation for future work.

For Zoller, further linguistic evidence lied in substrate influence from Munda, Tibeto-Burman, and Dravidian found in the Outer languages. He postulates that north India was largely Munda-speaking, citing Munda substrata in the West Himalayish languages. He goes on to cite parallels between syllable structure in West Pahari and Munda (the sesquisyllabic structure), convergence of ideophones in Indo-Aryan with Munda, consonant fluctuations in Outer IA deśī vocabulary and Munda, and lexical parallels between Munda, Tibeto-Burman, Burushaski, and Outer IA.[14]

There is a historical split between the Indo-Aryans of the Madhyadeśa region (focused on modern-day western Uttar Pradesh, and more generally the Indo-Gangetic plain) and Indo-Aryans of other regions. Religious texts divide the lands of India into ārya "Aryan" and mleccha "barbarian", the latter including all of the non-Madhyadeśa regions even after their Aryanisation and adoption of Indo-Aryan languages. Grammarians such as Patañjali relate that the dialect of the Asuras (demons) had the change r > l, a distinctly eastern Indo-Aryan change. Listings such as the Pañca-draviḍa group Gujarat and Maharashtra with the non-Indo-Aryan Deccan.[15]


The first point-by-point refutation of the Inner–Outer hypothesis was by Chatterji (1926, pp. 150–169), in response to the evidence put forth in Grierson (1920). All of the phonological and morphological features connecting the Outer languages cited by Grierson were found to be either coincidental retentions (only shared innovations are diagnostic of language relations) or faultily grouped. As an example, he strikes down the retention of final short vowels as an Outer feature, noting that (1) the loss of short vowels is in-progress in all Indo-Aryan languages, just at different stages presently, and (2) this is not a shared innovation and thus not diagnostic of a language grouping. As Zoller (2016, p. 76) notes, "Chatterji's rejection of the hypothesis brought the discussion to an effective standstill until it was revived almost hundred years later by Franklin Southworth."

After Southworth's work was published, Cardona & Jain (2003, p. 26) responded, "I think it fair to say that these [Southworth's] conclusions are not sufficiently backed up by detailed facts about the chronology of changes to merit their being accepted as established"; that is to say, for many of the supposed differences between Inner and Outer, there is no compelling historical evidence attesting that they reflect OIA divisions and not more recent changes or areal diffusions.

Cathcart (2020) conducted a probabilistic assessment of the Inner–Outer hypothesis using various statistical approaches to modelling sound change (adopting the suggestion of phonology-first analysis put forth by Masica) based on data from the Comparative Dictionary of the Indo-Aryan Languages compiled by Ralph Lilley Turner. His logistic normal distribution model found evidence for a core-periphery distinction while the Dirichlet distribution model is less convincing. Cathcart concluded that "neither model provides full support for" the Inner-outer hypothesis, but there is "at least vague support for an areal core and periphery" that could be in line with Zoller's model but not with Southworth's.

Without taking a side on the debate, Stroński & Verbeke (2020) studied isoglosses from a diachronic perspective in morphosyntactic alignment in Awadhi (a transitional language) and V2 word order in Kashmiri (thought to be Outer), and compared both to Pahari in order to make sense of Zoller's Inner–Outer hypothesis and Peterson's East–West hypothesis. They argued that synchronic features are not sufficient for assessing the validity of these hypotheses, as shown by the complex history and boundaries of the features they examined.

Following work on the structural typology of Indo-Aryan, Munda, and Dravidian, Peterson (2017) proposed an East–West split in Indo-Aryan, with Eastern Indo-Aryan and Bihari undergoing historical convergence with Munda due to a long period of contact. Some of the Eastern features he put forth supporting this proposal are lack of ergativity, loss of gender marking, numeral classifiers, lack of oblique nominal stems, and lack of attributive agreement.[16] He does not explicitly reject the Inner–Outer hypothesis in the text, but his grouping puts Marathi–Konkani as closer to the Central IA languages (e.g. Hindi) than to Eastern IA, so it is incompatible with the Inner–Outer hypothesis.



