Open main menu

Wikipedia β

Wikipedia:WikiProject Languages

This WikiProject aims primarily to provide a consistent treatment of each human language on Wikipedia. Many languages already have extensive pages, and the systematic information on those pages is not presented in a consistent way. The purpose of this WikiProject is to present that information consistently, and to ensure that each of the major areas is covered at least briefly for each language.

These are only suggestions, things to give you focus and to get you going, and you shouldn't feel obligated in the least to follow them. However, try to stick to the format for the Infobox for each language. See the template for an example Infobox.

The easiest way to get started writing for a language that doesn't already have an article or to convert an article to the WikiProject format is to start with the template.

Article alertsEdit

  Notice: An RFC on the future of Wikipedia portals is ongoing.

Articles for deletion
Good article nominees
Featured article reviews

Quality articlesEdit

Article assessmentEdit

Place the {{WikiProject Languages}} project banner template on the talk pages of any language-related articles. To rate the article on the quality scale, add one of the following parameters:

  • class=FA for featured articles
  • class=A for A-class articles
  • class=GA for good articles
  • class=B for B-class articles
  • class=start for Start-class articles
  • class=stub for Stub-class articles (which may not necessarily have a "stub" message on them!)
  • class=NA for non-articles (templates, images, etc.)

See WP:GRADES for pointers on classification.


Index · Statistics · Log

Article namesEdit

Most language articles should be on a page titled XXX language. Reasons for this recommendation:

  1. Ambiguity. While some language have special forms that refer unambiguously to the language, English is inherently ambiguous about language names. Having a standard of "XXX language" ensures that it's always unambiguous. There is always the possibility of "XXX literature", "XXX grammar", but these cannot be referred to simply as "XXX", and so are not a reason for disambiguation.
  2. Precedent. This is how Encyclopædia Britannica and many other English-language encyclopedias name their articles.
Please note that when there is nothing to disambiguate a language name from, such as Hindi, Esperanto or Inuktitut, there is no need for the "language". See Wikipedia:Naming conventions#Languages, both spoken and programming and Wikipedia:Naming conventions (languages) for the relevant naming policy.

Whether the varieties of Arabic and Chinese should be called "languages" or "dialects" continues to be a highly controversial issue. The current convention is: use NAME + Arabic for Arabic varieties (e.g. Egyptian Arabic) and NAME + Chinese for Chinese varieties (e.g. Mandarin Chinese). Infoboxes are put at both Arabic language and Chinese language and at their first-level subdivisions.

Even in cases in which there is a consensus that varieties of a language have a dialect status, the number and divisions between such dialects are often vaguely-defined, and controversies exist among dialectologists over whether certain varieties should be treated in a unified way or are best understood as separate though related varieties. Separate articles should only be written on varieties (e.g., Estuary English) or related groups of varieties (e.g., Hispanic English) that have been well-enough studied by linguists that at least a minimal body of literature exists about that variety or group of varieties, as a distinct dialect or group of dialects. Phonological, morphosyntactic, or lexical variation that may be considered subdialectal should be noted as "differences within X dialect,", where X is a dialect as discussed in the relevant literature. Controversies over dialect status can be noted in articles as such, but should also be based on citable work. Names used to refer to that dialect in the title should be preferred over folk-linguistic terms (e.g., Inland North versus Midwestern Accent).

Article structureEdit

There are templates for the structure of articles about spoken (oral) languages at /Template and for signed languages at /Template (sign language).

Open tasksEdit



Population data has been mostly updated from Ethnologue 16 to 17. However, an unknown number of articles which did not have the ref field set to "e16" slipped through the cracks; an example is Cumanagoto, which did not have a ref'd population figure because E16 had mistakenly listed it as extinct. Articles which are not ref'd to Ethnologue could be checked in case E17 has a more recent figure.

User:PotatoBot helps keep ISO redirects in sync with changing WP articles and ISO standards. The results of the latest run are displayed at ISO 639 log and ISO 639 language articles missing.

Names at Spurious_languages#Spurious_according_to_Glottolog with asterisks have not been addressed.

Articles to be createdEdit

Red links should either be redirected or have their own articles.

Articles with red links

99.9% of ISO language names have articles, though not always one-to-one (e.g. Fulani, Zhuang, and Mazatec); the 0.01% which do not are spurious, dubious, or insufficiently attested to justify their own article, and are redirected to an article stating that.

Lists for evaluation

The lists below are of self-links in our articles, language names from various sources which do not have articles or redirects, and suspicious cases to keep track of.

Lists of obscure names from common refs
  • 48 at INALI names for Mexican languages (27 Mixtec & 6 Nahuatl to be reviewed; 12 Zapotec & 3 others attempted). Even blue links may be wrong, due to confusion of similar town names or misidentification at Ethnologue.
  • 7 potential languages w data. The AIATSIS db is periodically updated, with new languages confirmed.
Ethnologue 11
  • Holima ["near Dobu" – misreading of Molima?], Waelulu ["existence unconfirmed"; taken from V&V]
Voegelin (1977)
36 red-linked names; list doesn't bother with reds links for what Loukotka says is unattested.
Blue links have not been checked. Many are presumably inadvertent homonyms rather than the language intended by V&V.
Ruhlen (1987)
  • S.Am.: 12 (see key) extremely obscure names of mostly unattested languages, not even listed in Campbell & Grondona 2012, and for only a few does Loukotka say anything other than 'unknown'. Those not found in Loukotka might be copy errors.
There are also at least half a dozen names in Ruhlen which take you to what is apparently the wrong article. One is a typo, 3 are unidentified, and 2 have perhaps just been reclassified.
Campbell & Grondona
Linguist List local-use ISO
25 at Talk:Glottolog#Unclassified_languages
93 more at Wikipedia:WikiProject Languages/Glottolog languages without ISO codes -- both for Glottolog 2.2.

Lists not yet updated for Glottolog 2.3!

Circular and suspicious links
Identity suspect
Nshi, Sotatipo, Lui, Pasto (wrong ISO?), Kanamarí and Karipuná (contradicted by E17), Gulei (marked "?" in list), Sonde, Ngoni, Pretoria-Tsonga (marked "§" in list) & Mangala
Circular links of ISO names with summary data
Loloish, Qiangic (3 listed + old name Pingfang, which I can't ID), unclassified Asian (Bhatola: presumably a Gond dialect, Warduji: presumably a Persian dialect), Hindi (Ghera: Pakistani enclave of unidentified Indian language), conlang codes (Kotava, Romanova: old articles were deleted as not-notable)
Cases to track
No 1-to-1 correspondence to ISO
Tracking only; no need to fix.
Gbaya language (Central African Republic), Gbaya language (Sudan), Syriac language
ISO languages without info box
Typically because there are problems in defining the language. Tracking only; no need to fix.
Minor languages covered in family article: Loloish (4)
Language uncertain: Mina, Majhwar
Rd. to script or history article: Epi-Olmec (undeciphered), Ancient Zapotec, Middle Korean
Rd. to spurious-language article: Parsi-Dari, Parsi, Tapeba
Newly discovered or unattested languages without ISO codes
Lubu (unattested and extinct)
Cuyama (unattested and extinct)

Requests for expansionEdit

Images for articles in Category:Wikipedia requested photographs of languages.

Requests for attentionEdit

(no article Ashéninka people; Keres functions as the lang article but reads as a family article)

Tagged categoriesEdit

Category:Articles lacking sourcesEdit

Only language varieties are included here. Subjects such as 'French language in Jordan' and 'Westernized Chinese language', though in bad shape, are not listed because they would not be representative of the many unreferenced articles that are not about specific varieties.

  • 2004–2014: (only articles with 'language', 'dialect', 'creole', or 'pidgin' in name are included; distilled from an insane number of articles)
English: Manningham accent, Jewish English languages
Germanic: Central Franconian dialects, Eastphalian dialect, Hamburgisch dialect, Norwegian dialects, Orsamål dialect, Ripuarian language, Sognamål dialect
Romance: Chipilo Venetian dialect, Comasco-Lecchese dialects, Fornes dialects, Pavese dialect, Sabino dialect, Sutsilvan dialects (Romansh)
Slavic: Debar dialect, Reka dialect, Strumica dialect
Maltese: Qormi dialect, Żejtun dialect
Chinese: Luoyang dialect, Mango dialect, Qihai dialect, Weihai dialect, Ningbo dialect, Ganyu dialect, Fu'an dialect, Xuzhou dialect
other: Kfar Kama Adyghe dialect (Adyghe), Enuani dialect (Igbo), Thanjavur Marathi dialect, South Korean standard language

Category:Orphaned articlesEdit

(same search terms as missing sources)

Ordek-Burnu language (moved to 'stele')

Open ISO issuesEdit

The following ISO change requests from previous years were still open in 2016 Jan. The articles should be updated if they are accepted. (See the current list, reviewed to 2016-06.)

Old open ISO change requests[4]
2006-084        gkm     Medieval Greek           Create
2009-060        ecg     Ecclesiastical Greek     Create
2009-081        elr     Katharevousa Greek       Create
2011-041        vsn     Vedic Sanskrit           Create
2011-165        jpd     Pando                    Create
2011-171        jkt     Kantana                  Create
2012-090        lgo     Looma macrolanguage      members Toma [tod] and Loma [lom]  (add as iso3 to Loma)
2015-005        fmu     Far Western Muria       Update
2015-048        myi     Mina (India)    Retire
2017-009        dwz     Dewas Rai         Create
2017-020        dno     Ndrulo (Northern Lendu)         Create 

Articles proposed for deletionEdit

including WP:AFD, WP:PROD and other processes

Articles to watchEdit

The following are language articles which come under repeated POV attack, often for ethnic or nationalistic reasons. Feel free to add ones you've noticed, and to remove languages which have not been a problem for some time. That way, if one of us drops out from editing, the articles we've been watching hopefully won't go to pot.

(Note: Ethnologue 17 and the Swedish Nationalencyklopedin use Indian census data, which is not a RS because it does not have a consistent definition of Hindi. For example, part of the Awadhi population is listed under Awadhi, but most is counted as Hindi. This problem is acknowledged in the presentation of the census results, but has gotten lost in 2ary sources.)
  • Serbo-Croatian & Croatian (subject to ARBMAC)
  • Saraiki dialect, Punjabi dialects, and "Panjistani" (requires text searches to purge repeated additions of contradictory claims of "Panjistani" to multiple articles)
  • Southern Luri language. It may be worthwhile splitting the Luri article, but so far the attempts to do so have been incompetent and motivated by OR redefinition of the language. The present description of the two varieties in the Luri article is so intertwined that splitting them would create something close to a content fork. — kwami (talk) 02:32, 4 September 2015 (UTC)
  • Assyrian Neo-Aramaic and Chaldean Neo-Aramaic, along with the ethnic articles. A seemingly chronic ethnic dispute.
  • Luganda and Baganda: deletion of ISO name
  • Misleading maps: Many national languages have had maps with half the world filled in because of emigration, with no apparent standard for what counts as a speaking population. Most of these will be caught by checking the top 100 at List of languages by number of native speakers.

Interpreting Ethnologue dataEdit

Ethnologue is the default source for language data on WP. There are several obvious advantages to Ethnologue, beside its universal accessibility: For many languages, it's all we have. For others, it provides a check on the politicization and population inflation that we experience when we allow advocates of the language to cherry-pick sources. Nonetheless, Ethnologue data needs to be carefully evaluated, and if possible, their sources should be verified and cited directly, or better sources used instead of Ethnologue where these are known. There are a few common and serious problems:

Extended content
  • The family trees are auto-generated, and should not be relied on. Auto-generation is skewed by idiosyncratic entries in the language articles. In E16, for example, the Maban family was listed as a branch of the Luo languages, because one of the Luo languages was named Maban; meanwhile, there were two separate Luo branches of Nilotic due to the spelling of "Luo" not matching across articles. The more obvious problems of this sort have been remedied in E17, but the trees are still not a RS for classification, and the nodes are not RSs for the languages in a particular group. Many of our articles say that there are X languages in the Y branch, based on Ethnologue, but all that can be relied on is the classification cited in individual Ethnologue articles.
  • Speaker data is inconsistent. For instance, in E14, Gawwada was cited as having 32,698 mother tongue speakers, including 27,477 monolinguals, based on the 1998 census. In E17, it is cited as having 68,600 speakers based on the 2007 census, but still 27,500 monolinguals. There is no reason to think that the percentage who are monolingual has changed drastically in ten years, so adding the cited number of monolinguals to a Wikipedia article would be irresponsible. Similarly, the cited size of the ethnic group may be only half the cited number of speakers, due to it being several decades older. If the number of monolinguals or ethnic members is not given a citation date by Ethnologue, it is useless and should not be repeated by us. The number of speakers and the dialects of the language may be from different sources, with the result that the number of speakers may not be that of all dialects. Very commonly, when a language is named after one of its dialects, the speaker number is that of the dialect, not of the language as a whole. Also, a language may be split up into separate ISO codes with the result that one article covers one variety but inherits the number of speakers of all varieties from the old article. Ethnologue has handled this well in recent years, but has not been able to go back and fix such errors inherited from old editions.
  • Ethnologue's arithmetic is consistently bad. For instance, Ethnologue lists five Central Iranian languages as having had 7,030 speakers reported in 2000. It appears that their source listed 35,000 speakers total, and Ethnologue divided that figure by 5 for the individual articles, with no indication that the result was no more than a guess. This kind of problem is not uncommon. Even more commonly, Ethnologue will add together incompatible data from various sources, paying no attention to significant figures. For example, if one source reported 2 to 5 million speakers in country A in 1975, and another 5 to 10 thousand in country B in 2006, Ethnologue will report the total as 3,507,500 speakers (3.5 million, the median of 2 and 5 million, plus 7,500, the median of 5–10,000). Old editions such as E14 are actually more reliable in this regard, as they tend to note that the estimate for country A was 2 to 5 million, when later editions will simply report 3.5 million as if that were the figure in the source. If the original source cannot be verified, we should at least look at each of the figures that make up the total and redo the math, so that we avoid spurious precision as much as practicable.
  • Dates are not reliable indicators of when the data was taken. Unless they are census data, which has the problem all censuses do of speakers intentionally misreporting their language, the dates given by Ethnologue are the date of publication of their source. They can be several decades after when the data was collected. The result is that an old date may report the same or more recent data than a newer date. For instance, several Australian languages are cited as "SIL 2011" in E17. However, in E16 they all had the same numbers of speakers cited to "Wurm and Hattori 1983". In other cases the source that Ethnologue uses may cite an old edition of Ethnologue, or the source that Ethnologue used in an old edition. And the sources themselves may have problems that are not mentioned in Ethnologue. For instance, one source from the 1990s notes that its numbers are copied from a publication from the 1980s that was based on field work in the 1950s. In the Ethnologue entry, however, only the date from the 1990s is given. For another example, the data for the Hindi languages was updated between E16 and E17, based on the new Indian census. However, the census makes it clear that many Awadhi speakers, for example, reported their language to be "Hindi" rather than Awadhi. The result is that the E17 figure for Hindi is inflated by perhaps 100 million people who should be listed under other languages, but there is no warning about this in Ethnologue. Many entries are also undated. Some of these are recent oversights that will be fixed in the next edition, but many are inherited from old editions of Ethnologue. In such cases, citing the edition of Ethnologue that first reported the figure might give the reader some indication that it is not recent data.
  • Figures may be ethnic numbers and an order of magnitude greater than the actual number of speakers. A good start in cleaning this up has been made in E17, but it's not clear how complete is it.

Such problems are understandable: Ethnologue is an enormous project with a very small editorial team and budget. For years, Ethnologue had a reputation for being unresponsive, so many linguists do not bother to correct the errors they find, but since ca. 2012 they have been appreciative of feedback.

Linguist List / Multitree includes a large number of language names not found in Ethnologue, but their identification is highly unreliable, and can often be seen to be spurious with even a cursory glance at the literature. Glottolog[5] often does a better job than either of these sources, for instance in verifying and updating classifications, in marking languages as 'spurious' when they cannot be verified to exist, and in specifying their sources, but cannot be relied on for dialects, where they blindly copy Multitree. Global Recordings Network copies much of its data from Ethnologue, misidentifies alternative names as languages, and contradicts itself with speaker numbers. In all these cases, primary sources should be used to check for the accuracy of such claims.



Project bannerEdit

Please add {{WikiProject Languages}} to talk pages of relevant articles. Articles with this template are put into Category:WikiProject Languages articles.


Language stubs should be tagged with the most appropriate template of these:


After you sign up, you can add the project userbox to your user page by adding the following: {{User WikiProject Languages}}. Your username will then automatically be added to the Category:WikiProject Language members.

Related WikiProjectsEdit

Project volunteersEdit

If you'd like to help out, be contacted by others interested in this WikiProject's subject, and receive task assignments and project-related updates on your talk page, please add your name here:


Click the "►" below to see all subcategories: