Most common words in English
Studies that estimate and rank the most common words in English examine texts written in English. Perhaps the most comprehensive such analysis is one that was conducted against the Oxford English Corpus (OEC), a very large collection of texts from around the world that are written in the English language. A text corpus is a large collection of written works that are organised in a way that makes such analysis easier.
In total, the texts in the Oxford English Corpus contain more than 2 billion words. The OEC includes a wide variety of writing samples, such as literary works, novels, academic journals, newspapers, magazines, Hansard's Parliamentary Debates, blogs, chat logs, and emails.
Another English corpus that has been used to study word frequency is the Brown Corpus, which was compiled by researchers at Brown University in the 1960s. The researchers published their analysis of the Brown Corpus in 1967. Their findings were similar, but not identical, to the findings of the OEC analysis.
According to The Reading Teacher's Book of Lists, the first 25 words in the OEC make up about one-third of all printed material in English, and the first 100 words make up about half of all written English. According to a study cited by Robert McCrum in The Story of English, all of the first hundred of the most common words in English are of Anglo-Saxon origin, except for "people", ultimately from Latin "populus", and "because", in part from Latin "causa".
Some lists of common words distinguish between word forms, while others rank all forms of a word as a single lexeme (the form of the word as it would appear in a dictionary). For example, the lexeme be (as in to be) comprises all its conjugations (is, was, are, were, etc.), and contractions of those conjugations. Note also that these top 100 lemmas listed below account for 50% of all the words in the Oxford English Corpus.
100 most common words
A list of 100 words that occur most frequently in written English is given below, based on an analysis of the Oxford English Corpus (a collection of texts in the English language, comprising over 2 billion running words). A part of speech is provided for most of the words, but part of speech categories vary between analyses, and not all possibilities are listed. For example, "I" may be a pronoun or a Roman numeral; "to" may be a preposition or an infinitive marker; "time" may be a noun or a verb. Also, a single spelling can represent more than one root word. For example, "singer" may be a form of either "sing" or "singe". Different corpora may treat such difference differently.
The table also includes frequencies from other corpora, note that as well as usage differences, lemmatisation may differ from corpus to corpus - for example splitting the prepositional use of "to" from the use as a particle. Also the COCA list includes dispersion as well as frequency to calculate rank.
|Word||Parts of speech||OEC rank||COCA rank||Dolch level|
|in||Preposition||7||6, 128, 3038||Pre-primer|
|that||Conjunction et al.||8||12, 27, 903||primer|
|not||Adverb et al.||13||28, 2929||Pre-primer|
|as||Adverb, conjunction, et al.||17||33, 49, 129||Grade 1|
|this||Determiner, adverb, noun||21||20, 4665||primer|
|but||Preposition, adverb, conjunction||22||23, 1715||primer|
|his||Possessive pronoun||23||25, 1887||Grade 1|
|by||Preposition||24||30, 1190||Grade 1|
|say||Verb et al.||28||19||primer|
|her||Possessive pronoun||29, 106||42||Grade 1|
|will||Verb, noun||33||48, 1506||primer|
|one||Noun, adjective, et al.||35||51, 104, 839||Pre-primer|
|there||Adverb, pronoun, et al.||38||53, 116||primer|
|their||Possessive pronoun||39||36||Grade 2|
|what||Pronoun, adverb, et al.||40||34||primer|
|so||Conjunction, adverb, et al.||41||55, 196||primer|
|up||Adverb, preposition, et al.||42||50, 456||Pre-primer|
|about||Preposition, adverb, et al.||45||46, 179||Grade 3|
|when||51||57, 136||Grade 1|
|like||54||74, 208, 1123, 1684, 2702|
|no||56||93, 699, 916, 1111, 4555|
|other||70||75, 715, 2355|
|back||81||108, 323, 1877|
|use||Verb, noun||83||92, 429|
|work||Verb, noun||87||117, 199|
|new||Adjective et al.||92||88|
Parts of speech
- Basic English
- Frequency analysis, the study of the frequency of letters or groups of letters
- Letter frequencies
- Oxford English Corpus
- Swadesh list, a compilation of basic concepts for the purpose of historical-comparative linguistics
- Zipf's law, a theory stating that the frequency of any word is inversely proportional to its rank in a frequency table
- "The Oxford English Corpus: Facts about the language". OxfordDictionaries.com. Oxford University Press. What is the commonest word?. Archived from the original on December 26, 2011. Retrieved June 22, 2011.
- "The Oxford English Corpus". AskOxford.com. Retrieved June 22, 2006.
- The First 100 Most Commonly Used English Words.
- Bill Bryson, The Mother Tongue: English and How It Got That Way, Harper Perennial, 2001, page 58
- Benjamin Zimmer. June 22, 2006. Time after time after time.... Language Log. Retrieved June 22, 2006.
- "Word frequency: based on 450 million word COCA corpus". www.wordfrequency.info. Retrieved 11 April 2018.