Open main menu

Wikipedia β

Content analysis is a research method for studying documents and communication artifacts, which can be texts of various formats, pictures, audio or video. Social scientists use content analysis to quantify patterns in communication, in a replicable and systematic manner.[1] One of the key advantage of this research method is to analyse social phenomena in a non-invasive way, in contrast to simulating social experiences or collecting survey answers.

Practices and philosophies of content analysis vary between scholarly communities. They all involve systematic reading or observation of texts or artifacts which are assigned labels (sometimes called codes) to indicate the presence of interesting, meaningful patterns.[2][3] After labeling a large set of media, a researcher is able to statistically estimate the proportions of patterns in the texts, as well as correlations between patterns.

Computers are increasingly used in content analysis, to automate the labeling (or coding) of documents. Simple computational techniques can provide descriptive data such as word frequencies and document lengths. Machine learning classifiers can greatly increase the number of texts which can be labeled, but the scientific utility of doing so is a matter of debate.


Goals of Content AnalysisEdit

Content analysis is best understood as a broad family of techniques. Effective researchers choose techniques that best help them answer their substantive questions. That said, according to Klaus Krippendorff, six questions must be addressed in every content analysis:[4]

  1. Which data are analyzed?
  2. How are the data defined?
  3. From what population are data drawn?
  4. What is the relevant context?
  5. What are the boundaries of the analysis?
  6. What is to be measured?

The simplest and most objective form of content analysis considers unambiguous characteristics of the text such as word frequencies, the page area taken by a newspaper column, or the duration of a radio or television program. Analysis of simple word frequencies is limited because the meaning of a word depends on surrounding text. Keyword In Context routines address this by placing words in their textual context. This helps resolve ambiguities such as those introduced by synonyms and homonyms.

A further step in analysis is the distinction between dictionary-based (quantitative) approaches and qualitative approaches. Dictionary-based approaches set up a list of categories derived from the frequency list of words and control the distribution of words and their respective categories over the texts. While methods in quantitative content analysis in this way transform observations of found categories into quantitative statistical data, the qualitative content analysis focuses more on the intentionality and its implications. There are strong parallels between qualitative content analysis and thematic analysis.[5]

Computational ToolsEdit

More generally, content analysis is research using the categorization and classification of speech, written text, interviews, images, or other forms of communication. In its beginnings, using the first newspapers at the end of the 19th century, analysis was done manually by measuring the number of lines and amount of space given a subject. With the rise of common computing facilities like PCs, computer-based methods of analysis are growing in popularity. Answers to open ended questions, newspaper articles, political party manifestoes, medical records or systematic observations in experiments can all be subject to systematic analysis of textual data.

By having contents of communication available in form of machine readable texts, the input is analyzed for frequencies and coded into categories for building up inferences.


Robert Weber notes: "To make valid inferences from the text, it is important that the classification procedure be reliable in the sense of being consistent: Different people should code the same text in the same way".[6] The validity, inter-coder reliability and intra-coder reliability are subject to intense methodological research efforts over long years.[4] Neuendorf suggests that when human coders are used in content analysis two coders should be used. Reliability of human coding is often measured using a statistical measure of intercoder reliability or "the amount of agreement or correspondence among two or more coders".[7]

Kinds of TextEdit

There are five types of texts in content analysis:

  1. written text, such as books and papers
  2. oral text, such as speech and theatrical performance
  3. iconic text, such as drawings, paintings, and icons
  4. audio-visual text, such as TV programs, movies, and videos
  5. hypertexts, which are texts found on the Internet


Over the years, content analysis has been applied to a variety of scopes. Hermeneutics and philology have long used content analysis to interpret sacred and profane texts and, in not a few cases, to attribute texts' authorship and authenticity.[3][4]

In recent times, particularly with the advent of mass communication, content analysis has known an increasing use to deeply analyze and understand media content and media logic. The political scientist Harold Lasswell formulated the core questions of content analysis in its early-mid 20th-century mainstream version: "Who says what, to whom, why, to what extent and with what effect?".[8] The strong emphasis for a quantitative approach started up by Lasswell was finally carried out by another "father" of content analysis, Bernard Berelson, who proposed a definition of content analysis which, from this point of view, is emblematic: "a research technique for the objective, systematic and quantitative description of the manifest content of communication".[9]

Quantitative content analysis has enjoyed a renewed popularity in recent years thanks to technological advances and fruitful application in of mass communication and personal communication research. Content analysis of textual big data produced by new media, particularly social media and mobile devices has become popular. These approaches take a simplified view of language that ignores the complexity of semiosis, the process by which meaning is formed out of language. Quantitative content analysts have been criticized for appealing to statistical measures to justify the objectivity and systematic nature of their methods while ignoring the limitations of their approach[citation needed].

Recently, Arash Heydarian Pashakhanlou has argued for a combination of quantitative, qualitative, manual and computer-assisted in a single study to offset the weaknesses of a partial content analysis and enhance the reliability and validity of a research project.[10]

Content analysis can also be described as studying traces, which are documents from past times, and artifacts, which are non-linguistic documents. Texts are understood to be produced by communication processes in a broad sense of that phrase—often gaining mean through abduction.[3][11]

More elaborate descriptionEdit

The method of content analysis enables the researcher to include large amounts of textual information and systematically identify its properties, such as the frequencies of most used keywords by locating the more important structures of its communication content. Such amounts of textual information must be categorized to provide a meaningful reading of content under scrutiny. For example, David Robertson created a coding frame for a comparison of modes of party competition between British and American parties.[12] It was developed further in 1979 by the Manifesto Research Group aiming at a comparative content-analytic approach on the policy positions of political parties. This group created the Manifesto Project Database.

Since the 1980s, content analysis has become an increasingly important tool in the measurement of success in public relations (notably media relations) programs and the assessment of media profiles, such as political media slant—orientation towards one of the two major parties.[13][14] In 1982, John Naisbitt published his popular Megatrends, based on content analysis in the US media. In analyses of this type, data from content analysis is usually combined with media data (circulation, readership, number of viewers and listeners, frequency of publication). It has also been used by futurists to identify trends.

The creation of coding frames is intrinsically related to a creative approach to variables that influence textual content. In political analysis, these variables could be political scandals, the impact of public opinion polls, sudden events in external politics, inflation etc. Mimetic Convergence, created by Fátima Carvalho for the comparative analysis of electoral proclamations on free-to-air television, is an example of creative articulation of variables in content analysis.[15] The methodology describes the construction of party identities during long-term party competitions on TV, from a dynamic perspective, governed by the logic of the contingent. This method aims to capture the contingent logic observed in electoral campaigns by focusing on the repetition and innovation of themes sustained in party broadcasts. According to such post-structuralist perspective from which electoral competition is analysed, the party identities, 'the real' cannot speak without mediations because there is not a natural centre fixing the meaning of a party structure, it rather depends on ad-hoc articulations. There is no empirical reality outside articulations of meaning. Reality is an outcome of power struggles that unify ideas of social structure as a result of contingent interventions. In Brazil, these contingent interventions have proven to be mimetic and convergent rather than divergent and polarised, being integral to the repetition of dichotomised world-views.

Mimetic Convergence aims to show the process of fixation of meaning through discursive articulations that repeat, alter and subvert political issues that come into play. For this reason, parties are not taken as the pure expression of conflicts for the representation of interests (of different classes, religions, ethnic groups[16][17]) but attempts to recompose and re-articulate ideas of an absent totality around signifiers gaining positivity.

Every content analysis should depart from a hypothesis. The hypothesis of Mimetic Convergence supports the Downsian interpretation that in general, rational voters converge in the direction of uniform positions in most thematic dimensions. The hypothesis guiding the analysis of Mimetic Convergence between political parties' broadcasts is: 'public opinion polls on vote intention, published throughout campaigns on TV will contribute to successive revisions of candidates' discourses. Candidates re-orient their arguments and thematic selections in part by the signals sent by voters. One must also consider the interference of other kinds of input on electoral propaganda such as internal and external political crises and the arbitrary interference of private interests on the dispute. Moments of internal crisis in disputes between candidates might result from the exhaustion of a certain strategy. The moments of exhaustion might consequently precipitate an inversion in the thematic flux.

As an evaluation approach, content analysis is considered by some to be quasi-evaluation because content analysis judgements need not be based on value statements if the research objective is aimed at presenting subjective experiences. Thus, they can be based on knowledge of everyday lived experiences. Such content analyses are not evaluations. On the other hand, when content analysis judgements are based on values, such studies are evaluations.[18]

Qualitative content analysis is "a systematic, replicable technique for compressing many words of text into fewer content categories based on explicit rules of coding".[19] It often involves building and applying a "concept dictionary" or fixed vocabulary of terms on the basis of which words are extracted from the textual data for concording or statistical computation.


Holsti groups fifteen uses of content analysis into three basic categories:[20]

  • make inferences about the antecedents of a communication
  • describe and make inferences about characteristics of a communication
  • make inferences about the effects of a communication.

He also places these uses into the context of the basic communication paradigm.

The following table shows fifteen uses of content analysis in terms of their general purpose, element of the communication paradigm to which they apply, and the general question they are intended to answer.

Uses of Content Analysis by Purpose, Communication Element, and Question
Purpose Element Question Use
Make inferences about the antecedents of communications Source Who?
Encoding process Why?
  • Secure political & military intelligence
  • Analyse traits of individuals
  • Infer cultural aspects & change
  • Provide legal & evaluative evidence
Describe & make inferences about the characteristics of communications Channel How?
  • Analyse techniques of persuasion
  • Analyse style
Message What?
  • Describe trends in communication content
  • Relate known characteristics of sources to messages they produce
  • Compare communication content to standards
Recipient To whom?
Make inferences about the consequences of communications Decoding process With what effect?
Note. Purpose, communication element, & question from Holsti.[20] Uses primarily from Berelson[21] as adapted by Holsti.[20]

See alsoEdit


  1. ^ Alan., Bryman, (2011). Business research methods. Bell, Emma, 1968- (3rd ed.). Cambridge: Oxford University Press. ISBN 9780199583409. OCLC 746155102. 
  2. ^ Hodder, I. (1994). The interpretation of documents and material culture. Thousand Oaks etc.: Sage. p. 155. ISBN 0761926879. 
  3. ^ a b c Tipaldo, G. (2014). L'analisi del contenuto e i mass media. Bologna, IT: Il Mulino. p. 42. ISBN 978-88-15-24832-9. 
  4. ^ a b c Krippendorff, Klaus (2004). Content Analysis: An Introduction to Its Methodology (2nd ed.). Thousand Oaks, CA: Sage. p. 413. ISBN 9780761915454. 
  5. ^ Vaismoradi, Mojtaba; Turunen, Hannele; Bondas, Terese (2013-09-01). "Content analysis and thematic analysis: Implications for conducting a qualitative descriptive study". Nursing & Health Sciences. 15 (3): 398–405. doi:10.1111/nhs.12048. ISSN 1442-2018. 
  6. ^ Weber, Robert Philip (1990). Basic Content Analysis (2nd ed.). Newbury Park, CA: Sage. p. 12. ISBN 9780803938632. 
  7. ^ Neuendorf, Kimberly A. (2002). The Content Analysis Guidebook. Thousand Oaks, CA: Sage. p. 10. 
  8. ^ Lasswell, Harold Dwight (1948). Power and Personality. New York, NY. 
  9. ^ Berelson, B. (1952). Content Analysis in Communication Research. Glencoe: Free Press. p. 18. 
  10. ^ Heydarian Pashakhanlou, Arash (2017). "Fully integrated content analysis in international relations". International Relations. 
  11. ^ Timmermans, Stefan and Iddo Tavory (2012). "Theory Construction in Qualitative Research: From Grounded Theory to Abductive Analysis". Sociological Theory (30(3) ed.): 167–186. 
  12. ^ Robertson, David Bruce (1976). A theory of party competition. London and New York: J. Wiley. ISBN 0471727377. 
  13. ^ Gentzkow, Matthew and Jesse M. Shapiro (2007). "What Drives Media Slant? Evidence from U.S. Daily Newspapers". Econometrica. 78: 35–71. doi:10.3982/ecta7195. 
  14. ^ "Methods for Media Analysis". ReStore. Economic and Social Research Council. Retrieved 13 June 2013. 
  15. ^ Carvalho, Fátima Lampreia (2000). "Continuidade e Inovação: conservadorismo e política da comunicação no Brasil" [Continuity and Innovation: Conservatism and Politics of Communication in Brazil]. Journal Revista Brasileira de Ciencias Sociais. São Paulo. 15 (43): 147–162. doi:10.1590/S0102-69092000000200008. Retrieved 12 June 2013. 
  16. ^ Lipset, Seymour M.; Stein Rokkan (1967). Cleavage structures, party systems, and voter alignments: an introduction. Free Press. pp. 1–64. 
  17. ^ Lijphart, Arend (1984). Democracies: Patterns of majoritarian and consensus government in twenty-one countries. New Haven: Yale University Press. p. 229. ISBN 0300031157. 
  18. ^ Frisbie, Richard (7–11 April 1986). The use of microcomputer programs to improve the reliability and validity of content analysis in evaluation. Annual Meeting of the American Educational Research Association. San Francisco, CA. 
  19. ^ Stemler, Steve (2001). "An Overview of Content Analysis". Practical Assessment, Research & Evaluation. 7 (17). Retrieved 12 June 2013. 
  20. ^ a b c Holsti, Ole R. (1969). Content Analysis for the Social Sciences and Humanities. Reading, MA: Addison-Wesley. 
  21. ^ Berelson, Bernard (1952). Content Analysis in Communication Research. Glencoe, Ill: Free Press. 

Further readingEdit

  • Pashakhanlou, Arash Heydarian (2017). "Fully integrated content analysis in International Relations". International Relations. 31 (4): 447–465. doi:10.1177/0047117817723060. 
  • Graneheim, Ulla Hällgren; Lundman, Berit (2004). "Qualitative content analysis in nursing research: concepts, procedures and measures to achieve trustworthiness". Nurse Education Today. 24 (2): 105–112. doi:10.1016/j.nedt.2003.10.001. 
  • Budge, Ian (ed.) (2001). Mapping Policy Preferences. Estimates for Parties, Electors and Governments 1945-1998. Oxford, UK: Oxford University Press. ISBN 978-0199244003.
  • Krippendorff, Klaus, and Bock, Mary Angela (eds) (2008). The Content Analysis Reader. Thousand Oaks, CA: Sage. ISBN 978-1412949668.
  • Roberts, Carl W. (ed.) (1997). Text Analysis for the Social Sciences: Methods for Drawing Inferences from Texts and Transcripts. Mahwah, NJ: Lawrence Erlbaum. ISBN 978-0805817348.
  • Wimmer, Roger D. and Dominick, Joseph R. (2005). Mass Media Research: An Introduction, 8th ed. Belmont, CA: Wadsworth. ISBN 978-0534647186.