Content validity

In psychometrics, content validity (also known as logical validity) refers to the extent to which a measure represents all facets of a given construct. For example, a depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. An element of subjectivity exists in relation to determining content validity, which requires a degree of agreement about what a particular personality trait such as extraversion represents. A disagreement about a personality trait will prevent the gain of a high content validity.[1]


Content validity is different from face validity, which refers not to what the test actually measures, but to what it superficially appears to measure. Face validity assesses whether the test "looks valid" to the examinees who take it, the administrative personnel who decide on its use, and other technically untrained observers. Content validity requires the use of recognized subject matter experts to evaluate whether test items assess defined content and more rigorous statistical tests than does the assessment of face validity. Content validity is most often addressed in academic and vocational testing, where test items need to reflect the knowledge actually required for a given topic area (e.g., history) or job skill (e.g., accounting). In clinical settings, content validity refers to the correspondence between test items and the symptom content of a syndrome. In educational research, content validity refers to the consensus of what course content (e.g., learning objectives) is essential for the curriculum (e.g., anatomy in physical therapist education[2]).


One widely used method of measuring content validity was developed by C. H. Lawshe. It is essentially a method for gauging agreement among raters or judges regarding how essential a particular item is. In an article regarding pre-employment testing, Lawshe (1975) [3] proposed that each of the subject matter expert raters (SMEs) on the judging panel respond to the following question for each item: "Is the skill or knowledge measured by this item 'essential,' 'useful, but not essential,' or 'not necessary' to the performance of the job?" According to Lawshe, if more than half the panelists indicate that an item is essential, that item has at least some content validity. Greater levels of content validity exist as larger numbers of panelists agree that a particular item is essential. Using these assumptions, Lawshe developed a formula termed the content validity ratio:   where   content validity ratio,   number of SME panelists indicating "essential",   total number of SME panelists. This formula yields values which range from +1 to -1; positive values indicate that at least half the SMEs rated the item as essential. The mean CVR across items may be used as an indicator of overall test content validity.

Lawshe (1975) provided a table of critical values for the CVR by which a test evaluator could determine, for a pool of SMEs of a given size, the size of a calculated CVR necessary to exceed chance expectation. This table had been calculated for Lawshe by his friend, Lowell Schipper. Close examination of this published table revealed an anomaly. In Schipper's table, the critical value for the CVR increases monotonically from the case of 40 SMEs (minimum value = .29) to the case of 9 SMEs (minimum value = .78) only to unexpectedly drop at the case of 8 SMEs (minimum value = .75) before hitting its ceiling value at the case of 7 SMEs (minimum value = .99). However, when applying the formula to 8 raters, the result from 7 Essential and 1 other rating yields a CVR of .75. If .75 was not the critical value, then 8 of 8 raters of Essential would be needed that would yield a CVR of 1.00. In that case, to be consistent with the ascending order of CVRs the value for 8 raters would have to be 1.00. That would violate the same principle because you would have the "perfect" value required for 8 raters, but not for ratings at other numbers of raters at either higher or lower than 8 raters. Whether this departure from the table's otherwise monotonic progression was due to a calculation error on Schipper's part or an error in typing or typesetting is unclear. Wilson, Pan & Schumsky (2012), seeking to correct the error, found no explanation in Lawshe's writings nor any publications by Schipper describing how the table of critical values was computed. Wilson and colleagues determined that the Schipper values were close approximations to the normal approximation to the binomial distribution. By comparing Schipper's values to the newly calculated binomial values, they also found that Lawshe and Schipper had erroneously labeled their published table as representing a one-tailed test when in fact the values mirrored the binomial values for a two-tailed test. Wilson and colleagues published a recalculation of critical values for the content validity ratio providing critical values in unit steps at multiple alpha levels.[4]

The table of values is the following one:[3]

N° of panelists Min. Value
5 .99
6 .99
7 .99
8 .75
9 .78
10 .62
11 .59
12 .56
20 .42
30 .33
40 .29

See alsoEdit


  1. ^ Pennington, Donald (2003). Essential Personality. Arnold. p. 37. ISBN 0-340-76118-0.
  2. ^ Pascoe, Michael (June 9, 2022). "An Assesment of Essential Anatomy Course Content in an Entry-Level Doctor of Physical Therapy Program". Medical Science Educator. 32 (4): 827–835. doi:10.1007/s40670-022-01574-1. PMC 9411453. PMID 36035529.
  3. ^ a b Lawshe, Charles H. (1975). "A Quantitative Approach to Content Validity". Personnel Psychology. 28 (4): 563–575. CiteSeerX doi:10.1111/j.1744-6570.1975.tb01393.x. S2CID 34660500.
  4. ^ Wilson, F. Robert; Pan, Wei; Schumsky, Donald A. (2012). "Recalculation of the Critical Values for Lawshe's Content Validity Ratio". Measurement and Evaluation in Counseling and Development. Informa UK Limited. 45 (3): 197–210. doi:10.1177/0748175612440286. ISSN 0748-1756. S2CID 145201317.

External linksEdit

  • Handbook of Management Scales, a Wikibook containing previously used multi-item scales to measure constructs in the empirical management research literature. For many scales, content validity is discussed.