Psychological testing is the administration of psychological tests. Psychological tests are administered by trained evaluators. A person's responses are evaluated according to carefully prescribed guidelines. Scores are thought to reflect individual differences in the construct the test purports to measure.The science behind psychological testing is psychometrics.
According to Anastasi and Urbina, psychological tests involve observations made on a "carefully chosen sample [emphasis authors] of an individual's behavior." A psychological test is often designed to measure unobserved constructs, also known as latent variables. Psychological tests can include a series of tasks or problems that the respondent has to solve. Psychological tests can include questionnaires and interviews, which are also designed to measure unobserved constructs. Questionnaire- and interview-based scales typically differ from psychoeducational tests, which ask for a respondent's maximum performance. Questionnaire- and interview-based scales, by contrast, ask for the respondent's typical behavior. Symptom and attitude tests are more often called scales. A useful psychological test/scale must be both valid (i.e., there is evidence to support the idea that the test or scale measures what it is purported to measure and "how well it does so") and reliable (i.e., internally consistent or give consistent results over time, across raters, etc.).
It is important that people who are equal on the measured construct (e.g., mathematics ability, depression) have an approximately equal probability of answering a test item accurately or acknowledging the presence of a symptom. An example of an item on a mathematics test that might be used in the United Kingdom but not the United States could be the following: "In a football match two players get a red card; how many players are left on the pitch?" This item requires knowledge of football (soccer) to be answered correctly, not just mathematical ability. Thus, group membership can influence the chance of correctly answering items, as encapsulated in the concept of differential item functioning. Often tests are constructed for a specific population and the nature of that population should be taken into account when administering tests outside that population. If a test is invariant to one population (e.g. schoolchildren in the United Kingdom) it does not automatically mean that the test functions in much the same way in another population (e.g. schoolchildren in the United States).
Psychological assessment is similar to psychological testing but usually involves a more comprehensive assessment of the individual. Psychological assessment is a process that involves checking the integration of information from multiple sources, such as tests of normal and abnormal personality, tests of ability or intelligence, tests of interests or attitudes, as well as information from personal interviews. Collateral information is also collected about personal, occupational, or medical history, such as from records or from interviews with parents, spouses, teachers, or previous therapists or physicians. A psychological test is one of the sources of data used within the process of assessment; usually more than one test is used. Many psychologists do some level of assessment when providing services to clients or patients, and may use for example, simple checklists to osis for treatment settings; to assess a particular area of functioning or disability often for school settings; to help select type of treatment or to assess treatment outcomes; to help courts decide issues such as child custody or competency to stand trial; or to help assess job applicants or employees and provide career development counseling or training.
The first large-scale tests may have been examinations that were part of the imperial examination system in China. The test, an early form of psychological testing, assessed candidates based on their proficiency in topics such as civil law and fiscal policies. Other early tests of intelligence were made for entertainment rather than analysis. Modern mental testing began in France in the 19th century. It contributed to separating mental retardation from mental illness and reducing the neglect, torture, and ridicule heaped on both groups.
Englishman Francis Galton coined the terms psychometrics and eugenics, and developed a method for measuring intelligence based on nonverbal sensory-motor tests. It was initially popular, but was abandoned after the discovery that it had no relationship to outcomes such as college grades. French psychologist Alfred Binet, together with psychologists Victor Henri and Théodore Simon, after about 15 years of development, published the Binet-Simon test in 1905, which focused on verbal abilities. It was intended to identify mental retardation in school children.
The origins of personality testing date back to the 18th and 19th centuries, when personality was assessed through phrenology, the measurement of the human skull, and physiognomy, which assessed personality based on a person's outer appearances. These early pseudoscientific techniques were eventually replaced with more empirical methods in the 20th century. One of the earliest modern personality tests was the Woolworth Personality Data Sheet, a self-report inventory developed for World War I and used for the psychiatric screening of new draftees.
Proper psychological testing is conducted after vigorous research and development in contrast to quick web-based or magazine questionnaires that say "Find out your Personality Color," or "What's your Inner Age?" Proper psychological testing consists of the following:
- Standardization - All procedures and steps must be conducted with consistency and under the same environment to achieve the same testing performance from those being tested.
- Objectivity - Scoring such that subjective judgments and biases are minimized, with results for each test taker obtained in the same way.
- Test Norms - The average test score within a large group of people where the performance of one individual can be compared to the results of others by establishing a point of comparison or frame of reference.
- Reliability - Obtaining the same result after multiple testing.
- Validity - The type of test being administered must measure what it is intended to measure.
Sample of behaviorEdit
The term sample of behavior refers to an individual's performance on tasks that have usually been prescribed beforehand. The samples of behavior that make up a paper-and-pencil test, the most common type of psychological test, are a series of test items. Performance on these items produce a test score. A score on a well-constructed test is believed to reflect a psychological construct such as achievement in a school subject like mathematics knowledge, cognitive ability, aptitude, emotional functioning, personality, etc. Differences in test scores are thought to reflect individual differences in the construct the test is purported to measure.
There are several broad categories of psychological tests:
Achievement tests are tests that assess an individual's knowledge in a subject domain. Academic achievement tests are designed to be administered by a trained evaluator to an individual or a group of people. During achievement tests, a series of test items is presented to the person being evaluated. A score on a test is believed to reflect achievement in a school subject.
Many achievement tests are norm-referenced. The person's responses are scored according to standardized protocols and the results can be compared to the responses of a norming group after the test is completed.
Some achievement tests are criterion referenced, the purpose of which is find out if the test-taker mastered a predetermined body of knowledge rather than to compare the test-taker to everyone else who is taking the test.
The Kaufman Test of Educational Achievement is an example of an individually administered achievement test for students.
Psychological tests have been designed to measure specific abilities, such as clerical, perceptual, numerical, or spatial aptitude. Sometimes these tests must be specially designed for a particular job, but there are also tests available that measure general clerical and mechanical aptitudes, or even general learning ability. An example of an occupational aptitude test is the Minnesota Clerical Test, which measures the perceptual speed and accuracy required to perform various clerical duties. A widely used aptitude test in business is the Wonderlic Test. There are aptitudes that are believed to be related to specific occupations and are used for career guidance as well as selection and recruitment.
Evidence suggests that aptitude tests like IQ tests are sensitive to past learning and cannnot avoid measuring past achievement although they were once thought to measure untutored ability. The SAT, which used to be called the Scholastic Aptitude Test, had its named changed because performance on the test is sensitive to training.
An attitude scale assesses an individual's disposition regarding an event (e.g., a Supreme Court decision), person (e.g., a governor), concept (e.g., wearing face masks during a pandemic), organization (e.g., the Boy Scouts), or object (e.g., nuclear weapons) on a unidimensional favorable-unfavorable attitude continuum. Attitude scales are used in marketing to determine individuals' preferences for brands. Historically social psychologists have developed attitude scales to assess individuals' attitudes toward the United Nations and race relations. Typically Likert scales are used in attitude research. Historically, the Thurstone scale was used prior to the development of the Likert scale. The Likert scale has largely supplanted the Thurstone scale.
Biographical Information BlankEdit
The Biographical Information Blanks or BIB is a paper-and-pencil form that includes items that ask about detailed personal and work history. It is used to aid in the hiring of employees by matching the backgrounds of individuals to requirements of the job.
The purpose of clinical tests is assess the presence of symptoms of psychopathology . Examples of clinical assessments include the Minnesota Multiphasic Personality Inventory, Millon Clinical Multiaxial Inventory-IV, Child Behavior Checklist, Symptom Checklist 90 and the Beck Depression Inventory.
Clinical tests like the MMPI are also norm-referenced, with 50 the middlemost score on a symptom subscale such as the Depression scale and 60 a score that places the individual one standard deviation above the mean for the symptom scale.
A criterion-referenced test is an achievement test in a specific knowledge domain. An individual's performance on the test is compared to a criterion. Test-takers are not compared to each other. A passing score, i.e., the criterion performance, is established by the teacher or an educational institution. Criterion-referenced tests are part and parcel of mastery based education.
Psychological assessment can involve the observation of people as they complete activities. This type of assessment is usually conducted with families in a laboratory or at home. Sometimes the observation can involve children in a classroom or the schoolyard. The purpose may be clinical, such as to establish a pre-intervention baseline of a child's hyperactive or aggressive classroom behaviors or to observe the nature of a parent-child interaction in order to understand a relational disorder. Time sampling methods are also part of direct observational research. The reliability of observers in direct observational research can be evaluated using Cohen's kappa.
The Parent-Child Interaction Assessment-II (PCIA) is an example of a direct observation procedure that is used with school-age children and parents. The parents and children are video recorded playing at a make-believe zoo. The Parent-Child Early Relational Assessment is used to study parents and young children and involves a feeding and a puzzle task. The MacArthur Story Stem Battery (MSSB) is used to elicit narratives from children. The Dyadic Parent-Child Interaction Coding System-II tracks the extent to which children follow the commands of parents and vice versa and is well suited to the study of children with Oppositional Defiant Disorders and their parents.
Psychological tests include interest inventories. These tests are used primarily for career counseling. Interest inventories include items that ask about the preferred activities and interests of people seeking career counseling. The rationale is that if the individual's of activities and interests is similar to the modal pattern for people who are successful in a given occupation, then the chances are high that the individual would find satisfaction in that occupation. A widely used interest test is the Strong Interest Inventory, which is used in career assessment, career counseling, and educational guidance.
Neuropsychological tests are designed to be an objective and standardized measure of a sample of behavior.
Items on norm-referenced tests have been tried out on a norming group and scores on the test can be classified as high, medium, or low and the gradations in between. These tests allow for the study of individual differences. Scores on norm-referenced achievement tests are associated with percentile ranks vis-á-vis other individuals who are the test-taker's age or grade.
Personality tests assess constructs that are thought to be the constituents of personality. Examples of personality constructs include traits in the Big Five, such as introversion-extroversion and conscientiousness. Personality constructs are thought to be dimensional. Personality measures are used in research and in the selection of employees. They include self-report and observer-report scales. Examples of norm-referenced personality tests include the NEO-PI, the 16PF, the OPQ, and the FFPI-C.
Projective testing originated in the first half of the 1900s.
Examples of projective tests are story-telling, drawings, or sentence-completion tasks. 
Public safety employment testsEdit
Vocations within the public safety field (i.e., fire service, law enforcement, corrections, emergency medical services) often require Industrial and Organizational Psychology tests for initial employment and advancement throughout the ranks. The National Firefighter Selection Inventory - NFSI, the National Criminal Justice Officer Selection Inventory - NCJOSI, and the Integrity Inventory are prominent examples of these tests.
Many psychological and psychoeducational tests are not available to the public. Test publishers put restrictions on who has access to the test. Psychology licensing boards also restrict access to the tests used in licensing psychologists. Test publishers hold that both copyright and professional ethics require them to protect the tests. Publishers sell tests only to people who have proved their educational and professional qualifications. Purchasers are legally bound not to give test answers or the tests themselves to members of the public unless permitted by the publisher.
The International Test Commission (ITC), an international association of national psychological societies and test publishers, publishes the International Guidelines for Test Use, which prescribes measures to take to "protect the integrity" of the tests by not publicly describing test techniques and by not "coaching individuals" so that they "might unfairly influence their test performance."
- Urbina, Susana; Anastasi, Anne (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice Hall. p. 4. ISBN 9780023030857. OCLC 35450434.
- Mellenbergh, G.J. (2008). Chapter 10: Surveys. In H.J. Adèr & G.J. Mellenbergh (Eds.) (with contributions by D.J. Hand), Advising on Research Methods: A consultant's companion (pp. 183-209). Huizen, The Netherlands: Johannes van Kessel Publishing.
- American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
- Mellenbergh, Gideon J. (1989). "Item bias and item response theory". International Journal of Educational Research. 13 (2): 127–143. doi:10.1016/0883-0355(89)90002-5.
- Board of Trustees of the Society for Personality Assessment (2006). "Standards for Education and Training in Psychological Assessment" (PDF). Journal of Personality Assessment. 87 (3): 355–357. doi:10.1207/s15327752jpa8703_17. PMID 17134344.
- Robert J. Gregory (2003). "The History of Psychological Testing" (PDF). Psychological Testing : History, Principles, and Applications. Allyn & Bacon. p. 4 in chapter 1. ISBN 9780205354726.
- Shi, Jiannong (2 February 2004). Sternberg, Robert J. (ed.). International Handbook of Intelligence. Cambridge University Press. pp. 330–331. ISBN 978-0-521-00402-2.
- Kaufman, Alan S. (2009). IQ testing 101. Springer Pub. Co. ISBN 978-0826106292. OCLC 255892649.
- Gillham, Nicholas W. (2001). "Sir Francis Galton and the birth of eugenics". Annual Review of Genetics. 35 (1): 83–101. doi:10.1146/annurev.genet.35.102401.090055. PMID 11700278.
- Nezami, Elahe; Butcher, James N. (16 February 2000). Goldstein, G.; Hersen, Michel (eds.). Handbook of Psychological Assessment. Elsevier. p. 415. ISBN 978-0-08-054002-3.
- Schultz, Duane P.; Schultz, Sydney Ellen (2010). Psychology and work today: An introduction to industrial and organizational psychology (10th ed.). Upper Saddle River, N.J.: Prentice Hall. pp. 99–102. ISBN 978-0205683581. OCLC 318765451.
- Kaufman Test of Educational Achievement | Third Edition 
- Aiken, Lewis R. (1998). Tests and examinations: Measuring abilities and performance. Wiley. ISBN 9780471192633. OCLC 37820003.
- Ceci, S. J. (1991). How much does schooling influence general intelligence and its cognitive components? A reassessment of the evidence. Developmental Psychology, 27, 703–722. https://doi.org/10.1037/0012-16184.108.40.2063
- Lemann, N. (1999). The big test: The secret history of the American meritocracy. New York: Farrar, Strauss and Giroux.
- Brown, R. (1965). Social psychology. New York: The Free Press.
- Beck, A. T.; Steer, R. A.; Brown, G. K. (1996). Manual for the Beck Depression Inventory-II (2nd ed.). San Antonio, TX: Psychological Corporation.
- Millon, T. (1994). Millon Clinical Multiaxial Inventory-III. Minneapolis, MN: National Computer Systems.
- Achenbach, T. M.; Rescorla, Leslie A. (2001). Manual for the ASEBA school-age forms & profiles: An integrated system of multi-informant assessment. Burlington, Vt: ASEBA. ISBN 978-0938565734. OCLC 53902766.
- Derogatis L. R. (1983). SCL90: Administration, Scoring and Procedures Manual for the Revised Version. Baltimore: Clinical Psychometric Research.
- Beck, A. T.; Steer, R. A.; Brown, G. K. (1996). Manual for the Beck Depression Inventory-II (2nd ed.). San Antonio, TX: Psychological Corporation.
- Reid, J. B., Eddy, J. M., Fetrow, R. A., & Stoolmiller, M. (1999). Description and immediate impacts of a preventive intervention for conduct problems. American Journal of Community Psychology, 27, 483–517.
- Waters, E., & Deane, K.E. (1985). Defining and assessing individual differences in attachment relationships: Q-methodology and the organization of behavior in infancy and early childhood (pp. 41-65)Monographs of the Society for Research in Child Development, 50, 41-65.
- Holigrocki, R. J; Kaminski, P. L.; Frieswyk, S. H. (1999). "Introduction to the Parent-Child Interaction Assessment". Bulletin of the Menninger Clinic. 63 (3): 413–428. PMID 10452199.
- Clark, R (1999). "The Parent-Child Early Relational Assessment: A Factorial Validity Study". Educational and Psychological Measurement. 59 (5): 821–846. doi:10.1177/00131649921970161.
- Bretherton, I., Oppenheim, D., Buchsbaum, H., Emde, R. N., & the MacArthur Narrative Group. (1990). MacArthur Story-Stem battery. Unpublished manual.
- Robinson, Elizabeth A.; Eyberg, Sheila M. (1981). "The dyadic parent–child interaction coding system: Standardization and validation". Journal of Consulting and Clinical Psychology. 49 (2): 245–250. doi:10.1037/0022-006x.49.2.245. PMID 7217491.
- Ashton, M. C., (2017). Individual Differences and Personality (3rd ed.). Amsterdam: Elsevier.
- International Personality Item Pool.  Accessed July 14, 2020
- John D., Wasserman (2003). "Nonverbal Assessment of Personality and Psychopathology". In McCallum, Steve R. (ed.). Handbook of Nonverbal Assessment. New York: Kluwer Academic / Plenum Publishers. ISBN 978-0-306-47715-7. Retrieved 20 November 2010.
- Murray, Henry A. (1943). Thematic Apperception Test manual. Cambridge, MA: Harvard University Press. OCLC 223083.
- The Committee on Psychological Tests and Assessment (CPTA), American Psychological Association (1994). "Statement on the Use of Secure Psychological Tests in the Education of Graduate and Undergraduate Psychology Students". American Psychological Association.
It should be recognized that certain tests used by psychologists and related professionals may suffer irreparable harm to their validity if their items, scoring keys or protocols, and other materials are publicly disclosed.
- Kenneth R. Morel (2009-09-24). "Test Security in Medicolegal Cases: Proposed Guidelines for Attorneys Utilizing Neuropsychology Practice". Archives of Clinical Neuropsychology. 24 (7): 635–646. doi:10.1093/arclin/acp062. PMID 19778915. Retrieved 2009-11-08.
- Pearson Assessments (2009). "Legal Policies". Psychological Corporation. Archived from the original on 2011-07-15. Retrieved 2009-11-15.
- International Test Commission (2000) International Guidelines for Test Use
|Wikimedia Commons has media related to Psychological testing.|
- American Psychological Association webpage on testing and assessment
- British Psychological Society Psychological Testing Centre
- Guidelines of the International Test Commission
- List of Common Psychological Tests
- International Item Pool, an alternative and free source of items available for research on personality