The Implicit Association Test (IAT) is a widely used psychological tool designed to measure implicit biases and attitudes towards certain groups, individuals, or concepts. Developed by social psychologists Anthony Greenwald and Mahzarin Banaji in the late 1990s, the purpose of the IAT is to uncover unconscious biases that individuals may hold, which can often contradict their explicit beliefs and values. This test is based on the idea that individuals may have implicit biases that they are not aware of, which can influence their thoughts and behaviors. The IAT measures these biases through a series of timed tasks and responses, providing insight into the automatic associations that individuals have with different social groups. In this essay, we will explore the purpose and methodology of the Implicit Association Test and its implications for understanding human behavior.

The Implicit Association Test (IAT) is a measure within social psychology designed to detect the strength of a person’s automatic association between mental representations of objects (concepts) in memory. The IAT was introduced in the scientific literature in 1998 by Anthony Greenwald, Debbie McGhee, and Jordan Schwartz. The IAT is now widely used in social psychology research and is used to some extent in clinical, cognitive, and developmental psychology research. Although some controversy still exists regarding the IAT and what it measures, much research into its validity and psychometric properties has been conducted since its introduction into the literature.

History

Implicit Cognition and Measurement

In 1995, social psychology researchers Anthony Greenwald and Mahzarin Banaji proposed the extension of ideas already existing in cognitive psychology to social psychology. They asserted that the idea of implicit and explicit memory can apply to social constructs as well. If memories that are not accessible to awareness can influence our actions, associations can also influence our attitudes and behavior. Thus, measures that tap into individual differences in associations of concepts should be developed. This would allow researchers to understand attitudes that cannot be measured through explicit self-report methods due to lack of awareness or social desirability bias.

Application and use

A computer-based measure, the IAT requires that users rapidly categorize two target concepts with an attribute (e.g. the concepts “male” and “female” with the attribute “logical”), such that easier pairings (faster responses) are interpreted as more strongly associated in memory than more difficult pairings (slower responses).

The IAT is thought to measure implicit attitudes: “introspectively unidentified (or inaccurately identified) traces of past experience that mediate favorable or unfavorable feeling, thought, or action toward social objects.” In research, the IAT has been used to develop theories to understand implicit cognition (i.e. cognitive processes of which a person has no conscious awareness). These processes may include memory, perception, attitudes, self-esteem, and stereotypes. Because the IAT requires that users make a series of rapid judgments, researchers believe that IAT scores may reflect attitudes which people are unwilling to reveal publicly. The IAT may allow researchers to get around the difficult problem of social desirability bias and for that reason it has been used extensively to assess people’s attitudes towards commonly stigmatized groups.

IAT Procedure

A typical IAT procedure involves a series of seven tasks. In the first task, an individual is asked to categorize stimuli into two categories. For example, a person might be presented with a computer screen on which the word “Black” appears in the top left-hand corner and the word “White” appears in the top right-hand corner. In the middle of the screen is a word, such as a first name, that is typically associated with either the categories of “Black” or “White.” For each word that appears in the middle of the screen, the person is asked to sort the word into the appropriate category by pressing the appropriate left-hand or right-hand key. On the second task, the person would complete a similar sorting procedure with an attribute of some kind. For example, the word “Pleasant” might now appear in the top left-hand corner of the screen and the word “Unpleasant” in the top right-hand corner. In the middle of the screen would appear a word that is either pleasant or unpleasant. Once again, the person would be asked to sort each word as being either pleasant or unpleasant by pressing the appropriate key. On the third task, individuals are asked to complete a combined task that includes both the categories and attributes from the first two tasks. In this example, the words “Black/Pleasant” might appear in the top left-hand corner while the words “White/Unpleasant” would appear in the top right-hand corner. Individuals would then see a series of stimuli in the center of the screen consisting of either a name or word. They would be asked to press the left-hand key if the name or word belongs to the “Black/Pleasant” category or the right-hand key if it belongs to the “White/Unpleasant” category. The fourth task is a repeat of the third task but with more repetitions of the names, words, or images.

The fifth task is a repeat of the first task with the exception that the position of the two target words would be reversed. For example, “Black” would now appear in the top right-hand corner of the screen and “White” in the top left-hand corner. The sixth task would be a repeat of the third, combined task with the exception that “Black/Pleasant” would now appear in the top right-hand corner and “White/Unpleasant” would now appear in the top left-hand corner. The seventh task is a repeat of the seventh task but with more repetitions of the names, words, or images. If the categories under study (e.g. Black or White) are differentially associated with the presented attributes (e.g. Pleasant/Unpleasant), you would expect that one of the combined tasks (the third/fourth or the sixth/seventh task) would be considerably easier for the participant.

Variations of the IAT include the Go/No-go Association Test (GNAT), the Brief-IAT and the Single-Category IAT. An idiographic approach using the IAT and the SC-IAT for measuring implicit anxiety showed that personalized stimulus selection did not affect the outcome, reliabilities and correlations to outside criteria.

Types of IATs

Valence IAT

Valence IATs measure associations between concepts and positive or negative valence. They are generally interpreted as a preference for one category over another. For example, the Race IAT shows that most White individuals have an implicit preference for Whites over Blacks. On the other hand, approximately half of Black individuals prefer Blacks over Whites. Similarly, the Age IAT generally shows that most individuals have an implicit preference for young over old, regardless of the age of the person taking the IAT. Some other valence IATs include the Weight IAT, the Sexuality IAT, the Arab-Muslim IAT, and the Skin-tone IAT.

Stereotype IAT

Stereotype IATs measure associations between concepts that often reflect the strength to which a person holds a particular societal stereotype. For example, the Gender-Science IAT reveals that most people associate women more strongly with liberal arts and men more strongly with science. Similarly, the Gender-Career IAT indicates that most people associate women more strongly with family and men more strongly with careers. The Asian IAT shows that many people more strongly associate Asian Americans with foreign landmarks and European Americans more strongly with American landmarks. Some other stereotype IATs include the Weapons IAT and the Native IAT.

Self-esteem IAT

The Self-esteem IAT measures implicit self-esteem by pairing “self” and “other” words with words of positive and negative valence. Those who find it easier to pair “self” with positive words than negative words are purported to have higher implicit self-esteem. Generally, measures of implicit self-esteem, including the IAT, are not strongly related to one another and are not strongly related to explicit measures of self-esteem.

Brief IAT

The Brief IAT (BIAT) uses a similar procedure to the standard IAT but requires fewer classifications. It involves four tasks rather than seven and only uses combined tasks (corresponding most closely to tasks 3, 4, 6, and 7 on the standard IAT). Additionally, it requires specification of a focal category in each task. For example, rather than focusing on “White” and “Black” as in the standard Race IAT, it asks participants to focus on one of these concepts in the first task and the other in the second task.

Child IAT

The Child IAT (Ch-IAT) allows for children as young as four years of age to take the IAT. Rather than words and pictures, the Ch-IAT uses sound and pictures. For example, positive and negative valence are indicated with smiling and frowning faces. Positive and negative words to be classified are voiced out loud to children.

Studies using the Ch-IAT have revealed that six-year old White children, ten-year old White children, and White adults have comparable implicit attitudes on the Race IAT.

Criticism and controversy

The IAT has engendered some controversy in both the scientific literature and in the public sphere (e.g., in the Wall Street Journal. For example, it has been interpreted as assessing familiarity, perceptual salience asymmetries, or mere cultural knowledge irrespective of personal endorsement of that knowledge. A more recent critique argued that there is a lack of empirical research justifying the diagnostic statements that are given to the lay public. For instance, feedback may report that someone has a [slight/moderate/strong] automatic preference for [European Americans/African Americans]. Proponents of the IAT have responded to these charges, but the debate continues. According to The New York Times, “there isn’t even that much consistency in the same person’s scores if the test is taken again”. In addition, researchers have recently claimed that results of the IAT might be biased by the participant’s lacking cognitive capability to adjust to switching categories, thus biasing results in favor of the first category pairing (e.g., pairing “Asian” with positive stimuli first, instead of pairing “Asian” with negative stimuli first).

Some of these issues have been settled in the research literature, but others continue to inspire debate among researchers and lay people alike.

Validity Research

Since its introduction into the scientific literature in 1998, a great deal of research has been conducted in order to examine the psychometric properties of the IAT as well as to address other criticisms on validity and reliability.

Construct Validity

The IAT is purported to measure relative strength of associations. However, some researchers have asserted that the IAT may instead be measuring constructs such as salience of attributes or cultural knowledge.

Predictive validity

A recent meta-analysis has concluded that the IAT has predictive validity independent of the predictive validity of explicit measures. Further, the IAT tends to be a better predictor of behavior in socially sensitive contexts (e.g., discrimination and suicidal behaviour) than traditional ‘explicit’ self-report methods, whereas explicit measures tend to be better predictors of behavior in less socially sensitive contexts (e.g., political preferences). Specifically, the IAT has been shown to predict voting behavior (e.g., ultimate candidate choice of undecided voters), mental health (e.g., a self-injury IAT differentiated between adolescents who injured themselves and those who did not), medical outcomes (e.g., medical recommendations by physicians), employment outcomes (e.g., interviewing Muslim-Arab versus Swedish job applicants), and education outcomes (e.g., gender-science stereotypes predict gender disparities in nations’ science and math test scores).

In applied settings, the IAT has been used in marketing and industrial psychology. For example, in determining the predictors of risk-taking behaviour in general aviation, attitudes towards risky flight behaviour as measured through an IAT have shown to be a more accurate forecast of risky flight behaviour than traditional explicit attitude or personality scales. The IAT has also been used in clinical psychology research to test the hypothesis that implicit associations may be a causal factor in the development of anxiety disorders.

Salience asymmetry

Researchers have argued that the IAT may measure salience of concepts rather than associations. Whereas IAT proponents claim that faster response times when pairing concepts indicate stronger associations, critics claim that faster response times indicate that concepts are similar in salience (and slower response times indicate that concepts differ in salience). There is some support for this claim. For example, in an old-young IAT, old faces would be more salient than young faces. As a result, researchers created an old-young IAT that involved pairing young and old faces with neutral words (non-salient attribute) and non-words (salient attribute). Response times were faster when old faces (salient) were paired with non-words (salient) than when old faces (salient) were paired with neutral words (non-salient), supporting the assertion that faster response time can be facilitated by matching salience.

Although proponents of the IAT acknowledge that it may be influenced by salience asymmetry, they argue that this does not preclude interpreting the IAT as a measure of associations.

Culture versus person

Another criticism of the IAT is that it may measure associations that are picked up from cultural knowledge rather than associations actually residing within a person. The counter-argument is that such associations may indeed arise from the culture, but they can nonetheless influence behavior.

To address the possibility that the IAT picks up on cultural knowledge rather than beliefs that are present in a person, some critics of the standard IAT created the personalized IAT. The primary difference between a standard valence IAT and the personalized IAT is that rather than using pleasant and unpleasant words as category labels, it uses “I like” and “I don’t like” as category labels. Additionally, the Personalized IAT does not provide error feedback for an incorrect response as in the standard IAT. This form of the IAT is more strongly related to explicit self-report measures of bias.

Proponents of the standard IAT argue that the Personalized IAT increases the likelihood that those taking it will evaluate the concept rather than classify it. This would increase its relationship with explicit measures without necessarily removing the effect of cultural knowledge. In fact, some researchers have examined the relationship between perceptions of general American attitudes and Personalized IAT scores and have concluded that the relationship between the IAT and cultural knowledge is not decreased by personalizing it. However, it is important to note that there was no relationship between cultural knowledge and standard IAT scores either.

Internal Validity

Fakeability

The IAT has also demonstrated a reasonable amount of resistance to social desirability bias. Individuals asked to fake their responses on the IAT have demonstrated difficulty in doing so. For example, participants who were asked to present a positive impression of themselves were able to do so on a self-report measure of anxiety but not an IAT measuring anxiety. Nonetheless, faking is possible, and recent research indicates that the most effective method of faking the IAT is to intentionally slow down responses for pairings that should be relatively easy. Most subjects, however, do not discover this strategy on their own, so faking is relatively rare. An algorithm developed to estimate IAT faking can identify those who are faking with approximately 75% accuracy.

Familiarity

A common criticism of the IAT is that it may be difficult to associate positive attributes with less familiar concepts. For example, if a person has had less contact with members of a particular ethnic group, he or she may have a more difficult time associating members of that ethnic group with positive words simply because of this lack of familiarity. There is some evidence against the familiarity based on studies that have ensured equal familiarity with the African American and White names as well as the faces appearing on the Race IAT.

Order

As the IAT relies on a comparison of response times in different tasks pairing concepts and attributes, researchers and others taking the IAT have speculated that the pairing on the first combined task may affect performance on the next combined task. For example, a participant who begins a gender stereotype IAT by pairing female names with family words may subsequently find the task of pairing female names with career words more difficult. Research has indeed shown a small effect of order. As a result, it is recommended to increase the number of classifications required in the fifth IAT task. This gives participants more practice before doing the second pairing, thus reducing the order effect.

Cognitive fluency and Age

The IAT is influenced by individual differences in average IAT response times such that those with slower overall response times tend to have more extreme IAT scores. Older subjects also tend to have more extreme IAT scores, and this may be related to cognitive fluency, or slower overall response times.

An improved scoring algorithm for the IAT, which reduces the effect of cognitive fluency on the IAT, has been introduced. A summary of the scoring algorithm can be found on Dr. Anthony Greenwald’s webpage.

Experience with the IAT

Repeated administrations of the IAT tend to decrease the magnitude of the effect for a particular person. This issue is somewhat ameliorated with the improved scoring algorithm. An additional safeguard to control for IAT experience is to include a different type of IAT as a comparison. This allows researchers to evaluate the degree of magnitude decrease when administering subsequent IATs.

Reliability

The IAT demonstrates satisfactory internal consistency and test-retest reliability. However, IAT scores do seem to vary between multiple administrations, indicating that it may measure a combination of trait (stable characteristics of people) and state (subject to variation based on situation-specific circumstances) characteristics. One example of the latter case is that scores on the Race IAT are known to be less biased against African Americans when those taking it imagine positive Black exemplars beforehand (e.g., Martin Luther King).

In popular culture

After establishing the IAT in the scientific literature, Dr. Anthony Greenwald, along with Mahzarin Banaji (Professor of Psychology at Harvard University) and Brian Nosek (Associate Professor of Psychology at the University of Virginia), co-founded Project Implicit, a virtual laboratory and educational outreach organization that facilitates research on implicit cognition.

The IAT has also been profiled in major media outlets (e.g. in the Washington Post) and in the popular book Blink, where it was suggested that one could score better on the implicit racism test by visualizing respected black leaders such as Nelson Mandela.