theory of mind task selector

Theory of Mind Task Selector

Purpose

This skill encodes expert knowledge for selecting, administering, and interpreting Theory of Mind (ToM) assessments. It provides a construct taxonomy, task selection decision trees, age-appropriate recommendations, psychometric properties, and guidance on confounds. A general-purpose programmer would not know which ToM tasks are appropriate for which populations, the developmental sequence of ToM abilities, or the psychometric limitations of common measures.

When to Use This Skill

Selecting a ToM measure for a developmental, clinical, or adult study
Matching a ToM task to the target population (children, adults, ASD, brain injury, aging)
Designing a comprehensive ToM assessment battery
Evaluating the psychometric properties of a proposed ToM measure
Identifying confounds (language, executive function, IQ) that may affect ToM task performance
Interpreting ceiling/floor effects in ToM data

Research Planning Protocol

Before executing the domain-specific steps below, you MUST:

State the research question -- What specific question is this analysis/paradigm addressing?
Justify the method choice -- Why is this approach appropriate? What alternatives were considered?
Declare expected outcomes -- What results would support vs. refute the hypothesis?
Note assumptions and limitations -- What does this method assume? Where could it mislead?
Present the plan to the user and WAIT for confirmation before proceeding.

For detailed methodology guidance, see the research-literacy skill.

⚠️ Verification Notice

This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.

ToM Construct Taxonomy

Developmental Hierarchy

ToM develops in a predictable sequence (Wellman & Liu, 2004). Tasks should be matched to the expected level:

Level Construct Age of Emergence Key Task Source

1 Diverse desires ~3 years Diverse desires task Wellman & Liu, 2004

2 Diverse beliefs ~3-4 years Diverse beliefs task Wellman & Liu, 2004

3 Knowledge access ~4 years Knowledge access task Wellman & Liu, 2004

4 First-order false belief ~4-5 years Sally-Anne (Wimmer & Perner, 1983) Wellman et al., 2001

5 Hidden emotion ~5-6 years Appearance-reality emotion task Wellman & Liu, 2004

6 Second-order false belief ~6-7 years Ice-cream van task Perner & Wimmer, 1985

7 Faux pas recognition ~9-11 years Faux pas stories Baron-Cohen et al., 1999

8 Advanced/adult ToM Adolescence-adult Strange Stories, RMET Happe, 1994; Baron-Cohen et al., 2001

Construct Dimensions

Dimension Description Example Tasks

Belief attribution Understanding others' beliefs, especially false beliefs Sally-Anne, unexpected contents

Desire attribution Understanding others' desires differ from one's own Diverse desires task

Intention attribution Understanding goal-directed action and intentionality Intentional vs. accidental actions

Emotion attribution Understanding others' emotions from context/cues Hidden emotion, RMET

Visual perspective-taking Level 1: what others see; Level 2: how others see it Director task, Flavell tasks

Implicit/spontaneous ToM Automatic, non-verbal ToM processing Anticipatory looking, VoE paradigms

Task Selection Decision Tree

By Age Group

What is the participant's age? | +-- Infants (6-24 months) | --> Implicit ToM tasks only | --> Anticipatory looking (Southgate et al., 2007) | --> Violation-of-expectation (Onishi & Baillargeon, 2005) | +-- Preschoolers (3-5 years) | --> Wellman & Liu (2004) scale (5 tasks) | --> Sally-Anne / Change of location (Wimmer & Perner, 1983) | --> Unexpected contents / Smarties task (Gopnik & Astington, 1988) | +-- School-age (6-12 years) | --> Second-order false belief (Perner & Wimmer, 1985) | --> Faux pas stories (Baron-Cohen et al., 1999) | --> Strange Stories (Happe, 1994) -- simplified versions | +-- Adolescents and Adults --> Strange Stories (Happe, 1994) --> RMET (Baron-Cohen et al., 2001) --> Director task (Keysar et al., 2003) --> Faux pas test (Baron-Cohen et al., 1999) --> Movie for the Assessment of Social Cognition (MASC; Dziobek et al., 2006)

By Population

What is the target population? | +-- Typically developing children | --> Wellman & Liu (2004) scale (most validated) | --> Standard false belief tasks | +-- Autism spectrum (children) | --> Sally-Anne (Baron-Cohen et al., 1985) | --> Unexpected contents (Perner et al., 1989) | --> Happe Strange Stories (if verbal) | | | NOTE: Many autistic individuals pass standard false | belief tasks by age 6-8. Use advanced tasks to | avoid ceiling effects (Happe, 1994). | +-- Autism spectrum (adults) | --> RMET (Baron-Cohen et al., 2001) | --> Faux pas test (Baron-Cohen et al., 1999) | --> MASC (Dziobek et al., 2006) | --> Director task (Keysar et al., 2003) | +-- Brain injury / neurological | --> Faux pas test (Stone et al., 1998) | --> Strange Stories (Happe, 1994) | --> RMET (Baron-Cohen et al., 2001) | --> Yoni task (Shamay-Tsoory & Aharon-Peretz, 2007) | +-- Aging / dementia --> Faux pas test (Gregory et al., 2002) --> RMET (Baron-Cohen et al., 2001) --> Strange Stories (Happe, 1994) --> Note: control for processing speed and working memory

By Construct

Key Tasks with Parameters

First-Order False Belief: Sally-Anne Task

Property Value Source

Original citation Wimmer & Perner, 1983; Baron-Cohen et al., 1985

Age range 3-6 years (standard); used in ASD at any age Wellman et al., 2001

Administration Acted out with dolls/puppets or illustrated story Baron-Cohen et al., 1985

Test question "Where will Sally look for her marble?"

Control questions Reality question + memory question (must pass both) Baron-Cohen et al., 1985

Scoring Pass/fail (binary)

Passing criterion Correct test question + both control questions Baron-Cohen et al., 1985

Typical passing rates ~20% at 3 years, ~50% at 4 years, ~90% by 5-6 years Wellman et al., 2001

Limitations Ceiling by age 6; binary scoring limits sensitivity Wellman et al., 2001

Unexpected Contents (Smarties Task)

Property Value Source

Original citation Gopnik & Astington, 1988; Perner et al., 1987

Age range 3-6 years Gopnik & Astington, 1988

Administration Show container (e.g., Smarties box) with unexpected contents (e.g., pencils)

Test question "What will [name] think is in the box?" (other's belief)

Self question "What did you think was in the box before I opened it?" (own prior belief)

Scoring Pass: predicts other will say "Smarties" (or typical contents)

Second-Order False Belief

Property Value Source

Original citation Perner & Wimmer, 1985

Age range 6-9 years Perner & Wimmer, 1985

Construct "She thinks that he thinks that..."

Administration Story scenario (ice-cream van paradigm) Perner & Wimmer, 1985

Test question "Where does Mary think John will go to buy ice cream?"

Passing rates ~10% at 5 years, ~50% at 7 years, ~90% by 9 years Perner & Wimmer, 1985

Comprehension questions 2-3 memory/comprehension checks required Standard practice

RMET (Reading the Mind in the Eyes Test)

Property Value Source

Original citation Baron-Cohen et al., 2001

Version Revised version (2001) -- 36 items Baron-Cohen et al., 2001

Age range Adults (16+ years); child version available (28 items) Baron-Cohen et al., 2001

Administration Forced-choice: pick 1 of 4 mental state words matching eye region photo

Scoring Total correct out of 36 (adults) or 28 (children)

Adult norms Mean ~ 26.2 (SD ~ 3.6) in typical adults Baron-Cohen et al., 2001

ASD norms Mean ~ 21.9 (SD ~ 6.6) in autistic adults Baron-Cohen et al., 2001

Reliability Internal consistency: Cronbach's alpha ~ 0.60-0.70 (modest) Olderbak et al., 2015

Limitations Low reliability, possible confound with emotion recognition vs. ToM per se Olderbak et al., 2015

Faux Pas Test

Property Value Source

Original citation Baron-Cohen et al., 1999

Age range 9 years to adult Baron-Cohen et al., 1999

Administration Read 10 faux pas stories + 10 control stories

Questions per story Detection ("Did someone say something awkward?"), identification, belief, empathy Baron-Cohen et al., 1999

Scoring 0-2 points per question; max 60 for faux pas stories Baron-Cohen et al., 1999

Control stories Must also score comprehension questions for controls

Sensitivity Good for detecting subtle ToM deficits in ASD, right hemisphere lesions, frontotemporal dementia Stone et al., 1998; Gregory et al., 2002

Strange Stories (Happe, 1994)

Property Value Source

Original citation Happe, 1994

Construct Advanced ToM: irony, white lie, double bluff, misunderstanding, persuasion, appearance/reality, figure of speech, sarcasm, forgetting, contrary emotions

Administration Read vignettes, open-ended question: "Why did X say that?"

Scoring 0 (incorrect), 1 (partial), 2 (full mental state reference) Happe, 1994

Number of stories 8-16 ToM stories + physical control stories Happe, 1994

Age range Children (8+) and adults Happe, 1994

Reliability Inter-rater reliability for scoring: kappa > 0.85 recommended Happe, 1994

Director Task (Visual Perspective-Taking)

Property Value Source

Original citation Keysar et al., 2003

Construct Level 2 perspective-taking under communicative demand

Administration Grid of objects; director (behind grid) instructs participant to move objects; some slots occluded from director's view

Measure Eye movements (egocentric intrusions), accuracy, RT Keysar et al., 2003

Key finding Even adults show egocentric errors on ~30-50% of critical trials Keysar et al., 2003

Age range 7 years to adult Dumontheil et al., 2010

See references/task-database.md for the full task list with administration protocols.

Psychometric Considerations

Reliability Summary

Task Internal Consistency Test-Retest Source

Sally-Anne (single item) N/A (binary) Variable Wellman et al., 2001

Wellman & Liu Scale Guttman scalability > 0.90 Moderate Wellman & Liu, 2004

RMET alpha ~ 0.60-0.70 r ~ 0.63-0.83 Olderbak et al., 2015; Fernandez-Abascal et al., 2013

Faux pas test alpha ~ 0.70-0.80 Not well-established Baron-Cohen et al., 1999

Strange Stories Inter-rater: kappa > 0.85 Moderate Happe, 1994

MASC alpha ~ 0.70 Adequate Dziobek et al., 2006

Validity Concerns

Ceiling effects: Standard false belief tasks show ceiling by age 5-6 in typical children. Use Wellman & Liu scale or advanced tasks (Wellman & Liu, 2004).
Floor effects: RMET and faux pas tests may show floor effects in clinical populations with severe deficits. Consider graded scoring.
Ecological validity: Structured ToM tasks may not predict real-world social behavior (German & Hehman, 2006).
Task purity: No ToM task measures only ToM. All tasks involve language, memory, executive function, and attention.

Confounds and Controls

Language

Confound Impact Mitigation Source

Verbal demands False belief tasks require comprehension of complex sentences Include vocabulary/language control measure Milligan et al., 2007

Narrative complexity Second-order tasks have heavy memory load Add comprehension check questions Perner & Wimmer, 1985

Word knowledge (RMET) Vocabulary confound in forced-choice emotion labels Control for verbal IQ Olderbak et al., 2015

Executive Function

Confound Impact Mitigation Source

Inhibitory control Must inhibit own knowledge to attribute false belief Include inhibition measure (e.g., Stroop, day-night) Carlson & Moses, 2001

Working memory Must hold multiple perspectives simultaneously Control for WM span Carlson & Moses, 2001

Cognitive flexibility Must switch between self and other perspective Include set-shifting measure Carlson & Moses, 2001

Recommended Control Measures

For any ToM study, include at minimum:

Verbal ability: Receptive vocabulary (e.g., PPVT) or verbal IQ subscale
Inhibitory control: Age-appropriate inhibition task
Working memory: Forward/backward digit span or equivalent
Non-ToM comprehension: Physical causality control stories (for Strange Stories and faux pas)

Task Combination Recommendations

Comprehensive Battery by Population

Population Recommended Battery Rationale

Preschool (3-5y) Wellman & Liu Scale (5 tasks) + diverse desires + diverse beliefs Guttman-scalable, captures developmental progression (Wellman & Liu, 2004)

School-age (6-12y) First-order FB + second-order FB + faux pas + Strange Stories subset Spans implicit to advanced ToM

ASD (children) Sally-Anne + unexpected contents + Strange Stories (simplified) Avoids ceiling; includes advanced items

ASD (adults) RMET + faux pas + MASC + Director task Multiple constructs; includes real-time and reflective tasks

Neurological (adults) Faux pas + Strange Stories + RMET Sensitive to frontal and right hemisphere lesions (Stone et al., 1998)

Aging research Faux pas + RMET + Strange Stories Control for processing speed; established aging norms

Minimum Battery (2-3 tasks)

If time is limited, prioritize:

One false belief task (for belief attribution)
Faux pas or Strange Stories (for advanced ToM / social reasoning)
RMET (for emotion/mental state recognition -- if construct-relevant)

Common Pitfalls

Using a single task as the sole ToM measure: ToM is multidimensional. Single tasks have low reliability and capture only one construct. Use a battery (Wellman & Liu, 2004).
Ignoring ceiling/floor effects: Standard false belief tasks ceiling by age 5-6. The RMET has modest reliability. Check for restricted range.
Not controlling for language: Most ToM tasks have substantial verbal demands. Group differences in ToM may reflect language differences, especially in ASD (Milligan et al., 2007).
Confounding ToM with executive function: False belief tasks require inhibitory control. Include EF measures and control statistically or use low-EF-demand tasks (Carlson & Moses, 2001).
Age-inappropriate task selection: Giving first-order false belief to adults (ceiling) or faux pas to 4-year-olds (floor). Match task to developmental level.
Treating the RMET as a pure ToM measure: The RMET has low reliability (alpha ~ 0.60-0.70) and may measure emotion recognition more than mental state inference (Olderbak et al., 2015).
Assuming failed performance = absent ToM: Implicit/anticipatory looking studies suggest infants may have ToM understanding that explicit tasks fail to capture (Onishi & Baillargeon, 2005). Distinguish competence from performance.
Not including control stories: For faux pas and Strange Stories, physical/non-mental-state control stories are essential to rule out general comprehension deficits.

Minimum Reporting Checklist

ToM construct(s) targeted (belief, desire, emotion, perspective-taking)
Task(s) used with full citation and version
Administration method (live, video, computerized)
Scoring criteria and inter-rater reliability (for open-ended tasks)
Control questions included and pass rates
Confound measures included (language, EF, IQ)
Ceiling/floor analysis: report distribution of scores, not just means
Age and developmental level of participants
Clinical classification criteria (if clinical population)
Effect sizes and confidence intervals for group comparisons

References

Baron-Cohen, S., Leslie, A. M., & Frith, U. (1985). Does the autistic child have a "theory of mind"? Cognition, 21(1), 37-46.
Baron-Cohen, S., O'Riordan, M., Stone, V., Jones, R., & Plaisted, K. (1999). Recognition of faux pas by normally developing children and children with Asperger syndrome or high-functioning autism. Journal of Autism and Developmental Disorders, 29(5), 407-418.
Baron-Cohen, S., Wheelwright, S., Hill, J., Raste, Y., & Plumb, I. (2001). The "Reading the Mind in the Eyes" test revised version. Journal of Child Psychology and Psychiatry, 42(2), 241-251.
Carlson, S. M., & Moses, L. J. (2001). Individual differences in inhibitory control and children's theory of mind. Child Development, 72(4), 1032-1053.
Dumontheil, I., Apperly, I. A., & Blakemore, S. J. (2010). Online usage of theory of mind continues to develop in late adolescence. Developmental Science, 13(2), 331-338.
Dziobek, I., Fleck, S., Kalbe, E., Rogers, K., Hassenstab, J., Brand, M., ... & Convit, A. (2006). Introducing MASC: A movie for the assessment of social cognition. Journal of Autism and Developmental Disorders, 36(5), 623-636.
Fernandez-Abascal, E. G., Cabello, R., Fernandez-Berrocal, P., & Baron-Cohen, S. (2013). Test-retest reliability of the "Reading the Mind in the Eyes" test. Journal of Autism and Developmental Disorders, 43(9), 2220-2223.
German, T. P., & Hehman, J. A. (2006). Representational and executive selection resources in "theory of mind." Psychological Science, 17(2), 130-132.
Gopnik, A., & Astington, J. W. (1988). Children's understanding of representational change and its relation to the understanding of false belief. Child Development, 59(1), 26-37.
Gregory, C., Lough, S., Stone, V., Erzinclioglu, S., Martin, L., Baron-Cohen, S., & Hodges, J. R. (2002). Theory of mind in patients with frontal variant frontotemporal dementia and Alzheimer's disease. Journal of Neurology, Neurosurgery & Psychiatry, 72(6), 752-756.
Happe, F. G. (1994). An advanced test of theory of mind. Journal of Autism and Developmental Disorders, 24(2), 129-154.
Keysar, B., Lin, S., & Barr, D. J. (2003). Limits on theory of mind use in adults. Cognition, 89(1), 25-41.
Milligan, K., Astington, J. W., & Dack, L. A. (2007). Language and theory of mind: Meta-analysis of the relation between language ability and false-belief understanding. Child Development, 78(2), 622-646.
Olderbak, S., Wilhelm, O., Olaru, G., Geiger, M., Brenneman, M. W., & Roberts, R. D. (2015). A psychometric analysis of the Reading the Mind in the Eyes test. Assessment, 22(6), 798-806.
Onishi, K. H., & Baillargeon, R. (2005). Do 15-month-old infants understand false beliefs? Science, 308(5719), 255-258.
Perner, J., Leekam, S. R., & Wimmer, H. (1987). Three-year-olds' difficulty with false belief. British Journal of Developmental Psychology, 5(2), 125-137.
Perner, J., & Wimmer, H. (1985). "John thinks that Mary thinks that..." Attribution of second-order beliefs. Journal of Experimental Child Psychology, 39(3), 437-471.
Samson, D., Apperly, I. A., Braithwaite, J. J., Andrews, B. J., & Bodley Scott, S. E. (2010). Seeing it their way: Evidence for rapid and involuntary computation of what other people see. Journal of Experimental Psychology: HPP, 36(5), 1255-1266.
Shamay-Tsoory, S. G., & Aharon-Peretz, J. (2007). Dissociable prefrontal networks for cognitive and affective theory of mind. Neuropsychologia, 45(13), 3054-3067.
Southgate, V., Senju, A., & Csibra, G. (2007). Action anticipation through attribution of false belief by 2-year-olds. Psychological Science, 18(7), 587-592.
Stone, V. E., Baron-Cohen, S., & Knight, R. T. (1998). Frontal lobe contributions to theory of mind. Journal of Cognitive Neuroscience, 10(5), 640-656.
Wellman, H. M., Cross, D., & Watson, J. (2001). Meta-analysis of theory-of-mind development: The truth about false belief. Child Development, 72(3), 655-684.
Wellman, H. M., & Liu, D. (2004). Scaling of theory-of-mind tasks. Child Development, 75(2), 523-541.
Wimmer, H., & Perner, J. (1983). Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception. Cognition, 13(1), 103-128.

See references/ for the full task database with administration protocols and scoring rubrics.

theory of mind task selector

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

eeg preprocessing pipeline guide

self-paced reading designer

verify skill

lesion-symptom mapping guide