Self-Paced Reading Designer
This skill encodes expert knowledge for designing self-paced reading (SPR) experiments in psycholinguistics. SPR is the most widely used behavioral method for studying real-time sentence comprehension during reading (Jegerski, 2014). A competent programmer without psycholinguistics training will reliably make errors in region segmentation, spillover design, and comprehension question construction -- all of which invalidate the resulting data.
For detailed region segmentation strategies, see references/region-segmentation.md.
For statistical analysis guidance, see references/analysis-guide.md.
Why SPR Design Requires Domain Expertise
Self-paced reading appears deceptively simple: participants press a button to reveal successive words. But the scientific value of an SPR experiment depends entirely on decisions that require psycholinguistic training:
- Region boundaries determine what you can measure. A critical region that spans a clause boundary conflates syntactic processing with wrap-up effects (Just & Carpenter, 1980). A non-specialist would not know this.
- Spillover is not a bug -- it is the primary data pattern. In SPR, processing difficulty at word N often appears in reading times at words N+1 and N+2, not at word N itself (Mitchell, 2004; Rayner, 1998). Failing to include and analyze spillover regions means missing the effect entirely.
- Comprehension questions that target the critical manipulation create demand characteristics. Participants learn to attend strategically to the manipulation, distorting natural reading patterns (Jegerski, 2014).
- Word length and frequency confounds are invisible to non-specialists. If the critical word in condition A is longer or less frequent than in condition B, reading time differences reflect lexical properties, not the intended manipulation (Keating & Jegerski, 2015).
Research Planning Protocol
Before executing the domain-specific steps below, you MUST:
- State the research question — What specific sentence processing question is this SPR study addressing?
- Justify the method choice — Why SPR (not eye-tracking, ERP, acceptability judgment)? What alternatives were considered?
- Declare expected outcomes — What reading time pattern (at which region) would support vs. refute the hypothesis?
- Note assumptions and limitations — What does SPR assume? Where could it mislead (e.g., lack of regressive eye movements)?
- Present the plan to the user and WAIT for confirmation before proceeding.
For detailed methodology guidance, see the research-literacy skill.
⚠️ Verification Notice
This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.
Core Workflow
Step 1: Select a Presentation Method
Choose based on your research question, population, and resources:
1A. Non-Cumulative Moving Window (Standard)
- The sentence is displayed as dashes; each button press reveals the next word and re-masks the previous one (Just, Carpenter, & Woolley, 1982)
- Advantages: Most widely used, large existing literature for comparison, preserves spatial layout information
- Disadvantages: Prevents regressions (unlike natural reading), produces spillover effects that spread over 2-4 words (Mitchell, 2004)
- Use when: You need comparability with the existing SPR literature; you are studying incremental sentence processing
1B. Cumulative Moving Window
- Each button press reveals the next word, but previously revealed words remain visible
- Advantages: More similar to natural reading (partial text context remains)
- Disadvantages: Rarely used; harder to compare with the dominant non-cumulative literature; participants may re-read prior context, introducing noise (Jegerski, 2014)
- Use when: Naturalness of reading is more important than comparability with prior work
1C. Phrase-by-Phrase Presentation
- Sentences are segmented into multi-word regions; each button press reveals a phrase
- Advantages: Faster for participants; appropriate when word-level resolution is not needed
- Disadvantages: Region boundaries must be linguistically principled (see
references/region-segmentation.md); reduces temporal resolution; risks confounding region length with reading time - Use when: Your manipulation spans a multi-word constituent and word-by-word resolution is unnecessary
1D. Centered (RSVP-style) Presentation
- Words appear one at a time at a fixed screen position (typically center)
- Advantages: Eliminates eye movement confounds; simpler programming
- Disadvantages: Destroys spatial layout; removes positional information that readers normally use; rarely used in modern SPR (Jegerski, 2014)
- Avoid unless: You have a specific theoretical reason to eliminate spatial layout
1E. Maze Task (Modern Alternative)
- Two words appear simultaneously; the participant selects the word that continues the sentence (Forster, Guerrera, & Elliot, 2009; Witzel, Witzel, & Forster, 2012)
- L-maze (Lexicality maze): Distractor is a pronounceable nonword
- G-maze (Grammaticality maze): Distractor is a real word that is ungrammatical in context
- A-maze (Auto-maze): Distractors generated automatically via NLP (Boyce, Futrell, & Levy, 2020)
- Advantages: Dramatically reduced spillover compared to SPR; forced incremental processing; works well for web-based data collection (Boyce et al., 2020); better statistical power per item than SPR for syntactic effects (Witzel et al., 2012)
- Disadvantages: Slower overall pace; dual-task demand (comprehension + selection); less natural than button-press SPR; requires distractor generation
- Use when: You need precise localization of effects, want to reduce spillover, or plan web-based data collection
Step 2: Configure Timing Parameters
| Parameter | Recommended Value | Rationale |
|---|---|---|
| Response timeout | None (self-paced) or 3000-5000 ms per region | No timeout is standard for in-lab SPR; timeout prevents excessively slow responses in web-based studies (Boyce et al., 2020) |
| Inter-stimulus interval (ISI) | 0 ms for non-cumulative moving window | Standard practice; the next word appears immediately when the previous is masked (Just et al., 1982) |
| ISI for phrase-by-phrase | 0 ms (typical) | Any nonzero ISI introduces a blank that disrupts reading and may introduce strategic pausing |
| Pre-sentence fixation | + or * for 500-1000 ms | Orients attention to display location; standard in SPR (Jegerski, 2014) |
| Post-sentence delay | 0-500 ms before comprehension question | Brief delay prevents motor interference between last word button-press and question response |
| Practice trials | 6-10 items minimum | Familiarizes participants with button-press rhythm and comprehension questions; use different sentences than experimental items (Jegerski, 2014; Keating & Jegerski, 2015) |
Step 3: Design Critical Regions
This is the most consequential design decision in an SPR experiment. See references/region-segmentation.md for full guidelines.
Core Principles
-
Match critical regions across conditions for word length (in characters) and lexical frequency. If your manipulation requires different words, match them on length (+/- 1 character) and log frequency (use SUBTLEX-US; Brysbaert & New, 2009). Unmatched items introduce confounds that mimic or mask experimental effects.
-
Include at least 2-3 spillover words after the critical region. Processing difficulty at the critical region reliably spills over to subsequent words in SPR (Just et al., 1982; Mitchell, 2004; Rayner, 1998). Without spillover regions, you will miss your effect. These spillover words must be identical across conditions.
-
Avoid placing critical regions at clause or sentence boundaries. Reading times at clause-final and sentence-final positions are inflated by wrap-up processes -- integration of clause-level meaning, discourse updating, and possibly implicit prosodic boundary effects (Just & Carpenter, 1980; Warren, White, & Reichle, 2009). This inflation is independent of your manipulation and adds noise.
-
Keep critical regions short (ideally a single word). Multi-word critical regions reduce temporal resolution and introduce length confounds. If you must use a multi-word region, it must have the same number of words and matched total character length across conditions.
-
Ensure the pre-critical region is identical across conditions. Any difference before the critical word can create baseline differences in reading time that propagate into the critical region via spillover.
Step 4: Design Comprehension Questions
Comprehension questions serve two purposes: ensuring participants read for meaning, and providing an exclusion criterion for inattentive participants.
Guidelines
| Parameter | Recommendation | Rationale |
|---|---|---|
| Proportion of trials with questions | 1/3 to 1/2 of all trials (experimental + filler) | Fewer than 1/3: participants may stop reading carefully; more than 1/2: task becomes tedious, and participants may shift to a question-anticipation strategy (Just et al., 1982; Jegerski, 2014) |
| Answer balance | 50% yes / 50% no for yes/no questions | Prevents response bias toward one answer |
| Question content | Target semantic content of the sentence, NOT the critical manipulation | Questions about the manipulation teach participants what you are studying, inducing strategic reading (Jegerski, 2014) |
| Accuracy exclusion threshold | >80% correct to retain participant | Standard criterion; lower accuracy suggests the participant was not reading for comprehension (Jegerski, 2014; common practice across SPR studies) |
| Question timing | Immediately after the sentence (or after the final button press) | Delayed questions test memory, not comprehension |
Example of Good vs. Bad Comprehension Questions
Suppose the experimental sentence manipulates relative clause attachment:
The maid of the actress who was on the balcony shouted to the crowd.
- Good question: "Did someone shout to the crowd?" (targets overall meaning, not the critical attachment)
- Bad question: "Who was on the balcony?" (directly probes the ambiguity under investigation, alerting participants to the manipulation)
Step 5: Design Item and Condition Structure
Latin Square Design
For within-subjects manipulations, use a Latin square design so that each participant sees each item in exactly one condition, and each condition is seen equally often across participants (Keating & Jegerski, 2015).
- For a 2-condition design: 2 lists; each item appears in condition A for half the participants, condition B for the other half
- For a 2x2 design: 4 lists (one per condition combination)
- Assign participants to lists in rotation
Items Per Condition
| Population | Minimum Items per Condition | Rationale |
|---|---|---|
| L1 speakers, robust effect (e.g., garden-path) | 24 items per condition | Sufficient for medium-to-large effects in mixed models (Keating & Jegerski, 2015) |
| L1 speakers, subtle effect (e.g., pragmatic inference) | 32-40 items per condition | Smaller effects require more items for adequate power (Keating & Jegerski, 2015; Brysbaert & Stevens, 2018) |
| L2 speakers | 32-40 items per condition | Higher variability in L2 populations requires more observations (Marsden, Thompson, & Plonsky, 2018) |
Filler Items
| Parameter | Recommendation | Rationale |
|---|---|---|
| Filler-to-experimental ratio | 2:1 or 3:1 (fillers : experimental items) | Prevents participants from identifying the experimental pattern; higher ratios reduce strategic processing (Keating & Jegerski, 2015) |
| Filler variety | Include multiple sentence types, lengths, and structures | Monotonous fillers fail to mask the experimental manipulation |
| Filler complexity | Include some fillers of similar complexity to experimental items | If only experimental items are complex, participants learn to attend differently to them |
| Comprehension questions on fillers | Yes -- at least the same rate as on experimental items | If questions only follow experimental items, participants learn that complex sentences predict questions |
Step 6: Decide Between SPR and Eye-Tracking
This is a design-level decision that should be made before programming the experiment.
| Criterion | SPR | Eye-Tracking |
|---|---|---|
| Equipment cost | Low (any computer) | High (dedicated eye-tracker, ~$20,000-$50,000) |
| Online data collection | Yes (web-based SPR and Maze work well) | No (requires in-lab calibration) |
| Temporal resolution | Word-by-word, with substantial spillover | Multiple fixation measures (first fixation, gaze duration, go-past, total time, regressions) |
| Regressions | Not measurable (non-cumulative display prevents rereading) | Yes -- regressions are a primary measure of reanalysis |
| Ecological validity | Moderate (button-press is unnatural, but spatial layout preserved) | Higher (closer to natural reading) |
| Sensitivity to early/late processing stages | Low (only a single RT per region, which blends all processing stages) | High (first-pass vs. second-pass measures separate early from late processing; Rayner, 1998) |
| Best for | Robust syntactic/semantic effects, web-based or underfunded studies, L2 populations without lab access | Nuanced temporal dynamics, distinguishing processing stages, studying regressions, garden-path recovery |
Rule of thumb: If you only need to know whether a manipulation affects reading time, SPR is sufficient. If you need to know when during processing the effect occurs (early lexical access vs. late reanalysis), use eye-tracking.
Common Pitfalls
These are errors that non-specialists routinely make:
-
No spillover region. The most common fatal flaw. If the sentence ends at or immediately after the critical word, spillover effects have nowhere to appear, and the effect is lost. Always include 2-3 words of identical post-critical material across conditions.
-
Critical region at a clause boundary. Wrap-up effects at clause-final positions (Just & Carpenter, 1980) inflate reading times by 50-100+ ms regardless of condition, swamping the experimental effect or producing spurious interactions.
-
Length/frequency mismatch. Longer words take approximately 30-40 ms per additional character in SPR (Ferreira & Clifton, 1986). A 2-character difference between conditions creates a ~60-80 ms confound, which can easily exceed the size of most psycholinguistic effects.
-
Comprehension questions targeting the manipulation. This transforms the experiment from measuring natural reading into measuring strategic disambiguation. Participants adapt within 10-15 trials (Jegerski, 2014).
-
Too few items per condition. With fewer than 24 items per condition, even large effects (d = 0.8) may not reach significance in mixed-effects models, particularly with by-item random slopes (Brysbaert & Stevens, 2018).
-
No fillers or insufficient fillers. Without a 2:1 filler-to-item ratio, participants identify the experimental manipulation and shift to strategic reading (Keating & Jegerski, 2015).
-
Analyzing only the critical region. Even when an effect appears on the critical word, it typically continues into the spillover region. Analyzing only one region provides an incomplete picture and may miss effects that appear exclusively in spillover.
-
Using raw reading times without controlling for word length. Raw RTs conflate lexical processing speed with the experimental manipulation. Either match word length precisely or use residual RTs / include word length as a covariate in the statistical model (Ferreira & Clifton, 1986).
-
Ignoring trial position effects. Reading speed increases across the experiment as participants become practiced. Include trial order as a covariate or present items in a randomized order (Jegerski, 2014).
-
Not checking comprehension accuracy before analyzing RTs. Participants with low accuracy (<80%) may not be reading for comprehension. Their RT data are uninterpretable and should be excluded (Jegerski, 2014).
Quick Reference: SPR Design Checklist
Before running your experiment, verify:
- Critical regions are matched for word length and frequency across conditions
- At least 2-3 identical spillover words follow the critical region in all conditions
- Critical region is not at a clause or sentence boundary
- Pre-critical region is identical across conditions
- At least 24 items per condition (32+ for subtle effects or L2 populations)
- Filler-to-experimental ratio is at least 2:1
- Comprehension questions on 1/3 to 1/2 of trials, balanced yes/no
- No comprehension question directly targets the experimental manipulation
- Latin square counterbalancing across lists
- 6-10 practice trials with different sentences than experimental items
- Analysis plan includes spillover regions (at least critical +1, +2)
- RT trimming criteria defined a priori (see
references/analysis-guide.md)
References
- Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59, 390-412.
- Boyce, V., Futrell, R., & Levy, R. P. (2020). Maze Made Easy: Better and easier measurement of incremental processing difficulty. Journal of Memory and Language, 111, 104082.
- Brysbaert, M., & New, B. (2009). Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41, 977-990.
- Brysbaert, M., & Stevens, M. (2018). Power analysis and effect size in mixed effects models. Journal of Cognition, 1(1), 9.
- Ferreira, F., & Clifton, C. (1986). The independence of syntactic processing. Journal of Memory and Language, 25, 348-368.
- Forster, K. I., Guerrera, C., & Elliot, L. (2009). The maze task: Measuring forced incremental sentence processing time. Behavior Research Methods, 41, 163-171.
- Jegerski, J. (2014). Self-paced reading. In J. Jegerski & B. VanPatten (Eds.), Research methods in second language psycholinguistics. Routledge.
- Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87, 329-354.
- Just, M. A., Carpenter, P. A., & Woolley, J. D. (1982). Paradigms and processes in reading comprehension. Journal of Experimental Psychology: General, 111, 228-238.
- Keating, G. D., & Jegerski, J. (2015). Experimental designs in sentence processing research. Studies in Second Language Acquisition, 37, 1-32.
- Marsden, E., Thompson, S., & Plonsky, L. (2018). A methodological synthesis of self-paced reading in second language research. Applied Psycholinguistics, 39, 861-904.
- Mitchell, D. C. (2004). On-line methods in language processing: Introduction and historical review. In M. Carreiras & C. Clifton (Eds.), The on-line study of sentence comprehension. Psychology Press.
- Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124, 372-422.
- Warren, T., White, S. J., & Reichle, E. D. (2009). Investigating the causes of wrap-up effects: Evidence from eye movements and E-Z Reader. Cognition, 111, 132-137.
- Witzel, N., Witzel, J., & Forster, K. (2012). Comparisons of online reading paradigms: Eye tracking, moving-window, and maze. Journal of Psycholinguistic Research, 41, 105-128.