Self-Paced Reading Designer

This skill encodes expert knowledge for designing self-paced reading (SPR) experiments in psycholinguistics. SPR is the most widely used behavioral method for studying real-time sentence comprehension during reading (Jegerski, 2014). A competent programmer without psycholinguistics training will reliably make errors in region segmentation, spillover design, and comprehension question construction -- all of which invalidate the resulting data.

For detailed region segmentation strategies, see references/region-segmentation.md. For statistical analysis guidance, see references/analysis-guide.md.

Why SPR Design Requires Domain Expertise

Self-paced reading appears deceptively simple: participants press a button to reveal successive words. But the scientific value of an SPR experiment depends entirely on decisions that require psycholinguistic training:

Region boundaries determine what you can measure. A critical region that spans a clause boundary conflates syntactic processing with wrap-up effects (Just & Carpenter, 1980). A non-specialist would not know this.
Spillover is not a bug -- it is the primary data pattern. In SPR, processing difficulty at word N often appears in reading times at words N+1 and N+2, not at word N itself (Mitchell, 2004; Rayner, 1998). Failing to include and analyze spillover regions means missing the effect entirely.
Comprehension questions that target the critical manipulation create demand characteristics. Participants learn to attend strategically to the manipulation, distorting natural reading patterns (Jegerski, 2014).
Word length and frequency confounds are invisible to non-specialists. If the critical word in condition A is longer or less frequent than in condition B, reading time differences reflect lexical properties, not the intended manipulation (Keating & Jegerski, 2015).

Research Planning Protocol

Before executing the domain-specific steps below, you MUST:

State the research question — What specific sentence processing question is this SPR study addressing?
Justify the method choice — Why SPR (not eye-tracking, ERP, acceptability judgment)? What alternatives were considered?
Declare expected outcomes — What reading time pattern (at which region) would support vs. refute the hypothesis?
Note assumptions and limitations — What does SPR assume? Where could it mislead (e.g., lack of regressive eye movements)?
Present the plan to the user and WAIT for confirmation before proceeding.

For detailed methodology guidance, see the research-literacy skill.

⚠️ Verification Notice

This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.

Core Workflow

Step 1: Select a Presentation Method

Choose based on your research question, population, and resources:

1A. Non-Cumulative Moving Window (Standard)

The sentence is displayed as dashes; each button press reveals the next word and re-masks the previous one (Just, Carpenter, & Woolley, 1982)
Advantages: Most widely used, large existing literature for comparison, preserves spatial layout information
Disadvantages: Prevents regressions (unlike natural reading), produces spillover effects that spread over 2-4 words (Mitchell, 2004)
Use when: You need comparability with the existing SPR literature; you are studying incremental sentence processing

1B. Cumulative Moving Window

Each button press reveals the next word, but previously revealed words remain visible
Advantages: More similar to natural reading (partial text context remains)
Disadvantages: Rarely used; harder to compare with the dominant non-cumulative literature; participants may re-read prior context, introducing noise (Jegerski, 2014)
Use when: Naturalness of reading is more important than comparability with prior work

1C. Phrase-by-Phrase Presentation

Sentences are segmented into multi-word regions; each button press reveals a phrase
Advantages: Faster for participants; appropriate when word-level resolution is not needed
Disadvantages: Region boundaries must be linguistically principled (see references/region-segmentation.md); reduces temporal resolution; risks confounding region length with reading time
Use when: Your manipulation spans a multi-word constituent and word-by-word resolution is unnecessary

1D. Centered (RSVP-style) Presentation

Words appear one at a time at a fixed screen position (typically center)
Advantages: Eliminates eye movement confounds; simpler programming
Disadvantages: Destroys spatial layout; removes positional information that readers normally use; rarely used in modern SPR (Jegerski, 2014)
Avoid unless: You have a specific theoretical reason to eliminate spatial layout

1E. Maze Task (Modern Alternative)

Two words appear simultaneously; the participant selects the word that continues the sentence (Forster, Guerrera, & Elliot, 2009; Witzel, Witzel, & Forster, 2012)
L-maze (Lexicality maze): Distractor is a pronounceable nonword
G-maze (Grammaticality maze): Distractor is a real word that is ungrammatical in context
A-maze (Auto-maze): Distractors generated automatically via NLP (Boyce, Futrell, & Levy, 2020)
Advantages: Dramatically reduced spillover compared to SPR; forced incremental processing; works well for web-based data collection (Boyce et al., 2020); better statistical power per item than SPR for syntactic effects (Witzel et al., 2012)
Disadvantages: Slower overall pace; dual-task demand (comprehension + selection); less natural than button-press SPR; requires distractor generation
Use when: You need precise localization of effects, want to reduce spillover, or plan web-based data collection

Step 2: Configure Timing Parameters

Parameter	Recommended Value	Rationale
Response timeout	None (self-paced) or 3000-5000 ms per region	No timeout is standard for in-lab SPR; timeout prevents excessively slow responses in web-based studies (Boyce et al., 2020)
Inter-stimulus interval (ISI)	0 ms for non-cumulative moving window	Standard practice; the next word appears immediately when the previous is masked (Just et al., 1982)
ISI for phrase-by-phrase	0 ms (typical)	Any nonzero ISI introduces a blank that disrupts reading and may introduce strategic pausing
Pre-sentence fixation	*+ or for 500-1000 ms**	Orients attention to display location; standard in SPR (Jegerski, 2014)
Post-sentence delay	0-500 ms before comprehension question	Brief delay prevents motor interference between last word button-press and question response
Practice trials	6-10 items minimum	Familiarizes participants with button-press rhythm and comprehension questions; use different sentences than experimental items (Jegerski, 2014; Keating & Jegerski, 2015)

Step 3: Design Critical Regions

This is the most consequential design decision in an SPR experiment. See references/region-segmentation.md for full guidelines.

Core Principles

Match critical regions across conditions for word length (in characters) and lexical frequency. If your manipulation requires different words, match them on length (+/- 1 character) and log frequency (use SUBTLEX-US; Brysbaert & New, 2009). Unmatched items introduce confounds that mimic or mask experimental effects.
Include at least 2-3 spillover words after the critical region. Processing difficulty at the critical region reliably spills over to subsequent words in SPR (Just et al., 1982; Mitchell, 2004; Rayner, 1998). Without spillover regions, you will miss your effect. These spillover words must be identical across conditions.
Avoid placing critical regions at clause or sentence boundaries. Reading times at clause-final and sentence-final positions are inflated by wrap-up processes -- integration of clause-level meaning, discourse updating, and possibly implicit prosodic boundary effects (Just & Carpenter, 1980; Warren, White, & Reichle, 2009). This inflation is independent of your manipulation and adds noise.
Keep critical regions short (ideally a single word). Multi-word critical regions reduce temporal resolution and introduce length confounds. If you must use a multi-word region, it must have the same number of words and matched total character length across conditions.
Ensure the pre-critical region is identical across conditions. Any difference before the critical word can create baseline differences in reading time that propagate into the critical region via spillover.

Step 4: Design Comprehension Questions

Comprehension questions serve two purposes: ensuring participants read for meaning, and providing an exclusion criterion for inattentive participants.

Guidelines

Parameter	Recommendation	Rationale
Proportion of trials with questions	1/3 to 1/2 of all trials (experimental + filler)	Fewer than 1/3: participants may stop reading carefully; more than 1/2: task becomes tedious, and participants may shift to a question-anticipation strategy (Just et al., 1982; Jegerski, 2014)
Answer balance	50% yes / 50% no for yes/no questions	Prevents response bias toward one answer
Question content	Target semantic content of the sentence, NOT the critical manipulation	Questions about the manipulation teach participants what you are studying, inducing strategic reading (Jegerski, 2014)
Accuracy exclusion threshold	>80% correct to retain participant	Standard criterion; lower accuracy suggests the participant was not reading for comprehension (Jegerski, 2014; common practice across SPR studies)
Question timing	Immediately after the sentence (or after the final button press)	Delayed questions test memory, not comprehension

Example of Good vs. Bad Comprehension Questions

Suppose the experimental sentence manipulates relative clause attachment:

The maid of the actress who was on the balcony shouted to the crowd.

Good question: "Did someone shout to the crowd?" (targets overall meaning, not the critical attachment)
Bad question: "Who was on the balcony?" (directly probes the ambiguity under investigation, alerting participants to the manipulation)

Step 5: Design Item and Condition Structure

Latin Square Design

For within-subjects manipulations, use a Latin square design so that each participant sees each item in exactly one condition, and each condition is seen equally often across participants (Keating & Jegerski, 2015).

For a 2-condition design: 2 lists; each item appears in condition A for half the participants, condition B for the other half
For a 2x2 design: 4 lists (one per condition combination)
Assign participants to lists in rotation

Items Per Condition

Population	Minimum Items per Condition	Rationale
L1 speakers, robust effect (e.g., garden-path)	24 items per condition	Sufficient for medium-to-large effects in mixed models (Keating & Jegerski, 2015)
L1 speakers, subtle effect (e.g., pragmatic inference)	32-40 items per condition	Smaller effects require more items for adequate power (Keating & Jegerski, 2015; Brysbaert & Stevens, 2018)
L2 speakers	32-40 items per condition	Higher variability in L2 populations requires more observations (Marsden, Thompson, & Plonsky, 2018)

Filler Items

Parameter	Recommendation	Rationale
Filler-to-experimental ratio	2:1 or 3:1 (fillers : experimental items)	Prevents participants from identifying the experimental pattern; higher ratios reduce strategic processing (Keating & Jegerski, 2015)
Filler variety	Include multiple sentence types, lengths, and structures	Monotonous fillers fail to mask the experimental manipulation
Filler complexity	Include some fillers of similar complexity to experimental items	If only experimental items are complex, participants learn to attend differently to them
Comprehension questions on fillers	Yes -- at least the same rate as on experimental items	If questions only follow experimental items, participants learn that complex sentences predict questions

Step 6: Decide Between SPR and Eye-Tracking

This is a design-level decision that should be made before programming the experiment.

Criterion	SPR	Eye-Tracking
Equipment cost	Low (any computer)	High (dedicated eye-tracker, ~$20,000-$50,000)
Online data collection	Yes (web-based SPR and Maze work well)	No (requires in-lab calibration)
Temporal resolution	Word-by-word, with substantial spillover	Multiple fixation measures (first fixation, gaze duration, go-past, total time, regressions)
Regressions	Not measurable (non-cumulative display prevents rereading)	Yes -- regressions are a primary measure of reanalysis
Ecological validity	Moderate (button-press is unnatural, but spatial layout preserved)	Higher (closer to natural reading)
Sensitivity to early/late processing stages	Low (only a single RT per region, which blends all processing stages)	High (first-pass vs. second-pass measures separate early from late processing; Rayner, 1998)
Best for	Robust syntactic/semantic effects, web-based or underfunded studies, L2 populations without lab access	Nuanced temporal dynamics, distinguishing processing stages, studying regressions, garden-path recovery

Rule of thumb: If you only need to know whether a manipulation affects reading time, SPR is sufficient. If you need to know when during processing the effect occurs (early lexical access vs. late reanalysis), use eye-tracking.

Common Pitfalls

These are errors that non-specialists routinely make:

No spillover region. The most common fatal flaw. If the sentence ends at or immediately after the critical word, spillover effects have nowhere to appear, and the effect is lost. Always include 2-3 words of identical post-critical material across conditions.
Critical region at a clause boundary. Wrap-up effects at clause-final positions (Just & Carpenter, 1980) inflate reading times by 50-100+ ms regardless of condition, swamping the experimental effect or producing spurious interactions.
Length/frequency mismatch. Longer words take approximately 30-40 ms per additional character in SPR (Ferreira & Clifton, 1986). A 2-character difference between conditions creates a ~60-80 ms confound, which can easily exceed the size of most psycholinguistic effects.
Comprehension questions targeting the manipulation. This transforms the experiment from measuring natural reading into measuring strategic disambiguation. Participants adapt within 10-15 trials (Jegerski, 2014).
Too few items per condition. With fewer than 24 items per condition, even large effects (d = 0.8) may not reach significance in mixed-effects models, particularly with by-item random slopes (Brysbaert & Stevens, 2018).
No fillers or insufficient fillers. Without a 2:1 filler-to-item ratio, participants identify the experimental manipulation and shift to strategic reading (Keating & Jegerski, 2015).
Analyzing only the critical region. Even when an effect appears on the critical word, it typically continues into the spillover region. Analyzing only one region provides an incomplete picture and may miss effects that appear exclusively in spillover.
Using raw reading times without controlling for word length. Raw RTs conflate lexical processing speed with the experimental manipulation. Either match word length precisely or use residual RTs / include word length as a covariate in the statistical model (Ferreira & Clifton, 1986).
Ignoring trial position effects. Reading speed increases across the experiment as participants become practiced. Include trial order as a covariate or present items in a randomized order (Jegerski, 2014).
Not checking comprehension accuracy before analyzing RTs. Participants with low accuracy (<80%) may not be reading for comprehension. Their RT data are uninterpretable and should be excluded (Jegerski, 2014).

Quick Reference: SPR Design Checklist

Before running your experiment, verify:

References

Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59, 390-412.
Boyce, V., Futrell, R., & Levy, R. P. (2020). Maze Made Easy: Better and easier measurement of incremental processing difficulty. Journal of Memory and Language, 111, 104082.
Brysbaert, M., & New, B. (2009). Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41, 977-990.
Brysbaert, M., & Stevens, M. (2018). Power analysis and effect size in mixed effects models. Journal of Cognition, 1(1), 9.
Ferreira, F., & Clifton, C. (1986). The independence of syntactic processing. Journal of Memory and Language, 25, 348-368.
Forster, K. I., Guerrera, C., & Elliot, L. (2009). The maze task: Measuring forced incremental sentence processing time. Behavior Research Methods, 41, 163-171.
Jegerski, J. (2014). Self-paced reading. In J. Jegerski & B. VanPatten (Eds.), Research methods in second language psycholinguistics. Routledge.
Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87, 329-354.
Just, M. A., Carpenter, P. A., & Woolley, J. D. (1982). Paradigms and processes in reading comprehension. Journal of Experimental Psychology: General, 111, 228-238.
Keating, G. D., & Jegerski, J. (2015). Experimental designs in sentence processing research. Studies in Second Language Acquisition, 37, 1-32.
Marsden, E., Thompson, S., & Plonsky, L. (2018). A methodological synthesis of self-paced reading in second language research. Applied Psycholinguistics, 39, 861-904.
Mitchell, D. C. (2004). On-line methods in language processing: Introduction and historical review. In M. Carreiras & C. Clifton (Eds.), The on-line study of sentence comprehension. Psychology Press.
Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124, 372-422.
Warren, T., White, S. J., & Reichle, E. D. (2009). Investigating the causes of wrap-up effects: Evidence from eye movements and E-Z Reader. Cognition, 111, 132-137.
Witzel, N., Witzel, J., & Forster, K. (2012). Comparisons of online reading paradigms: Eye tracking, moving-window, and maze. Journal of Psycholinguistic Research, 41, 105-128.