Neuroimaging Sample Size Calculator

Simulation-based sample-size planning for neuroimaging studies using effect-size maps

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "Neuroimaging Sample Size Calculator" with this command: npx skills add haoxuanlithuai/awesome_cognitive_and_neuroscience_skills/haoxuanlithuai-awesome-cognitive-and-neuroscience-skills-neuroimaging-sample-size-calculator

Neuroimaging Sample Size Calculator

Purpose

Traditional power analysis (e.g., using G*Power for a t-test) fails for neuroimaging because it cannot account for the massive multiple comparisons problem, spatial correlation structure, or the multi-level nature of neuroimaging inference. Neuroimaging requires simulation-based approaches that generate synthetic datasets, apply the full analysis pipeline including multiple comparison correction, and estimate power as the proportion of simulations detecting the effect.

A competent programmer without neuroimaging training would use standard power formulas and dramatically overestimate the power of a whole-brain analysis. They would not know that cluster-extent thresholds, random field theory corrections, and spatial smoothness all affect the effective number of tests, nor that pilot-data-based simulation is the gold standard for neuroimaging power analysis. This skill encodes the domain-specific methodology for simulation-based sample size planning.

When to Use This Skill

  • Planning sample size for a new fMRI, EEG, or MEG study
  • Conducting power analysis for a grant application or registered report
  • Estimating required N when pilot data or published effect size maps are available
  • Choosing between whole-brain and ROI-based analysis based on power constraints
  • Evaluating the statistical adequacy of a proposed or completed study

Research Planning Protocol

Before executing the domain-specific steps below, you MUST:

  1. State the research question — What specific question is this analysis/paradigm addressing?
  2. Justify the method choice — Why is this approach appropriate? What alternatives were considered?
  3. Declare expected outcomes — What results would support vs. refute the hypothesis?
  4. Note assumptions and limitations — What does this method assume? Where could it mislead?
  5. Present the plan to the user and WAIT for confirmation before proceeding.

For detailed methodology guidance, see the research-literacy skill.

⚠️ Verification Notice

This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.

Why Traditional Power Analysis Fails for Neuroimaging

The Fundamental Problem

Standard power analysis computes the sample size for a single statistical test at a given effect size, alpha, and power. Neuroimaging violates every assumption of this framework:

Standard AssumptionNeuroimaging RealityConsequence
Single test~100,000 voxels testedAlpha must be corrected, dramatically reducing per-test sensitivity
Independent testsVoxels are spatially correlated (due to smoothing and neural organization)Effective number of tests is much less than 100,000, but hard to compute analytically
Known effect sizeEffect size varies across voxels and depends on ROI definitionNo single "effect size" characterizes a study
Simple test statisticCluster-based, TFCE, and permutation tests have complex null distributionsPower depends on the specific inference method used
One-level inferenceSubject-level estimation + group-level testWithin-subject variance and between-subject variance both affect power

Source: Mumford & Nichols, 2008; Poldrack et al., 2017.

The Pilot-Data-Based Simulation Approach

The gold standard for neuroimaging power analysis uses pilot data to simulate full datasets at varying sample sizes (Mumford & Nichols, 2008).

Step-by-Step Procedure

Step 1: Obtain pilot data or published effect-size maps
 |
Step 2: Estimate expected effect sizes at regions of interest
 |
Step 3: Simulate datasets with varying N
 |
Step 4: Apply full analysis pipeline (including multiple comparison correction)
 |
Step 5: Compute power = proportion of simulations detecting the effect
 |
Step 6: Find the N that achieves target power (typically 80% or 90%)

Step 1: Obtain Pilot Data

SourceQualityRequirementsCaveats
Own pilot studyBestAt least 10-15 subjects for stable variance estimatesEffect sizes from small pilots are inflated; use conservative estimates
Published group mapGoodUnthresholded statistical map (t-map or z-map)May not match your exact paradigm or population
NeuroVault repositoryGoodSearch for comparable paradigmsMaps may use different preprocessing/analysis pipelines
Meta-analytic map (NeuroSynth, NiMARE)ModerateCoordinate-based or image-based meta-analysisProvides average effect across studies, may underestimate for specific paradigms

Source: Mumford & Nichols, 2008; Poldrack et al., 2017.

Critical warning: Effect sizes from small pilot studies (N < 20) are inflated due to the winner's curse. Assume the true effect is 50-75% of the pilot estimate (Button et al., 2013).

Step 2: Estimate Effect Sizes

For ROI-based analysis:

  1. Define the ROI a priori (from atlas, meta-analysis, or independent data)
  2. Extract the mean effect size (Cohen's d or percent signal change) from the pilot data within the ROI
  3. Apply the deflation correction (multiply by 0.5-0.75) for conservative estimation

For whole-brain analysis:

  1. Use the full unthresholded statistical map as the effect-size map
  2. The map captures spatial variation in effect size across the brain
  3. Power will vary by region -- focus on the primary region of interest for sample size determination

Step 3: Simulate Datasets

For each candidate sample size N:

  1. Generate 1,000-5,000 simulated group maps by: a. Sampling N subjects from a population with the estimated effect size and variance b. Adding realistic noise (estimated from pilot residuals or assumed Gaussian with spatial smoothness matching the pilot data) c. Creating a group-level statistical map

  2. Apply the smoothness estimate from the pilot data (or the planned smoothing kernel) to each simulated map

Step 4: Apply Full Analysis Pipeline

For each simulated dataset:

  1. Compute the group-level statistical map (e.g., one-sample t-test)
  2. Apply the planned multiple comparison correction method:
  • Cluster-based inference: apply cluster-defining threshold (CDT) of p < 0.001 (Eklund et al., 2016) and identify significant clusters
  • Voxelwise FWE: apply random field theory correction at p < 0.05 FWE
  • TFCE: compute TFCE image and apply permutation-based correction
  • FDR: apply Benjamini-Hochberg at q < 0.05

Step 5: Compute Power

  • Voxel-level power: For each voxel, power = proportion of simulations in which that voxel is significant
  • ROI-level power: Power = proportion of simulations in which at least one voxel in the target ROI is significant
  • Cluster-level power: Power = proportion of simulations in which a significant cluster overlaps with the target region

Report the power metric most relevant to your planned analysis (Mumford & Nichols, 2008).

Tools and Implementations

fMRIpower (Mumford & Nichols, 2008)

FeatureDescription
InputPilot group-level statistical maps (from FSL)
MethodResamples from pilot to estimate power at varying N
OutputPower curves for specified ROIs at different sample sizes
RequirementsFSL, R; pilot data from at least 10-15 subjects
StrengthsUses actual pilot data; accounts for design-specific temporal autocorrelation
LimitationsAssumes pilot effect sizes are representative; FSL-specific

NeuroPowerTools (Durnez et al., 2016)

FeatureDescription
InputUnthresholded statistical map (any software)
MethodFits mixture model to peak distribution; estimates prevalence and effect size
OutputPower estimates at varying N; optimal sample size for target power
AccessWeb-based: https://neuropowertools.org
StrengthsDoes not require individual subject data; works with published maps
LimitationsPeak-based approximation; may underestimate power for distributed effects

PowerMap (Joyce & Hayasaka, 2012)

FeatureDescription
InputAssumed effect size map, noise model, smoothness
MethodFull simulation with parametric statistical testing
OutputVoxelwise power maps at specified N
RequirementsMATLAB
StrengthsVoxel-level power visualization; flexible correction methods
LimitationsComputationally intensive; requires specification of noise model

AFNI 3dClustSim

FeatureDescription
InputSmoothness estimates (from 3dFWHMx), voxel dimensions, mask
MethodMonte Carlo simulation of random fields
OutputCluster-size thresholds for a given alpha level
Use for powerEstimate minimum detectable cluster size at a given sample size; not a full power tool
StrengthsFast, accounts for non-Gaussian smoothness (ACF model; Cox et al., 2017)
LimitationsDoes not compute power directly; only provides cluster-extent thresholds

ROI-Based Power Shortcuts

When full simulation is impractical, ROI-based power analysis provides a reasonable alternative:

Procedure

  1. Define the target ROI a priori (from atlas, meta-analysis, or independent data)
  2. Extract the expected effect size (Cohen's d) from pilot data or literature:
  • Mean activation within ROI / standard deviation of activation across subjects
  1. Use standard power formulas (G*Power or similar) with the ROI-level effect size
  2. No multiple comparison correction is needed for a single a priori ROI

Effect Size Extraction from Published Results

Published StatisticConversion to Cohen's dSource
t-value (within-subject)d = t / sqrt(N)Standard formula
t-value (between-group)d = 2t / sqrt(df)Standard formula
z-valued = z / sqrt(N) (approximate)Approximate for large N
Percent signal change + SDd = mean_PSC / SD_PSCDirect computation
Partial eta-squaredd = sqrt(eta^2 / (1 - eta^2))Conversion formula

Meta-Analytic Effect Sizes

Use coordinate-based meta-analysis tools to estimate effect sizes at specific brain locations:

ToolMethodOutputSource
NiMAREALE, MKDA, or other CBMAMeta-analytic map; extract effect at ROISalo et al., 2023
NeuroSynthAutomated term-based meta-analysisAssociation maps; extract effect at coordinatesYarkoni et al., 2011
BrainMapALE meta-analysisCoordinate-based likelihood mapsLaird et al., 2005

Caveat: Meta-analytic effect sizes aggregate across many studies with different designs, populations, and analysis pipelines. They provide a reasonable lower bound but may not match your specific paradigm (Yarkoni et al., 2011).

Current Sample Size Recommendations

Landmark Findings

FindingRecommendationSource
Brain-behavior associations require massive samples for replicabilityN > 2,000 for whole-brain brain-behavior correlationsMarek et al., 2022
N = 20 gives ~50% power for medium fMRI effectsN = 40+ for 80% power with medium effectsPoldrack et al., 2017
80% power at uncorrected p < 0.001 requires N ~ 40 for d = 0.8N = 40 per group for large between-group effectsTurner et al., 2018
Cluster-based inference with CDT p < 0.01 produces inflated false positivesUse CDT p < 0.001 and increase N to compensate for reduced sensitivityEklund et al., 2016
Within-subject designs are much more powerful than between-subjectPrefer within-subject designs when scientifically appropriateMumford & Nichols, 2008

Minimum Sample Size Table

Analysis TypeMinimum N (80% Power)Effect Size AssumedCorrection MethodSource
Within-subject activation (whole-brain)25-30d = 0.8 (large)Cluster-based, CDT p < 0.001Desmond & Glover, 2002
Between-group (whole-brain, large effect)20-25 per groupd = 0.8Cluster-based, CDT p < 0.001Thirion et al., 2007
Between-group (whole-brain, medium effect)40-50 per groupd = 0.5Cluster-based, CDT p < 0.001Poldrack et al., 2017
ROI-based (single a priori ROI)15-25d = 0.5-0.8Uncorrected (single test)Desmond & Glover, 2002
Resting-state connectivity (group mean)25-40r = 0.3-0.5FDR or NBSSmith et al., 2011
Brain-behavior correlation (whole-brain)2,000+r < 0.1 (replicable)PermutationMarek et al., 2022
Brain-behavior correlation (single ROI)80-200r = 0.2-0.3UncorrectedStandard formula

Registered Report Considerations

Registered reports require pre-specification of sample size with a formal power analysis. For neuroimaging registered reports:

  1. Specify the primary analysis (whole-brain vs. ROI) and the corresponding power analysis method
  2. Use simulation-based power when possible; if not, use ROI-based power with conservative effect size estimates
  3. Pre-specify the multiple comparison correction method and document its impact on required N
  4. Include sensitivity analysis: What is the minimum detectable effect size at the planned N?
  5. State stopping rules: Pre-register the exact N and analysis plan; sequential analysis requires adjustment (Lakens, 2014)
  6. Account for attrition: Specify expected exclusion rate (typically 10-20% for fMRI) and over-recruit

Domain insight: Reviewers will be suspicious of power analyses based on large effect sizes from small pilot studies. Use conservative (deflated) effect size estimates and show power curves across a range of plausible effect sizes.

Practical Workflow for Grant Applications

When Pilot Data Are Available

  1. Run fMRIpower or NeuroPowerTools with pilot maps
  2. Generate power curves showing power vs. N for the primary contrast and ROI
  3. Select N that achieves 80-90% power for the primary analysis
  4. Add 15-20% for expected participant exclusions
  5. Report: pilot study details, effect size estimates, power tool used, correction method, target power, final N

When No Pilot Data Are Available

  1. Search NeuroVault for comparable paradigms; download unthresholded maps
  2. Use NeuroPowerTools with the published map
  3. Alternatively, estimate ROI-level effect sizes from published papers:
  • Extract t-values and convert to Cohen's d
  • Apply deflation (multiply by 0.5-0.75; Button et al., 2013)
  • Use G*Power for ROI-based power
  1. As a last resort, use the benchmark table above with the analysis type closest to your planned study
  2. Document all assumptions and state that the power analysis is based on estimated (not measured) effect sizes

Common Pitfalls

  1. Using G*Power for whole-brain analyses: Standard power tools compute power for a single test and do not account for multiple comparison correction. This overestimates power by an order of magnitude (Mumford & Nichols, 2008)
  2. Trusting pilot study effect sizes: Small pilot studies (N < 20) produce inflated effect sizes. Always deflate by 25-50% (Button et al., 2013)
  3. Ignoring the correction method: Power depends critically on whether you use voxelwise FWE, cluster-based, FDR, or permutation-based correction. Power at FDR q < 0.05 can be 2-3x higher than voxelwise FWE p < 0.05 for the same N
  4. Conflating within-subject and between-subject power: Within-subject designs (one-sample t-test on contrast maps) are much more powerful than between-subject designs (two-sample t-test) because they eliminate between-subject variance (Mumford & Nichols, 2008)
  5. Not accounting for attrition: In fMRI, 10-20% of data may be unusable due to motion, scanner artifacts, or task non-compliance. Over-recruit accordingly
  6. Treating all regions equally: Power varies across the brain because effect sizes and noise vary spatially. Power at your primary ROI may be adequate even if whole-brain power is low
  7. Assuming published N is adequate: Most published fMRI studies are underpowered (Button et al., 2013). Matching a published study's N does not guarantee adequate power
  8. Not reporting sensitivity analysis: Always report the minimum detectable effect size at your planned N, in addition to the power estimate for the expected effect

Minimum Reporting Checklist

  • Source of effect size estimate (pilot data, published study, meta-analysis)
  • Effect size metric (Cohen's d, r, percent signal change) and value used
  • Whether effect size deflation was applied and the correction factor
  • Power analysis method (simulation-based, ROI-based analytical, benchmark-based)
  • Power analysis tool and version (fMRIpower, NeuroPowerTools, G*Power, custom simulation)
  • Number of simulations (for simulation-based approaches)
  • Multiple comparison correction method assumed in power analysis
  • Statistical threshold used (e.g., CDT p < 0.001, cluster p < 0.05 FWE)
  • Target power level (80% or 90%)
  • Planned total N and N per group (if applicable)
  • Expected attrition rate and over-recruitment plan
  • Sensitivity analysis (minimum detectable effect at planned N)

References

  • Button, K. S., Ioannidis, J. P. A., Mokrysz, C., et al. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365-376.
  • Cox, R. W., Chen, G., Glen, D. R., Reynolds, R. C., & Taylor, P. A. (2017). FMRI clustering in AFNI: False-positive rates redux. Brain Connectivity, 7(3), 152-171.
  • Desmond, J. E., & Glover, G. H. (2002). Estimating sample size in functional MRI (fMRI) neuroimaging studies. Journal of Neuroscience Methods, 118(2), 115-128.
  • Durnez, J., Degryse, J., Moerkerke, B., et al. (2016). Power and sample size calculations for fMRI studies based on the prevalence of active peaks. bioRxiv, 049429.
  • Eklund, A., Nichols, T. E., & Knutsson, H. (2016). Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. PNAS, 113(28), 7900-7905.
  • Joyce, K. E., & Hayasaka, S. (2012). Development of PowerMap: A software package for statistical power calculation in neuroimaging studies. Neuroinformatics, 10(4), 351-365.
  • Laird, A. R., Fox, P. M., Price, C. J., et al. (2005). ALE meta-analysis: Controlling the false discovery rate and performing statistical contrasts. Human Brain Mapping, 25(1), 155-164.
  • Lakens, D. (2014). Performing high-powered studies efficiently with sequential analyses. European Journal of Social Psychology, 44(7), 701-710.
  • Marek, S., Tervo-Clemmens, B., Calabro, F. J., et al. (2022). Reproducible brain-wide association studies require thousands of individuals. Nature, 603(7902), 654-660.
  • Mumford, J. A., & Nichols, T. E. (2008). Power calculation for group fMRI studies accounting for arbitrary design and temporal autocorrelation. NeuroImage, 39(1), 261-268.
  • Poldrack, R. A., Baker, C. I., Durnez, J., et al. (2017). Scanning the horizon: Towards transparent and reproducible neuroimaging research. Nature Reviews Neuroscience, 18(2), 115-126.
  • Salo, T., Yarkoni, T., Nichols, T. E., et al. (2023). NiMARE: Neuroimaging Meta-Analysis Research Environment. NeuroImage, 268, 119862.
  • Smith, S. M., Miller, K. L., Salimi-Khorshidi, G., et al. (2011). Network modelling methods for FMRI. NeuroImage, 54(2), 875-891.
  • Thirion, B., Pinel, P., Meriaux, S., et al. (2007). Analysis of a large fMRI cohort: Statistical and methodological issues for group analyses. NeuroImage, 35(1), 105-120.
  • Turner, B. O., Paul, E. J., Miller, M. B., & Barbey, A. K. (2018). Small sample sizes reduce the replicability of task-based fMRI studies. Communications Biology, 1, 62.
  • Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C., & Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. Nature Methods, 8(8), 665-670.

See references/ for worked examples and simulation code templates.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

eeg preprocessing pipeline guide

No summary provided by upstream source.

Repository SourceNeeds Review
General

self-paced reading designer

No summary provided by upstream source.

Repository SourceNeeds Review
General

verify skill

No summary provided by upstream source.

Repository SourceNeeds Review
General

lesion-symptom mapping guide

No summary provided by upstream source.

Repository SourceNeeds Review