Neuroimaging Sample Size Calculator
Purpose
Traditional power analysis (e.g., using G*Power for a t-test) fails for neuroimaging because it cannot account for the massive multiple comparisons problem, spatial correlation structure, or the multi-level nature of neuroimaging inference. Neuroimaging requires simulation-based approaches that generate synthetic datasets, apply the full analysis pipeline including multiple comparison correction, and estimate power as the proportion of simulations detecting the effect.
A competent programmer without neuroimaging training would use standard power formulas and dramatically overestimate the power of a whole-brain analysis. They would not know that cluster-extent thresholds, random field theory corrections, and spatial smoothness all affect the effective number of tests, nor that pilot-data-based simulation is the gold standard for neuroimaging power analysis. This skill encodes the domain-specific methodology for simulation-based sample size planning.
When to Use This Skill
- Planning sample size for a new fMRI, EEG, or MEG study
- Conducting power analysis for a grant application or registered report
- Estimating required N when pilot data or published effect size maps are available
- Choosing between whole-brain and ROI-based analysis based on power constraints
- Evaluating the statistical adequacy of a proposed or completed study
Research Planning Protocol
Before executing the domain-specific steps below, you MUST:
- State the research question — What specific question is this analysis/paradigm addressing?
- Justify the method choice — Why is this approach appropriate? What alternatives were considered?
- Declare expected outcomes — What results would support vs. refute the hypothesis?
- Note assumptions and limitations — What does this method assume? Where could it mislead?
- Present the plan to the user and WAIT for confirmation before proceeding.
For detailed methodology guidance, see the research-literacy skill.
⚠️ Verification Notice
This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.
Why Traditional Power Analysis Fails for Neuroimaging
The Fundamental Problem
Standard power analysis computes the sample size for a single statistical test at a given effect size, alpha, and power. Neuroimaging violates every assumption of this framework:
| Standard Assumption | Neuroimaging Reality | Consequence |
|---|---|---|
| Single test | ~100,000 voxels tested | Alpha must be corrected, dramatically reducing per-test sensitivity |
| Independent tests | Voxels are spatially correlated (due to smoothing and neural organization) | Effective number of tests is much less than 100,000, but hard to compute analytically |
| Known effect size | Effect size varies across voxels and depends on ROI definition | No single "effect size" characterizes a study |
| Simple test statistic | Cluster-based, TFCE, and permutation tests have complex null distributions | Power depends on the specific inference method used |
| One-level inference | Subject-level estimation + group-level test | Within-subject variance and between-subject variance both affect power |
Source: Mumford & Nichols, 2008; Poldrack et al., 2017.
The Pilot-Data-Based Simulation Approach
The gold standard for neuroimaging power analysis uses pilot data to simulate full datasets at varying sample sizes (Mumford & Nichols, 2008).
Step-by-Step Procedure
Step 1: Obtain pilot data or published effect-size maps
|
Step 2: Estimate expected effect sizes at regions of interest
|
Step 3: Simulate datasets with varying N
|
Step 4: Apply full analysis pipeline (including multiple comparison correction)
|
Step 5: Compute power = proportion of simulations detecting the effect
|
Step 6: Find the N that achieves target power (typically 80% or 90%)
Step 1: Obtain Pilot Data
| Source | Quality | Requirements | Caveats |
|---|---|---|---|
| Own pilot study | Best | At least 10-15 subjects for stable variance estimates | Effect sizes from small pilots are inflated; use conservative estimates |
| Published group map | Good | Unthresholded statistical map (t-map or z-map) | May not match your exact paradigm or population |
| NeuroVault repository | Good | Search for comparable paradigms | Maps may use different preprocessing/analysis pipelines |
| Meta-analytic map (NeuroSynth, NiMARE) | Moderate | Coordinate-based or image-based meta-analysis | Provides average effect across studies, may underestimate for specific paradigms |
Source: Mumford & Nichols, 2008; Poldrack et al., 2017.
Critical warning: Effect sizes from small pilot studies (N < 20) are inflated due to the winner's curse. Assume the true effect is 50-75% of the pilot estimate (Button et al., 2013).
Step 2: Estimate Effect Sizes
For ROI-based analysis:
- Define the ROI a priori (from atlas, meta-analysis, or independent data)
- Extract the mean effect size (Cohen's d or percent signal change) from the pilot data within the ROI
- Apply the deflation correction (multiply by 0.5-0.75) for conservative estimation
For whole-brain analysis:
- Use the full unthresholded statistical map as the effect-size map
- The map captures spatial variation in effect size across the brain
- Power will vary by region -- focus on the primary region of interest for sample size determination
Step 3: Simulate Datasets
For each candidate sample size N:
-
Generate 1,000-5,000 simulated group maps by: a. Sampling N subjects from a population with the estimated effect size and variance b. Adding realistic noise (estimated from pilot residuals or assumed Gaussian with spatial smoothness matching the pilot data) c. Creating a group-level statistical map
-
Apply the smoothness estimate from the pilot data (or the planned smoothing kernel) to each simulated map
Step 4: Apply Full Analysis Pipeline
For each simulated dataset:
- Compute the group-level statistical map (e.g., one-sample t-test)
- Apply the planned multiple comparison correction method:
- Cluster-based inference: apply cluster-defining threshold (CDT) of p < 0.001 (Eklund et al., 2016) and identify significant clusters
- Voxelwise FWE: apply random field theory correction at p < 0.05 FWE
- TFCE: compute TFCE image and apply permutation-based correction
- FDR: apply Benjamini-Hochberg at q < 0.05
Step 5: Compute Power
- Voxel-level power: For each voxel, power = proportion of simulations in which that voxel is significant
- ROI-level power: Power = proportion of simulations in which at least one voxel in the target ROI is significant
- Cluster-level power: Power = proportion of simulations in which a significant cluster overlaps with the target region
Report the power metric most relevant to your planned analysis (Mumford & Nichols, 2008).
Tools and Implementations
fMRIpower (Mumford & Nichols, 2008)
| Feature | Description |
|---|---|
| Input | Pilot group-level statistical maps (from FSL) |
| Method | Resamples from pilot to estimate power at varying N |
| Output | Power curves for specified ROIs at different sample sizes |
| Requirements | FSL, R; pilot data from at least 10-15 subjects |
| Strengths | Uses actual pilot data; accounts for design-specific temporal autocorrelation |
| Limitations | Assumes pilot effect sizes are representative; FSL-specific |
NeuroPowerTools (Durnez et al., 2016)
| Feature | Description |
|---|---|
| Input | Unthresholded statistical map (any software) |
| Method | Fits mixture model to peak distribution; estimates prevalence and effect size |
| Output | Power estimates at varying N; optimal sample size for target power |
| Access | Web-based: https://neuropowertools.org |
| Strengths | Does not require individual subject data; works with published maps |
| Limitations | Peak-based approximation; may underestimate power for distributed effects |
PowerMap (Joyce & Hayasaka, 2012)
| Feature | Description |
|---|---|
| Input | Assumed effect size map, noise model, smoothness |
| Method | Full simulation with parametric statistical testing |
| Output | Voxelwise power maps at specified N |
| Requirements | MATLAB |
| Strengths | Voxel-level power visualization; flexible correction methods |
| Limitations | Computationally intensive; requires specification of noise model |
AFNI 3dClustSim
| Feature | Description |
|---|---|
| Input | Smoothness estimates (from 3dFWHMx), voxel dimensions, mask |
| Method | Monte Carlo simulation of random fields |
| Output | Cluster-size thresholds for a given alpha level |
| Use for power | Estimate minimum detectable cluster size at a given sample size; not a full power tool |
| Strengths | Fast, accounts for non-Gaussian smoothness (ACF model; Cox et al., 2017) |
| Limitations | Does not compute power directly; only provides cluster-extent thresholds |
ROI-Based Power Shortcuts
When full simulation is impractical, ROI-based power analysis provides a reasonable alternative:
Procedure
- Define the target ROI a priori (from atlas, meta-analysis, or independent data)
- Extract the expected effect size (Cohen's d) from pilot data or literature:
- Mean activation within ROI / standard deviation of activation across subjects
- Use standard power formulas (G*Power or similar) with the ROI-level effect size
- No multiple comparison correction is needed for a single a priori ROI
Effect Size Extraction from Published Results
| Published Statistic | Conversion to Cohen's d | Source |
|---|---|---|
| t-value (within-subject) | d = t / sqrt(N) | Standard formula |
| t-value (between-group) | d = 2t / sqrt(df) | Standard formula |
| z-value | d = z / sqrt(N) (approximate) | Approximate for large N |
| Percent signal change + SD | d = mean_PSC / SD_PSC | Direct computation |
| Partial eta-squared | d = sqrt(eta^2 / (1 - eta^2)) | Conversion formula |
Meta-Analytic Effect Sizes
Use coordinate-based meta-analysis tools to estimate effect sizes at specific brain locations:
| Tool | Method | Output | Source |
|---|---|---|---|
| NiMARE | ALE, MKDA, or other CBMA | Meta-analytic map; extract effect at ROI | Salo et al., 2023 |
| NeuroSynth | Automated term-based meta-analysis | Association maps; extract effect at coordinates | Yarkoni et al., 2011 |
| BrainMap | ALE meta-analysis | Coordinate-based likelihood maps | Laird et al., 2005 |
Caveat: Meta-analytic effect sizes aggregate across many studies with different designs, populations, and analysis pipelines. They provide a reasonable lower bound but may not match your specific paradigm (Yarkoni et al., 2011).
Current Sample Size Recommendations
Landmark Findings
| Finding | Recommendation | Source |
|---|---|---|
| Brain-behavior associations require massive samples for replicability | N > 2,000 for whole-brain brain-behavior correlations | Marek et al., 2022 |
| N = 20 gives ~50% power for medium fMRI effects | N = 40+ for 80% power with medium effects | Poldrack et al., 2017 |
| 80% power at uncorrected p < 0.001 requires N ~ 40 for d = 0.8 | N = 40 per group for large between-group effects | Turner et al., 2018 |
| Cluster-based inference with CDT p < 0.01 produces inflated false positives | Use CDT p < 0.001 and increase N to compensate for reduced sensitivity | Eklund et al., 2016 |
| Within-subject designs are much more powerful than between-subject | Prefer within-subject designs when scientifically appropriate | Mumford & Nichols, 2008 |
Minimum Sample Size Table
| Analysis Type | Minimum N (80% Power) | Effect Size Assumed | Correction Method | Source |
|---|---|---|---|---|
| Within-subject activation (whole-brain) | 25-30 | d = 0.8 (large) | Cluster-based, CDT p < 0.001 | Desmond & Glover, 2002 |
| Between-group (whole-brain, large effect) | 20-25 per group | d = 0.8 | Cluster-based, CDT p < 0.001 | Thirion et al., 2007 |
| Between-group (whole-brain, medium effect) | 40-50 per group | d = 0.5 | Cluster-based, CDT p < 0.001 | Poldrack et al., 2017 |
| ROI-based (single a priori ROI) | 15-25 | d = 0.5-0.8 | Uncorrected (single test) | Desmond & Glover, 2002 |
| Resting-state connectivity (group mean) | 25-40 | r = 0.3-0.5 | FDR or NBS | Smith et al., 2011 |
| Brain-behavior correlation (whole-brain) | 2,000+ | r < 0.1 (replicable) | Permutation | Marek et al., 2022 |
| Brain-behavior correlation (single ROI) | 80-200 | r = 0.2-0.3 | Uncorrected | Standard formula |
Registered Report Considerations
Registered reports require pre-specification of sample size with a formal power analysis. For neuroimaging registered reports:
- Specify the primary analysis (whole-brain vs. ROI) and the corresponding power analysis method
- Use simulation-based power when possible; if not, use ROI-based power with conservative effect size estimates
- Pre-specify the multiple comparison correction method and document its impact on required N
- Include sensitivity analysis: What is the minimum detectable effect size at the planned N?
- State stopping rules: Pre-register the exact N and analysis plan; sequential analysis requires adjustment (Lakens, 2014)
- Account for attrition: Specify expected exclusion rate (typically 10-20% for fMRI) and over-recruit
Domain insight: Reviewers will be suspicious of power analyses based on large effect sizes from small pilot studies. Use conservative (deflated) effect size estimates and show power curves across a range of plausible effect sizes.
Practical Workflow for Grant Applications
When Pilot Data Are Available
- Run fMRIpower or NeuroPowerTools with pilot maps
- Generate power curves showing power vs. N for the primary contrast and ROI
- Select N that achieves 80-90% power for the primary analysis
- Add 15-20% for expected participant exclusions
- Report: pilot study details, effect size estimates, power tool used, correction method, target power, final N
When No Pilot Data Are Available
- Search NeuroVault for comparable paradigms; download unthresholded maps
- Use NeuroPowerTools with the published map
- Alternatively, estimate ROI-level effect sizes from published papers:
- Extract t-values and convert to Cohen's d
- Apply deflation (multiply by 0.5-0.75; Button et al., 2013)
- Use G*Power for ROI-based power
- As a last resort, use the benchmark table above with the analysis type closest to your planned study
- Document all assumptions and state that the power analysis is based on estimated (not measured) effect sizes
Common Pitfalls
- Using G*Power for whole-brain analyses: Standard power tools compute power for a single test and do not account for multiple comparison correction. This overestimates power by an order of magnitude (Mumford & Nichols, 2008)
- Trusting pilot study effect sizes: Small pilot studies (N < 20) produce inflated effect sizes. Always deflate by 25-50% (Button et al., 2013)
- Ignoring the correction method: Power depends critically on whether you use voxelwise FWE, cluster-based, FDR, or permutation-based correction. Power at FDR q < 0.05 can be 2-3x higher than voxelwise FWE p < 0.05 for the same N
- Conflating within-subject and between-subject power: Within-subject designs (one-sample t-test on contrast maps) are much more powerful than between-subject designs (two-sample t-test) because they eliminate between-subject variance (Mumford & Nichols, 2008)
- Not accounting for attrition: In fMRI, 10-20% of data may be unusable due to motion, scanner artifacts, or task non-compliance. Over-recruit accordingly
- Treating all regions equally: Power varies across the brain because effect sizes and noise vary spatially. Power at your primary ROI may be adequate even if whole-brain power is low
- Assuming published N is adequate: Most published fMRI studies are underpowered (Button et al., 2013). Matching a published study's N does not guarantee adequate power
- Not reporting sensitivity analysis: Always report the minimum detectable effect size at your planned N, in addition to the power estimate for the expected effect
Minimum Reporting Checklist
- Source of effect size estimate (pilot data, published study, meta-analysis)
- Effect size metric (Cohen's d, r, percent signal change) and value used
- Whether effect size deflation was applied and the correction factor
- Power analysis method (simulation-based, ROI-based analytical, benchmark-based)
- Power analysis tool and version (fMRIpower, NeuroPowerTools, G*Power, custom simulation)
- Number of simulations (for simulation-based approaches)
- Multiple comparison correction method assumed in power analysis
- Statistical threshold used (e.g., CDT p < 0.001, cluster p < 0.05 FWE)
- Target power level (80% or 90%)
- Planned total N and N per group (if applicable)
- Expected attrition rate and over-recruitment plan
- Sensitivity analysis (minimum detectable effect at planned N)
References
- Button, K. S., Ioannidis, J. P. A., Mokrysz, C., et al. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365-376.
- Cox, R. W., Chen, G., Glen, D. R., Reynolds, R. C., & Taylor, P. A. (2017). FMRI clustering in AFNI: False-positive rates redux. Brain Connectivity, 7(3), 152-171.
- Desmond, J. E., & Glover, G. H. (2002). Estimating sample size in functional MRI (fMRI) neuroimaging studies. Journal of Neuroscience Methods, 118(2), 115-128.
- Durnez, J., Degryse, J., Moerkerke, B., et al. (2016). Power and sample size calculations for fMRI studies based on the prevalence of active peaks. bioRxiv, 049429.
- Eklund, A., Nichols, T. E., & Knutsson, H. (2016). Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. PNAS, 113(28), 7900-7905.
- Joyce, K. E., & Hayasaka, S. (2012). Development of PowerMap: A software package for statistical power calculation in neuroimaging studies. Neuroinformatics, 10(4), 351-365.
- Laird, A. R., Fox, P. M., Price, C. J., et al. (2005). ALE meta-analysis: Controlling the false discovery rate and performing statistical contrasts. Human Brain Mapping, 25(1), 155-164.
- Lakens, D. (2014). Performing high-powered studies efficiently with sequential analyses. European Journal of Social Psychology, 44(7), 701-710.
- Marek, S., Tervo-Clemmens, B., Calabro, F. J., et al. (2022). Reproducible brain-wide association studies require thousands of individuals. Nature, 603(7902), 654-660.
- Mumford, J. A., & Nichols, T. E. (2008). Power calculation for group fMRI studies accounting for arbitrary design and temporal autocorrelation. NeuroImage, 39(1), 261-268.
- Poldrack, R. A., Baker, C. I., Durnez, J., et al. (2017). Scanning the horizon: Towards transparent and reproducible neuroimaging research. Nature Reviews Neuroscience, 18(2), 115-126.
- Salo, T., Yarkoni, T., Nichols, T. E., et al. (2023). NiMARE: Neuroimaging Meta-Analysis Research Environment. NeuroImage, 268, 119862.
- Smith, S. M., Miller, K. L., Salimi-Khorshidi, G., et al. (2011). Network modelling methods for FMRI. NeuroImage, 54(2), 875-891.
- Thirion, B., Pinel, P., Meriaux, S., et al. (2007). Analysis of a large fMRI cohort: Statistical and methodological issues for group analyses. NeuroImage, 35(1), 105-120.
- Turner, B. O., Paul, E. J., Miller, M. B., & Barbey, A. K. (2018). Small sample sizes reduce the replicability of task-based fMRI studies. Communications Biology, 1, 62.
- Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C., & Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. Nature Methods, 8(8), 665-670.
See references/ for worked examples and simulation code templates.