mcmc-sampling-stan

MCMC Sampling with Stan

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "mcmc-sampling-stan" with this command: npx skills add letta-ai/skills/letta-ai-skills-mcmc-sampling-stan

MCMC Sampling with Stan

Overview

This skill provides procedural guidance for implementing Markov Chain Monte Carlo (MCMC) sampling using Stan and RStan. It covers the complete workflow from environment setup through model specification to convergence validation, with emphasis on avoiding common pitfalls in hierarchical Bayesian modeling.

When to Use This Skill

  • Installing and configuring RStan in R environments

  • Writing Stan models for Bayesian inference

  • Implementing hierarchical/multilevel models

  • Configuring MCMC sampling parameters

  • Diagnosing and validating MCMC convergence

Pre-Implementation Checklist

Before writing any code, verify the following in order:

  1. System Dependencies First

Critical: Always check system dependencies before attempting R package installation.

Check for C++ compiler and build tools

which g++ || which clang++ dpkg -l | grep -E "(build-essential|r-base-dev)" # Debian/Ubuntu

Required system packages for RStan:

  • build-essential or equivalent C++ toolchain

  • r-base-dev for R development headers

  • Sufficient memory (Stan compilation is memory-intensive)

  1. RStan Installation Strategy

Install RStan with explicit version pinning when specified:

If specific version required (e.g., 2.32.7)

install.packages("rstan", repos = "https://cloud.r-project.org/")

Verify installation and version

library(rstan) packageVersion("rstan")

Common installation failures:

  • Missing C++ compiler → Install build-essential first

  • Compilation errors → Check g++ version compatibility

  • Memory exhaustion → Close other applications, increase swap

  1. Environment Verification

Before proceeding with model development:

Verify RStan loads without errors

library(rstan)

Test Stan compilation with minimal model

stan_model(model_code = "parameters { real x; } model { x ~ normal(0,1); }")

Stan Model Specification

Hierarchical Model Structure

For hierarchical Bayesian models, follow this block structure:

data { // Declare all input data with types and constraints int<lower=0> N; // Number of observations int<lower=0> y[N]; // Observed counts int<lower=0> n[N]; // Trial sizes }

parameters { // Hyperparameters (population-level) real<lower=0> alpha; real<lower=0> beta;

// Group-level parameters real<lower=0, upper=1> theta[N]; }

model { // Hyperprior (if using custom/improper priors) // Example: p(α,β) ∝ (α+β)^(-5/2) target += -2.5 * log(alpha + beta);

// Prior for group parameters theta ~ beta(alpha, beta);

// Likelihood y ~ binomial(n, theta); }

Custom Prior Implementation

When implementing custom or improper priors:

Prior Form Stan Implementation

p(x) ∝ x^a target += a * log(x);

p(x) ∝ (a+b)^c target += c * log(a + b);

Flat/improper No explicit statement needed

Document the mathematical transformation: Always comment the relationship between the mathematical prior and the target += statement.

MCMC Sampling Configuration

Recommended Control Parameters

For hierarchical models with potential sampling difficulties:

fit <- sampling( model, data = stan_data, chains = 4, # Multiple chains for convergence assessment iter = 100000, # Total iterations (including warmup) seed = 1, # Reproducibility control = list( adapt_delta = 0.95, # Increase for divergent transitions max_treedepth = 15 # Increase for complex posteriors ) )

Parameter Selection Rationale

Parameter Default When to Increase Why

adapt_delta

0.8 Divergent transitions Smaller step sizes improve exploration

max_treedepth

10 "Maximum treedepth" warnings Allows longer trajectories

iter

2000 Low effective sample size More samples for inference

warmup

iter/2 Slow convergence More adaptation time

Warmup Considerations

  • Default warmup is 50% of iter

  • For 100,000 iterations: 50,000 warmup + 50,000 sampling per chain

  • Explicitly specify if different warmup proportion needed:

sampling(..., iter = 100000, warmup = 25000) # 25% warmup

Convergence Diagnostics

Required Validation Steps

Always verify these after sampling:

1. Check Rhat (potential scale reduction factor)

summary(fit)$summary[, "Rhat"]

Target: All Rhat < 1.01 (ideally < 1.005)

2. Check effective sample size

summary(fit)$summary[, "n_eff"]

Target: n_eff > 400 for reliable inference

3. Check for divergent transitions

get_num_divergent(fit)

Target: 0 divergent transitions

4. Check treedepth saturation

get_num_max_treedepth(fit)

Target: 0 or minimal saturation

Diagnostic Interpretation

Issue Indicator Resolution

Poor mixing Rhat > 1.01 Increase iter, reparameterize

Inefficient sampling Low n_eff Increase iter, adjust adapt_delta

Geometric problems Divergent transitions Increase adapt_delta, reparameterize

Complex posterior Max treedepth hits Increase max_treedepth

Common Pitfalls and Solutions

  1. Installation Order Errors

Wrong: Install RStan → Installation fails → Check dependencies Right: Check dependencies → Install prerequisites → Install RStan

  1. Missing Convergence Checks

Wrong: Run sampling → Extract means → Report results Right: Run sampling → Check Rhat/n_eff/divergences → Validate → Extract results

  1. Improper Prior Issues

When using improper priors like p(α,β) ∝ (α+β)^(-5/2):

  • Ensure parameters are constrained (<lower=0> )

  • Verify posterior is proper (converges)

  • Check for boundary behavior near zero

  1. Command Syntax in Shell

Avoid piping R output with shell redirection that causes parsing errors:

Problematic

Rscript -e "code" 2>&1 | grep ...

Safer

Rscript -e "code" > output.txt 2>&1 grep ... output.txt

Verification Strategy

Stepwise Verification Approach

  • Environment: Verify RStan installation before model development

  • Compilation: Test Stan model compiles without errors

  • Sampling: Run with reduced iterations first to catch issues

  • Diagnostics: Check all convergence metrics before trusting results

  • Results: Extract posteriors only after validation passes

Sanity Checks for Results

  • Posterior means should be plausible given prior and data

  • 95% credible intervals should have reasonable width

  • Multiple chains should show similar posteriors

  • Results should be consistent across different seeds (approximately)

References

For detailed Stan documentation and examples, consult:

  • Stan User's Guide: Model specification and best practices

  • RStan Getting Started: Installation and basic usage

  • Stan Functions Reference: Available distributions and functions

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

extracting-pdf-text

No summary provided by upstream source.

Repository SourceNeeds Review
General

video-processing

No summary provided by upstream source.

Repository SourceNeeds Review
General

google-workspace

No summary provided by upstream source.

Repository SourceNeeds Review
General

portfolio-optimization

No summary provided by upstream source.

Repository SourceNeeds Review