stata

Comprehensive Stata reference for writing correct .do files, data management, econometrics, causal inference, graphics, Mata programming, and 20 community packages (reghdfe, estout, did, rdrobust, etc.). Covers syntax, options, gotchas, and idiomatic patterns. Use this skill whenever the user asks you to write, debug, or explain Stata code.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "stata" with this command: npx skills add dylantmoore/stata-skill/dylantmoore-stata-skill-stata

Stata Skill

You have access to comprehensive Stata reference files. Do not load all files. Read only the 1-3 files relevant to the user's current task using the routing table below.


Critical Gotchas

These are Stata-specific pitfalls that lead to silent bugs. Internalize these before writing any code.

Missing Values Sort to +Infinity

Stata's . (and .a-.z) are greater than all numbers.

* WRONG — includes observations where income is missing!
gen high_income = (income > 50000)

* RIGHT
gen high_income = (income > 50000) if !missing(income)

* WRONG — missing ages appear in this list
list if age > 60

* RIGHT
list if age > 60 & !missing(age)

= vs ==

= is assignment; == is comparison. Mixing them up is a syntax error or silent bug.

* WRONG — syntax error
gen employed = 1 if status = 1

* RIGHT
gen employed = 1 if status == 1

Local Macro Syntax

Locals use `name' (backtick + single-quote). Globals use $name or ${name}. Forgetting the closing quote is the #1 macro bug.

local controls "age education income"
regress wage `controls'        // correct
regress wage `controls         // WRONG — missing closing quote
regress wage 'controls'        // WRONG — wrong quote characters

by Requires Prior Sort (Use bysort)

* WRONG — error if data not sorted by id
by id: gen first = (_n == 1)

* RIGHT — bysort sorts automatically
bysort id: gen first = (_n == 1)

* Also RIGHT — explicit sort
sort id
by id: gen first = (_n == 1)

Factor Variable Notation (i. and c.)

Use i. for categorical, c. for continuous. Omitting i. treats categories as continuous.

* WRONG — treats race as continuous (e.g., race=3 has 3x effect of race=1)
regress wage race education

* RIGHT — creates dummies automatically
regress wage i.race education

* Interactions
regress wage i.race##c.education    // full interaction
regress wage i.race#c.education     // interaction only (no main effects)

generate vs replace

generate creates new variables; replace modifies existing ones. Using generate on an existing variable name is an error.

gen x = 1
gen x = 2          // ERROR: x already defined
replace x = 2      // correct

String Comparison Is Case-Sensitive

* May miss "Male", "MALE", etc.
keep if gender == "male"

* Safer
keep if lower(gender) == "male"

merge Always Check _merge

merge 1:1 id using other.dta
tab _merge                      // always inspect
assert _merge == 3              // or handle mismatches
drop _merge

preserve / restore for Temporary Changes

preserve
collapse (mean) income, by(state)
* ... do something with collapsed data ...
restore   // original data is back

Weights Are Not Interchangeable

  • fweight — frequency weights (replication)
  • aweight — analytic/regression weights (inverse variance)
  • pweight — probability/sampling weights (survey data, implies robust SE)
  • iweight — importance weights (rarely used)

capture Swallows Errors

capture some_command
if _rc != 0 {
    di as error "Failed with code: " _rc
    exit _rc
}

Line Continuation Uses ///

regress y x1 x2 x3 ///
    x4 x5 x6, ///
    vce(robust)

Stored Results: r() vs e() vs s()

  • r() — r-class commands (summarize, tabulate, etc.)
  • e() — e-class commands (estimation: regress, logit, etc.)
  • s() — s-class commands (parsing)

A new estimation command overwrites previous e() results. Store them first:

regress y x1 x2
estimates store model1

Running Stata from the Command Line

Claude can execute Stata code by running .do files in batch mode from the terminal. This is how to run Stata non-interactively.

Finding the Stata Binary

Stata on macOS is a .app bundle. The actual binary is inside it. Common locations:

# Stata 18 / StataNow (most common)
/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp
/Applications/StataNow/StataMP.app/Contents/MacOS/stata-mp

# Other editions (SE, BE)
/Applications/Stata/StataSE.app/Contents/MacOS/stata-se
/Applications/Stata/StataBE.app/Contents/MacOS/stata-be

If Stata isn't on $PATH, find it with: mdfind -name "stata-mp" | grep MacOS

Batch Mode (-b)

# Run a .do file in batch mode — output goes to <filename>.log
/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp -b do analysis.do

# If stata-mp is on PATH (e.g., via symlink or alias):
stata-mp -b do analysis.do
  • -b = batch mode (non-interactive, no GUI)
  • Output (everything Stata would display) is written to analysis.log in the working directory
  • Exit code is 0 on success, non-zero on error
  • The log file contains all output, including error messages — check it after execution

Running Inline Stata Code

To run a quick Stata snippet without creating a .do file:

# Write a temp .do file and run it
cat > /tmp/stata_run.do << 'EOF'
sysuse auto, clear
summarize price mpg
EOF
stata-mp -b do /tmp/stata_run.do
cat /tmp/stata_run.log

Checking Results

# Check if it succeeded
stata-mp -b do tests/run_tests.do && echo "SUCCESS" || echo "FAILED"

# Search the log for pass/fail
grep -E "PASS|FAIL|error|r\([0-9]+\)" run_tests.log

Tips

  • clear all at the top of batch scripts — batch mode starts with a fresh Stata session, but clear all ensures no stale state from prior runs in the same session.
  • set more off — prevents Stata from pausing for --more-- prompts (fatal in batch mode).
  • Log files overwrite silentlyanalysis.do always writes to analysis.log in the current directory. If you run multiple .do files, check the right log.
  • Working directory — Stata's working directory is wherever you run the command from, not where the .do file lives. Use cd in the .do file or absolute paths if needed.

Routing Table

Read only the files relevant to the user's task. Paths are relative to this SKILL.md file.

Data Operations

FileTopics & Key Commands
references/basics-getting-started.mduse, save, describe, browse, sysuse, basic workflow
references/data-import-export.mdimport delimited, import excel, ODBC, export, web data
references/data-management.mdgenerate, replace, merge, append, reshape, collapse, recode, egen, encode/decode
references/variables-operators.mdVariable types, byte/int/long/float/double, operators, missing values (.<.a), if/in qualifiers
references/string-functions.mdsubstr(), regexm(), strtrim(), split, ustrlen(), regex, Unicode
references/date-time-functions.mddate(), clock(), %td/%tc formats, mdy(), dofm(), business calendars
references/mathematical-functions.mdround(), log(), exp(), abs(), mod(), cond(), distributions, random numbers

Statistics & Econometrics

FileTopics & Key Commands
references/descriptive-statistics.mdsummarize, tabulate, correlate, tabstat, codebook, weighted stats
references/linear-regression.mdregress, vce(robust), vce(cluster), test, lincom, margins, predict, ivregress
references/panel-data.mdxtset, xtreg fe/re, Hausman test, xtabond, dynamic panels
references/time-series.mdtsset, ARIMA, VAR, dfuller, pperron, irf, forecasting
references/limited-dependent-variables.mdlogit, probit, tobit, poisson, nbreg, mlogit, ologit, margins for nonlinear
references/bootstrap-simulation.mdbootstrap, simulate, permute, Monte Carlo
references/survey-data-analysis.mdsvyset, svy:, subpop(), complex survey design, replicate weights
references/missing-data-handling.mdmi impute, mi estimate, FIML, misstable, diagnostics
references/maximum-likelihood.mdml model, custom likelihood functions, ml init, gradient-based optimization
references/gmm-estimation.mdgmm, moment conditions, estat overid, J-test

Causal Inference

FileTopics & Key Commands
references/treatment-effects.mdteffects ra/ipw/ipwra/aipw, stteffects, ATE/ATT/ATET
references/difference-in-differences.mdDiD, parallel trends, event studies, staggered adoption
references/regression-discontinuity.mdSharp/fuzzy RD, bandwidth selection, rdplot
references/matching-methods.mdPSM, nearest neighbor, kernel matching, teffects nnmatch
references/sample-selection.mdheckman, heckprobit, treatment models, exclusion restrictions

Advanced Methods

FileTopics & Key Commands
references/survival-analysis.mdstset, stcox, streg, Kaplan-Meier, parametric models
references/sem-factor-analysis.mdsem, gsem, CFA, path analysis, alpha, reliability
references/nonparametric-methods.mdkdensity, rank tests, qreg, npregress
references/spatial-analysis.mdspmatrix, spregress, spatial weights, Moran's I
references/machine-learning.mdlasso, elasticnet, cvlasso, cross-validation

Graphics

FileTopics & Key Commands
references/graphics.mdtwoway, scatter, line, bar, histogram, graph combine, graph export, schemes

Programming

FileTopics & Key Commands
references/programming-basics.mdlocal, global, foreach, forvalues, program define, syntax, return
references/advanced-programming.mdsyntax, mata, classes, _prefix, dialog boxes, tempfile/tempvar
references/mata-introduction.mdMata basics, when to use Mata vs ado, data types
references/mata-programming.mdMata functions, flow control, structures, pointers
references/mata-matrix-operations.mdMatrix creation, decompositions, solvers, st_matrix()
references/mata-data-access.mdst_data(), st_view(), st_store(), performance tips

Output & Workflow

FileTopics & Key Commands
references/tables-reporting.mdputexcel, putdocx, putpdf, LaTeX integration, collect
references/workflow-best-practices.mdProject structure, master do-files, version control, debugging, common mistakes
references/external-tools-integration.mdPython via python:, R via rsource, shell commands, Git

Community Packages

FileWhat It Does
packages/reghdfe.mdHigh-dimensional fixed effects OLS (absorbs multiple FE sets efficiently)
packages/estout.mdesttab/estout: publication-quality regression tables
packages/outreg2.mdAlternative regression table exporter (Word, Excel, TeX)
packages/asdoc.mdOne-command Word document creation for any Stata output
packages/tabout.mdCross-tabulations and summary tables to file
packages/coefplot.mdCoefficient plots from stored estimates
packages/graph-schemes.mdgrstyle, schemepack, plotplain — better graph themes
packages/did.mdModern DiD: csdid, did_multiplegt, did_imputation (Callaway-Sant'Anna, de Chaisemartin-D'Haultfoeuille, Borusyak-Jaravel-Spiess)
packages/event-study.mdeventstudyinteract, eventdd — event study estimators
packages/rdrobust.mdRobust RD estimation with optimal bandwidth (rdrobust, rdplot, rdbwselect)
packages/psmatch2.mdPropensity score matching (nearest neighbor, kernel, radius)
packages/synth.mdSynthetic control method (synth, synth_runner)
packages/ivreg2.mdEnhanced IV/2SLS: ivreg2, xtivreg2 with additional diagnostics
packages/xtabond2.mdDynamic panel GMM (Arellano-Bond/Blundell-Bond)
packages/binsreg.mdBinned scatter plots with CI (binsreg, binstest)
packages/nprobust.mdNonparametric kernel estimation and inference
packages/diagnostics.mdbacondecomp, xttest3, collinearity, heteroskedasticity tests
packages/winsor.mdWinsorizing and trimming: winsor2, winsor
packages/data-manipulation.mdgtools (fast collapse/egen), rangestat, egenmore
packages/package-management.mdssc install, net install, ado update, finding packages

Common Patterns

Regression Table Workflow

* Estimate models
eststo clear
eststo: regress y x1 x2, vce(robust)
eststo: regress y x1 x2 x3, vce(robust)
eststo: regress y x1 x2 x3 x4, vce(cluster id)

* Export table
esttab using "results.tex", replace ///
    se star(* 0.10 ** 0.05 *** 0.01) ///
    label booktabs ///
    title("Main Results") ///
    mtitles("(1)" "(2)" "(3)")

Panel Data Setup

xtset panelid timevar          // declare panel structure
xtdescribe                      // check balance
xtsum outcome                   // within/between variation

* Fixed effects
xtreg y x1 x2, fe vce(cluster panelid)
* Or with reghdfe (preferred for multiple FE)
reghdfe y x1 x2, absorb(panelid timevar) vce(cluster panelid)

Difference-in-Differences

* Classic 2x2 DiD
gen post = (year >= treatment_year)
gen treat_post = treated * post
regress y treated post treat_post, vce(cluster id)

* Event study (uniform timing — must interact with treatment group)
reghdfe y ib(-1).rel_time#1.treated, absorb(id year) vce(cluster id)
testparm *.rel_time#1.treated   // pre-trend test

* Modern staggered DiD (Callaway & Sant'Anna)
csdid y x1 x2, ivar(id) time(year) gvar(first_treat) agg(event)
csdid_plot

Graph Export

* Publication-quality scatter with fit line
twoway (scatter y x, mcolor(navy%50) msize(small)) ///
       (lfit y x, lcolor(cranberry) lwidth(medthick)), ///
    title("Title Here") ///
    xtitle("X Label") ytitle("Y Label") ///
    legend(off) scheme(s2color)
graph export "figure1.pdf", replace as(pdf)
graph export "figure1.png", replace as(png) width(2400)

Data Cleaning Pipeline

* Load and inspect
import delimited "raw_data.csv", clear varnames(1)
describe
codebook, compact

* Clean
rename *, lower                 // lowercase all varnames
destring income, replace force  // convert string to numeric
replace income = . if income < 0

* Label
label variable income "Annual household income (USD)"
label define yesno 0 "No" 1 "Yes"
label values employed yesno

* Save
compress
save "clean_data.dta", replace

Multiple Imputation

mi set mlong
mi register imputed income education
mi impute chained (regress) income (ologit) education = age i.gender, add(20) rseed(12345)
mi estimate: regress wage income education age i.gender

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

stata-c-plugins

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

openclaw-version-monitor

监控 OpenClaw GitHub 版本更新,获取最新版本发布说明,翻译成中文, 并推送到 Telegram 和 Feishu。用于:(1) 定时检查版本更新 (2) 推送版本更新通知 (3) 生成中文版发布说明

Archived SourceRecently Updated
Coding

ask-claude

Delegate a task to Claude Code CLI and immediately report the result back in chat. Supports persistent sessions with full context memory. Safe execution: no data exfiltration, no external calls, file operations confined to workspace. Use when the user asks to run Claude, delegate a coding task, continue a previous Claude session, or any task benefiting from Claude Code's tools (file editing, code analysis, bash, etc.).

Archived SourceRecently Updated
Coding

ai-dating

This skill enables dating and matchmaking workflows. Use it when a user asks to make friends, find a partner, run matchmaking, or provide dating preferences/profile updates. The skill should execute `dating-cli` commands to complete profile setup, task creation/update, match checking, contact reveal, and review.

Archived SourceRecently Updated