R Econometrics
Purpose
This skill helps economists run rigorous econometric analyses in R, including Instrumental Variables (IV), Difference-in-Differences (DiD), and Regression Discontinuity Design (RDD). It generates publication-ready code with proper diagnostics and robust standard errors.
When to Use
-
Running causal inference analyses
-
Estimating treatment effects with panel data
-
Creating publication-ready regression tables
-
Implementing modern econometric methods (two-way fixed effects, event studies)
Instructions
Step 1: Understand the Research Design
Before generating code, ask the user:
-
What is your identification strategy? (IV, DiD, RDD, or simple regression)
-
What is the unit of observation? (individual, firm, country-year, etc.)
-
What fixed effects do you need? (entity, time, two-way)
-
How should standard errors be clustered?
Step 2: Generate Analysis Code
Based on the research design, generate R code that:
-
Uses the fixest package - Modern, fast, and feature-rich for panel data
-
Includes proper diagnostics:
-
For IV: First-stage F-statistics, weak instrument tests
-
For DiD: Parallel trends visualization, event study plots
-
For RDD: Bandwidth selection, density tests
-
Uses robust/clustered standard errors appropriate for the data structure
-
Creates publication-ready output using modelsummary or etable
Step 3: Structure the Output
Always include:
1. Setup and packages
2. Data loading and preparation
3. Descriptive statistics
4. Main specification
5. Robustness checks
6. Visualization
7. Export results
Step 4: Add Documentation
Include comments explaining:
-
Why each specification choice was made
-
Interpretation of key coefficients
-
Limitations and assumptions
Example Prompts
-
"Run a DiD analysis with state and year fixed effects, clustering at the state level"
-
"Estimate the effect of X on Y using Z as an instrument"
-
"Create an event study plot showing treatment effects by year"
-
"Run a sharp RDD with optimal bandwidth selection"
Example Output
============================================
Difference-in-Differences Analysis
============================================
Setup
library(tidyverse) library(fixest) library(modelsummary)
Load data
df <- read_csv("data.csv")
Prepare treatment variable
df <- df %>% mutate( post = year >= treatment_year, treated = state %in% treatment_states, treat_post = treated * post )
----------------------------------------
Main DiD Specification
----------------------------------------
Two-way fixed effects
did_model <- feols( outcome ~ treat_post | state + year, data = df, cluster = ~state )
View results
summary(did_model)
----------------------------------------
Event Study
----------------------------------------
Create relative time variable
df <- df %>% mutate(rel_time = year - treatment_year)
Event study regression
event_study <- feols( outcome ~ i(rel_time, treated, ref = -1) | state + year, data = df, cluster = ~state )
Plot coefficients
iplot(event_study, main = "Event Study: Effect on Outcome", xlab = "Years Relative to Treatment")
----------------------------------------
Robustness: Alternative Specifications
----------------------------------------
Different clustering
did_robust <- feols( outcome ~ treat_post | state + year, data = df, cluster = ~state + year # Two-way clustering )
----------------------------------------
Export Results
----------------------------------------
modelsummary( list("Main" = did_model, "Two-way Cluster" = did_robust), stars = c('' = 0.1, '' = 0.05, '' = 0.01), output = "results/did_table.tex" )
Requirements
Software
- R 4.0+
Packages
-
fixest
-
Fast fixed effects estimation
-
modelsummary
-
Publication-ready tables
-
tidyverse
-
Data manipulation
-
ggplot2
-
Visualization
Install with:
install.packages(c("fixest", "modelsummary", "tidyverse"))
Best Practices
-
Always cluster standard errors at the level of treatment assignment
-
Run pre-trend tests for DiD designs
-
Report first-stage F-statistics for IV (should be > 10)
-
Use feols over lm for panel data (faster and more features)
-
Document all specification choices in your code comments
Common Pitfalls
-
❌ Not clustering standard errors at the right level
-
❌ Ignoring weak instruments in IV estimation
-
❌ Using TWFE with staggered treatment timing (use did or sunab() instead)
-
❌ Not reporting robustness checks
References
-
fixest documentation
-
Cunningham (2021) Causal Inference: The Mixtape
-
Angrist & Pischke (2009) Mostly Harmless Econometrics
Changelog
v1.0.0
- Initial release with IV, DiD, RDD support