R Econometrics

Purpose

This skill helps economists run rigorous econometric analyses in R, including Instrumental Variables (IV), Difference-in-Differences (DiD), and Regression Discontinuity Design (RDD). It generates publication-ready code with proper diagnostics and robust standard errors.

When to Use

Running causal inference analyses
Estimating treatment effects with panel data
Creating publication-ready regression tables
Implementing modern econometric methods (two-way fixed effects, event studies)

Instructions

Step 1: Understand the Research Design

Before generating code, ask the user:

What is your identification strategy? (IV, DiD, RDD, or simple regression)
What is the unit of observation? (individual, firm, country-year, etc.)
What fixed effects do you need? (entity, time, two-way)
How should standard errors be clustered?

Step 2: Generate Analysis Code

Based on the research design, generate R code that:

Uses the fixest package - Modern, fast, and feature-rich for panel data
Includes proper diagnostics:
For IV: First-stage F-statistics, weak instrument tests
For DiD: Parallel trends visualization, event study plots
For RDD: Bandwidth selection, density tests
Uses robust/clustered standard errors appropriate for the data structure
Creates publication-ready output using modelsummary or etable

Step 3: Structure the Output

Always include:

1. Setup and packages

2. Data loading and preparation

3. Descriptive statistics

4. Main specification

5. Robustness checks

6. Visualization

7. Export results

Step 4: Add Documentation

Include comments explaining:

Why each specification choice was made
Interpretation of key coefficients
Limitations and assumptions

Example Prompts

"Run a DiD analysis with state and year fixed effects, clustering at the state level"
"Estimate the effect of X on Y using Z as an instrument"
"Create an event study plot showing treatment effects by year"
"Run a sharp RDD with optimal bandwidth selection"

Example Output

============================================

Difference-in-Differences Analysis

============================================

Setup

library(tidyverse) library(fixest) library(modelsummary)

Load data

df <- read_csv("data.csv")

Prepare treatment variable

df <- df %>% mutate( post = year >= treatment_year, treated = state %in% treatment_states, treat_post = treated * post )

----------------------------------------

Main DiD Specification

----------------------------------------

Two-way fixed effects

did_model <- feols( outcome ~ treat_post | state + year, data = df, cluster = ~state )

View results

summary(did_model)

----------------------------------------

Event Study

----------------------------------------

Create relative time variable

df <- df %>% mutate(rel_time = year - treatment_year)

Event study regression

event_study <- feols( outcome ~ i(rel_time, treated, ref = -1) | state + year, data = df, cluster = ~state )

Plot coefficients

iplot(event_study, main = "Event Study: Effect on Outcome", xlab = "Years Relative to Treatment")

----------------------------------------

Robustness: Alternative Specifications

----------------------------------------

Different clustering

did_robust <- feols( outcome ~ treat_post | state + year, data = df, cluster = ~state + year # Two-way clustering )

----------------------------------------

Export Results

----------------------------------------

modelsummary( list("Main" = did_model, "Two-way Cluster" = did_robust), stars = c('' = 0.1, '' = 0.05, '' = 0.01), output = "results/did_table.tex" )

Requirements

Software

R 4.0+

Packages

fixest
Fast fixed effects estimation
modelsummary
Publication-ready tables
tidyverse
Data manipulation
ggplot2
Visualization

Install with:

install.packages(c("fixest", "modelsummary", "tidyverse"))

Best Practices

Always cluster standard errors at the level of treatment assignment
Run pre-trend tests for DiD designs
Report first-stage F-statistics for IV (should be > 10)
Use feols over lm for panel data (faster and more features)
Document all specification choices in your code comments

Common Pitfalls

❌ Not clustering standard errors at the right level
❌ Ignoring weak instruments in IV estimation
❌ Using TWFE with staggered treatment timing (use did or sunab() instead)
❌ Not reporting robustness checks

References

fixest documentation
Cunningham (2021) Causal Inference: The Mixtape
Angrist & Pischke (2009) Mostly Harmless Econometrics

Changelog

v1.0.0

Initial release with IV, DiD, RDD support

r-econometrics

Safety Notice

Copy this and send it to your AI assistant to learn

1. Setup and packages

2. Data loading and preparation

3. Descriptive statistics

4. Main specification

5. Robustness checks

6. Visualization

7. Export results

============================================

Difference-in-Differences Analysis

============================================

Setup

Load data

Prepare treatment variable

----------------------------------------

Main DiD Specification

----------------------------------------

Two-way fixed effects

View results

----------------------------------------

Event Study

----------------------------------------

Create relative time variable

Event study regression

Plot coefficients

----------------------------------------

Robustness: Alternative Specifications

----------------------------------------

Different clustering

----------------------------------------

Export Results

----------------------------------------

Source Transparency

Related Skills

python-panel-data

academic-paper-writer

stata-regression

lit-review-assistant