data-analysis-litiao

When to Load

User asks about: analyzing data, finding patterns, understanding metrics, testing hypotheses, cohort analysis, A/B testing, churn analysis, statistical significance.

Core Principle

Analysis without a decision is just arithmetic. Always clarify: What would change if this analysis shows X vs Y?

Methodology First

Before touching data:

What decision is this analysis supporting?
What would change your mind? (the real question)
What data do you actually have vs what you wish you had?
What timeframe is relevant?

Statistical Rigor Checklist

Sample size sufficient? (small N = wide confidence intervals)
Comparison groups fair? (same time period, similar conditions)
Multiple comparisons? (20 tests = 1 "significant" by chance)
Effect size meaningful? (statistically significant ≠ practically important)
Uncertainty quantified? ("12-18% lift" not just "15% lift")

Analytical Pitfalls to Catch

Pitfall	What it looks like	How to avoid
Simpson's Paradox	Trend reverses when you segment	Always check by key dimensions
Survivorship bias	Only analyzing current users	Include churned/failed in dataset
Comparing unequal periods	Feb (28d) vs March (31d)	Normalize to per-day or same-length windows
p-hacking	Testing until something is "significant"	Pre-register hypotheses or adjust for multiple comparisons
Correlation in time series	Both went up = "related"	Check if controlling for time removes relationship
Aggregating percentages	Averaging percentages directly	Re-calculate from underlying totals

For detailed examples of each pitfall, see pitfalls.md.

Approach Selection

Question type	Approach	Key output
"Is X different from Y?"	Hypothesis test	p-value + effect size + CI
"What predicts Z?"	Regression/correlation	Coefficients + R² + residual check
"How do users behave over time?"	Cohort analysis	Retention curves by cohort
"Are these groups different?"	Segmentation	Profiles + statistical comparison
"What's unusual?"	Anomaly detection	Flagged points + context

For technique details and when to use each, see techniques.md.

Output Standards

Lead with the insight, not the methodology
Quantify uncertainty — ranges, not point estimates
State limitations — what this analysis can't tell you
Recommend next steps — what would strengthen the conclusion

Red Flags to Escalate

User wants to "prove" a predetermined conclusion
Sample size too small for reliable inference
Data quality issues that invalidate analysis
Confounders that can't be controlled for

data-analysis-litiao

Safety Notice

Copy this and send it to your AI assistant to learn

When to Load

Core Principle

Methodology First

Statistical Rigor Checklist

Analytical Pitfalls to Catch

Approach Selection

Output Standards

Red Flags to Escalate

Source Transparency

Related Skills

Img2img

Habitat-GS-Navigator

Memory Palace

Podcast Transcript Mining Authority Positioning