Data Quality Audit
Quick Start
Goal: find correctness issues quickly and return a small, actionable report.
When to use this skill
- bug reports that smell like data drift (missing rows, double counts, weird nulls)
- after migrations/backfills to prove invariants still hold
- when analytics numbers disagree between systems
Before you run checks
- database engine
- target tables + expected primary keys
- expected invariants (unique, not null, fk relationships, allowed ranges)
- performance constraints (large tables, peak hours)
Workflow (default)
- pick 1–3 core tables for the issue
- run cheap checks first (nulls, duplicates on keys)
- run relationship checks (orphans)
- run domain checks (ranges, enums)
- write a short findings report + safe remediation plan
Core checks (portable sql)
null checks
SELECT COUNT(*) AS null_count
FROM your_table
WHERE important_col IS NULL;
duplicates on a candidate key
SELECT key_col, COUNT(*) AS c
FROM your_table
GROUP BY key_col
HAVING COUNT(*) > 1
ORDER BY c DESC
LIMIT 50;
orphan rows (broken references)
SELECT COUNT(*) AS orphan_count
FROM child c
LEFT JOIN parent p ON p.id = c.parent_id
WHERE c.parent_id IS NOT NULL
AND p.id IS NULL;
invalid ranges
SELECT COUNT(*) AS bad_count
FROM your_table
WHERE amount < 0;
time sanity (example)
SELECT COUNT(*) AS bad_count
FROM your_table
WHERE created_at > NOW();
Performance-safe tips
- always start with
count(*)+ targeted where clauses - add
limitwhen inspecting example rows - scope by time window if tables are huge (last 7/30 days)
- prefer indexed predicates (id ranges, created_at) for sampling
Remediation patterns
fix duplicates
- decide on a canonical row rule (latest by updated_at, highest priority status, etc.)
- write a deterministic dedupe query
- add a unique constraint or unique index after cleanup
fix orphans
- pick policy: delete orphans, reattach to parent, or set fk to null
- add fk constraint after data is corrected
fix nulls
- backfill from source columns or defaults
- add not null only after verification
Output format (copy/paste)
## data quality audit
### scope
- tables:
- time window:
### findings
- [severity] issue: evidence
### likely impact
- user impact:
- analytics impact:
### remediation
- step 1:
- step 2:
### verification
- query checks: