Overview
Deduplicate CSV rows using one or more key columns. Keeps the first row by default.
Inputs
- A CSV file.
- Optional key columns (comma separated).
Outputs
- A new CSV file with duplicates removed.
Workflow
- Choose the key columns (or use the whole row).
- Run the script to produce a deduped CSV.
- Validate row counts.
Usage
python scripts/csv_dedupe.py --input data.csv --output data.deduped.csv --keys id,email
python scripts/csv_dedupe.py --input data.csv --output data.deduped.csv
Safety
- No network access.
- Only reads/writes the file paths you pass in.