AgentWard Sanitize
Detect and redact personally identifiable information (PII) from text files.
IMPORTANT — PII Safety Rules
-
Do NOT read the input file directly. It may contain sensitive PII.
-
ALWAYS use --output FILE to write sanitized output to a file.
-
Only read the OUTPUT file, never the raw input.
-
Only show the user the redacted output, never the raw input.
-
--json and --preview are safe — they do NOT print raw PII values to stdout.
-
The entity map (raw PII → placeholder mapping) is written to a separate sidecar file (*.entity-map.json ) only when --output is used. Do NOT read the entity map file.
What it does
Scans files for PII — credit cards, SSNs, emails, phone numbers, API keys, IP addresses, mailing addresses, dates of birth, passport numbers, driver's license numbers, bank routing numbers, medical license numbers, and insurance member IDs — and replaces each instance with a numbered placeholder like [CREDIT_CARD_1] .
Usage
Sanitize a file (RECOMMENDED — always use --output)
python scripts/sanitize.py patient-notes.txt --output clean.txt
Preview mode (detect PII categories/offsets without showing raw values)
python scripts/sanitize.py notes.md --preview
JSON output (safe — no raw PII in stdout)
python scripts/sanitize.py report.txt --json --output clean.txt
Filter to specific categories
python scripts/sanitize.py log.txt --categories ssn,credit_card,email --output clean.txt
Supported PII categories
See references/SUPPORTED_PII.md for the full list with detection methods and false positive mitigation.
Category Pattern type Example
credit_card
Luhn-validated 13-19 digits 4111 1111 1111 1111
ssn
3-2-4 digit groups 123-45-6789
cvv
Keyword-anchored 3-4 digits CVV: 123
expiry_date
Keyword-anchored MM/YY expiry 01/30
api_key
Provider prefix patterns sk-abc..., ghp_..., AKIA...
Standard email format user@example.com
phone
US/intl phone numbers +1 (555) 123-4567
ip_address
IPv4 addresses 192.168.1.100
date_of_birth
Keyword-anchored dates DOB: 03/15/1985
passport
Keyword-anchored alphanumeric Passport: AB1234567
drivers_license
Keyword-anchored alphanumeric DL: D12345678
bank_routing
Keyword-anchored 9 digits routing: 021000021
address
Street + city/state/zip 742 Evergreen Terrace Dr, Springfield, IL 62704
medical_license
Keyword-anchored license ID License: CA-MD-8827341
insurance_id
Keyword-anchored member/policy ID Member ID: BCB-2847193
Security and Privacy
-
All processing is local. The script makes zero network calls. No data leaves your machine.
-
Zero dependencies. Uses only Python standard library — no third-party packages to audit.
-
PII never reaches stdout. The --json and --preview modes strip raw PII values from output. The entity map (containing raw PII to placeholder mappings) is only written to a sidecar file on disk when --output is used.
-
Designed for agent safety. The skill instructions above tell the agent to never read the raw input file or the entity map file — only the sanitized output.
Requirements
-
Python 3.11+
-
No external dependencies (stdlib only)
About
Built by AgentWard — the open-source permission control plane for AI agents.