sanitize

Detect and redact PII from text files. Supports 15 categories including credit cards, SSNs, emails, API keys, addresses, and more — with zero dependencies.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "sanitize" with this command: npx skills add agentward-ai/sanitize

AgentWard Sanitize

Detect and redact personally identifiable information (PII) from text files.

IMPORTANT — PII Safety Rules

  • Do NOT read the input file directly. It may contain sensitive PII.
  • ALWAYS use --output FILE to write sanitized output to a file.
  • Only read the OUTPUT file, never the raw input.
  • Only show the user the redacted output, never the raw input.
  • --json and --preview are safe — they do NOT print raw PII values to stdout.
  • The entity map (raw PII → placeholder mapping) is written to a separate sidecar file (*.entity-map.json) only when --output is used. Do NOT read the entity map file.

What it does

Scans files for PII — credit cards, SSNs, emails, phone numbers, API keys, IP addresses, mailing addresses, dates of birth, passport numbers, driver's license numbers, bank routing numbers, medical license numbers, and insurance member IDs — and replaces each instance with a numbered placeholder like [CREDIT_CARD_1].

Usage

Sanitize a file (RECOMMENDED — always use --output)

python scripts/sanitize.py patient-notes.txt --output clean.txt

Preview mode (detect PII categories/offsets without showing raw values)

python scripts/sanitize.py notes.md --preview

JSON output (safe — no raw PII in stdout)

python scripts/sanitize.py report.txt --json --output clean.txt

Filter to specific categories

python scripts/sanitize.py log.txt --categories ssn,credit_card,email --output clean.txt

Supported PII categories

See references/SUPPORTED_PII.md for the full list with detection methods and false positive mitigation.

CategoryPattern typeExample
credit_cardLuhn-validated 13-19 digits4111 1111 1111 1111
ssn3-2-4 digit groups123-45-6789
cvvKeyword-anchored 3-4 digitsCVV: 123
expiry_dateKeyword-anchored MM/YYexpiry 01/30
api_keyProvider prefix patternssk-abc..., ghp_..., AKIA...
emailStandard email formatuser@example.com
phoneUS/intl phone numbers+1 (555) 123-4567
ip_addressIPv4 addresses192.168.1.100
date_of_birthKeyword-anchored datesDOB: 03/15/1985
passportKeyword-anchored alphanumericPassport: AB1234567
drivers_licenseKeyword-anchored alphanumericDL: D12345678
bank_routingKeyword-anchored 9 digitsrouting: 021000021
addressStreet + city/state/zip742 Evergreen Terrace Dr, Springfield, IL 62704
medical_licenseKeyword-anchored license IDLicense: CA-MD-8827341
insurance_idKeyword-anchored member/policy IDMember ID: BCB-2847193

Security and Privacy

  • All processing is local. The script makes zero network calls. No data leaves your machine.
  • Zero dependencies. Uses only Python standard library — no third-party packages to audit.
  • PII never reaches stdout. The --json and --preview modes strip raw PII values from output. The entity map (containing raw PII to placeholder mappings) is only written to a sidecar file on disk when --output is used.
  • Designed for agent safety. The skill instructions above tell the agent to never read the raw input file or the entity map file — only the sanitized output.

Requirements

  • Python 3.11+
  • No external dependencies (stdlib only)

About

Built by AgentWard — the open-source permission control plane for AI agents.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Power Automate Monitoring

**Pro+ subscription required.** Tenant-wide Power Automate flow health monitoring, failure rate analytics, and asset inventory using the FlowStudio MCP cache...

Registry SourceRecently Updated
General

Power Automate Governance

Govern Power Automate flows and Power Apps at scale using the FlowStudio MCP cached store. Classify flows by business impact, detect orphaned resources, audi...

Registry SourceRecently Updated
General

Secretary Memory

OpenClaw 秘书式多分区记忆系统 v3.0。仿生现代秘书的笔记本分类法,支持:(1) 多分区并发搜索 + 每分区3条上下文召回,(2) 会话自动摘要,(3) 偏好自动提取 + 用户关系图谱,(4) 记忆冲突主动检测,(5) 定时 consolidation + 会话结束 hook,(6) 精细化恢复/回溯,...

Registry SourceRecently Updated
General

运维助手 v2.0

运维助手 v2.0 - 支持本地、远程、多服务器集群监控 (健康检查、日志分析、性能监控、批量操作、文件传输)

Registry SourceRecently Updated