SharpAgent Content Safety Engine v1.0.0
The last line of defense for content output. It's not about "should we say it" — it's "how should it be said in this jurisdiction." Independent from five-factor review (credibility ≠ compliance). Layer 3 of the four-layer architecture.
Architecture Position
Layer 1: Five-Factor Review ← Trust verification (global, immutable)
Layer 2: Calibration Framework ← Output adaptation (warm/professional/deep)
Layer 3: Content Safety Engine ← Compliance interception (per-jurisdiction rules) ← YOU ARE HERE
Layer 4: Final Output
Why independent? Five-factor review asks "can I trust this?" Safety engine asks "can I say this?" The first is information quality, the second is compliance and safety. Mixing them contaminates both judgments.
Contract
contract:
name: sharpagent-content-safety
version: "1.0.0"
category: analysis
trust_level: verified
reads:
- Content
- CompliancePolicy
writes:
- SafetyVerdict
preconditions:
- "At least one compliance policy loaded"
- "Content is not empty"
postconditions:
- "Verdict is one of: pass | flag | block"
- "If flag or block, reason and rule reference are provided"
calibration:
default_mode: professional
modes_supported: [warm, professional, deep]
compliance:
jurisdiction: global
safety_level: strict
lifecycle:
status: active
publish_as: SharpAgent
Core Design
Pluggable Rule Engine
rules:
- id: "global/PII-001"
type: "block"
description: "Detect and block personal identifiable information"
patterns:
- "email"
- "phone_number"
- "id_card"
- "address"
severity: "high"
- id: "cn/content-001"
type: "block"
description: "Block prohibited content per China Internet regulations"
jurisdiction: "cn"
severity: "critical"
- id: "us/export-001"
type: "flag"
description: "Flag export-controlled technology references"
jurisdiction: "us"
severity: "medium"
- id: "global/hate-speech-001"
type: "block"
description: "Block hate speech and discriminatory content"
severity: "high"
- id: "global/privacy-003"
type: "flag"
description: "Flag privacy-sensitive content for human review"
severity: "medium"
Rule Structure
rule:
id: "{jurisdiction}/{name}-{seq}" # Unique identifier
type: "block" | "flag" | "pass" # Action
description: "..." # Human-readable
jurisdiction: "cn" | "us" | "eu" | "global" # Applicable jurisdiction
patterns: [regex...] # Match patterns (optional)
keywords: [string...] # Keyword matching (optional)
severity: "low" | "medium" | "high" | "critical"
exemptions: [ # Exceptions
"educational context",
"news reporting"
]
Jurisdiction Configuration
Runtime selection (multi-select):
safety_engine.load_policies(jurisdictions=["cn", "us", "eu"])
Each loaded jurisdiction stacks its rules. Conflicting rules: strictest wins.
Rule priority (high to low):
1. block → 2. flag → 3. pass
Cross-jurisdiction: take max severity
Workflow
Step 1: Pre-Flight
- Content empty?
- Content too long? Chunk at ≤4096 chars.
Step 2: Rule Matching
For each chunk:
for each loaded rule:
skip if jurisdiction not active
check patterns/keywords
check exemptions
record match
Step 3: Verdict
| Verdict | Meaning | Action |
|---|---|---|
| ✅ pass | No matches | Let through to output |
| ⚠️ flag | Low severity match | Tag + allow + log |
| 🚫 block | High severity match | Block + return alternative content |
Block replacement:
[Content blocked by safety engine]
Reason: {top_reason}
Contact administrator for full content.
Step 4: Logging
{
"event": "safety_check",
"jurisdictions": ["cn", "global"],
"rules_matched": [
{"rule": "cn/content-001", "severity": "critical"}
],
"verdict": "block",
"timestamp": "2026-05-11T06:10:00Z",
"agent": "sharpagent"
}
Ruleset Management
Built-in Rulesets
| Ruleset | Coverage | File |
|---|---|---|
global | Universal safety (hate speech/PII/privacy) | rules/global.yaml |
cn | China internet content regulations | rules/cn.yaml |
us | US export control/safe harbor | rules/us.yaml |
eu | GDPR related | rules/eu.yaml |
Custom Rules
rules/custom/
├── my-company-policy.yaml
├── my-project-policy.yaml
└── README.md
Edge Cases
| Situation | Action |
|---|---|
| Conflicting jurisdiction rules | Strictest wins (block > flag > pass) |
| Rule false positive | Add exemption, log false positive |
| Cross-chunk sensitive phrase | Overlap scanning (±200 chars) |
| No jurisdiction configured | Load global only |
| Corrupt rule file | Skip + log error, don't crash engine |
| Exemption conditions met | Skip rule, log exemption reason |
Quality Gates
| Check | What | Fail action |
|---|---|---|
| At least 1 ruleset | No rules = nothing blocked | Don't start |
| Verdict unambiguous | pass/flag/block | Default block |
| Block provides reason | User knows why | Add reason |
| Complete audit log | Every check recorded | Backfill |
| Rules versioned | Updates don't break running checks | Semver rules |
Integration Points
Five-Factor Review
- Safety engine output (compliance_check: fail) can trigger five-factor
- Independent but cooperative
Calibration Framework
- Safety engine sits between Layer 2 (calibration) and Layer 4 (output)
- Calibration
compliancefield maps to safety engine rule selection
Self-Evolving
- Safety false positives/negatives trigger self-evolving reflection
- New rules as improvement hypotheses
Layered Memory
- Safety logs go to L6 archive (legal compliance)
Version History
- v1.0.0 — Initial release. Pluggable multi-jurisdiction content safety engine.
SharpAgent · MIT-0 · 2026-05-11