sharpagent-content-safety

SharpAgent Content Safety Engine — Pluggable multi-jurisdiction content policy enforcer. Blocks, flags, or passes content based on loaded rule sets. Supports concurrent jurisdictions (global/China/US/EU). Coordinates with the calibration framework and five-factor review. Independent Layer 3 of the SharpAgent four-layer architecture.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "sharpagent-content-safety" with this command: npx skills add yezhaowang888-stack/sharpagent-content-safety

SharpAgent Content Safety Engine v1.0.0

The last line of defense for content output. It's not about "should we say it" — it's "how should it be said in this jurisdiction." Independent from five-factor review (credibility ≠ compliance). Layer 3 of the four-layer architecture.

Architecture Position

Layer 1: Five-Factor Review     ← Trust verification (global, immutable)
Layer 2: Calibration Framework  ← Output adaptation (warm/professional/deep)
Layer 3: Content Safety Engine  ← Compliance interception (per-jurisdiction rules) ← YOU ARE HERE
Layer 4: Final Output

Why independent? Five-factor review asks "can I trust this?" Safety engine asks "can I say this?" The first is information quality, the second is compliance and safety. Mixing them contaminates both judgments.

Contract

contract:
  name: sharpagent-content-safety
  version: "1.0.0"
  category: analysis
  trust_level: verified
  reads:
    - Content
    - CompliancePolicy
  writes:
    - SafetyVerdict
  preconditions:
    - "At least one compliance policy loaded"
    - "Content is not empty"
  postconditions:
    - "Verdict is one of: pass | flag | block"
    - "If flag or block, reason and rule reference are provided"
  calibration:
    default_mode: professional
    modes_supported: [warm, professional, deep]
  compliance:
    jurisdiction: global
    safety_level: strict
  lifecycle:
    status: active
    publish_as: SharpAgent

Core Design

Pluggable Rule Engine

rules:
  - id: "global/PII-001"
    type: "block"
    description: "Detect and block personal identifiable information"
    patterns:
      - "email"
      - "phone_number"
      - "id_card"
      - "address"
    severity: "high"

  - id: "cn/content-001"
    type: "block"
    description: "Block prohibited content per China Internet regulations"
    jurisdiction: "cn"
    severity: "critical"

  - id: "us/export-001"
    type: "flag"
    description: "Flag export-controlled technology references"
    jurisdiction: "us"
    severity: "medium"

  - id: "global/hate-speech-001"
    type: "block"
    description: "Block hate speech and discriminatory content"
    severity: "high"

  - id: "global/privacy-003"
    type: "flag"
    description: "Flag privacy-sensitive content for human review"
    severity: "medium"

Rule Structure

rule:
  id: "{jurisdiction}/{name}-{seq}"   # Unique identifier
  type: "block" | "flag" | "pass"      # Action
  description: "..."                   # Human-readable
  jurisdiction: "cn" | "us" | "eu" | "global"  # Applicable jurisdiction
  patterns: [regex...]                 # Match patterns (optional)
  keywords: [string...]                # Keyword matching (optional)
  severity: "low" | "medium" | "high" | "critical"
  exemptions: [                        # Exceptions
    "educational context",
    "news reporting"
  ]

Jurisdiction Configuration

Runtime selection (multi-select):

safety_engine.load_policies(jurisdictions=["cn", "us", "eu"])

Each loaded jurisdiction stacks its rules. Conflicting rules: strictest wins.

Rule priority (high to low):
1. block → 2. flag → 3. pass
Cross-jurisdiction: take max severity

Workflow

Step 1: Pre-Flight

  • Content empty?
  • Content too long? Chunk at ≤4096 chars.

Step 2: Rule Matching

For each chunk:
    for each loaded rule:
        skip if jurisdiction not active
        check patterns/keywords
        check exemptions
        record match

Step 3: Verdict

VerdictMeaningAction
✅ passNo matchesLet through to output
⚠️ flagLow severity matchTag + allow + log
🚫 blockHigh severity matchBlock + return alternative content

Block replacement:

[Content blocked by safety engine]
Reason: {top_reason}
Contact administrator for full content.

Step 4: Logging

{
  "event": "safety_check",
  "jurisdictions": ["cn", "global"],
  "rules_matched": [
    {"rule": "cn/content-001", "severity": "critical"}
  ],
  "verdict": "block",
  "timestamp": "2026-05-11T06:10:00Z",
  "agent": "sharpagent"
}

Ruleset Management

Built-in Rulesets

RulesetCoverageFile
globalUniversal safety (hate speech/PII/privacy)rules/global.yaml
cnChina internet content regulationsrules/cn.yaml
usUS export control/safe harborrules/us.yaml
euGDPR relatedrules/eu.yaml

Custom Rules

rules/custom/
├── my-company-policy.yaml
├── my-project-policy.yaml
└── README.md

Edge Cases

SituationAction
Conflicting jurisdiction rulesStrictest wins (block > flag > pass)
Rule false positiveAdd exemption, log false positive
Cross-chunk sensitive phraseOverlap scanning (±200 chars)
No jurisdiction configuredLoad global only
Corrupt rule fileSkip + log error, don't crash engine
Exemption conditions metSkip rule, log exemption reason

Quality Gates

CheckWhatFail action
At least 1 rulesetNo rules = nothing blockedDon't start
Verdict unambiguouspass/flag/blockDefault block
Block provides reasonUser knows whyAdd reason
Complete audit logEvery check recordedBackfill
Rules versionedUpdates don't break running checksSemver rules

Integration Points

Five-Factor Review

  • Safety engine output (compliance_check: fail) can trigger five-factor
  • Independent but cooperative

Calibration Framework

  • Safety engine sits between Layer 2 (calibration) and Layer 4 (output)
  • Calibration compliance field maps to safety engine rule selection

Self-Evolving

  • Safety false positives/negatives trigger self-evolving reflection
  • New rules as improvement hypotheses

Layered Memory

  • Safety logs go to L6 archive (legal compliance)

Version History

  • v1.0.0 — Initial release. Pluggable multi-jurisdiction content safety engine.

SharpAgent · MIT-0 · 2026-05-11

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

Agent Governance Assistant

AI-powered enterprise AI agent governance framework — audit agent behavior, enforce security policies, ensure CBIRC/CFCA compliance, detect shadow AI, and ge...

Registry SourceRecently Updated
Security

Toani Vault CLI

Install, configure, and securely operate the Toani Vault CLI for login, health checks, credential metadata reads, and sandbox browser sessions. Use when the...

Registry SourceRecently Updated
520Profile unavailable
Security

security

No summary provided by upstream source.

Repository SourceNeeds Review
Security

security

No summary provided by upstream source.

Repository SourceNeeds Review
485-boshu2