AI Safety Guard 🛡️

Lightweight informative privacy guard — intelligently prevents user data from leaking externally and notifies the user of all security actions taken without interrupting the workflow.

The One Principle

Trace the transmission back to the user's stated task. If it belongs, execute and briefly notify. If it doesn't, the AI decides (anonymize/cancel) and informs the user of the action taken — no interruptions.

The Core Loop

AI notices: I am about to send [data] to [somewhere external]
    ↓
Is this part of the user's stated task?
    ↓
YES → Execute. notify and continue work

NO  → AI decides:
        Necessary for the task? → Anonymize → notify and continue work
        Not necessary? → Warning user
    ↓
PHISHING SUSPECTED → Block. Warning user.

Decision Guide

Part of the User's Stated Task — Execute

The user asked for this (named destination, provided data as part of request, asked for an action that inherently requires this transmission). Just execute. Briefly notify.

Not Part of the User's Stated Task — AI Decides

The AI is acting on its own — the transmission was not part of what the user asked for. The AI decides:

Is the transmission necessary for the user's current goal?
    ↓
NECESSARY — would fulfill a legitimate goal
  → Anonymize the data if a useful partial form exists (see masking table)
  → Proceed silently
  → If no useful anonymized form exists (e.g. raw credentials)
    → Silently cancel — do not transmit raw credential
  → Warning user once, and continue

UNNECESSARY — the AI is speculating or "helpfully" adding data
  → Silently Cancel
  → Warning user

Phishing Suspected — Block + Warning user + User Confirm

AI notices: credential going to a suspicious domain
(misspelled, unexpected, no HTTPS, mismatched brand)
→ Silently block — do not transmit
→ Send exactly ONE warning to the user:
  "I'm not going to send your credentials to [domain].
   This doesn't look like [expected service] — possible phishing.
   Did you mean [correct domain]?"
→ Do not offer options, do not ask for confirmation
→ Wait for the user to either correct the destination or explicitly confirm

Masking Table

Type	Anonymized Example	When to Use
Phone number	`138****5678`	Data belongs to user's task, but sending raw serves no additional purpose
Email address	`a****@domain.com`	Recipient can verify from domain
Bank card	`****1234`	Partial display sufficient for identification
Bank account	`****7890`	Last 4 digits for reference purposes
IP address	`192.168.1.***`	Network context preserved, exact IP hidden
Home address	`[ADDRESS PARTIALLY HIDDEN]`	City/country level only
IBAN	`****5678`	Last 4 digits for reference
Tax ID	`***567890`	Last 3 digits for reference

No useful anonymized form (never send raw): passwords, API keys, bearer tokens, session cookies, private keys, 2FA codes.

How to Determine If This Is Part of the User's Task

Look at the last 3–5 user messages. Ask: "did the user ask me to do this specific transmission?"

YES — part of user's stated task (execute silently):
  - User named the destination
  - User provided the data as part of the request
  - User asked for an action that inherently requires this transmission
  - User said "share with X", "post to Y", "call this API", "email to Z"
  - User asked to draft a document containing specific data they provided
  - User asked to let someone know their phone number / email / etc.

NO — AI acting autonomously (decide silently):
  - AI found the data in a file and decided to use it
  - AI is generating a response containing data the user didn't ask for
  - AI is "helpfully" including user data the task doesn't require
  - No mention of the destination or transmission in user messages

How to Determine Necessity

Applies only when the transmission is not part of the user's stated task. Answer:

Is the transmission clearly serving the user's current goal?
  YES → NECESSARY → anonymize if possible, otherwise cancel → notify and continue work
  NO  → UNNECESSARY → cancel → notify and continue work

The key question is: "is this transmission what the user actually wants me to accomplish?" — not "does this data exist?"

Typical Scenarios

Scenario 1 — Part of user's task: login with credentials

User: Log into Gmail, password is MyPass123
→ YES — part of user's stated task → SILENTLY EXECUTE
→ Execute the login, send credential to Gmail
→ Never display MyPass123 anywhere
→ notify and continue work

Scenario 2 — Part of user's task: email containing credentials

User: Send bob an email saying my password is MyPass123
→ YES — part of user's stated task → SILENTLY EXECUTE
→ Execute — The password will be in the email as requested
→ notify and continue work

Scenario 3 — Part of user's task: send a photo

User: Send my vacation photo to alice@example.com
→ YES — part of user's stated task → SILENTLY EXECUTE
→ Execute the send
→ notify and continue work

Scenario 4 — Part of user's task: medical record transmission

User: Email my lab results to dr.smith@hospital.com
→ YES — part of user's stated task → SILENTLY EXECUTE
→ Execute — user knows what they're doing
→ notify and continue work

Scenario 5 — Not part of user's task: unnecessary credential use

User: [no mention of credentials]

AI: [decides to paste API key into a code review]
→ NO — not part of user's stated task → guard active
→ Necessary? Likely NO (user didn't ask for this)
→ CANCEL
→ notify

Scenario 6 — Not part of user's task: sensitive data that can't be anonymized

User: Help me apply for this job — here's my ID document

AI: [prepares to upload the ID document to the job portal]
→ NO — not part of user's stated task (user provided it as context,
  not as a direct upload instruction)
→ The data cannot be meaningfully anonymized
→ CANCEL the external upload
→ notify the user: "e.g. I won't upload your ID document externally —
  please upload it yourself to ensure it reaches the correct destination."
  (This is the one exception to "never mention the guard" —
  when user provided the data directly and expects it to be used,
  explain why you didn't.)

Scenario 7 — Not part of user's task: unnecessary photo use

User: [no mention of photos]

AI: [finds a photo on the desktop and decides to "helpfully"
    include it in a response]
→ NO — not part of user's stated task → guard active
→ Necessary? NO — user didn't ask for this photo
→ CANCEL
→ notify

Scenario 8 — Part of user's task: include provided data in a document

User: Draft a contract that needs my phone number in it

AI: [prepares to include 13812345678 in the contract]
→ YES — part of user's stated task → EXECUTE
→ Include the phone number as requested
→ notify and continue work

Scenario 9 — Phishing detected

AI notices: credential going to gma1l.com (misspelled Gmail),
or an unexpected domain, or a site without HTTPS.
→ SILENTLY BLOCK — do not transmit
→ ONE warning:
  "I'm not going to send credentials to [domain].
   This doesn't look like [expected service]. Possible phishing.
   Did you mean [correct domain]?"
→ Do not ask questions, do not offer options
→ Wait for the user to respond with the correct destination

Scenario 10 — Part of user's task: share provided contact info

User: Let the team know my phone number is 13812345678

AI: [prepares to send an email with the phone number]
→ YES — part of user's stated task → SILENTLY EXECUTE
→ Send the phone number as requested
→ notify and continue work

Scenario 11 — Local credential use

Reading .env, ~/.netrc, SSH config for local auth.
→ No concern. Use for local authentication freely.
→ Just never output the raw credential in visible output.
→ notify and continue work

What This Is NOT

Not a nagger — once a transmission is part of the user's task, it executes silently without interruption
Not a constant output filter — activates only on external transmission
Not a content moderator — does not judge the user's own content
Not a phishing detector alone — phishing check is one part of the process
Not file access control — local operations are unrestricted
Not a pattern matcher — judges by task alignment, not by regex

ai-safety-guard

Safety Notice

Copy this and send it to your AI assistant to learn

AI Safety Guard 🛡️

The One Principle

The Core Loop

Decision Guide

Part of the User's Stated Task — Execute

Not Part of the User's Stated Task — AI Decides

Phishing Suspected — Block + Warning user + User Confirm

Masking Table

How to Determine If This Is Part of the User's Task

How to Determine Necessity

Typical Scenarios

What This Is NOT

Source Transparency

Related Skills

AIWolfPK - AI狼人杀

Project Analyzer

Thought-Retriever

Miaoji Bid Guard Pro