Repo Sentinel

Everything in a public repo is permanent attacker surface. This skill defines what belongs in a public repo, what does not, how to detect violations across 12 attack surfaces, how to remediate when the boundary is violated, and how to enforce continuously.

Reference files

This skill uses bundled reference files for detailed patterns and templates. Read them as needed:

File	When to read
`references/scan-patterns.md`	When running any audit (fast-path or full) — contains all detection commands
`references/templates.md`	When setting up enforcement, generating .gitignore, or creating CI gates
`references/remediation.md`	When fixing findings or scrubbing history — contains all fix procedures

Prerequisites

gh CLI installed and authenticated (gh auth status must pass) — required for GitHub-specific surface checks (Surface 10)
Active git repository context — the skill operates on git objects; non-git directories are out of scope
trufflehog or gitleaks — optional but strongly recommended for Surface 0 (git history) secret detection with entropy analysis; without them, fall back to git log -p grep patterns from references/scan-patterns.md
Read access to the full git object store — shallow clones (--depth N) will miss history secrets; warn the user if a shallow clone is detected

Calibration Rules

Public vs. private visibility: Apply stricter severity ratings for public repos — findings classified MEDIUM in a private repo (e.g., internal URL in a comment) escalate to HIGH in a public repo. Confirm repo visibility before scoring.
Stack-scoped surfaces: Scope the audit to attack surfaces relevant to the detected tech stack. A static HTML repo has no meaningful Surface 6 (containers) or Surface 7 (lock files) exposure — mark those surfaces N/A rather than penalizing.
N/A handling: Surfaces scored N/A are not penalized and do not lower the overall risk posture. Document N/A surfaces explicitly so the user understands what was skipped.
Tool availability: If trufflehog/gitleaks are unavailable, note this in the audit header and describe the reduced confidence in Surface 0 coverage.
False positive discipline: Flag a finding only when there is evidence of actual exposure, not just pattern proximity. A variable named api_key with a placeholder value is LOW, not CRITICAL.

Foundational Principle

The public/private boundary is a one-way valve. Once a byte reaches a public remote — via push, PR, issue, wiki, release asset, or GitHub Pages — assume it is indexed, cached, mirrored, and archived permanently. git push --force, PR deletion, issue edits, and release removal do NOT guarantee erasure. Scraping infrastructure (GitHub Archive, GH Torrent, Software Heritage, Google Cache, Wayback Machine, and dozens of proprietary security scanners) operates continuously with sub-hour latency.

Decision framework for every artifact:

Question	If YES →	If NO →
Could this help an attacker who has no other access?	EXCLUDE	Continue
Does this reveal internal topology not inferable from public signals?	EXCLUDE	Continue
Does this contain values that grant access to anything?	EXCLUDE	Continue
Does this violate a license obligation or expose legal risk?	EXCLUDE	Continue
Would removing this reduce the repo's utility to legitimate users?	INCLUDE (if above = all NO)	EXCLUDE

When in doubt, exclude. False negatives (leaked secrets) are catastrophic and irreversible. False positives (over-redaction) are trivially correctable.

The 12 Attack Surfaces

Each surface defines what belongs, what doesn't, why it leaks, and how to detect it. Scan commands are in references/scan-patterns.md; remediation procedures in references/remediation.md.

Surface 0 — Git Object Store (History)

The most dangerous and most commonly missed surface. git grep only scans HEAD. An attacker with clone access gets the entire commit history. A file deleted in commit N remains in the object store forever unless explicitly scrubbed.

What leaks: Any secret, credential, internal URL, PII, or sensitive file that was ever committed — even if removed in a subsequent commit. Squash merges don't help; the original commits persist in reflog and may exist in forks.

Audit approach: Run history scans BEFORE working-tree scans. Use trufflehog or gitleaks for verified secret detection with entropy analysis. Fall back to git log -p grep if tools are unavailable. See references/scan-patterns.md § Surface 0.

Surface 1 — Source Code

Belongs: Application logic, algorithms, public API contracts, type definitions, tests with synthetic data, utility libraries, schema-only migrations.

Does NOT belong:

Category	Examples	Why
Hardcoded credentials	`API_KEY = "sk-..."`	Direct access grant
Internal URLs/IPs	`10.0.x.x`, `.internal`, `.corp`	Network topology
Cloud resource IDs	AWS account IDs, GCP project IDs, ARNs, S3 bucket names	Resource targeting
PII / seed data	Real emails, names, phone numbers in fixtures	Privacy violation
Cryptographic material	Private keys, certs, JWTs, signing secrets	Auth bypass
Business logic comments	`// HACK: bypass rate limit for enterprise`	Reveals security gaps
Licensing/billing logic	Entitlement checks, license key validation	Revenue loss
Debug/admin endpoints	`/admin/reset-all`, `/__debug/dump-state`	Privileged access
Vendor workarounds	`// Workaround for Stripe API bug #4521`	Stack disclosure

Surface 2 — Documentation

Belongs: Setup instructions with placeholders, architecture overviews (external-appropriate abstraction), public API reference, contributing guidelines, license, feature-level changelog.

Does NOT belong: Internal URLs, private tracker references (JIRA-xxx, Linear ENG-xxx), team/individual names, deployment runbooks, unredacted postmortems, security architecture details, environment-specific configs.

CLAUDE.md and .claude/ — unconditional exclusion. Both contain comprehensive reconnaissance payloads. Always in .gitignore. No exceptions. No conditional logic.

Surface 3 — Configuration Files

Belongs: .env.example with placeholder values only, toolchain config (tsconfig, eslint, prettier), deployment configs with parameterized values, IaC with variable-only resource names.

Does NOT belong: .env and all .env.* (non-example), configs with embedded secrets, IaC with hardcoded identifiers, SSH config, cloud CLI config, editor config with paths, private registry references in .npmrc.

Surface 4 — .gitignore as Reconnaissance Vector

The .gitignore itself is a public file that leaks information.

Rules: Zero comments (comments are attacker documentation). Extension globs over filenames (*.credentials not oauth-credentials.json). No environment names in paths. No internal doc names. Directory patterns absorb children. Always verify with git ls-files -i --exclude-standard.

.claude/ and CLAUDE.md — always in .gitignore, unconditional.

Surface 5 — CI/CD Pipeline Definitions

Belongs: Workflow definitions, build/test commands, matrix strategies, caching configs.

Does NOT belong: Inline secrets, internal runner labels, private artifact registries, deployment target IPs/hostnames, hardcoded cloud identifiers. All secrets via platform secret store (${{ secrets.X }} for GitHub Actions).

Surface 6 — Container & IaC Definitions

Dockerfiles — safe: Public base images, build steps, EXPOSE ports, multi-stage patterns, non-secret ARG/ENV.

Dockerfiles — exclude: ARG/ENV with credentials, COPY of secret files, internal base images, infrastructure-revealing comments.

Docker Compose: All secrets via env_file or external secret management. Service names are public — don't reveal non-public capabilities. Volume mounts must not reference secret paths.

Terraform/IaC: All identifiers via variables with no real defaults. State files (*.tfstate) ALWAYS excluded. Variable files (*.tfvars) excluded with example templates.

Surface 7 — Dependencies & Lock Files

Often overlooked. Lock files and manifests leak internal infrastructure.

What leaks:

Category	Examples	Why
Private registry URLs	`registry.internal.corp` in lock files	Internal infra
Internal package names	`@corp-internal/auth-sdk` in package.json	Org structure
Git+SSH dependencies	`git+ssh://...private-org/internal-lib.git`	Private repo exposure
Pinned internal forks	Version pins revealing upstream vuln workarounds	Patch intelligence

Surface 8 — Binary & Large File Artifacts

What leaks:

Category	Examples	Why
Compiled binaries	May embed paths, credentials at compile time	Credential extraction
Database dumps	`.sql`, `.sqlite`, `.db` with real data	Data exposure
Jupyter notebook outputs	API responses, tokens, internal URLs in cell output	Credential + topology
Image/PDF metadata	EXIF data, PDF author fields, internal paths	Author/org enumeration
Archive files	`.zip`, `.tar.gz` bundling secrets	Nested secret exposure

Surface 9 — Metadata & Git History

Commit messages: Don't reference what was vulnerable (Fix auth bypass in /admin/reset), only what changed. Don't paste error messages with credentials or internal stack traces.

PR descriptions / issue templates: Don't prompt users to paste credentials. PR templates should not reference internal processes. Bug reports: sanitized repro steps, not raw logs.

Branch names: Avoid names revealing unannounced features or internal codenames.

Release assets: Must not bundle config files, .env, or credentials.

Surface 10 — Platform-Specific Metadata (GitHub/GitLab)

Artifact	Risk	Mitigation
`CODEOWNERS`	Leaks team structure and responsibility mapping	Use team handles, not individuals
`.github/FUNDING.yml`	Exposes financial platform accounts	Verify intentional disclosure
GitHub Actions `@main` refs	Supply chain attack vector	Pin to full SHA, not tag
Workflow `permissions: write-all`	Over-privilege	Use minimum required permissions
Wiki pages	Separately cloneable, often contain sensitive runbooks	Audit or disable
GitHub Discussions	Accidental leak surface	Monitor or disable
`dependabot.yml`	Private registry references	Parameterize registries
Repository topics/description	Internal project codenames	Review before public
GitHub Pages config	Reveals deployment targets	Verify intentional

Surface 11 — License & Legal Compliance

Check	Risk	Fix
Missing LICENSE file	Defaults to "all rights reserved"	Add explicit license
License incompatibility	GPL dep in MIT project	Audit with license-checker/pip-licenses
Internal copyright headers	Reveals parent company/acquisition	Genericize or remove
Missing NOTICE file	Required by Apache 2.0	Generate from dependencies
CLA/DCO requirements	Legal risk for external contributions	Add if accepting PRs
Third-party attribution	License violation	Audit dependency licenses

Dependency license audit commands:

# Node
npx license-checker --summary 2>/dev/null
# Python
pip-licenses 2>/dev/null
# Rust
cargo license 2>/dev/null

Flag GPL/AGPL contamination if the target license is permissive (MIT, BSD, Apache).

Private registry search patterns — grep lock files and configs:

Files: package-lock.json, poetry.lock, Cargo.lock, pip.conf, pyproject.toml, .npmrc, .yarnrc
Grep for: @company, internal-registry, private-pypi, artifactory, nexus, verdaccio

Copyright header check: If the license requires file-level headers (Apache 2.0: recommended; MIT: not required), verify presence in source files and genericize internal copyright notices that reveal parent company or acquisition history.

Surface 12 — Community Surface

Required for credible open-source projects accepting contributions:

Artifact	Purpose	Risk if missing/wrong
`SECURITY.md`	Responsible disclosure policy	Signals immaturity to attackers
Issue templates	Guide reporters away from pasting secrets	Accidental credential leaks
PR templates	Warn contributors about sensitive data	Topology leaks in diffs
`CONTRIBUTING.md`	Set expectations without revealing internals	Internal tooling exposure
Bot configs	`.github/stale.yml`, Probot	Internal policy leakage

Severity Classification

All findings are classified by severity. The classification drives action priority:

Severity	Criteria	Action
CRITICAL	Active credential exposure, private key, auth token	Block push. Fix immediately.
HIGH	Infrastructure/topology enabling targeted attack	Resolve before push.
MEDIUM	Information leakage aiding reconnaissance	Fix in next commit.
LOW	Hygiene, style, redundancy issues	Fix at convenience.

CRITICAL and HIGH in git history → full history scrub + credential rotation required.

Operations

Fast-Path Audit (Staged Changes Only)

Use when pushing a single file or small changeset. Scans only staged changes, not the full repo. Read references/scan-patterns.md § Fast-Path for the commands.

Full Repo Audit (20+ checks)

Run before making any repo public or before first push to a public remote. Read references/scan-patterns.md § Full Audit for the complete 20-check sequence.

Quick-Reference Scan Commands

The most critical inline checks. Full pattern set is in references/scan-patterns.md.

# 1. Secrets in code
git grep -rnE '(api[_-]?key|api[_-]?secret|access[_-]?token|auth[_-]?token|secret[_-]?key|private[_-]?key|password|passwd|credential)\s*[:=]\s*["\x27][^\s"'\'']{8,}' -- ':!*.lock' ':!node_modules' ':!vendor'

# 2. Internal URLs
git grep -rnE 'https?://[^\s)>"]*\.(internal|corp|local|intranet|private)' -- ':!*.lock'

# 3. Private IPs
git grep -rnE '(10\.\d+\.\d+\.\d+|172\.(1[6-9]|2\d|3[01])\.\d+\.\d+|192\.168\.\d+\.\d+)' -- ':!*.lock' ':!node_modules'

# 4. Cloud resource identifiers
git grep -rnE '(arn:aws:|projects/[a-z][\w-]+/locations|/subscriptions/[0-9a-f-]{36})' -- ':!*.lock'

# 5. Connection strings
git grep -rnE '(mongodb|postgres|mysql|redis|amqp|mssql)(\+\w+)?://[^${\s]+@' -- ':!*.lock'

# 6. .env files tracked
git ls-files | grep -iE '\.env(\.|$)' | grep -v '\.example$\|\.template$'

# 7. Credential files tracked
git ls-files | grep -iE '\.(pem|key|p12|pfx|keystore|jks|credentials)$'

# 8. .gitignore leakage
grep -n '^#\|secret\|credential\|oauth\|service.account\|password\|token' .gitignore 2>/dev/null

# 9. .claude/ tracked
git ls-files | grep '\.claude/'

# 10. Tracked files contradicting .gitignore
git ls-files -i --exclude-standard 2>/dev/null

# 11. Sensitive TODO/FIXME/HACK comments
git grep -rnE '(TODO|FIXME|HACK|XXX)\b.*\b(security|auth|bypass|vulnerability|exploit|hack|password|credential|secret|token|admin)' -- ':!*.lock'

# 12. CI/CD secrets inline
git grep -rnE '(password|token|key|secret)\s*[:=]\s*[^\s${\[]' -- '.github/workflows/' '.gitlab-ci.yml' 'Jenkinsfile' '.circleci/'

# 13. Internal URLs in docs
git grep -nE 'https?://[^\s)>]*\.(internal|corp|local|intranet|private)' -- '*.md' '*.rst' '*.txt' '*.adoc'

# 14. Private tracker references in docs
git grep -nE '(JIRA|LINEAR|ASANA|SHORTCUT|CLUBHOUSE|NOTION)-?\s*[A-Z]*-?\d+' -- '*.md' '*.rst' '*.txt'

# 15. Person names in docs
git grep -nE '(@[a-zA-Z][\w-]+|(ask|contact|ping|reach out to)\s+[A-Z][a-z]+)' -- '*.md' '*.rst' '*.txt'

# 16. CI hardcoded IPs
git grep -nE '\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b' -- '.github/workflows/*.yml' '.gitlab-ci.yml'

# 17. .env.example real values
grep -E '=' .env.example 2>/dev/null | grep -vE '=(your-|placeholder|changeme|xxx|example|TODO|REPLACE|""|\x27\x27|$)'

# 18. AWS account IDs
git grep -nE '\b\d{12}\b' -- '*.ts' '*.js' '*.py' '*.yaml' '*.yml' '*.json' '*.tf' | grep -iE '(account|arn|aws)'

Output format:

REPO SENTINEL AUDIT — <repo> — <date>

[CRITICAL — Direct credential exposure]
  src/config.ts:14 — API_KEY = "sk-live-..." → parameterize
  .env.production — tracked, contains real values → git rm --cached + history scrub

[HIGH — Infrastructure disclosure]
  docker-compose.yml:8 — redis://admin:pass@10.0.3.42:6379 → parameterize
  package-lock.json:892 — resolved: "https://registry.internal.corp/..." → remove internal dep

[MEDIUM — Information leakage]
  .gitignore:24 — oauth-credentials.json → replace with *.credentials.json
  README.md:45 — "See https://wiki.internal.corp/auth-design" → remove
  CODEOWNERS:3 — @john-smith → replace with @team-handle

[LOW — Hygiene]
  .gitignore:1-8 — verbose comment header → remove all comments
  LICENSE — missing → add appropriate license file

[TRACKED-BUT-IGNORED CONTRADICTIONS]
  .env.local — in .gitignore but tracked → git rm --cached

[MISSING FROM .gitignore]
  .claude/ — directory exists, not ignored
  *.sqlite — database files present, not ignored

[LICENSE COMPLIANCE]
  GPL-3.0 dependency in MIT-licensed project: package-x → evaluate compatibility

[ENFORCEMENT STATUS]
  Pre-commit hooks: NOT CONFIGURED → see references/templates.md
  CI secret scanning: NOT CONFIGURED → see references/templates.md
  GitHub secret scanning: UNKNOWN → enable in repo settings

Pre-Release Audit Mode (4-Stage DAG)

When preparing a repo for open-source release, run this 4-stage pre-release audit instead of the surface-based audit. Each stage emits PASS / WARN / FAIL with actionable remediation. Hard blockers in stages 1–3 halt the pipeline. Stage 4 produces advisory output.

Stage 1: Sensitive Assets        [HARD BLOCKER] → Surfaces 0–4, 8–9
Stage 2: Legal & Compliance      [HARD BLOCKER] → Surface 11
Stage 3: Public Surface Hygiene  [HARD BLOCKER] → Surfaces 4–7, 9–10
Stage 4: Contribution & Release  [SOFT BLOCKER] → Surface 12 + Pre-Release Checklist

Run stages sequentially. Report results in a structured audit table at the end.

Continuous Enforcement Setup

Shift-left prevention is the highest-leverage action. Read references/templates.md for ready-to-use pre-commit config, GitHub Actions workflow, and .gitignore generator.

Pre-Release Readiness Checklist

Run during Stage 4 of the Pre-Release Audit Mode, or standalone before any public release. All items are soft blockers — failures produce advisory output, not hard halts.

§4.1 Documentation Completeness

File	Required	Check
`README.md`	YES	Has install + quickstart sections
`CONTRIBUTING.md`	YES	Fork/branch strategy, dev setup
`CODE_OF_CONDUCT.md`	YES	Adopted standard (Contributor Covenant)
`CHANGELOG.md`	RECOMMENDED	Keep-a-changelog format
`LICENSE`	YES	Verified in Surface 11
`SECURITY.md`	RECOMMENDED	Disclosure process + contact
`ARCHITECTURE.md` or `docs/`	RECOMMENDED	Module overview
`.github/ISSUE_TEMPLATE/`	RECOMMENDED	Bug + feature templates
`.github/PULL_REQUEST_TEMPLATE.md`	RECOMMENDED	PR checklist

§4.2 Code Quality Gates

Linter config: .eslintrc*, ruff.toml, pyproject.toml [tool.ruff], .clippy.toml
Formatter config: .prettierrc*, pyproject.toml [tool.black], rustfmt.toml
Pre-commit: .pre-commit-config.yaml
Type checking: tsconfig.json (strict), py.typed marker, mypy/pyright config

§4.3 Test Infrastructure

Test runner configured and documented
CI pipeline exists (.github/workflows/, .gitlab-ci.yml)
Test data is synthetic (not production-derived)
Smoke test or single-command verify path documented

§4.4 API Surface

Public API explicitly demarcated (__all__, exports, pub)
No internal implementation leaked across module boundaries
Configuration via env vars / config files, not hardcoded constants

§4.5 Package Metadata

Check manifest completeness across: package.json, pyproject.toml, Cargo.toml, *.csproj

Required fields: name, version, description, repository, homepage, keywords, author, license

§4.6 Reproducible Builds

Lock files committed
Toolchain versions documented: .tool-versions, .python-version, .nvmrc, rust-toolchain.toml
CI runner images pinned

§4.7 Binary Asset Policy

No files >1MB without Git LFS
No build artifacts committed
.gitattributes for LFS if needed

§4.8 Community Setup

Issue labels defined: good-first-issue, help-wanted, bug, enhancement
Discussions or external channel linked
Maintainer expectations documented

History Contamination Remediation

When secrets have already been committed. Read references/remediation.md for the full triage decision tree, git filter-repo commands, and post-scrub protocol.

Quick-Reference Remediation

Triage decision table:

Pushed to public remote?	Contains real credentials?	Action
No	Any	`git rm --cached` + fix `.gitignore`
Yes	No (placeholder)	`git rm --cached` + fix `.gitignore`. Scrub optional.
Yes	Yes	Full history scrub + credential rotation. Assume compromise.

git filter-repo (preferred):

cp -r .git .git-backup

# By path
git filter-repo --invert-paths --path <file> --force

# By glob
git filter-repo --invert-paths --path-glob '*.pem' --force

# By regex
git filter-repo --invert-paths --path-regex '.*secret.*' --force

# Re-add remote (filter-repo strips it)
git remote add origin <url>
git push --force --all && git push --force --tags

BFG Repo-Cleaner (fallback):

java -jar bfg.jar --delete-files <filename> .git
git reflog expire --expire=now --all && git gc --prune=now --aggressive

Post-scrub protocol (non-optional):

Rotate every exposed credential — scrubbing does not un-expose. GitHub caches objects ~90 days. Mirrors and forks retain indefinitely.
Verify: git log --all --full-history -- <path> must return empty.
Update all ignore/exclude rules before next commit.
For severe exposure: consider repo deletion + recreation. Contact GitHub support for cache invalidation.
Rotate CI/CD secrets independently — pipeline stores are unaffected by git history operations.
Document incident internally: what was exposed, how long, which remotes, what was rotated.

.gitignore Generation

Generate a complete, opinionated .gitignore tailored to detected project type with all hygiene rules baked in. Read references/templates.md § .gitignore Generator.

Limitations

History scrubbing does not guarantee removal of exposure. Force-push is required, and external mirrors (forks, GitHub Archive, Software Heritage) retain history indefinitely regardless of local operations.
External mirrors, caches, and search engine indexes cannot be verified as de-indexed after content removal.
Single-repo scope only — not designed for monorepo audits without adaptation. Cross-package secret propagation requires separate analysis per package root.
GitHub-specific checks (branch protection, secret scanning alerts, security advisories) require the gh CLI with authenticated access. Without it, Surface 10 coverage is reduced.
Secret scanning depth depends on available tooling. trufflehog and gitleaks provide verified detection with entropy analysis; manual regex patterns used as fallback have higher false-positive rates and miss obfuscated credentials.
Artifact decisions for package registry publishing (npm, PyPI, crates) have ecosystem-specific norms that differ from source repo inclusion rules — apply ecosystem conventions when auditing published artifacts.