Git Repository Auditor

What This Does

A CLI tool to audit Git repositories for security issues, code quality problems, and repository health. Scan repositories for secrets, large files, sensitive data, and common security anti-patterns.

Key features:

Secrets detection: Scan Git history for API keys, passwords, tokens, and other sensitive data using regex patterns
Large file detection: Identify large files (>10MB) in repository history that may impact performance
Security anti-patterns: Detect hardcoded credentials, insecure configuration files, and dangerous permissions
Repository health: Check for merge conflicts, stale branches, and other repository hygiene issues
Compliance reporting: Generate security compliance reports for audits and team reviews
Multiple output formats: Human-readable, JSON, and CSV output for integration with other tools
Custom scanning: Configure custom regex patterns and file extensions to scan
Historical analysis: Scan entire Git history or specific time ranges
Remediation guidance: Suggest fixes for identified security issues

When To Use

You need to audit a Git repository for security compliance
You want to detect accidental commits of secrets or sensitive data
You're preparing a repository for open-source release
You need to identify performance issues (large files in history)
You're onboarding new developers and want to ensure repository hygiene
You need to generate security audit reports for compliance requirements
You want to automate security scanning in CI/CD pipelines
You're cleaning up old repositories and need to identify issues

Usage

Basic commands:

# Scan current directory repository
python3 scripts/main.py scan .

# Scan specific repository path
python3 scripts/main.py scan /path/to/repo

# Scan with custom secrets patterns file
python3 scripts/main.py scan . --patterns custom-patterns.json

# Generate JSON report for automation
python3 scripts/main.py scan . --json

# Check only for large files (>50MB)
python3 scripts/main.py scan . --check large-files --threshold 50

# Scan specific branch or commit range
python3 scripts/main.py scan . --branch main --since "2024-01-01"

# Generate remediation report with suggested fixes
python3 scripts/main.py scan . --remediation

# List all branches with last commit age
python3 scripts/main.py branches .

Examples

Example 1: Basic security scan

python3 scripts/main.py scan ~/projects/my-app

Output:

🔍 Scanning repository: /home/user/projects/my-app
📊 Repository info: 247 commits, 5 branches, 3 contributors

🔐 SECURITY ISSUES FOUND (3):
⚠️  High: AWS_ACCESS_KEY_ID found in commit abc123 (2024-02-15)
    File: config/old-config.env
    Pattern: AWS_ACCESS_KEY_ID=AKIA.*
    Remediation: Rotate key immediately, remove from history with BFG

⚠️  Medium: Hardcoded database password in commit def456 (2024-01-20)
    File: src/database.js
    Pattern: password: "secret123"
    Remediation: Move to environment variables, use secret manager

⚠️  Low: Private key file extension in commit ghi789 (2023-12-05)
    File: backup/id_rsa.old
    Pattern: Private key file (.pem, .key, .ppk, id_rsa)
    Remediation: Remove file from repository history

💾 LARGE FILES FOUND (2):
📦 42MB: assets/video/demo.mp4 (commit xyz123)
📦 18MB: database/backup.sql (commit uvw456)

✅ Repository health: Good
⏰ Stale branches: 2 branches older than 90 days

Example 2: JSON output for CI/CD integration

python3 scripts/main.py scan . --json > security-report.json

Output (excerpt):

{
  "repository": "/home/user/projects/my-app",
  "scan_date": "2024-03-06T10:30:00Z",
  "security_issues": [
    {
      "severity": "high",
      "type": "aws_access_key",
      "commit": "abc123",
      "date": "2024-02-15",
      "file": "config/old-config.env",
      "pattern": "AWS_ACCESS_KEY_ID=AKIA.*",
      "remediation": "Rotate key immediately, remove from history with BFG"
    }
  ],
  "large_files": [
    {
      "size_mb": 42,
      "path": "assets/video/demo.mp4",
      "commit": "xyz123"
    }
  ],
  "summary": {
    "total_issues": 3,
    "by_severity": {"high": 1, "medium": 1, "low": 1},
    "large_files_count": 2,
    "total_size_mb": 60
  }
}

Example 3: Check repository health

python3 scripts/main.py health .

Output:

📈 Repository Health Report: /home/user/projects/my-app

📊 Basic Metrics:
- Commits: 1,247
- Branches: 12 (3 active, 9 stale)
- Contributors: 8
- First commit: 2022-05-15
- Last commit: 2024-03-06

⚠️  Health Issues:
- Stale branches: 9 branches with no commits in >90 days
- Large files: 2 files >10MB in history
- Binary files: 45 binary files (consider Git LFS)
- Merge conflicts: 3 unresolved merge markers in code

✅ Good Practices:
- .gitignore present and comprehensive
- No secrets detected in recent commits
- Regular commit activity (avg 15 commits/week)
- Meaningful commit messages (87% good)

💡 Recommendations:
1. Clean up stale branches: git branch -d branch1 branch2...
2. Consider Git LFS for binary files
3. Resolve merge conflicts in: src/app.js, config/settings.yaml

Example 4: Large files detection only

python3 scripts/main.py scan . --check large-files --threshold 20

Output:

💾 Large Files (>20MB) in Repository History:

1. assets/videos/presentation.mp4
   - Size: 42MB
   - Commit: xyz123 (2024-01-15)
   - Author: Jane Doe
   - Message: "Add presentation video"

2. database/backup/archive.sql.gz
   - Size: 38MB
   - Commit: uvw456 (2023-12-20)
   - Author: John Smith
   - Message: "Database backup"

Total: 2 files, 80MB
Recommendation: Consider using Git LFS for files >20MB

Requirements

Git 2.20+ installed and available in PATH
Python 3.x
No external Python dependencies required (uses standard library)

Limitations

Scanning large repositories with extensive history may be slow
Secrets detection uses regex patterns; may have false positives/negatives
Does not automatically remove secrets from history (requires manual remediation)
Limited to Git repositories (does not work with other VCS)
No support for scanning encrypted repositories
Large file detection scans entire history; may miss files in ignored directories
Does not integrate with external secret managers (Vault, AWS Secrets Manager, etc.)
No real-time monitoring; scans only historical commits
Limited to text file scanning; cannot detect secrets in binary files
May not detect all secret patterns; custom patterns may be needed
Performance depends on repository size and history depth
No support for scanning Git submodules automatically
No built-in integration with secret management systems (Vault, AWS Secrets Manager)
Limited to text file scanning; cannot detect secrets in binary files
No support for custom Git hooks or pre-commit integration
Performance may be impacted on repositories with millions of commits
No support for distributed scanning across multiple repositories
Limited error handling for corrupted Git repositories
No support for scanning Git worktrees or shallow clones
Cannot scan remote repositories without local clone
No built-in notification system for new issues

Directory Structure

The tool works with any local Git repository. No special configuration directories are required, but you can provide custom patterns files for secrets detection.

Error Handling

Invalid repository paths show helpful error messages with suggestions
Git command failures show the underlying error and suggest troubleshooting steps
Permission errors suggest checking repository access rights
Pattern file parsing errors show line numbers and validation issues
Memory errors suggest using smaller commit ranges or more specific scanning

Contributing

This is a skill built by the Skill Factory. Issues and improvements should be reported through the OpenClaw project.

git-repo-auditor

Safety Notice

Copy this and send it to your AI assistant to learn

Git Repository Auditor

What This Does

When To Use

Usage

Examples

Example 1: Basic security scan

Example 2: JSON output for CI/CD integration

Example 3: Check repository health

Example 4: Large files detection only

Requirements

Limitations

Directory Structure

Error Handling

Contributing

Source Transparency

Related Skills

Code Review Inspector

Vorim AI — Agent Identity & Trust

Meta Ads CLI - control your ad campaigns

agent-bom registry