Code Validation

Validates code changes through automated scanning and LLM-guided heuristics to detect:

Test disabling patterns (skip, only, todo)
Secret exposure (hardcoded credentials, API keys)
Path portability issues (user-specific paths)
Dangerous security flags
Large deletions
Dependency/import changes
Broad exception handling

When to Use

Execute code-validation as part of QA validation protocol:

After Action Agent completes implementation
Before approving changes for merge
When reviewing diffs for red flags
During final validation phase

Validation Workflow

1. Automated Scanning (Scripts)

Run scripts first for deterministic, fast checks:

For Git Diffs (comparing branches):

# Compare feature branch against main
python scripts/diff_analyzer.py --base main --format json

# Compare specific commit range
python scripts/diff_analyzer.py --range HEAD~5..HEAD --format json

# Save report to file
python scripts/diff_analyzer.py --base main --output validation-report.json

For Static File Analysis (staged changes or specific files):

# Scan specific files
python scripts/static_analyzer.py src/app.py src/utils.py --format json

# Scan entire directory
python scripts/static_analyzer.py ./src --format json

# Exclude patterns
python scripts/static_analyzer.py ./src --exclude node_modules .git --format json

# Save report
python scripts/static_analyzer.py ./src --output scan-report.json

2. Interpret Scan Results

Parse JSON output and evaluate findings:

Finding Structure:

{
  "category": "test_disabling|secret_exposure|path_portability|security_flags|large_deletion|dependency_change",
  "severity": "CRITICAL|HIGH|MEDIUM|LOW",
  "file": "path/to/file.py",
  "line": 42,
  "pattern": "regex pattern matched",
  "context": "actual line content",
  "message": "human-readable description"
}

Severity Guidelines:

CRITICAL: Secrets, user-specific paths in docs - BLOCK merge immediately
HIGH: Test disabling, security flags, user-specific paths in code - Require Action Agent fixes
MEDIUM: Large deletions, dependency changes, broad exceptions - Review and justify
LOW: Minor issues - Optional fixes

3. LLM Heuristic Review (Context-Dependent)

After automated scanning, apply human judgment for:

Test Assertion Weakening

Scripts cannot detect semantic changes. Manually review:

Reduced assertion count without clear reason
Replaced specific assertions with generic checks
Removed edge case validations
Changed from behavior validation to mock validation only

Red Flags:

// Before
expect(response.data).toMatchObject({
  id: expect.any(String),
  status: 'active',
  count: expect.any(Number)
});

// After - WEAKENED
expect(response.data).toBeDefined(); // ❌ Lost specificity

Broad Try/Catch Appropriateness

Evaluate if exception handling is justified:

Top-level error boundaries: Often acceptable
Business logic: Usually inappropriate
Missing error logging/reporting: Red flag
Swallowing errors without assertions in tests: Red flag

When Acceptable:

// Top-level boundary
app.use((err, req, res, next) => {
  logger.error(err);
  res.status(500).json({ error: 'Internal error' });
});

Red Flag:

// Business logic swallowing errors
try {
  await processPayment(data);
} catch (e) {
  // ❌ Silent failure, no logging
}

Scope Creep vs Legitimate Refactoring

Assess if changes align with issue scope:

Issue describes feature X, but changes include unrelated Y
"While I was here" refactoring without issue reference
Architecture changes not mentioned in acceptance criteria

Legitimate:

Refactoring directly related to implementation
Fixing bugs discovered during implementation (document in scratch notes)
Updating tests to match new implementation

Scope Creep:

Reformatting unrelated files
Adding features not in issue
Changing patterns/conventions beyond issue scope

Architecture Alignment

Verify changes match current production architecture:

Check against ADRs referenced in .project-context.md
Verify stack matches documented tech stack
Confirm patterns follow project standards
Ensure deprecated approaches aren't reintroduced

4. Generate Validation Report

Combine automated findings with heuristic review:

## Code Validation Results for [ISSUE-ID]

### Automated Scan Summary
- Files Changed: X
- Total Findings: Y
- CRITICAL: Z findings
- HIGH: A findings
- MEDIUM: B findings

### Critical Findings (BLOCK)
[List CRITICAL severity findings with file:line references]

### High Priority Findings (FIX REQUIRED)
[List HIGH severity findings]

### Heuristic Review
- Test Assertion Quality: [PASS/FAIL with specifics]
- Exception Handling: [PASS/WARN/FAIL with examples]
- Scope Alignment: [PASS/WARN/FAIL with details]
- Architecture Compliance: [PASS/FAIL with ADR references]

### Recommendation
[APPROVED | CHANGES REQUIRED | BLOCKED]

### Action Items
[Specific fixes needed with file:line references]

Script Output Format

Both scripts output JSON with this structure:

{
  "commit_range": "main..HEAD",
  "files_scanned": 42,
  "files_changed": 15,
  "total_findings": 8,
  "findings_by_severity": {
    "CRITICAL": 1,
    "HIGH": 3,
    "MEDIUM": 4,
    "LOW": 0
  },
  "findings": [
    {
      "category": "secret_exposure",
      "severity": "CRITICAL",
      "file": "src/config.py",
      "line": 12,
      "pattern": "...",
      "context": "API_KEY = 'sk_live_abc123...'",
      "message": "Potential hardcoded secret"
    }
  ],
  "summary": {
    "test_disabling": 2,
    "secret_exposure": 1,
    "path_portability": 3,
    "security_flags": 1,
    "dependency_changes": 1,
    "large_deletions": 0
  }
}

Red Flag Categories

Test Disabling (HIGH)

Patterns indicating tests were disabled rather than fixed:

.skip(), .only(), .todo()
xit(), xdescribe(), fit(), fdescribe()
@pytest.skip, @unittest.skip

Action: Require Action Agent to fix tests or justify with comment

Secret Exposure (CRITICAL)

Hardcoded credentials or API keys:

API keys, tokens, passwords
AWS credentials
GitHub tokens
Stripe keys

Action: BLOCK merge, require environment variables

Path Portability (CRITICAL in docs, HIGH in code)

User-specific paths that won't work for other developers:

/Users/username/
/home/username/
C:\Users\username\
~/Desktop, ~/Documents

Action: BLOCK if in documentation, require repo-relative paths

Security Flags (HIGH)

Commands that weaken security:

--no-verify, --insecure, -k
chmod 777
StrictHostKeyChecking no
--allow-root

Action: Require justification comment or removal

Large Deletions (MEDIUM)

Files with >100 lines removed:

May indicate legitimate refactoring
Could hide removed validation logic
Might remove important error handling

Action: Manual review to verify deletions are intentional

Dependency Changes (MEDIUM)

New imports or package additions:

Track new dependencies for security review
Verify necessity for issue scope
Check for unnecessary additions

Action: Verify in package.json/requirements.txt, run security audit

Integration with QA Protocol

Execute code-validation at Step 3: Change Review (Diff) in QA workflow:

Switch to feature branch
Run code-validation scripts
Interpret automated findings
Apply LLM heuristics
Continue with Claude Code Review (MCP)
Proceed with remaining QA steps

If CRITICAL or multiple HIGH findings:

BLOCK validation
Report to Traycer with specific file:line references
Delegate to Action Agent for fixes
Re-run validation after fixes

Resources

Scripts:
- scripts/diff_analyzer.py - Analyzes git diffs for red flags
- scripts/static_analyzer.py - Scans files without git context

Notes

Scripts are deterministic and fast; use them first
LLM heuristics handle context-dependent evaluation
Always provide file:line references in reports
CRITICAL findings must block merge
Document justified exceptions in code comments

code-validation

Safety Notice

Copy this and send it to your AI assistant to learn