Code Validation
Validates code changes through automated scanning and LLM-guided heuristics to detect:
- Test disabling patterns (skip, only, todo)
- Secret exposure (hardcoded credentials, API keys)
- Path portability issues (user-specific paths)
- Dangerous security flags
- Large deletions
- Dependency/import changes
- Broad exception handling
When to Use
Execute code-validation as part of QA validation protocol:
- After Action Agent completes implementation
- Before approving changes for merge
- When reviewing diffs for red flags
- During final validation phase
Validation Workflow
1. Automated Scanning (Scripts)
Run scripts first for deterministic, fast checks:
For Git Diffs (comparing branches):
# Compare feature branch against main
python scripts/diff_analyzer.py --base main --format json
# Compare specific commit range
python scripts/diff_analyzer.py --range HEAD~5..HEAD --format json
# Save report to file
python scripts/diff_analyzer.py --base main --output validation-report.json
For Static File Analysis (staged changes or specific files):
# Scan specific files
python scripts/static_analyzer.py src/app.py src/utils.py --format json
# Scan entire directory
python scripts/static_analyzer.py ./src --format json
# Exclude patterns
python scripts/static_analyzer.py ./src --exclude node_modules .git --format json
# Save report
python scripts/static_analyzer.py ./src --output scan-report.json
2. Interpret Scan Results
Parse JSON output and evaluate findings:
Finding Structure:
{
"category": "test_disabling|secret_exposure|path_portability|security_flags|large_deletion|dependency_change",
"severity": "CRITICAL|HIGH|MEDIUM|LOW",
"file": "path/to/file.py",
"line": 42,
"pattern": "regex pattern matched",
"context": "actual line content",
"message": "human-readable description"
}
Severity Guidelines:
- CRITICAL: Secrets, user-specific paths in docs - BLOCK merge immediately
- HIGH: Test disabling, security flags, user-specific paths in code - Require Action Agent fixes
- MEDIUM: Large deletions, dependency changes, broad exceptions - Review and justify
- LOW: Minor issues - Optional fixes
3. LLM Heuristic Review (Context-Dependent)
After automated scanning, apply human judgment for:
Test Assertion Weakening
Scripts cannot detect semantic changes. Manually review:
- Reduced assertion count without clear reason
- Replaced specific assertions with generic checks
- Removed edge case validations
- Changed from behavior validation to mock validation only
Red Flags:
// Before
expect(response.data).toMatchObject({
id: expect.any(String),
status: 'active',
count: expect.any(Number)
});
// After - WEAKENED
expect(response.data).toBeDefined(); // ❌ Lost specificity
Broad Try/Catch Appropriateness
Evaluate if exception handling is justified:
- Top-level error boundaries: Often acceptable
- Business logic: Usually inappropriate
- Missing error logging/reporting: Red flag
- Swallowing errors without assertions in tests: Red flag
When Acceptable:
// Top-level boundary
app.use((err, req, res, next) => {
logger.error(err);
res.status(500).json({ error: 'Internal error' });
});
Red Flag:
// Business logic swallowing errors
try {
await processPayment(data);
} catch (e) {
// ❌ Silent failure, no logging
}
Scope Creep vs Legitimate Refactoring
Assess if changes align with issue scope:
- Issue describes feature X, but changes include unrelated Y
- "While I was here" refactoring without issue reference
- Architecture changes not mentioned in acceptance criteria
Legitimate:
- Refactoring directly related to implementation
- Fixing bugs discovered during implementation (document in scratch notes)
- Updating tests to match new implementation
Scope Creep:
- Reformatting unrelated files
- Adding features not in issue
- Changing patterns/conventions beyond issue scope
Architecture Alignment
Verify changes match current production architecture:
- Check against ADRs referenced in
.project-context.md - Verify stack matches documented tech stack
- Confirm patterns follow project standards
- Ensure deprecated approaches aren't reintroduced
4. Generate Validation Report
Combine automated findings with heuristic review:
## Code Validation Results for [ISSUE-ID]
### Automated Scan Summary
- Files Changed: X
- Total Findings: Y
- CRITICAL: Z findings
- HIGH: A findings
- MEDIUM: B findings
### Critical Findings (BLOCK)
[List CRITICAL severity findings with file:line references]
### High Priority Findings (FIX REQUIRED)
[List HIGH severity findings]
### Heuristic Review
- Test Assertion Quality: [PASS/FAIL with specifics]
- Exception Handling: [PASS/WARN/FAIL with examples]
- Scope Alignment: [PASS/WARN/FAIL with details]
- Architecture Compliance: [PASS/FAIL with ADR references]
### Recommendation
[APPROVED | CHANGES REQUIRED | BLOCKED]
### Action Items
[Specific fixes needed with file:line references]
Script Output Format
Both scripts output JSON with this structure:
{
"commit_range": "main..HEAD",
"files_scanned": 42,
"files_changed": 15,
"total_findings": 8,
"findings_by_severity": {
"CRITICAL": 1,
"HIGH": 3,
"MEDIUM": 4,
"LOW": 0
},
"findings": [
{
"category": "secret_exposure",
"severity": "CRITICAL",
"file": "src/config.py",
"line": 12,
"pattern": "...",
"context": "API_KEY = 'sk_live_abc123...'",
"message": "Potential hardcoded secret"
}
],
"summary": {
"test_disabling": 2,
"secret_exposure": 1,
"path_portability": 3,
"security_flags": 1,
"dependency_changes": 1,
"large_deletions": 0
}
}
Red Flag Categories
Test Disabling (HIGH)
Patterns indicating tests were disabled rather than fixed:
.skip(),.only(),.todo()xit(),xdescribe(),fit(),fdescribe()@pytest.skip,@unittest.skip
Action: Require Action Agent to fix tests or justify with comment
Secret Exposure (CRITICAL)
Hardcoded credentials or API keys:
- API keys, tokens, passwords
- AWS credentials
- GitHub tokens
- Stripe keys
Action: BLOCK merge, require environment variables
Path Portability (CRITICAL in docs, HIGH in code)
User-specific paths that won't work for other developers:
/Users/username//home/username/C:\Users\username\~/Desktop,~/Documents
Action: BLOCK if in documentation, require repo-relative paths
Security Flags (HIGH)
Commands that weaken security:
--no-verify,--insecure,-kchmod 777StrictHostKeyChecking no--allow-root
Action: Require justification comment or removal
Large Deletions (MEDIUM)
Files with >100 lines removed:
- May indicate legitimate refactoring
- Could hide removed validation logic
- Might remove important error handling
Action: Manual review to verify deletions are intentional
Dependency Changes (MEDIUM)
New imports or package additions:
- Track new dependencies for security review
- Verify necessity for issue scope
- Check for unnecessary additions
Action: Verify in package.json/requirements.txt, run security audit
Integration with QA Protocol
Execute code-validation at Step 3: Change Review (Diff) in QA workflow:
- Switch to feature branch
- Run code-validation scripts
- Interpret automated findings
- Apply LLM heuristics
- Continue with Claude Code Review (MCP)
- Proceed with remaining QA steps
If CRITICAL or multiple HIGH findings:
- BLOCK validation
- Report to Traycer with specific file:line references
- Delegate to Action Agent for fixes
- Re-run validation after fixes
Resources
- Scripts:
scripts/diff_analyzer.py- Analyzes git diffs for red flagsscripts/static_analyzer.py- Scans files without git context
Notes
- Scripts are deterministic and fast; use them first
- LLM heuristics handle context-dependent evaluation
- Always provide file:line references in reports
- CRITICAL findings must block merge
- Document justified exceptions in code comments