Agent Verifier
Purpose
Verify code against business rules, security policies, and framework best practices. All analysis happens locally—code never leaves your machine.
When to Use
Trigger this skill when the user asks to:
- "verify my code/agent"
- "review code"
- "check compliance"
- "audit my code"
- "check against rules"
- "validate implementation"
Process
Step 1: Detect Mode and Load Context
Check for Kahuna integration (.kahuna/ directory exists):
If .kahuna/ directory exists:
- Context is pre-loaded from
kahuna_prepare_context - Use
kahuna_askfor additional queries - Read
.kahuna/context-guide.mdfor organizational rules and framework patterns
If .kahuna/ directory does NOT exist (standalone mode):
- Scan for project configuration files
- Apply built-in best practices (see Step 1b)
Step 1b: Standalone Context Discovery
When running without Kahuna, gather context from the project itself:
Project configuration files:
package.json,pyproject.toml,Cargo.toml- Dependencies and project typetsconfig.json,biome.json,.eslintrc*- Linting/formatting rules.env.example- Expected environment variables
Documentation:
README.md- Project overviewdocs/,CONTRIBUTING.md,.github/CONTRIBUTING.md- GuidelinesARCHITECTURE.md,DESIGN.md- Design decisions
Detect language/framework from:
- File extensions (
.py,.ts,.js,.go,.rs) - Dependencies in manifest files
- Directory structure patterns
Step 2: Discover Files to Analyze
Locate implementation files in the project. Check these common locations:
Directories:
src/agent/,agent/,src/, project rootlib/,app/,packages/
Key files to analyze:
graph.py- Workflow/graph definition (LangGraph)tools.py- Tool implementationsstate.py- State schema definitionsprompts.py- Prompt templatesnodes.py- Node function implementationsconfig.py,settings.py- Configuration*.ts,*.js- TypeScript/JavaScript sources*.go,*.rs- Go/Rust sources
AI Agent files to prioritize (when agent patterns detected):
graph.py,graph.ts- Agent workflow definitionstools.py,tools.ts,tools/*.py,tools/*.ts- Tool implementationsstate.py,state.ts- State schemasprompts.py,prompts/*.md,system.md- Prompt templatesagent.py,agent.ts- Main agent logiclanggraph.json,crew.yaml- Framework configurations
Exclude from prompt size analysis:
skills/directory and all subdirectories — these are skill definition files (e.g.SKILL.md) loaded on demand by the coding agent, not static system prompts embedded in every LLM call. Flagging them for size would be a false positive.
Use list_files or equivalent to discover the actual structure, then read relevant files.
Step 3: Verify Code Against Rules
Check Tiers
Every check in this skill is classified as one of two tiers. Apply them differently:
-
[PATTERN]— Mechanical. The answer is objectively correct or incorrect based on code structure. Apply the rule exactly as written. Do not use judgment to soften or skip. A missingstop=parameter on@retryis always an ❌ Issue, regardless of context. Report these with high confidence. -
[HEURISTIC]— Judgment required. The rule describes a quality signal that requires interpretation. Apply it as a best-effort assessment. Mark these findings clearly so the reader knows they reflect analysis, not a deterministic check.
In the report, tag every finding with its tier: [P] for pattern, [H] for heuristic.
Fallback Rule
Every [PATTERN] check has a paired [HEURISTIC] fallback. After running the pattern table for a given category, also scan for behavior that resembles the anti-pattern but doesn't match any specific table entry. This catches novel libraries, custom implementations, and pattern drift that rigid tables cannot anticipate.
Heuristic fallback findings are always:
- Tagged
[H]in the report - Reported as ⚠️ Warning (never ❌ Issue) — they reflect judgment, not a definitive match
- Accompanied by a note explaining why a manual review is recommended
The pattern check is the primary detector (high confidence, low false positives). The heuristic fallback is the safety net (lower confidence, catches what patterns miss).
Analyze code against all available rules:
With Kahuna (enhanced mode):
- Organizational rules - Company policies from
.kahuna/context-guide.md - IT/Security rules - Security requirements from knowledge base
- Framework best practices - Patterns from surfaced context
Standalone (built-in rules):
-
[HEURISTIC]General code quality- Clear naming conventions (descriptive, consistent)
- Appropriate code organization and structure
- Error handling patterns
- No magic numbers/strings without constants
-
Security basics — mixed tier (see per-item labels below:)
[PATTERN]No hardcoded secrets — scan for assignments matchingAPI_KEY,SECRET,PASSWORD,TOKEN,PRIVATE_KEY(case-insensitive) assigned to string literals. Flag any match as ❌ Issue.[HEURISTIC]Input validation on external data[HEURISTIC]Proper error messages (no stack traces in production)[HEURISTIC]Secure defaults
-
Language-specific (auto-detected):
TypeScript/JavaScript:
[PATTERN]Type safety — flag iftsconfig.jsondoes not have"strict": true[HEURISTIC]Async/await error handling[PATTERN]Noanytypes — flag unqualified: anyannotations as ⚠️ Warning[HEURISTIC]Dependency security (outdated/vulnerable packages)
Python:
[PATTERN]Type hints — flag anydeffunction in public scope (no leading_) that has parameters without type annotations, as ⚠️ Warning[HEURISTIC]Docstrings for modules, classes, functions[PATTERN]Requirements pinning — flag any line inrequirements.txt/pyproject.tomldependencies using>=,>, or no version specifier (should use==), as ❌ Issue
Go:
[PATTERN]No ignored errors — flag any_ =assignments where the right-hand side is a function call returningerror, as ❌ Issue[HEURISTIC]Context propagation[HEURISTIC]Proper package structure
-
AI Agent-specific (if detected):
[HEURISTIC]State schema validation[HEURISTIC]Tool error handling[HEURISTIC]Prompt injection considerations[HEURISTIC]Rate limiting awareness
Agent Observability Patterns:
-
[PATTERN]Loop SafetyApply mechanically. Do not pass a loop because it "looks like it might terminate."
Pattern to find Pass condition Severity while True:in PythonA breakstatement exists within the same block scope⚠️ Warning if absent for { }in GoA breakorreturnexists within the block⚠️ Warning if absent while (true)in TS/JSA breakorreturnexists within the block⚠️ Warning if absent Function calls itself recursively A non-recursive return path exists (base case), OR a depth/counter parameter is present ⚠️ Warning if absent [HEURISTIC]Fallback: Unrecognized Loop PatternsAfter applying the pattern table above, also scan for loop-like behavior that may run indefinitely but doesn't match any specific pattern above:
- Any loop where the termination condition depends entirely on external/runtime state with no timeout or max-iteration guard
- Generator functions that
yieldindefinitely without a documented exit condition - Event loops or polling loops (e.g.
while not done:,while queue:) without timeout parameters - Recursive call chains across multiple functions (A calls B, B calls A) without depth tracking
If found, flag as ⚠️ Warning: "Potential unbounded loop not matching known patterns — verify termination condition manually"
-
[PATTERN]Retry Limit EnforcementApply mechanically. Check each decorator or call against the table below. If the required parameter is absent, flag as ❌ Issue regardless of other parameters present.
Decorator-based (Python):
Library/Pattern Required parameter Fail condition @retry(tenacity)stop=stop_after_attempt(n)orstop=stop_after_delay(n)stop=absent@backoff.on_exceptionmax_tries=nmax_tries=absentHTTP client retry configuration (Python):
Library/Pattern Required parameter Fail condition urllib3.Retry(...)total=nwhere n > 0total=absent ortotal=0HTTPAdapter(max_retries=Retry(...))The Retryobject must havetotal=ntotal=absent in theRetryobject passed tomax_retries=httpx.HTTPTransport(retries=n)retries=nwhere n > 0retries=absent orretries=0AWS SDK (Python/boto3):
Library/Pattern Required parameter Fail condition Config(retries={...})(botocore)max_attemptskey with value > 1max_attemptsabsent, ormax_attempts: 0ormax_attempts: 1(no retries)Note: boto3 clients without any explicit
Config(retries=...)use the SDK default (3 attempts, standard mode) — do not flag the absence of retry config as an issue. Only flag when retry config is present but disables retries.JavaScript/TypeScript:
Library/Pattern Required parameter Fail condition retry(...)(async-retry)retries: nin options objectretries:absentpRetry(...)(p-retry)retries: nin options objectretries:absentCustom retry loops (all languages):
A
while True:/while (true)/for {}block that contains atry/except(ortry/catch) with acontinueor a re-invocation of the same call is a manual retry loop. Apply the same rule as Loop Safety: a bounded counter must be present.Pattern to find Pass condition Fail condition Loop + try/except+continueAn integer counter is declared before the loop and incremented inside it, with a conditional check against a max No counter present → ❌ Issue [HEURISTIC]Fallback: Unrecognized Retry PatternsAfter applying the pattern tables above, also scan for retry-like behavior that doesn't match any known library or pattern:
- Any function or decorator containing "retry" in its name not covered by the tables above
- Any imported module with "retry" in its package name not listed in the tables (e.g.
stamina,retry,aiohttp_retry) - A loop containing: sleep/delay + exception handling + re-invocation of the same external call, where no counter or max-attempt mechanism is visible
- Configuration objects with keys like
max_retries,retry_count,attemptsthat may belong to unlisted libraries
If found, flag as ⚠️ Warning: "Potential retry pattern not matching known libraries — verify retry bounds manually"
-
[PATTERN]Tool Registry ConsistencyApply mechanically:
-
Collect all tool names from definition files. A tool name is defined by:
@toolor@function_tooldecorator on a function → the function name- A dict/object with a
"name":orname:key at the top level of a tools file
-
Collect all tool name references from prompt files (
.md,.txt,prompts.py). A reference is any backtick-quoted identifier or string that names a capability the agent is told it can use. -
Flag every reference not in the definition list as ❌ Issue (hallucinated tool).
-
Flag every defined tool not mentioned in any prompt as ⚠️ Warning (undocumented tool).
-
[HEURISTIC]Tools never bound to LLM — Find where tools are defined (any list, registry, or decorated set of functions intended as agent tools). Then find where the LLM is invoked (the call that sends messages to the model). Check whether the tools are passed to that invocation point. If a tools collection exists but is never connected to the LLM call, flag as ❌ Issue: the LLM has no knowledge of these tools and cannot invoke them — the tool-calling architecture is broken or incomplete.The connection can take many forms depending on the framework (e.g. a
tools=argument, a bind method, a plugin registration API, an agent constructor parameter). Do not look for any specific method name — reason about whether the defined tools actually reach the LLM invocation. Example of the broken pattern: tools decorated with@tooland collected inALL_TOOLS, but the LLM call never receivesALL_TOOLSin any form.
[HEURISTIC]Fallback: Unrecognized Tool DefinitionsAfter applying the known tool definition patterns above, also scan for tool-like structures that don't match any known format:
- Any dict/object with both
"description"and"parameters"keys that resembles a tool schema but doesn't match the specific patterns listed - Functions with structured docstrings that look like tool descriptions (name, parameter list, return description) but lack a
@tooldecorator - Any variable named
tools,tool_list,available_tools,functions, or similar containing callable references or schema objects - Class-based tool patterns (classes with
run(),execute(), or__call__()methods that appear to wrap external capabilities)
If found, include in the tool registry count and note: "Tool detected via heuristic — format not in known pattern table. Verify this is an intended agent tool."
-
-
[PATTERN]Context Size AwarenessApply mechanically using the formula:
token_estimate = len(file_content_chars) / 4Content ⚠️ Warning threshold ❌ Issue threshold System prompt file > 4,000 tokens > 8,000 tokens Single tool description block > 500 tokens > 1,000 tokens All tool descriptions combined > 2,000 tokens > 4,000 tokens Exclude
skills/directories from this check (see Step 2).[HEURISTIC]Fallback: Borderline and Non-Standard Context SizesAfter applying the mechanical threshold check above, also apply judgment in these cases:
- When a token estimate falls within 20% of any threshold, flag as ⚠️ Warning: "Token estimate is approximate (chars/4). Actual token count may differ — consider measuring with a tokenizer if close to limit."
- Dynamically assembled prompts (e.g. f-strings,
.format(), template concatenation) where the final size depends on runtime data — flag as ⚠️ Warning if the static template alone is large, since runtime content will add to it - Multiple prompt files that are concatenated or chained together — estimate the combined size, not just individual files
- Prompt files that include or import other files (e.g.
{% include %},{read_file(...)}) — note that the effective size may be larger than the single file
If found, flag as ⚠️ Warning with an explanation of why the actual context size may differ from the static estimate.
-
[HEURISTIC]Explicit Tool Listing- System prompts should list available tools
- Tool capabilities should be clearly described
Detection: Check for tool listing sections (headers like "Available Tools", "You have access to")
For each check, determine:
- ✅ Pass - Code complies with the rule
- ⚠️ Warning - Potential concern worth reviewing
- ❌ Issue - Clear violation that needs fixing
Step 3b: Agent Pattern Analysis (if AI agent detected)
If the project appears to be an AI agent (LangGraph, CrewAI, AutoGen, LangChain, or custom), perform additional analysis:
Framework Detection:
langgraphin imports → LangGraph agentcrewaiin imports → CrewAI agentautogenin imports → AutoGen agentlangchainin imports → LangChain agent- Custom patterns → Custom agent framework
Analysis Steps:
-
Build tool registry
Scan all tool definition files:
tools.py,tools.ts,tools/*.py,tools/*.ts, and any file whose name or content suggests it defines agent tools. Extract tool names using the patterns below. A name found by any pattern counts as a registered tool.Python — decorator patterns:
Pattern How to extract name @tool(LangChain) on adefFunction name immediately below the decorator @function_tool(OpenAI Agents SDK) on adefFunction name immediately below the decorator @tool(name="...")with explicit name argUse the name=argument value, not the function namePython — dict/list patterns:
Pattern How to extract name {"type": "function", "function": {"name": "..."}}(OpenAI function calling)Value of the "name"key inside"function"{"name": "...", "input_schema": {...}}(Anthropic tool use)Value of the top-level "name"key{"name": "...", "description": "...", "parameters": {...}}(generic schema)Value of the top-level "name"keyToolNode([func1, func2, ...])(LangGraph)Each function name in the list — these must already be registered via decorator or schema above tools = [func1, func2]/TOOLS = [...]list assigned to a variableEach identifier in the list — resolve to function names already found by other patterns TypeScript/JavaScript — patterns:
Pattern How to extract name { type: "function", function: { name: "..." } }(OpenAI)Value of name:insidefunction:tool({ description: "...", parameters: z.object({...}) })assigned to aconst name =(Vercel AI SDK)The constvariable namenew DynamicTool({ name: "...", ... })(LangChain.js)Value of name:zodFunction({ name: "...", ... })Value of name:After collecting all names, note the total count and source format for the report.
-
Analyze agent loops
- Find main execution loops
- Check for termination conditions (break, return, max iterations)
- Verify retry limits on all retry mechanisms
-
Analyze prompts
- Measure prompt sizes (estimate tokens)
- Check for tool listings in system prompts
- Verify tool references against registry
- Exclude
skills/directories — files underskills/(e.g.SKILL.md) are skill definitions loaded on demand, not static system prompts. Do not flag them for context size.
-
Cross-reference
- Tools in prompts vs registry (flag mismatches)
- State fields vs usage
- Config vs implementation
-
LangGraph graph cycle analysis (only when LangGraph is detected)
LangGraph agents define control flow as a directed graph of nodes and edges, not
whileloops. A cycle in the graph is intentional (the agent loops between "agent" and "tools" nodes), but every cycle must have at least one conditional edge that can route toEND. A cycle with no reachableENDis an infinite loop at the graph level.Detection steps:
a. Find the graph file (
graph.py,graph.ts, or file containingStateGraph/MessageGraph)b. Build an edge map by scanning for:
workflow.add_edge(source, dest)— unconditional edgeworkflow.add_conditional_edges(source, fn, mapping)— conditional edges; extract all destination values from the mapping dict
c. Identify cycles: find any node that is reachable from itself by following edges
d. For each cycle, check if
END(or"__end__") is reachable from any node in the cycle via a conditional edge mappinge. Flag accordingly:
Condition Severity Cycle exists, ENDreachable via conditional edge✅ Pass Cycle exists, no path to ENDfrom any node in cycle❌ Issue Graph has no ENDnode at all❌ Issue Node has no outgoing edges and is not END⚠️ Warning (dead-end node) Example — infinite cycle (❌ Issue):
workflow.add_edge("agent", "tools") workflow.add_edge("tools", "agent") # cycle, but no path to ENDExample — cycle with exit (✅ Pass):
workflow.add_conditional_edges("agent", should_continue, { "continue": "tools", "end": END # END is reachable → cycle is safe }) workflow.add_edge("tools", "agent")Note: The check inspects the static structure of
add_edge/add_conditional_edgescalls. It does not evaluate the routing function itself (should_continue) — that is runtime behaviour. If the mapping dict containsENDas a possible destination, the check passes.[HEURISTIC]Fallback: Unrecognized Graph and State Machine PatternsAfter applying LangGraph-specific edge parsing above, also scan for graph-like control flow structures in other frameworks or custom implementations:
- State machine definitions with transition tables or dispatch dicts (e.g.
transitionslibrary, customstate_map = {"state_a": handler_a, ...}) - Custom routing logic where nodes call other nodes based on conditions, forming implicit cycles
- LangGraph.js patterns using camelCase methods (
.addNode(),.addEdge(),.addConditionalEdges()) — apply the same cycle analysis - CrewAI, AutoGen, or other multi-agent frameworks where agents hand off to each other in a potentially circular pattern
- Any adjacency list, dict-of-lists, or graph object that defines node connections with no clear termination path
If a cycle is detected with no visible termination condition, flag as ⚠️ Warning: "Potential cyclic control flow detected in non-LangGraph graph structure — verify that a termination condition exists"
Example findings:
### ⚠️ Warnings
- Potential infinite loop: `agent/loop.py:45`
- **Pattern:** `while True:` without visible break condition
- **Suggestion:** Add explicit max iteration counter: `for i in range(MAX_ITERATIONS):`
- Large system prompt: `prompts/system.md`
- **Size:** ~6,200 tokens (estimated)
- **Threshold:** 4,000 tokens (warning)
- **Risk:** May cause context overflow with long conversations
- **Suggestion:** Consider splitting into base prompt + dynamic sections
### ❌ Issues
- Missing retry limit: `tools/api_client.py:23`
- **Pattern:** `@retry` decorator without `stop` parameter
- **Rule:** All retry mechanisms must have explicit bounds
- **Fix:** Add `@retry(stop=stop_after_attempt(3))` or use `tenacity.stop_after_attempt(3)`
- Hallucinated tool reference: `prompts/system.md:34`
- **Reference:** `execute_sql_query`
- **Available tools:** search_docs, write_file, run_tests
- **Rule:** Tool references must match registered tools
- **Fix:** Either add tool definition or remove reference from prompt
Step 4: Generate Report
Output a structured verification report with agent-specific sections when applicable:
# Verification Report
**Project:** [project name or path]
**Date:** [current date]
**Mode:** [Kahuna-enhanced | Standalone]
**Files analyzed:** [count]
**Agent type detected:** [LangGraph | CrewAI | AutoGen | LangChain | Custom | None]
## Summary
✅ X checks passed | ⚠️ Y warnings | ❌ Z issues
### By Category
| Category | Pass | Warn | Issue |
|----------|------|------|-------|
| Code Quality | X | X | X |
| Security | X | X | X |
| Agent Patterns | X | X | X |
## Agent Pattern Analysis
*(Include this section only when Agent type detected ≠ None)*
### Loop Safety
- [x] All retry mechanisms have explicit limits
- [ ] ⚠️ Potential unbounded loop at `[file:line]`
- [ ] ❌ Missing retry limit at `[file:line]`
### Tool Consistency
- [x] Tool registry found: X tools defined
- [ ] ❌ Y hallucinated tool references in prompts
- [ ] ❌ `[H]` Tools defined but never connected to LLM invocation — LLM cannot invoke tools
- [ ] ⚠️ Z tools not documented in system prompt
### Context Management
- [x] System prompt within limits (~X tokens)
- [ ] ⚠️ System prompt exceeds recommended size (~X tokens)
- [x] Tool descriptions within limits
## Findings
> `[P]` = pattern-matched (structurally reliable) · `[H]` = heuristic (best-effort judgment)
### ✅ Passing
- `[P]` [Check name]: [Brief confirmation of compliance]
### ⚠️ Warnings
- `[P|H]` [Check name]: [Description of concern]
- **Location:** [file:line if applicable]
- **Suggestion:** [How to address]
### ❌ Issues
- `[P|H]` [Check name]: [Description of violation]
- **Location:** [file:line]
- **Rule:** [Which rule this violates]
- **Fix:** [Specific remediation steps]
## Recommendations
*(Generic recommendations for all projects)*
1. [Priority recommendation based on findings]
2. [Additional improvements]
## Agent-Specific Recommendations
*(Include this section only when Agent type detected ≠ None)*
1. **Loop Safety:** [Add iteration limits / Add retry bounds]
2. **Tool Registry:** [Remove or define hallucinated tools]
3. **Context Management:** [Split large prompts / Add tool documentation]
Step 5: Export Report (Optional)
After presenting the report, ask the user:
Would you like to save this verification report to a file?
If confirmed:
-
Create the reports directory if it doesn't exist:
mkdir -p reports/verification -
Generate filename with actual current timestamp (not zeros):
reports/verification/YYYY-MM-DD_HH-MM-SS.mdExample:
reports/verification/2026-03-04_16-48-21.mdUse the current time from the system, not placeholder values. The format is:
{year}-{month}-{day}_{hour}-{minute}-{second}.md -
Save the complete report to that file.
Notes
- Privacy first: All code analysis happens locally. Nothing is sent to external services.
- Kahuna enhances, not requires: The skill works standalone with built-in rules. Kahuna adds organization-specific knowledge.
- Be specific: Include file names and line numbers when reporting issues.
- Explain the "why": Help developers understand why each rule matters.
- Honor existing configs: Respect project's existing lint rules,
.editorconfig, etc. - Respect tier discipline:
[PATTERN]checks must be applied exactly as specified — do not use judgment to pass something the rule says should fail.[HEURISTIC]checks require judgment — apply them thoughtfully and mark findings clearly so the reader understands the confidence level.