verification

Verify code against organizational policies, security requirements, and framework best practices. Use when asked to "verify", "review", "audit", "check compliance", or "validate" code.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "verification" with this command: npx skills add aurite-ai/agent-verifier/aurite-ai-agent-verifier-verification

Agent Verifier

Purpose

Verify code against business rules, security policies, and framework best practices. All analysis happens locally—code never leaves your machine.

When to Use

Trigger this skill when the user asks to:

  • "verify my code/agent"
  • "review code"
  • "check compliance"
  • "audit my code"
  • "check against rules"
  • "validate implementation"

Process

Step 1: Detect Mode and Load Context

Check for Kahuna integration (.kahuna/ directory exists):

If .kahuna/ directory exists:

  • Context is pre-loaded from kahuna_prepare_context
  • Use kahuna_ask for additional queries
  • Read .kahuna/context-guide.md for organizational rules and framework patterns

If .kahuna/ directory does NOT exist (standalone mode):

  • Scan for project configuration files
  • Apply built-in best practices (see Step 1b)

Step 1b: Standalone Context Discovery

When running without Kahuna, gather context from the project itself:

Project configuration files:

  • package.json, pyproject.toml, Cargo.toml - Dependencies and project type
  • tsconfig.json, biome.json, .eslintrc* - Linting/formatting rules
  • .env.example - Expected environment variables

Documentation:

  • README.md - Project overview
  • docs/, CONTRIBUTING.md, .github/CONTRIBUTING.md - Guidelines
  • ARCHITECTURE.md, DESIGN.md - Design decisions

Detect language/framework from:

  • File extensions (.py, .ts, .js, .go, .rs)
  • Dependencies in manifest files
  • Directory structure patterns

Step 2: Discover Files to Analyze

Locate implementation files in the project. Check these common locations:

Directories:

  • src/agent/, agent/, src/, project root
  • lib/, app/, packages/

Key files to analyze:

  • graph.py - Workflow/graph definition (LangGraph)
  • tools.py - Tool implementations
  • state.py - State schema definitions
  • prompts.py - Prompt templates
  • nodes.py - Node function implementations
  • config.py, settings.py - Configuration
  • *.ts, *.js - TypeScript/JavaScript sources
  • *.go, *.rs - Go/Rust sources

AI Agent files to prioritize (when agent patterns detected):

  • graph.py, graph.ts - Agent workflow definitions
  • tools.py, tools.ts, tools/*.py, tools/*.ts - Tool implementations
  • state.py, state.ts - State schemas
  • prompts.py, prompts/*.md, system.md - Prompt templates
  • agent.py, agent.ts - Main agent logic
  • langgraph.json, crew.yaml - Framework configurations

Exclude from prompt size analysis:

  • skills/ directory and all subdirectories — these are skill definition files (e.g. SKILL.md) loaded on demand by the coding agent, not static system prompts embedded in every LLM call. Flagging them for size would be a false positive.

Use list_files or equivalent to discover the actual structure, then read relevant files.

Step 3: Verify Code Against Rules

Check Tiers

Every check in this skill is classified as one of two tiers. Apply them differently:

  • [PATTERN] — Mechanical. The answer is objectively correct or incorrect based on code structure. Apply the rule exactly as written. Do not use judgment to soften or skip. A missing stop= parameter on @retry is always an ❌ Issue, regardless of context. Report these with high confidence.

  • [HEURISTIC] — Judgment required. The rule describes a quality signal that requires interpretation. Apply it as a best-effort assessment. Mark these findings clearly so the reader knows they reflect analysis, not a deterministic check.

In the report, tag every finding with its tier: [P] for pattern, [H] for heuristic.

Fallback Rule

Every [PATTERN] check has a paired [HEURISTIC] fallback. After running the pattern table for a given category, also scan for behavior that resembles the anti-pattern but doesn't match any specific table entry. This catches novel libraries, custom implementations, and pattern drift that rigid tables cannot anticipate.

Heuristic fallback findings are always:

  • Tagged [H] in the report
  • Reported as ⚠️ Warning (never ❌ Issue) — they reflect judgment, not a definitive match
  • Accompanied by a note explaining why a manual review is recommended

The pattern check is the primary detector (high confidence, low false positives). The heuristic fallback is the safety net (lower confidence, catches what patterns miss).


Analyze code against all available rules:

With Kahuna (enhanced mode):

  1. Organizational rules - Company policies from .kahuna/context-guide.md
  2. IT/Security rules - Security requirements from knowledge base
  3. Framework best practices - Patterns from surfaced context

Standalone (built-in rules):

  1. [HEURISTIC] General code quality

    • Clear naming conventions (descriptive, consistent)
    • Appropriate code organization and structure
    • Error handling patterns
    • No magic numbers/strings without constants
  2. Security basics — mixed tier (see per-item labels below:)

    • [PATTERN] No hardcoded secrets — scan for assignments matching API_KEY, SECRET, PASSWORD, TOKEN, PRIVATE_KEY (case-insensitive) assigned to string literals. Flag any match as ❌ Issue.
    • [HEURISTIC] Input validation on external data
    • [HEURISTIC] Proper error messages (no stack traces in production)
    • [HEURISTIC] Secure defaults
  3. Language-specific (auto-detected):

    TypeScript/JavaScript:

    • [PATTERN] Type safety — flag if tsconfig.json does not have "strict": true
    • [HEURISTIC] Async/await error handling
    • [PATTERN] No any types — flag unqualified : any annotations as ⚠️ Warning
    • [HEURISTIC] Dependency security (outdated/vulnerable packages)

    Python:

    • [PATTERN] Type hints — flag any def function in public scope (no leading _) that has parameters without type annotations, as ⚠️ Warning
    • [HEURISTIC] Docstrings for modules, classes, functions
    • [PATTERN] Requirements pinning — flag any line in requirements.txt / pyproject.toml dependencies using >=, >, or no version specifier (should use ==), as ❌ Issue

    Go:

    • [PATTERN] No ignored errors — flag any _ = assignments where the right-hand side is a function call returning error, as ❌ Issue
    • [HEURISTIC] Context propagation
    • [HEURISTIC] Proper package structure
  4. AI Agent-specific (if detected):

    • [HEURISTIC] State schema validation
    • [HEURISTIC] Tool error handling
    • [HEURISTIC] Prompt injection considerations
    • [HEURISTIC] Rate limiting awareness

    Agent Observability Patterns:

    1. [PATTERN] Loop Safety

      Apply mechanically. Do not pass a loop because it "looks like it might terminate."

      Pattern to findPass conditionSeverity
      while True: in PythonA break statement exists within the same block scope⚠️ Warning if absent
      for { } in GoA break or return exists within the block⚠️ Warning if absent
      while (true) in TS/JSA break or return exists within the block⚠️ Warning if absent
      Function calls itself recursivelyA non-recursive return path exists (base case), OR a depth/counter parameter is present⚠️ Warning if absent

      [HEURISTIC] Fallback: Unrecognized Loop Patterns

      After applying the pattern table above, also scan for loop-like behavior that may run indefinitely but doesn't match any specific pattern above:

      • Any loop where the termination condition depends entirely on external/runtime state with no timeout or max-iteration guard
      • Generator functions that yield indefinitely without a documented exit condition
      • Event loops or polling loops (e.g. while not done:, while queue:) without timeout parameters
      • Recursive call chains across multiple functions (A calls B, B calls A) without depth tracking

      If found, flag as ⚠️ Warning: "Potential unbounded loop not matching known patterns — verify termination condition manually"

    2. [PATTERN] Retry Limit Enforcement

      Apply mechanically. Check each decorator or call against the table below. If the required parameter is absent, flag as ❌ Issue regardless of other parameters present.

      Decorator-based (Python):

      Library/PatternRequired parameterFail condition
      @retry (tenacity)stop=stop_after_attempt(n) or stop=stop_after_delay(n)stop= absent
      @backoff.on_exceptionmax_tries=nmax_tries= absent

      HTTP client retry configuration (Python):

      Library/PatternRequired parameterFail condition
      urllib3.Retry(...)total=n where n > 0total= absent or total=0
      HTTPAdapter(max_retries=Retry(...))The Retry object must have total=ntotal= absent in the Retry object passed to max_retries=
      httpx.HTTPTransport(retries=n)retries=n where n > 0retries= absent or retries=0

      AWS SDK (Python/boto3):

      Library/PatternRequired parameterFail condition
      Config(retries={...}) (botocore)max_attempts key with value > 1max_attempts absent, or max_attempts: 0 or max_attempts: 1 (no retries)

      Note: boto3 clients without any explicit Config(retries=...) use the SDK default (3 attempts, standard mode) — do not flag the absence of retry config as an issue. Only flag when retry config is present but disables retries.

      JavaScript/TypeScript:

      Library/PatternRequired parameterFail condition
      retry(...) (async-retry)retries: n in options objectretries: absent
      pRetry(...) (p-retry)retries: n in options objectretries: absent

      Custom retry loops (all languages):

      A while True: / while (true) / for {} block that contains a try/except (or try/catch) with a continue or a re-invocation of the same call is a manual retry loop. Apply the same rule as Loop Safety: a bounded counter must be present.

      Pattern to findPass conditionFail condition
      Loop + try/except + continueAn integer counter is declared before the loop and incremented inside it, with a conditional check against a maxNo counter present → ❌ Issue

      [HEURISTIC] Fallback: Unrecognized Retry Patterns

      After applying the pattern tables above, also scan for retry-like behavior that doesn't match any known library or pattern:

      • Any function or decorator containing "retry" in its name not covered by the tables above
      • Any imported module with "retry" in its package name not listed in the tables (e.g. stamina, retry, aiohttp_retry)
      • A loop containing: sleep/delay + exception handling + re-invocation of the same external call, where no counter or max-attempt mechanism is visible
      • Configuration objects with keys like max_retries, retry_count, attempts that may belong to unlisted libraries

      If found, flag as ⚠️ Warning: "Potential retry pattern not matching known libraries — verify retry bounds manually"

    3. [PATTERN] Tool Registry Consistency

      Apply mechanically:

      1. Collect all tool names from definition files. A tool name is defined by:

        • @tool or @function_tool decorator on a function → the function name
        • A dict/object with a "name": or name: key at the top level of a tools file
      2. Collect all tool name references from prompt files (.md, .txt, prompts.py). A reference is any backtick-quoted identifier or string that names a capability the agent is told it can use.

      3. Flag every reference not in the definition list as ❌ Issue (hallucinated tool).

      4. Flag every defined tool not mentioned in any prompt as ⚠️ Warning (undocumented tool).

      5. [HEURISTIC] Tools never bound to LLM — Find where tools are defined (any list, registry, or decorated set of functions intended as agent tools). Then find where the LLM is invoked (the call that sends messages to the model). Check whether the tools are passed to that invocation point. If a tools collection exists but is never connected to the LLM call, flag as ❌ Issue: the LLM has no knowledge of these tools and cannot invoke them — the tool-calling architecture is broken or incomplete.

        The connection can take many forms depending on the framework (e.g. a tools= argument, a bind method, a plugin registration API, an agent constructor parameter). Do not look for any specific method name — reason about whether the defined tools actually reach the LLM invocation. Example of the broken pattern: tools decorated with @tool and collected in ALL_TOOLS, but the LLM call never receives ALL_TOOLS in any form.

      [HEURISTIC] Fallback: Unrecognized Tool Definitions

      After applying the known tool definition patterns above, also scan for tool-like structures that don't match any known format:

      • Any dict/object with both "description" and "parameters" keys that resembles a tool schema but doesn't match the specific patterns listed
      • Functions with structured docstrings that look like tool descriptions (name, parameter list, return description) but lack a @tool decorator
      • Any variable named tools, tool_list, available_tools, functions, or similar containing callable references or schema objects
      • Class-based tool patterns (classes with run(), execute(), or __call__() methods that appear to wrap external capabilities)

      If found, include in the tool registry count and note: "Tool detected via heuristic — format not in known pattern table. Verify this is an intended agent tool."

    4. [PATTERN] Context Size Awareness

      Apply mechanically using the formula: token_estimate = len(file_content_chars) / 4

      Content⚠️ Warning threshold❌ Issue threshold
      System prompt file> 4,000 tokens> 8,000 tokens
      Single tool description block> 500 tokens> 1,000 tokens
      All tool descriptions combined> 2,000 tokens> 4,000 tokens

      Exclude skills/ directories from this check (see Step 2).

      [HEURISTIC] Fallback: Borderline and Non-Standard Context Sizes

      After applying the mechanical threshold check above, also apply judgment in these cases:

      • When a token estimate falls within 20% of any threshold, flag as ⚠️ Warning: "Token estimate is approximate (chars/4). Actual token count may differ — consider measuring with a tokenizer if close to limit."
      • Dynamically assembled prompts (e.g. f-strings, .format(), template concatenation) where the final size depends on runtime data — flag as ⚠️ Warning if the static template alone is large, since runtime content will add to it
      • Multiple prompt files that are concatenated or chained together — estimate the combined size, not just individual files
      • Prompt files that include or import other files (e.g. {% include %}, {read_file(...)}) — note that the effective size may be larger than the single file

      If found, flag as ⚠️ Warning with an explanation of why the actual context size may differ from the static estimate.

    5. [HEURISTIC] Explicit Tool Listing

      • System prompts should list available tools
      • Tool capabilities should be clearly described

      Detection: Check for tool listing sections (headers like "Available Tools", "You have access to")

For each check, determine:

  • Pass - Code complies with the rule
  • ⚠️ Warning - Potential concern worth reviewing
  • Issue - Clear violation that needs fixing

Step 3b: Agent Pattern Analysis (if AI agent detected)

If the project appears to be an AI agent (LangGraph, CrewAI, AutoGen, LangChain, or custom), perform additional analysis:

Framework Detection:

  • langgraph in imports → LangGraph agent
  • crewai in imports → CrewAI agent
  • autogen in imports → AutoGen agent
  • langchain in imports → LangChain agent
  • Custom patterns → Custom agent framework

Analysis Steps:

  1. Build tool registry

    Scan all tool definition files: tools.py, tools.ts, tools/*.py, tools/*.ts, and any file whose name or content suggests it defines agent tools. Extract tool names using the patterns below. A name found by any pattern counts as a registered tool.

    Python — decorator patterns:

    PatternHow to extract name
    @tool (LangChain) on a defFunction name immediately below the decorator
    @function_tool (OpenAI Agents SDK) on a defFunction name immediately below the decorator
    @tool(name="...") with explicit name argUse the name= argument value, not the function name

    Python — dict/list patterns:

    PatternHow to extract name
    {"type": "function", "function": {"name": "..."}} (OpenAI function calling)Value of the "name" key inside "function"
    {"name": "...", "input_schema": {...}} (Anthropic tool use)Value of the top-level "name" key
    {"name": "...", "description": "...", "parameters": {...}} (generic schema)Value of the top-level "name" key
    ToolNode([func1, func2, ...]) (LangGraph)Each function name in the list — these must already be registered via decorator or schema above
    tools = [func1, func2] / TOOLS = [...] list assigned to a variableEach identifier in the list — resolve to function names already found by other patterns

    TypeScript/JavaScript — patterns:

    PatternHow to extract name
    { type: "function", function: { name: "..." } } (OpenAI)Value of name: inside function:
    tool({ description: "...", parameters: z.object({...}) }) assigned to a const name = (Vercel AI SDK)The const variable name
    new DynamicTool({ name: "...", ... }) (LangChain.js)Value of name:
    zodFunction({ name: "...", ... })Value of name:

    After collecting all names, note the total count and source format for the report.

  2. Analyze agent loops

    • Find main execution loops
    • Check for termination conditions (break, return, max iterations)
    • Verify retry limits on all retry mechanisms
  3. Analyze prompts

    • Measure prompt sizes (estimate tokens)
    • Check for tool listings in system prompts
    • Verify tool references against registry
    • Exclude skills/ directories — files under skills/ (e.g. SKILL.md) are skill definitions loaded on demand, not static system prompts. Do not flag them for context size.
  4. Cross-reference

    • Tools in prompts vs registry (flag mismatches)
    • State fields vs usage
    • Config vs implementation
  5. LangGraph graph cycle analysis (only when LangGraph is detected)

    LangGraph agents define control flow as a directed graph of nodes and edges, not while loops. A cycle in the graph is intentional (the agent loops between "agent" and "tools" nodes), but every cycle must have at least one conditional edge that can route to END. A cycle with no reachable END is an infinite loop at the graph level.

    Detection steps:

    a. Find the graph file (graph.py, graph.ts, or file containing StateGraph/MessageGraph)

    b. Build an edge map by scanning for:

    • workflow.add_edge(source, dest) — unconditional edge
    • workflow.add_conditional_edges(source, fn, mapping) — conditional edges; extract all destination values from the mapping dict

    c. Identify cycles: find any node that is reachable from itself by following edges

    d. For each cycle, check if END (or "__end__") is reachable from any node in the cycle via a conditional edge mapping

    e. Flag accordingly:

    ConditionSeverity
    Cycle exists, END reachable via conditional edge✅ Pass
    Cycle exists, no path to END from any node in cycle❌ Issue
    Graph has no END node at all❌ Issue
    Node has no outgoing edges and is not END⚠️ Warning (dead-end node)

    Example — infinite cycle (❌ Issue):

    workflow.add_edge("agent", "tools")
    workflow.add_edge("tools", "agent")  # cycle, but no path to END
    

    Example — cycle with exit (✅ Pass):

    workflow.add_conditional_edges("agent", should_continue, {
        "continue": "tools",
        "end": END          # END is reachable → cycle is safe
    })
    workflow.add_edge("tools", "agent")
    

    Note: The check inspects the static structure of add_edge / add_conditional_edges calls. It does not evaluate the routing function itself (should_continue) — that is runtime behaviour. If the mapping dict contains END as a possible destination, the check passes.

    [HEURISTIC] Fallback: Unrecognized Graph and State Machine Patterns

    After applying LangGraph-specific edge parsing above, also scan for graph-like control flow structures in other frameworks or custom implementations:

    • State machine definitions with transition tables or dispatch dicts (e.g. transitions library, custom state_map = {"state_a": handler_a, ...})
    • Custom routing logic where nodes call other nodes based on conditions, forming implicit cycles
    • LangGraph.js patterns using camelCase methods (.addNode(), .addEdge(), .addConditionalEdges()) — apply the same cycle analysis
    • CrewAI, AutoGen, or other multi-agent frameworks where agents hand off to each other in a potentially circular pattern
    • Any adjacency list, dict-of-lists, or graph object that defines node connections with no clear termination path

    If a cycle is detected with no visible termination condition, flag as ⚠️ Warning: "Potential cyclic control flow detected in non-LangGraph graph structure — verify that a termination condition exists"

Example findings:

### ⚠️ Warnings
- Potential infinite loop: `agent/loop.py:45`
  - **Pattern:** `while True:` without visible break condition
  - **Suggestion:** Add explicit max iteration counter: `for i in range(MAX_ITERATIONS):`

- Large system prompt: `prompts/system.md`
  - **Size:** ~6,200 tokens (estimated)
  - **Threshold:** 4,000 tokens (warning)
  - **Risk:** May cause context overflow with long conversations
  - **Suggestion:** Consider splitting into base prompt + dynamic sections

### ❌ Issues
- Missing retry limit: `tools/api_client.py:23`
  - **Pattern:** `@retry` decorator without `stop` parameter
  - **Rule:** All retry mechanisms must have explicit bounds
  - **Fix:** Add `@retry(stop=stop_after_attempt(3))` or use `tenacity.stop_after_attempt(3)`

- Hallucinated tool reference: `prompts/system.md:34`
  - **Reference:** `execute_sql_query`
  - **Available tools:** search_docs, write_file, run_tests
  - **Rule:** Tool references must match registered tools
  - **Fix:** Either add tool definition or remove reference from prompt

Step 4: Generate Report

Output a structured verification report with agent-specific sections when applicable:

# Verification Report

**Project:** [project name or path]
**Date:** [current date]
**Mode:** [Kahuna-enhanced | Standalone]
**Files analyzed:** [count]
**Agent type detected:** [LangGraph | CrewAI | AutoGen | LangChain | Custom | None]

## Summary

✅ X checks passed | ⚠️ Y warnings | ❌ Z issues

### By Category
| Category | Pass | Warn | Issue |
|----------|------|------|-------|
| Code Quality | X | X | X |
| Security | X | X | X |
| Agent Patterns | X | X | X |

## Agent Pattern Analysis

*(Include this section only when Agent type detected ≠ None)*

### Loop Safety
- [x] All retry mechanisms have explicit limits
- [ ] ⚠️ Potential unbounded loop at `[file:line]`
- [ ] ❌ Missing retry limit at `[file:line]`

### Tool Consistency
- [x] Tool registry found: X tools defined
- [ ] ❌ Y hallucinated tool references in prompts
- [ ] ❌ `[H]` Tools defined but never connected to LLM invocation — LLM cannot invoke tools
- [ ] ⚠️ Z tools not documented in system prompt

### Context Management
- [x] System prompt within limits (~X tokens)
- [ ] ⚠️ System prompt exceeds recommended size (~X tokens)
- [x] Tool descriptions within limits

## Findings

> `[P]` = pattern-matched (structurally reliable) · `[H]` = heuristic (best-effort judgment)

### ✅ Passing
- `[P]` [Check name]: [Brief confirmation of compliance]

### ⚠️ Warnings
- `[P|H]` [Check name]: [Description of concern]
  - **Location:** [file:line if applicable]
  - **Suggestion:** [How to address]

### ❌ Issues
- `[P|H]` [Check name]: [Description of violation]
  - **Location:** [file:line]
  - **Rule:** [Which rule this violates]
  - **Fix:** [Specific remediation steps]

## Recommendations

*(Generic recommendations for all projects)*

1. [Priority recommendation based on findings]
2. [Additional improvements]

## Agent-Specific Recommendations

*(Include this section only when Agent type detected ≠ None)*

1. **Loop Safety:** [Add iteration limits / Add retry bounds]
2. **Tool Registry:** [Remove or define hallucinated tools]
3. **Context Management:** [Split large prompts / Add tool documentation]

Step 5: Export Report (Optional)

After presenting the report, ask the user:

Would you like to save this verification report to a file?

If confirmed:

  1. Create the reports directory if it doesn't exist:

    mkdir -p reports/verification
    
  2. Generate filename with actual current timestamp (not zeros):

    reports/verification/YYYY-MM-DD_HH-MM-SS.md
    

    Example: reports/verification/2026-03-04_16-48-21.md

    Use the current time from the system, not placeholder values. The format is: {year}-{month}-{day}_{hour}-{minute}-{second}.md

  3. Save the complete report to that file.

Notes

  • Privacy first: All code analysis happens locally. Nothing is sent to external services.
  • Kahuna enhances, not requires: The skill works standalone with built-in rules. Kahuna adds organization-specific knowledge.
  • Be specific: Include file names and line numbers when reporting issues.
  • Explain the "why": Help developers understand why each rule matters.
  • Honor existing configs: Respect project's existing lint rules, .editorconfig, etc.
  • Respect tier discipline: [PATTERN] checks must be applied exactly as specified — do not use judgment to pass something the rule says should fail. [HEURISTIC] checks require judgment — apply them thoughtfully and mark findings clearly so the reader understands the confidence level.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

verification

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

verification

No summary provided by upstream source.

Repository SourceNeeds Review
General

verification

No summary provided by upstream source.

Repository SourceNeeds Review