Code Reviewer

Overview

Perform thorough, structured code reviews that catch real issues and provide actionable feedback. Every finding is categorized by type, assigned a severity level, and paired with a concrete fix suggestion. The goal is a review that a developer can act on immediately — not vague observations.

When to use

When reviewing a pull request or merge request
When a user asks for feedback on code they've written
When examining a diff or changeset for issues
When auditing code for security, performance, or quality
When asked to "review this code" or "check this for bugs"

Do NOT use when:

The user wants code written from scratch (use a code generation skill)
The user wants a refactoring plan (use a refactoring skill)
The user only wants formatting/linting (suggest an automated tool instead)

Severity Classification

Every finding MUST have one of these severity levels:

Severity	Meaning	Examples
critical	Will cause data loss, security breach, or system failure	SQL injection, unvalidated auth, data corruption
high	Likely to cause bugs in production or significant issues	Race conditions, missing error handling, N+1 queries
medium	Code smell or moderate risk that should be addressed	Magic numbers, overly complex logic, missing validation
low	Minor improvement that would make code better	Naming conventions, minor style issues, documentation gaps
info	Observation or suggestion, not a problem	Alternative approaches, FYI notes, praise for good patterns

Category Definitions

Every finding MUST be assigned one category:

Category	What it covers
bug	Logic errors, off-by-one, null derefs, incorrect conditions
security	Injection, auth bypass, data exposure, insecure defaults
performance	N+1 queries, unnecessary allocations, missing caching, O(n^2)
style	Naming, formatting, idiomatic usage, readability
maintainability	Complexity, coupling, missing abstractions, testability
testing	Missing tests, untested edge cases, test quality

Workflow

1. Understand Context

Before reviewing line-by-line, understand the big picture:

What does this code do? Read the PR description, commit messages, or ask.
What language/framework? Adjust expectations to the ecosystem's conventions.
What's the scope? A 10-line utility function gets different scrutiny than a 500-line auth module.
Are there tests? Note their presence or absence early.

Output: A 1-2 sentence summary of what the code is doing and its context.

2. Scan for Bugs and Logic Errors

Read through the code looking for correctness issues:

Off-by-one errors in loops and array access
Null/undefined dereferences
Incorrect boolean logic or operator precedence
Missing return statements or unreachable code
Race conditions in concurrent code
Incorrect error propagation

For each bug found, note the exact line, what's wrong, and what the fix is.

3. Check Security

Systematically check for common vulnerability classes:

Injection: SQL, command, XSS, template injection — any place user input reaches a sink
Authentication/Authorization: Missing auth checks, privilege escalation paths
Data exposure: Secrets in code, verbose error messages, logging sensitive data
Insecure defaults: Disabled TLS verification, permissive CORS, weak crypto
Input validation: Missing or insufficient validation of external input

Security findings are almost always critical or high severity.

4. Evaluate Performance

Look for patterns that will cause performance problems at scale:

N+1 queries: Database calls inside loops
Unbounded operations: Missing pagination, loading entire tables
Unnecessary work: Redundant computations, repeated parsing
Missing caching: Expensive operations that could be cached
Algorithmic complexity: O(n^2) or worse where O(n) or O(n log n) is possible
Resource leaks: Unclosed connections, file handles, streams

5. Assess Style and Readability

Check that the code is clear and follows conventions:

Consistent naming (camelCase, snake_case — match the project)
Functions and variables have descriptive names
No deeply nested conditionals (> 3 levels)
Comments explain "why", not "what"
Dead code or commented-out code removed
Consistent error handling patterns

Style findings are typically low or info severity.

6. Check Maintainability

Evaluate long-term code health:

Function length: Functions over 50 lines likely need extraction
Cyclomatic complexity: Too many branches = hard to test and reason about
Magic numbers/strings: Unnamed constants scattered through code
Coupling: Does this code depend on too many other modules?
Testability: Can this code be unit tested without mocking the world?
Duplication: Copy-pasted logic that should be extracted

7. Produce Structured Report

Compile all findings into the output format below. Order findings by severity (critical first, info last). Include a summary section with overall assessment.

Output Format

Structure every review as follows:

## Code Review Summary

**Context**: [1-2 sentence summary of what the code does] **Overall
Assessment**: [PASS | PASS WITH CONCERNS | NEEDS CHANGES | REJECT] **Findings**:
[N critical, N high, N medium, N low, N info]

## Findings

### [SEVERITY] [CATEGORY]: [Brief title]

**Location**: [file:line or description of where] **Issue**: [What's wrong and
why it matters] **Suggestion**: [Specific code or approach to fix it]

```diff
- problematic code
+ suggested fix
```

---

[Repeat for each finding, ordered by severity]

## Recommendations

[2-3 sentences on the most important things to address before merging]

Also produce a structured findings.json:

{
  "summary": {
    "context": "Brief description",
    "overall_assessment": "NEEDS CHANGES",
    "total_findings": 5,
    "by_severity": {
      "critical": 1,
      "high": 1,
      "medium": 2,
      "low": 1,
      "info": 0
    },
    "by_category": {
      "security": 1,
      "bug": 1,
      "performance": 1,
      "style": 1,
      "maintainability": 1
    }
  },
  "findings": [
    {
      "severity": "critical",
      "category": "security",
      "title": "SQL injection via unsanitized user input",
      "location": "app/db.py:42",
      "issue": "User input is concatenated directly into SQL query string",
      "suggestion": "Use parameterized queries instead of string concatenation",
      "fix_diff": "- cursor.execute(f\"SELECT * FROM users WHERE id = {user_id}\")\n+ cursor.execute(\"SELECT * FROM users WHERE id = %s\", (user_id,))"
    }
  ]
}

Example

Input code to review:

def get_user_orders(db, user_id):
    query = "SELECT * FROM orders WHERE user_id = " + user_id
    results = db.execute(query)
    orders = []
    for row in results:
        product = db.execute(f"SELECT name FROM products WHERE id = {row['product_id']}")
        orders.append({"order": row, "product": product[0]["name"]})
    return orders

Review output:

Code Review Summary

Context: Function retrieves a user's orders with product names from a database. Overall Assessment: NEEDS CHANGES Findings: 1 critical, 1 high, 1 medium, 0 low, 0 info

Findings

CRITICAL SECURITY: SQL injection via string concatenation

Location: Line 2 Issue: user_id is concatenated directly into the SQL query. An attacker can inject arbitrary SQL (e.g., 1 OR 1=1; DROP TABLE orders). This is the #1 web application vulnerability (OWASP A03:2021). Suggestion: Use parameterized queries.

- query = "SELECT * FROM orders WHERE user_id = " + user_id
+ query = "SELECT * FROM orders WHERE user_id = %s"
+ results = db.execute(query, (user_id,))

HIGH PERFORMANCE: N+1 query pattern

Location: Lines 4-6 Issue: For each order row, a separate database query fetches the product name. With 100 orders, this executes 101 queries instead of 2. This will degrade linearly with data volume. Suggestion: Use a JOIN or batch query.

- for row in results:
-     product = db.execute(f"SELECT name FROM products WHERE id = {row['product_id']}")
-     orders.append({"order": row, "product": product[0]["name"]})
+ query = """
+     SELECT o.*, p.name as product_name
+     FROM orders o JOIN products p ON o.product_id = p.id
+     WHERE o.user_id = %s
+ """
+ results = db.execute(query, (user_id,))
+ orders = [{"order": row, "product": row["product_name"]} for row in results]

MEDIUM BUG: No error handling for empty results

Location: Line 7 Issue: If product query returns empty results, product[0]["name"] will raise an IndexError. This is likely when products are deleted but orders remain. Suggestion: Add a guard for empty results.

- orders.append({"order": row, "product": product[0]["name"]})
+ product_name = product[0]["name"] if product else "Unknown Product"
+ orders.append({"order": row, "product": product_name})

Recommendations

Fix the SQL injection immediately — it's a critical security vulnerability. Then refactor the N+1 query pattern with a JOIN, which will also eliminate the empty result bug. Add parameterized queries throughout.

Common Mistakes

Mistake	Fix
Vague findings like "this could be improved"	Always state WHAT is wrong, WHY it matters, and HOW to fix it
Missing severity levels	Every finding needs critical/high/medium/low/info
Reviewing only for style, ignoring security	Follow the full workflow: bugs → security → performance → style → maintainability
Suggesting fixes without showing code	Include diff-style code suggestions for non-trivial fixes
Treating all issues as equal priority	Order by severity; developers should fix critical/high first
Reviewing code you don't understand	Ask for context first; state assumptions explicitly
Nitpicking style in code with critical bugs	Focus on what matters most; don't bury critical findings in style noise

Key Principles

Be specific, not vague — "Line 42 has an SQL injection because user_id is concatenated into the query" beats "watch out for security issues." Every finding must point to a specific location and explain the concrete problem.
Always suggest a complete fix — Identifying a problem without a solution is only half the job. Include code diffs or clear instructions for every finding above info severity. A fix must be concrete enough that a developer can implement it without guessing — show the actual code change, not just "add error handling." If the fix requires imports, setup, or configuration changes, include those too.
Explain the why — Developers learn and prioritize better when they understand consequences. "This N+1 pattern will execute 1001 queries for 1000 rows" is more compelling than "this is an N+1 query."
Prioritize ruthlessly — A review with 3 critical findings and 2 high findings is more useful than one with 47 low-severity style nits. Lead with what matters.
Categorize consistently — Use the severity and category system for every finding. This lets developers filter, triage, and track review feedback systematically.

code-reviewer

Safety Notice

Copy this and send it to your AI assistant to learn