Mongoose Schema Auditor

Audit Mongoose schema definitions for correctness, performance, and data integrity. Reviews schema design, index strategy, population chains, middleware hooks, virtual fields, discriminators, pagination patterns, and connection management. Acts as a senior MongoDB engineer auditing your Mongoose models for production readiness.

Usage

Invoke this skill when you need to review Mongoose schemas, optimize query performance, or validate data modeling best practices.

Basic invocation:

Audit the Mongoose schemas in /path/to/models/ Review this Mongoose model for best practices Check Mongoose query performance across the project

Focused analysis:

Audit indexing strategy for all Mongoose models Review population chains for N+1 risks Check middleware hooks for side-effect issues Analyze pagination patterns for cursor vs. offset

The agent reads Mongoose model files, parses schema definitions and query patterns, and produces a comprehensive quality report.

How It Works

Step 1: Discover and Parse Model Files

The agent locates all Mongoose model definitions:

# Find model files
find /path/to/src/ -name "*.model.ts" -o -name "*.model.js" -o -name "*.schema.ts" -o -name "*.schema.js"

# Find files with mongoose.model() calls
grep -rl "mongoose.model\|new Schema\|Schema(\|model(" /path/to/src/ --include="*.ts" --include="*.js"

# Find population patterns
grep -rn "\.populate(" /path/to/src/ --include="*.ts" --include="*.js"

# Find query patterns
grep -rn "\.find(\|\.findOne(\|\.findById(\|\.aggregate(" /path/to/src/ --include="*.ts" --include="*.js"

The agent parses each model file to extract:

Schema definitions (fields, types, nested schemas)
Index definitions (single, compound, text, TTL, geospatial)
Virtual fields (getters, setters, population virtuals)
Middleware hooks (pre/post save, validate, remove, find)
Static and instance methods
Discriminators (inheritance patterns)
Plugin usage (mongoose-paginate, mongoose-delete, etc.)
Population references (ref, populate paths)

Step 2: Audit Schema Design

The agent checks each schema's structural design:

Field type analysis:

// GOOD: Well-designed schema
const userSchema = new Schema({
  email: {
    type: String,
    required: [true, "Email is required"],
    unique: true,
    lowercase: true,
    trim: true,
    match: [/^\S+@\S+\.\S+$/, "Invalid email format"],
    index: true,
  },
  name: {
    type: String,
    required: true,
    trim: true,
    minlength: [1, "Name cannot be empty"],
    maxlength: [100, "Name too long"],
  },
  role: {
    type: String,
    enum: {
      values: ["user", "admin", "moderator"],
      message: "{VALUE} is not a valid role",
    },
    default: "user",
  },
  metadata: {
    type: Map,
    of: String,
  },
}, {
  timestamps: true,
  toJSON: { virtuals: true },
  toObject: { virtuals: true },
});

// PROBLEMS the agent detects:

FAIL: Schema "User"
  Field "email" — type: String, no trim or lowercase
  Users can register with " User@EXAMPLE.com " and "user@example.com" as different accounts
  FIX: Add lowercase: true, trim: true

FAIL: Schema "Product"
  Field "price" — type: Number, no min constraint
  Negative prices possible at DB level
  FIX: Add min: [0, "Price cannot be negative"]

FAIL: Schema "Order"
  Field "status" — type: String, no enum constraint
  Any string value accepted — "active", "ACTIVE", "actve" all valid
  FIX: Add enum: ["pending", "processing", "shipped", "delivered", "cancelled"]

WARN: Schema "Post"
  Field "content" — type: String, no maxlength
  RISK: Unbounded string field — single document can grow to 16 MB limit
  FIX: Add maxlength: [50000, "Content exceeds maximum length"]

FAIL: Schema "Comment"
  Deeply nested subdocument array (3 levels deep)
  comments[].replies[].reactions[]
  RISK: Unbounded nested arrays can exceed 16 MB document limit
  RISK: Cannot index fields inside deeply nested arrays efficiently
  FIX: Model replies and reactions as separate collections with references

WARN: Schema "UserProfile"
  Using Mixed type for "preferences" field
  Mixed type disables change detection — must call markModified()
  FIX: Define explicit sub-schema for preferences:
    preferences: { theme: String, language: String, notifications: Boolean }

WARN: Schema "Event"
  No timestamps option
  FIX: Add { timestamps: true } to schema options
  Automatically manages createdAt and updatedAt fields

Schema options audit:

Schema Options Analysis:

  FAIL: Schema "User" — no toJSON transform
    Password hash, internal IDs, and __v exposed in API responses
    FIX: Add transform to toJSON:
      toJSON: {
        transform(doc, ret) {
          delete ret.password;
          delete ret.__v;
          ret.id = ret._id;
          delete ret._id;
        }
      }

  WARN: Schema "Post" — versionKey enabled (__v)
    __v field consumes storage but rarely used correctly
    Most apps don't implement optimistic concurrency control
    FIX: If not using OCC: { versionKey: false }
    If using OCC: Implement proper version checking in updates

  WARN: Schema "Product" — strict mode not explicitly set
    Default: strict = true (ignores fields not in schema)
    Silently drops data — hard to debug missing fields
    RECOMMEND: strict: "throw" in development, strict: true in production

  FAIL: Schema "AuditLog" — no collection name specified
    Mongoose pluralizes: "AuditLog" becomes "auditlogs"
    FIX: Explicit collection name: { collection: "audit_logs" }

Step 3: Analyze Index Strategy

The agent evaluates index coverage:

Index Analysis:

  Collection "users" (estimated: 500K documents)
    Indexes:
      _id (default)
      email_1 (unique)

    FAIL: No index on "createdAt"
      Query pattern: User.find().sort({ createdAt: -1 }).limit(20)
      Without index: Full collection scan + in-memory sort
      FIX: userSchema.index({ createdAt: -1 })

    FAIL: No compound index for filtered + sorted queries
      Pattern: User.find({ role: "admin" }).sort({ name: 1 })
      FIX: userSchema.index({ role: 1, name: 1 })
      Index must match filter fields first, then sort fields

  Collection "orders"
    Indexes:
      _id (default)
      userId_1

    FAIL: Missing compound index for common dashboard query
      Pattern: Order.find({ userId, status: "active" }).sort({ createdAt: -1 })
      Current: Only userId indexed — scans all user orders for status
      FIX: orderSchema.index({ userId: 1, status: 1, createdAt: -1 })

    WARN: No TTL index on completed orders
      Old completed orders accumulate indefinitely
      FIX: orderSchema.index({ completedAt: 1 }, { expireAfterSeconds: 7776000 })
      Auto-deletes completed orders after 90 days

  Collection "sessions"
    FAIL: No TTL index for session expiry
      Expired sessions accumulate — collection grows unbounded
      FIX: sessionSchema.index({ expiresAt: 1 }, { expireAfterSeconds: 0 })
      MongoDB automatically removes documents when expiresAt passes

  Collection "products"
    WARN: Text index missing for search functionality
      Pattern: Product.find({ $text: { $search: query } })
      FIX: productSchema.index({ name: "text", description: "text" })
      AND: Set weights: { name: 10, description: 5 } for relevance ranking

  Index Count Warning:
    Collection "orders" has 8 indexes
    WARN: Each additional index slows writes and consumes RAM
    Review: Are all 8 indexes actively used?
    Check with: db.orders.aggregate([{ $indexStats: {} }])
    Remove unused indexes to improve write performance

Step 4: Review Population Patterns

The agent audits .populate() usage for performance:

Population Analysis:

  FAIL: Deep population chain detected
    Order.findById(id)
      .populate("user")
      .populate("items.product")
      .populate("items.product.category")
      .populate("items.product.reviews")
      .populate("shippingAddress")

    This generates 5+ additional MongoDB queries per request
    For a list of 20 orders: 100+ database round trips
    FIX: Use aggregation pipeline with $lookup for controlled joins:
      Order.aggregate([
        { $match: { _id: ObjectId(id) } },
        { $lookup: { from: "users", localField: "user", foreignField: "_id", as: "user" } },
        { $unwind: "$user" },
      ])
    OR: Denormalize frequently accessed fields into the order document

  FAIL: Population inside a loop
    const orders = await Order.find({ userId });
    for (const order of orders) {
      await order.populate("items.product");  // N+1 query!
    }
    FIX: Populate in the initial query:
      Order.find({ userId }).populate("items.product")

  WARN: Population without field selection
    User.findById(id).populate("posts")
    Loads ALL fields of ALL posts — potentially megabytes of data
    FIX: Select only needed fields:
      .populate({ path: "posts", select: "title createdAt -_id" })
    AND: Limit results:
      .populate({ path: "posts", options: { limit: 10, sort: { createdAt: -1 } } })

  WARN: Circular population possible
    User -> posts (populate) -> author (populate) -> posts ...
    RISK: No built-in depth limit — can cause infinite recursion
    FIX: Use maxDepth option or explicit field selection to break cycle

  FAIL: Population on field without ref
    commentSchema: { author: { type: String } }
    Code: Comment.find().populate("author")
    RISK: Silently fails — returns null for author without error
    FIX: Add ref: author: { type: Schema.Types.ObjectId, ref: "User" }

Step 5: Audit Middleware Hooks

The agent checks middleware for correctness and performance:

Middleware Analysis:

  FAIL: Pre-save hook with async operation and no error handling
    userSchema.pre("save", async function() {
      this.password = await bcrypt.hash(this.password, 10);
    });
    PROBLEMS:
      1. Hashes password on EVERY save, not just when changed
      2. No error handling — bcrypt failure silently corrupts data
    FIX:
      userSchema.pre("save", async function() {
        if (!this.isModified("password")) return;
        try {
          this.password = await bcrypt.hash(this.password, 12);
        } catch (err) {
          throw new Error("Password hashing failed");
        }
      });

  FAIL: Pre-save hook not triggered by findOneAndUpdate
    userSchema.pre("save", function() { this.updatedAt = new Date(); });
    But code uses: User.findOneAndUpdate(filter, update)
    findOneAndUpdate BYPASSES save middleware
    FIX: Add pre-findOneAndUpdate hook:
      userSchema.pre("findOneAndUpdate", function() {
        this.set({ updatedAt: new Date() });
      });
    OR: Use timestamps: true in schema options (handles both)

  WARN: Post-remove hook for cleanup but using deleteMany
    productSchema.post("remove", async function() {
      await Review.deleteMany({ productId: this._id });
    });
    But code uses: Product.deleteMany({ category: "old" })
    deleteMany does NOT trigger document middleware
    FIX: Use query middleware:
      productSchema.pre("deleteMany", async function() {
        const products = await this.model.find(this.getFilter());
        const productIds = products.map(p => p._id);
        await Review.deleteMany({ productId: { $in: productIds } });
      });

  WARN: Heavy computation in pre-find middleware
    postSchema.pre("find", function() {
      // Runs on EVERY find query — adds latency
      this.where({ deletedAt: null }); // soft delete filter
    });
    PASS: Logic is correct (soft delete pattern)
    WARN: Consider using mongoose-delete plugin for consistent soft delete
    OR: Use a discriminator pattern for archived documents

  FAIL: Middleware execution order dependency
    orderSchema.pre("save", calculateTotal);    // Depends on items
    orderSchema.pre("save", validateInventory); // Modifies items
    RISK: calculateTotal runs before validateInventory — stale data
    FIX: Ensure correct order or combine into single middleware

Step 6: Check Query Patterns

The agent reviews query efficiency:

Query Pattern Analysis:

  FAIL: No .lean() on read-only queries
    const users = await User.find({ role: "admin" });
    Returns full Mongoose documents with change tracking overhead
    For API responses, lean() is 2-5x faster:
    FIX: const users = await User.find({ role: "admin" }).lean();
    CAUTION: lean() documents don't have methods, virtuals, or middleware

  FAIL: Fetching all documents without pagination
    const allProducts = await Product.find({});
    RISK: Collection with 100K+ documents — response is 100+ MB
    RISK: MongoDB cursor timeout, Node.js memory exhaustion
    FIX: Always paginate:
      Product.find({}).skip(page * limit).limit(limit)
    BETTER: Cursor-based pagination:
      Product.find({ _id: { $gt: lastId } }).limit(limit).sort({ _id: 1 })

  FAIL: Using skip() for deep pagination
    Product.find().skip(10000).limit(20)
    MongoDB must iterate through 10,000 documents to skip them
    Performance degrades linearly with page number
    FIX: Use cursor-based pagination:
      Product.find({ _id: { $gt: lastSeenId } }).sort({ _id: 1 }).limit(20)

  WARN: Select projection missing on large documents
    const post = await Post.findById(id);
    Post has "content" field (10-50 KB per document)
    If only showing title and date in a list, content wastes bandwidth
    FIX: Post.findById(id).select("title author createdAt")

  FAIL: Using find() + filter in JavaScript instead of MongoDB query
    const users = await User.find({});
    const admins = users.filter(u => u.role === "admin");
    Fetches ALL users, filters in Node.js — wastes bandwidth and memory
    FIX: const admins = await User.find({ role: "admin" });

  WARN: Aggregation without allowDiskUse for large datasets
    Order.aggregate([
      { $group: { _id: "$userId", total: { $sum: "$amount" } } },
    ]);
    If results exceed 100 MB RAM limit, aggregation fails
    FIX: Order.aggregate([...]).allowDiskUse(true)

Step 7: Review Connection Management

The agent audits connection configuration:

Connection Analysis:

  FAIL: No connection options configured
    mongoose.connect("mongodb://localhost:27017/myapp");
    Using all defaults — not production-ready
    FIX: mongoose.connect(uri, {
      maxPoolSize: 10,
      minPoolSize: 2,
      socketTimeoutMS: 45000,
      serverSelectionTimeoutMS: 5000,
      heartbeatFrequencyMS: 10000,
      retryWrites: true,
      w: "majority",
      readPreference: "secondaryPreferred",
    });

  FAIL: No connection error handling
    mongoose.connect(uri); // No .catch() or error event handler
    FIX: Handle connection events:
      mongoose.connection.on("error", (err) => { logger.error(err); });
      mongoose.connection.on("disconnected", () => { logger.warn("Disconnected"); });
      mongoose.connection.on("reconnected", () => { logger.info("Reconnected"); });

  FAIL: Multiple mongoose.connect() calls
    Found in: app.ts (line 15), test-setup.ts (line 8), seed.ts (line 5)
    RISK: Connection pool exhaustion, race conditions
    FIX: Single connection module imported everywhere:
      // db.ts
      let connection: mongoose.Connection | null = null;
      export async function getConnection() {
        if (!connection) connection = await mongoose.connect(uri, opts);
        return connection;
      }

  WARN: No graceful shutdown handler
    Application exits without closing MongoDB connection
    RISK: Pending operations lost, connection pool leak
    FIX: process.on("SIGTERM", async () => {
      await mongoose.connection.close();
      process.exit(0);
    });

Step 8: Audit Discriminator Usage

The agent checks schema inheritance patterns:

Discriminator Analysis:

  Model "Event" uses discriminators:
    ClickEvent (discriminatorKey: "type")
    PurchaseEvent (discriminatorKey: "type")
    SignupEvent (discriminatorKey: "type")

  PASS: Discriminator key matches query pattern
    Event.find({ type: "click" }) — uses discriminator for efficient queries

  WARN: No index on discriminator key
    Queries filtering by event type scan entire collection
    FIX: eventSchema.index({ type: 1, createdAt: -1 })

  FAIL: Base schema has fields only used by one discriminator
    "paymentMethod" field on base Event schema — only used by PurchaseEvent
    RISK: Wasted storage on 90% of documents, confusing API
    FIX: Move paymentMethod to PurchaseEvent discriminator schema

  WARN: Discriminator without base schema validation
    PurchaseEvent allows fields from ClickEvent (no strict separation)
    FIX: Use strict: "throw" on discriminator schemas during development

Step 9: Check Data Integrity Patterns

The agent reviews data consistency approaches:

Data Integrity Analysis:

  FAIL: No unique compound index for business rules
    Orders have "userId" + "productId" + "status" uniqueness requirement
    (user can't have two active orders for same product)
    Database allows duplicates — enforced only in application code
    FIX: orderSchema.index(
      { userId: 1, productId: 1, status: 1 },
      { unique: true, partialFilterExpression: { status: "active" } }
    )

  FAIL: Reference integrity not enforced
    Order.userId references User — but no validation that user exists
    FIX: Add pre-save validation:
      orderSchema.pre("save", async function() {
        const userExists = await mongoose.model("User").exists({ _id: this.userId });
        if (!userExists) throw new Error("Referenced user does not exist");
      });

  WARN: No schema validation on update operations
    Schema validators only run on save() by default
    findOneAndUpdate bypasses required field validation
    FIX: Set runValidators globally:
      mongoose.set("runValidators", true);
    OR: Per-query: { runValidators: true } option

  FAIL: Atomic counter without $inc
    const user = await User.findById(id);
    user.loginCount += 1;
    await user.save();
    RISK: Race condition — two concurrent logins both read same count
    FIX: User.findByIdAndUpdate(id, { $inc: { loginCount: 1 } })

Step 10: Produce the Analysis Report

The agent generates a comprehensive report:

# Mongoose Schema Audit Report
# Models Path: /src/models/ | Date: April 30, 2026

## Overview
  Models: 12
  Total fields: 94
  Indexes defined: 18
  Middleware hooks: 9
  Population refs: 15
  Discriminators: 3

## Overall Health Score: 55/100

## Category Scores
  Schema Design:         6/10  (missing validators, Mixed types)
  Index Strategy:        4/10  (missing compounds, no TTL indexes)
  Population Patterns:   4/10  (deep chains, N+1, no field selection)
  Middleware:            5/10  (bypass risks, order dependency)
  Query Efficiency:      4/10  (no lean, no pagination, full scans)
  Connection Mgmt:       5/10  (no options, no error handling)
  Data Integrity:        5/10  (no compound unique, no ref validation)
  Discriminators:        7/10  (correct usage, missing index)
  Schema Organization:   6/10  (good file structure, missing docs)

## Critical Issues
  1. No .lean() on 80% of read queries — 2-5x performance penalty
  2. Deep population chain (5 levels) on order detail endpoint
  3. find() without pagination on products collection (100K+ docs)
  4. skip()-based pagination on high-traffic listing endpoint
  5. No connection error handling — silent disconnection

## Recommendations Summary
  Estimated effort: 3-5 days for critical + high priority fixes
  Expected improvement: 55 -> 82 health score
  Risk reduction: Eliminates N+1 queries and memory exhaustion

Output

The agent produces:

Health score: 0-100 overall schema quality rating
Category scores: granular ratings for each quality dimension
Critical issues: problems that pose performance or data risk
Per-model analysis: field, index, and middleware audit
Population map: visual representation of reference chains
Query efficiency review: analysis of query patterns with optimizations
Index recommendations: specific compound indexes for detected patterns
Remediation code: exact Mongoose code to fix each issue
Priority matrix: issues ranked by risk and effort

Scope Options

Scope	What It Covers
Full (default)	All models, queries, and connections
Single model	Deep analysis of one schema definition
Index audit	Index coverage against detected query patterns
Population	Population chain analysis and optimization
Middleware	Hook correctness and bypass risk
Changed	Only model files changed in current git branch

Tips for Best Results

Point the agent at both model files and query/service files for full coverage
Include controller or route files so the agent can trace query patterns
Share MongoDB explain() output for query plan verification
Run db.collection.stats() and share output for size-aware recommendations
For TypeScript projects, include type definition files for type safety review
Combine with MongoDB Atlas Performance Advisor data for production-informed optimization

cm-mongoose-schema-auditor

Safety Notice

Copy this and send it to your AI assistant to learn