Mongoose Schema Auditor
Audit Mongoose schema definitions for correctness, performance, and data integrity. Reviews schema design, index strategy, population chains, middleware hooks, virtual fields, discriminators, pagination patterns, and connection management. Acts as a senior MongoDB engineer auditing your Mongoose models for production readiness.
Usage
Invoke this skill when you need to review Mongoose schemas, optimize query performance, or validate data modeling best practices.
Basic invocation:
Audit the Mongoose schemas in /path/to/models/ Review this Mongoose model for best practices Check Mongoose query performance across the project
Focused analysis:
Audit indexing strategy for all Mongoose models Review population chains for N+1 risks Check middleware hooks for side-effect issues Analyze pagination patterns for cursor vs. offset
The agent reads Mongoose model files, parses schema definitions and query patterns, and produces a comprehensive quality report.
How It Works
Step 1: Discover and Parse Model Files
The agent locates all Mongoose model definitions:
# Find model files
find /path/to/src/ -name "*.model.ts" -o -name "*.model.js" -o -name "*.schema.ts" -o -name "*.schema.js"
# Find files with mongoose.model() calls
grep -rl "mongoose.model\|new Schema\|Schema(\|model(" /path/to/src/ --include="*.ts" --include="*.js"
# Find population patterns
grep -rn "\.populate(" /path/to/src/ --include="*.ts" --include="*.js"
# Find query patterns
grep -rn "\.find(\|\.findOne(\|\.findById(\|\.aggregate(" /path/to/src/ --include="*.ts" --include="*.js"
The agent parses each model file to extract:
- Schema definitions (fields, types, nested schemas)
- Index definitions (single, compound, text, TTL, geospatial)
- Virtual fields (getters, setters, population virtuals)
- Middleware hooks (pre/post save, validate, remove, find)
- Static and instance methods
- Discriminators (inheritance patterns)
- Plugin usage (mongoose-paginate, mongoose-delete, etc.)
- Population references (ref, populate paths)
Step 2: Audit Schema Design
The agent checks each schema's structural design:
Field type analysis:
// GOOD: Well-designed schema
const userSchema = new Schema({
email: {
type: String,
required: [true, "Email is required"],
unique: true,
lowercase: true,
trim: true,
match: [/^\S+@\S+\.\S+$/, "Invalid email format"],
index: true,
},
name: {
type: String,
required: true,
trim: true,
minlength: [1, "Name cannot be empty"],
maxlength: [100, "Name too long"],
},
role: {
type: String,
enum: {
values: ["user", "admin", "moderator"],
message: "{VALUE} is not a valid role",
},
default: "user",
},
metadata: {
type: Map,
of: String,
},
}, {
timestamps: true,
toJSON: { virtuals: true },
toObject: { virtuals: true },
});
// PROBLEMS the agent detects:
FAIL: Schema "User"
Field "email" — type: String, no trim or lowercase
Users can register with " User@EXAMPLE.com " and "user@example.com" as different accounts
FIX: Add lowercase: true, trim: true
FAIL: Schema "Product"
Field "price" — type: Number, no min constraint
Negative prices possible at DB level
FIX: Add min: [0, "Price cannot be negative"]
FAIL: Schema "Order"
Field "status" — type: String, no enum constraint
Any string value accepted — "active", "ACTIVE", "actve" all valid
FIX: Add enum: ["pending", "processing", "shipped", "delivered", "cancelled"]
WARN: Schema "Post"
Field "content" — type: String, no maxlength
RISK: Unbounded string field — single document can grow to 16 MB limit
FIX: Add maxlength: [50000, "Content exceeds maximum length"]
FAIL: Schema "Comment"
Deeply nested subdocument array (3 levels deep)
comments[].replies[].reactions[]
RISK: Unbounded nested arrays can exceed 16 MB document limit
RISK: Cannot index fields inside deeply nested arrays efficiently
FIX: Model replies and reactions as separate collections with references
WARN: Schema "UserProfile"
Using Mixed type for "preferences" field
Mixed type disables change detection — must call markModified()
FIX: Define explicit sub-schema for preferences:
preferences: { theme: String, language: String, notifications: Boolean }
WARN: Schema "Event"
No timestamps option
FIX: Add { timestamps: true } to schema options
Automatically manages createdAt and updatedAt fields
Schema options audit:
Schema Options Analysis:
FAIL: Schema "User" — no toJSON transform
Password hash, internal IDs, and __v exposed in API responses
FIX: Add transform to toJSON:
toJSON: {
transform(doc, ret) {
delete ret.password;
delete ret.__v;
ret.id = ret._id;
delete ret._id;
}
}
WARN: Schema "Post" — versionKey enabled (__v)
__v field consumes storage but rarely used correctly
Most apps don't implement optimistic concurrency control
FIX: If not using OCC: { versionKey: false }
If using OCC: Implement proper version checking in updates
WARN: Schema "Product" — strict mode not explicitly set
Default: strict = true (ignores fields not in schema)
Silently drops data — hard to debug missing fields
RECOMMEND: strict: "throw" in development, strict: true in production
FAIL: Schema "AuditLog" — no collection name specified
Mongoose pluralizes: "AuditLog" becomes "auditlogs"
FIX: Explicit collection name: { collection: "audit_logs" }
Step 3: Analyze Index Strategy
The agent evaluates index coverage:
Index Analysis:
Collection "users" (estimated: 500K documents)
Indexes:
_id (default)
email_1 (unique)
FAIL: No index on "createdAt"
Query pattern: User.find().sort({ createdAt: -1 }).limit(20)
Without index: Full collection scan + in-memory sort
FIX: userSchema.index({ createdAt: -1 })
FAIL: No compound index for filtered + sorted queries
Pattern: User.find({ role: "admin" }).sort({ name: 1 })
FIX: userSchema.index({ role: 1, name: 1 })
Index must match filter fields first, then sort fields
Collection "orders"
Indexes:
_id (default)
userId_1
FAIL: Missing compound index for common dashboard query
Pattern: Order.find({ userId, status: "active" }).sort({ createdAt: -1 })
Current: Only userId indexed — scans all user orders for status
FIX: orderSchema.index({ userId: 1, status: 1, createdAt: -1 })
WARN: No TTL index on completed orders
Old completed orders accumulate indefinitely
FIX: orderSchema.index({ completedAt: 1 }, { expireAfterSeconds: 7776000 })
Auto-deletes completed orders after 90 days
Collection "sessions"
FAIL: No TTL index for session expiry
Expired sessions accumulate — collection grows unbounded
FIX: sessionSchema.index({ expiresAt: 1 }, { expireAfterSeconds: 0 })
MongoDB automatically removes documents when expiresAt passes
Collection "products"
WARN: Text index missing for search functionality
Pattern: Product.find({ $text: { $search: query } })
FIX: productSchema.index({ name: "text", description: "text" })
AND: Set weights: { name: 10, description: 5 } for relevance ranking
Index Count Warning:
Collection "orders" has 8 indexes
WARN: Each additional index slows writes and consumes RAM
Review: Are all 8 indexes actively used?
Check with: db.orders.aggregate([{ $indexStats: {} }])
Remove unused indexes to improve write performance
Step 4: Review Population Patterns
The agent audits .populate() usage for performance:
Population Analysis:
FAIL: Deep population chain detected
Order.findById(id)
.populate("user")
.populate("items.product")
.populate("items.product.category")
.populate("items.product.reviews")
.populate("shippingAddress")
This generates 5+ additional MongoDB queries per request
For a list of 20 orders: 100+ database round trips
FIX: Use aggregation pipeline with $lookup for controlled joins:
Order.aggregate([
{ $match: { _id: ObjectId(id) } },
{ $lookup: { from: "users", localField: "user", foreignField: "_id", as: "user" } },
{ $unwind: "$user" },
])
OR: Denormalize frequently accessed fields into the order document
FAIL: Population inside a loop
const orders = await Order.find({ userId });
for (const order of orders) {
await order.populate("items.product"); // N+1 query!
}
FIX: Populate in the initial query:
Order.find({ userId }).populate("items.product")
WARN: Population without field selection
User.findById(id).populate("posts")
Loads ALL fields of ALL posts — potentially megabytes of data
FIX: Select only needed fields:
.populate({ path: "posts", select: "title createdAt -_id" })
AND: Limit results:
.populate({ path: "posts", options: { limit: 10, sort: { createdAt: -1 } } })
WARN: Circular population possible
User -> posts (populate) -> author (populate) -> posts ...
RISK: No built-in depth limit — can cause infinite recursion
FIX: Use maxDepth option or explicit field selection to break cycle
FAIL: Population on field without ref
commentSchema: { author: { type: String } }
Code: Comment.find().populate("author")
RISK: Silently fails — returns null for author without error
FIX: Add ref: author: { type: Schema.Types.ObjectId, ref: "User" }
Step 5: Audit Middleware Hooks
The agent checks middleware for correctness and performance:
Middleware Analysis:
FAIL: Pre-save hook with async operation and no error handling
userSchema.pre("save", async function() {
this.password = await bcrypt.hash(this.password, 10);
});
PROBLEMS:
1. Hashes password on EVERY save, not just when changed
2. No error handling — bcrypt failure silently corrupts data
FIX:
userSchema.pre("save", async function() {
if (!this.isModified("password")) return;
try {
this.password = await bcrypt.hash(this.password, 12);
} catch (err) {
throw new Error("Password hashing failed");
}
});
FAIL: Pre-save hook not triggered by findOneAndUpdate
userSchema.pre("save", function() { this.updatedAt = new Date(); });
But code uses: User.findOneAndUpdate(filter, update)
findOneAndUpdate BYPASSES save middleware
FIX: Add pre-findOneAndUpdate hook:
userSchema.pre("findOneAndUpdate", function() {
this.set({ updatedAt: new Date() });
});
OR: Use timestamps: true in schema options (handles both)
WARN: Post-remove hook for cleanup but using deleteMany
productSchema.post("remove", async function() {
await Review.deleteMany({ productId: this._id });
});
But code uses: Product.deleteMany({ category: "old" })
deleteMany does NOT trigger document middleware
FIX: Use query middleware:
productSchema.pre("deleteMany", async function() {
const products = await this.model.find(this.getFilter());
const productIds = products.map(p => p._id);
await Review.deleteMany({ productId: { $in: productIds } });
});
WARN: Heavy computation in pre-find middleware
postSchema.pre("find", function() {
// Runs on EVERY find query — adds latency
this.where({ deletedAt: null }); // soft delete filter
});
PASS: Logic is correct (soft delete pattern)
WARN: Consider using mongoose-delete plugin for consistent soft delete
OR: Use a discriminator pattern for archived documents
FAIL: Middleware execution order dependency
orderSchema.pre("save", calculateTotal); // Depends on items
orderSchema.pre("save", validateInventory); // Modifies items
RISK: calculateTotal runs before validateInventory — stale data
FIX: Ensure correct order or combine into single middleware
Step 6: Check Query Patterns
The agent reviews query efficiency:
Query Pattern Analysis:
FAIL: No .lean() on read-only queries
const users = await User.find({ role: "admin" });
Returns full Mongoose documents with change tracking overhead
For API responses, lean() is 2-5x faster:
FIX: const users = await User.find({ role: "admin" }).lean();
CAUTION: lean() documents don't have methods, virtuals, or middleware
FAIL: Fetching all documents without pagination
const allProducts = await Product.find({});
RISK: Collection with 100K+ documents — response is 100+ MB
RISK: MongoDB cursor timeout, Node.js memory exhaustion
FIX: Always paginate:
Product.find({}).skip(page * limit).limit(limit)
BETTER: Cursor-based pagination:
Product.find({ _id: { $gt: lastId } }).limit(limit).sort({ _id: 1 })
FAIL: Using skip() for deep pagination
Product.find().skip(10000).limit(20)
MongoDB must iterate through 10,000 documents to skip them
Performance degrades linearly with page number
FIX: Use cursor-based pagination:
Product.find({ _id: { $gt: lastSeenId } }).sort({ _id: 1 }).limit(20)
WARN: Select projection missing on large documents
const post = await Post.findById(id);
Post has "content" field (10-50 KB per document)
If only showing title and date in a list, content wastes bandwidth
FIX: Post.findById(id).select("title author createdAt")
FAIL: Using find() + filter in JavaScript instead of MongoDB query
const users = await User.find({});
const admins = users.filter(u => u.role === "admin");
Fetches ALL users, filters in Node.js — wastes bandwidth and memory
FIX: const admins = await User.find({ role: "admin" });
WARN: Aggregation without allowDiskUse for large datasets
Order.aggregate([
{ $group: { _id: "$userId", total: { $sum: "$amount" } } },
]);
If results exceed 100 MB RAM limit, aggregation fails
FIX: Order.aggregate([...]).allowDiskUse(true)
Step 7: Review Connection Management
The agent audits connection configuration:
Connection Analysis:
FAIL: No connection options configured
mongoose.connect("mongodb://localhost:27017/myapp");
Using all defaults — not production-ready
FIX: mongoose.connect(uri, {
maxPoolSize: 10,
minPoolSize: 2,
socketTimeoutMS: 45000,
serverSelectionTimeoutMS: 5000,
heartbeatFrequencyMS: 10000,
retryWrites: true,
w: "majority",
readPreference: "secondaryPreferred",
});
FAIL: No connection error handling
mongoose.connect(uri); // No .catch() or error event handler
FIX: Handle connection events:
mongoose.connection.on("error", (err) => { logger.error(err); });
mongoose.connection.on("disconnected", () => { logger.warn("Disconnected"); });
mongoose.connection.on("reconnected", () => { logger.info("Reconnected"); });
FAIL: Multiple mongoose.connect() calls
Found in: app.ts (line 15), test-setup.ts (line 8), seed.ts (line 5)
RISK: Connection pool exhaustion, race conditions
FIX: Single connection module imported everywhere:
// db.ts
let connection: mongoose.Connection | null = null;
export async function getConnection() {
if (!connection) connection = await mongoose.connect(uri, opts);
return connection;
}
WARN: No graceful shutdown handler
Application exits without closing MongoDB connection
RISK: Pending operations lost, connection pool leak
FIX: process.on("SIGTERM", async () => {
await mongoose.connection.close();
process.exit(0);
});
Step 8: Audit Discriminator Usage
The agent checks schema inheritance patterns:
Discriminator Analysis:
Model "Event" uses discriminators:
ClickEvent (discriminatorKey: "type")
PurchaseEvent (discriminatorKey: "type")
SignupEvent (discriminatorKey: "type")
PASS: Discriminator key matches query pattern
Event.find({ type: "click" }) — uses discriminator for efficient queries
WARN: No index on discriminator key
Queries filtering by event type scan entire collection
FIX: eventSchema.index({ type: 1, createdAt: -1 })
FAIL: Base schema has fields only used by one discriminator
"paymentMethod" field on base Event schema — only used by PurchaseEvent
RISK: Wasted storage on 90% of documents, confusing API
FIX: Move paymentMethod to PurchaseEvent discriminator schema
WARN: Discriminator without base schema validation
PurchaseEvent allows fields from ClickEvent (no strict separation)
FIX: Use strict: "throw" on discriminator schemas during development
Step 9: Check Data Integrity Patterns
The agent reviews data consistency approaches:
Data Integrity Analysis:
FAIL: No unique compound index for business rules
Orders have "userId" + "productId" + "status" uniqueness requirement
(user can't have two active orders for same product)
Database allows duplicates — enforced only in application code
FIX: orderSchema.index(
{ userId: 1, productId: 1, status: 1 },
{ unique: true, partialFilterExpression: { status: "active" } }
)
FAIL: Reference integrity not enforced
Order.userId references User — but no validation that user exists
FIX: Add pre-save validation:
orderSchema.pre("save", async function() {
const userExists = await mongoose.model("User").exists({ _id: this.userId });
if (!userExists) throw new Error("Referenced user does not exist");
});
WARN: No schema validation on update operations
Schema validators only run on save() by default
findOneAndUpdate bypasses required field validation
FIX: Set runValidators globally:
mongoose.set("runValidators", true);
OR: Per-query: { runValidators: true } option
FAIL: Atomic counter without $inc
const user = await User.findById(id);
user.loginCount += 1;
await user.save();
RISK: Race condition — two concurrent logins both read same count
FIX: User.findByIdAndUpdate(id, { $inc: { loginCount: 1 } })
Step 10: Produce the Analysis Report
The agent generates a comprehensive report:
# Mongoose Schema Audit Report
# Models Path: /src/models/ | Date: April 30, 2026
## Overview
Models: 12
Total fields: 94
Indexes defined: 18
Middleware hooks: 9
Population refs: 15
Discriminators: 3
## Overall Health Score: 55/100
## Category Scores
Schema Design: 6/10 (missing validators, Mixed types)
Index Strategy: 4/10 (missing compounds, no TTL indexes)
Population Patterns: 4/10 (deep chains, N+1, no field selection)
Middleware: 5/10 (bypass risks, order dependency)
Query Efficiency: 4/10 (no lean, no pagination, full scans)
Connection Mgmt: 5/10 (no options, no error handling)
Data Integrity: 5/10 (no compound unique, no ref validation)
Discriminators: 7/10 (correct usage, missing index)
Schema Organization: 6/10 (good file structure, missing docs)
## Critical Issues
1. No .lean() on 80% of read queries — 2-5x performance penalty
2. Deep population chain (5 levels) on order detail endpoint
3. find() without pagination on products collection (100K+ docs)
4. skip()-based pagination on high-traffic listing endpoint
5. No connection error handling — silent disconnection
## Recommendations Summary
Estimated effort: 3-5 days for critical + high priority fixes
Expected improvement: 55 -> 82 health score
Risk reduction: Eliminates N+1 queries and memory exhaustion
Output
The agent produces:
- Health score: 0-100 overall schema quality rating
- Category scores: granular ratings for each quality dimension
- Critical issues: problems that pose performance or data risk
- Per-model analysis: field, index, and middleware audit
- Population map: visual representation of reference chains
- Query efficiency review: analysis of query patterns with optimizations
- Index recommendations: specific compound indexes for detected patterns
- Remediation code: exact Mongoose code to fix each issue
- Priority matrix: issues ranked by risk and effort
Scope Options
| Scope | What It Covers |
|---|---|
| Full (default) | All models, queries, and connections |
| Single model | Deep analysis of one schema definition |
| Index audit | Index coverage against detected query patterns |
| Population | Population chain analysis and optimization |
| Middleware | Hook correctness and bypass risk |
| Changed | Only model files changed in current git branch |
Tips for Best Results
- Point the agent at both model files and query/service files for full coverage
- Include controller or route files so the agent can trace query patterns
- Share MongoDB explain() output for query plan verification
- Run db.collection.stats() and share output for size-aware recommendations
- For TypeScript projects, include type definition files for type safety review
- Combine with MongoDB Atlas Performance Advisor data for production-informed optimization