mongodb

MongoDB - Document Database Patterns

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "mongodb" with this command: npx skills add bobmatnyc/claude-mpm-skills/bobmatnyc-claude-mpm-skills-mongodb

MongoDB - Document Database Patterns

Overview

MongoDB is a document-oriented NoSQL database that stores data in flexible, JSON-like documents. It excels at handling unstructured or semi-structured data, hierarchical relationships, and scenarios requiring horizontal scaling.

Key Features:

  • Flexible schema (schemaless documents)

  • Rich query language with secondary indexes

  • Aggregation framework for analytics

  • Horizontal scaling (sharding)

  • Replica sets for high availability

  • Change streams for real-time data

  • Geospatial and full-text search

When to Use MongoDB:

  • Rapidly evolving schemas

  • Hierarchical/nested data (embedded documents)

  • Real-time analytics with aggregation

  • Geospatial applications

  • Content management systems

  • IoT data ingestion

  • Catalog/inventory systems

When NOT to Use MongoDB:

  • Complex multi-table joins (use RDBMS)

  • ACID transactions across many documents (improved in 4.0+, but limited)

  • Strict schema enforcement requirements

Schema Design Fundamentals

Document Structure

// MongoDB document (BSON format) { "_id": ObjectId("507f1f77bcf86cd799439011"), // Primary key (auto-generated) "email": "alice@example.com", "name": "Alice Johnson", "profile": { // Embedded document "bio": "Software developer", "avatar": "https://example.com/avatar.jpg", "social": { "twitter": "@alice", "github": "alice-dev" } }, "tags": ["developer", "python", "mongodb"], // Array field "createdAt": ISODate("2024-01-15T10:30:00Z"), "updatedAt": ISODate("2024-01-20T14:22:00Z") }

Embedded vs Referenced Documents

Embedded (Denormalized) - Store related data in same document:

// Embedded: Good for 1:1 or 1:Few relationships // User with embedded address { "_id": ObjectId("..."), "name": "Alice", "address": { "street": "123 Main St", "city": "San Francisco", "zipCode": "94102" } }

// Embedded: Order with line items (1:Many bounded) { "_id": ObjectId("..."), "orderNumber": "ORD-2024-001", "customer": { "name": "Alice", "email": "alice@example.com" }, "items": [ { "productId": "SKU001", "name": "Widget", "quantity": 2, "price": 29.99 }, { "productId": "SKU002", "name": "Gadget", "quantity": 1, "price": 49.99 } ], "total": 109.97 }

When to Embed:

  • Data is queried together frequently

  • Child data doesn't make sense without parent

  • 1:1 or 1:Few relationships

  • Child data is bounded (won't grow unbounded)

  • Data doesn't need to be accessed independently

Referenced (Normalized) - Store references to other documents:

// Referenced: Good for 1:Many unbounded or Many:Many // User document { "_id": ObjectId("user123"), "name": "Alice", "email": "alice@example.com" }

// Posts collection (references user) { "_id": ObjectId("post456"), "authorId": ObjectId("user123"), // Reference to user "title": "MongoDB Schema Design", "content": "...", "commentCount": 42 }

// Comments collection (references post) { "_id": ObjectId("comment789"), "postId": ObjectId("post456"), // Reference to post "authorId": ObjectId("user999"), // Reference to commenter "text": "Great article!", "createdAt": ISODate("2024-01-20T10:00:00Z") }

When to Reference:

  • Many:Many relationships

  • 1:Many with unbounded growth (comments, logs)

  • Data is accessed independently

  • Document size would exceed 16MB limit

  • Need atomic updates on referenced document

Hybrid Pattern (Extended Reference)

// Post with denormalized author info + reference { "_id": ObjectId("post456"), "title": "MongoDB Best Practices", "content": "...", "author": { "_id": ObjectId("user123"), // Reference for lookups "name": "Alice", // Denormalized for display "avatar": "https://..." // Frequently needed fields }, "commentCount": 42, "lastCommentAt": ISODate("...") }

Query Patterns

Basic CRUD Operations

// Find documents db.users.find({ email: "alice@example.com" }) db.users.find({ age: { $gte: 18, $lte: 65 } }) db.users.find({ tags: { $in: ["developer", "designer"] } })

// Find with projection (select specific fields) db.users.find( { status: "active" }, { name: 1, email: 1, _id: 0 } // Include name, email; exclude _id )

// Find one db.users.findOne({ email: "alice@example.com" })

// Insert db.users.insertOne({ name: "Bob", email: "bob@example.com" }) db.users.insertMany([ { name: "Charlie", email: "charlie@example.com" }, { name: "Diana", email: "diana@example.com" } ])

// Update db.users.updateOne( { email: "alice@example.com" }, { $set: { name: "Alice Updated", updatedAt: new Date() } } )

db.users.updateMany( { status: "inactive" }, { $set: { archived: true } } )

// Upsert (update or insert) db.users.updateOne( { email: "new@example.com" }, { $set: { name: "New User", createdAt: new Date() } }, { upsert: true } )

// Delete db.users.deleteOne({ email: "bob@example.com" }) db.users.deleteMany({ status: "deleted" })

Query Operators

// Comparison db.products.find({ price: { $gt: 100 } }) // Greater than db.products.find({ price: { $gte: 100 } }) // Greater than or equal db.products.find({ price: { $lt: 50 } }) // Less than db.products.find({ price: { $lte: 50 } }) // Less than or equal db.products.find({ price: { $ne: 0 } }) // Not equal db.products.find({ category: { $in: ["A", "B"] } }) // In array db.products.find({ category: { $nin: ["C", "D"] } }) // Not in array

// Logical db.users.find({ $and: [{ age: { $gte: 18 } }, { status: "active" }] }) db.users.find({ $or: [{ role: "admin" }, { role: "moderator" }] }) db.users.find({ age: { $not: { $lt: 18 } } })

// Element db.users.find({ middleName: { $exists: true } }) // Field exists db.users.find({ age: { $type: "number" } }) // Field type

// Array db.posts.find({ tags: "mongodb" }) // Contains element db.posts.find({ tags: { $all: ["mongodb", "database"] } }) // Contains all db.posts.find({ tags: { $size: 3 } }) // Array size db.posts.find({ "tags.0": "featured" }) // First element

// Embedded documents db.users.find({ "address.city": "San Francisco" }) db.users.find({ "profile.social.twitter": { $exists: true } })

// Regex db.users.find({ name: { $regex: /^alice/i } }) db.users.find({ email: { $regex: /@example.com$/ } })

Update Operators

// Field updates db.users.updateOne( { _id: userId }, { $set: { name: "New Name" }, // Set field value $unset: { temporaryField: "" }, // Remove field $rename: { oldName: "newName" }, // Rename field $inc: { loginCount: 1 }, // Increment $mul: { price: 1.1 }, // Multiply $min: { lowestScore: 50 }, // Set if less than current $max: { highestScore: 100 }, // Set if greater than current $currentDate: { updatedAt: true } // Set to current date } )

// Array updates db.posts.updateOne( { _id: postId }, { $push: { tags: "new-tag" }, // Add to array $addToSet: { tags: "unique-tag" }, // Add if not exists $pop: { tags: 1 }, // Remove last (-1 for first) $pull: { tags: "old-tag" }, // Remove specific value $pullAll: { tags: ["a", "b"] } // Remove multiple values } )

// Array with modifiers db.posts.updateOne( { _id: postId }, { $push: { comments: { $each: [comment1, comment2], // Add multiple $sort: { createdAt: -1 }, // Sort after push $slice: -100 // Keep only last 100 } } } )

// Positional update (update matched array element) db.posts.updateOne( { _id: postId, "comments._id": commentId }, { $set: { "comments.$.text": "Updated comment" } } )

// Update all matching array elements db.posts.updateOne( { _id: postId }, { $set: { "comments.$[elem].read": true } }, { arrayFilters: [{ "elem.userId": currentUserId }] } )

Indexing Strategies

Index Types

// Single field index db.users.createIndex({ email: 1 }) // Ascending db.users.createIndex({ createdAt: -1 }) // Descending

// Compound index db.orders.createIndex({ customerId: 1, createdAt: -1 })

// Unique index db.users.createIndex({ email: 1 }, { unique: true })

// Partial index (index subset of documents) db.orders.createIndex( { createdAt: 1 }, { partialFilterExpression: { status: "pending" } } )

// TTL index (auto-delete old documents) db.sessions.createIndex( { createdAt: 1 }, { expireAfterSeconds: 3600 } // Delete after 1 hour )

// Text index (full-text search) db.articles.createIndex({ title: "text", content: "text" }) // Query: db.articles.find({ $text: { $search: "mongodb tutorial" } })

// Geospatial index db.locations.createIndex({ coordinates: "2dsphere" }) // Query: db.locations.find({ // coordinates: { // $near: { // $geometry: { type: "Point", coordinates: [-122.4, 37.8] }, // $maxDistance: 5000 // meters // } // } // })

// Multikey index (for array fields - automatic) db.posts.createIndex({ tags: 1 })

// Hashed index (for sharding) db.users.createIndex({ email: "hashed" })

Index Best Practices

// ESR Rule for compound indexes: Equality, Sort, Range // Query: find users where status = "active", sorted by createdAt, age > 18 // Index: { status: 1, createdAt: -1, age: 1 } // ^Equality ^Sort ^Range

// Covered queries (all fields in index) db.users.createIndex({ email: 1, name: 1, status: 1 }) db.users.find( { email: "alice@example.com" }, { name: 1, status: 1, _id: 0 } // All fields from index ) // Check with explain: "totalDocsExamined": 0

// Analyze query performance db.users.find({ email: "alice@example.com" }).explain("executionStats") // Look for: // - IXSCAN (index scan) vs COLLSCAN (collection scan) // - totalDocsExamined vs totalKeysExamined // - executionTimeMillis

// List indexes db.users.getIndexes()

// Drop index db.users.dropIndex("email_1") db.users.dropIndex({ email: 1, name: 1 })

Aggregation Pipeline

Pipeline Stages

// Basic aggregation structure db.orders.aggregate([ { $match: { status: "completed" } }, // Filter (like WHERE) { $group: { _id: "$customerId", total: { $sum: "$amount" } } }, // Group { $sort: { total: -1 } }, // Sort { $limit: 10 } // Limit results ])

// Common stages db.collection.aggregate([ // $match - Filter documents { $match: { status: "active", createdAt: { $gte: ISODate("2024-01-01") } } },

// $project - Reshape documents (include/exclude/compute fields) { $project: { name: 1, email: 1, fullName: { $concat: ["$firstName", " ", "$lastName"] }, yearCreated: { $year: "$createdAt" } }},

// $addFields - Add new fields (keeps existing) { $addFields: { totalPrice: { $multiply: ["$price", "$quantity"] } }},

// $group - Group and aggregate { $group: { _id: "$category", count: { $sum: 1 }, totalRevenue: { $sum: "$amount" }, avgPrice: { $avg: "$price" }, maxPrice: { $max: "$price" }, products: { $push: "$name" }, // Collect into array uniqueTags: { $addToSet: "$tag" } // Unique values }},

// $sort { $sort: { totalRevenue: -1, count: 1 } },

// $skip and $limit (pagination) { $skip: 20 }, { $limit: 10 },

// $unwind - Deconstruct array field { $unwind: "$tags" }, // { tags: ["a", "b"] } becomes { tags: "a" }, { tags: "b" }

// $lookup - Join collections { $lookup: { from: "users", localField: "authorId", foreignField: "_id", as: "author" }}, { $unwind: "$author" }, // Convert single-element array to object

// $facet - Multiple pipelines in parallel { $facet: { results: [{ $skip: 0 }, { $limit: 10 }], totalCount: [{ $count: "count" }] }} ])

Real-World Aggregation Examples

// Sales analytics by month db.orders.aggregate([ { $match: { status: "completed" } }, { $group: { _id: { year: { $year: "$createdAt" }, month: { $month: "$createdAt" } }, totalSales: { $sum: "$amount" }, orderCount: { $sum: 1 }, avgOrderValue: { $avg: "$amount" } }}, { $sort: { "_id.year": -1, "_id.month": -1 } } ])

// Top customers with order details db.orders.aggregate([ { $match: { createdAt: { $gte: ISODate("2024-01-01") } } }, { $group: { _id: "$customerId", totalSpent: { $sum: "$amount" }, orderCount: { $sum: 1 }, lastOrder: { $max: "$createdAt" } }}, { $sort: { totalSpent: -1 } }, { $limit: 10 }, { $lookup: { from: "customers", localField: "_id", foreignField: "_id", as: "customer" }}, { $unwind: "$customer" }, { $project: { customerName: "$customer.name", customerEmail: "$customer.email", totalSpent: 1, orderCount: 1, lastOrder: 1 }} ])

// Product category performance with nested unwind db.orders.aggregate([ { $unwind: "$items" }, { $lookup: { from: "products", localField: "items.productId", foreignField: "_id", as: "product" }}, { $unwind: "$product" }, { $group: { _id: "$product.category", totalRevenue: { $sum: { $multiply: ["$items.quantity", "$items.price"] } }, unitsSold: { $sum: "$items.quantity" }, uniqueProducts: { $addToSet: "$product._id" } }}, { $addFields: { uniqueProductCount: { $size: "$uniqueProducts" } }}, { $sort: { totalRevenue: -1 } } ])

Transactions (MongoDB 4.0+)

// Multi-document transaction const session = client.startSession();

try { session.startTransaction();

// Transfer money between accounts await accounts.updateOne( { _id: fromAccountId }, { $inc: { balance: -amount } }, { session } );

await accounts.updateOne( { _id: toAccountId }, { $inc: { balance: amount } }, { session } );

await transactions.insertOne( { from: fromAccountId, to: toAccountId, amount: amount, createdAt: new Date() }, { session } );

await session.commitTransaction(); } catch (error) { await session.abortTransaction(); throw error; } finally { session.endSession(); }

Change Streams

// Watch for changes in real-time const changeStream = db.orders.watch([ { $match: { "fullDocument.status": "pending" } } ]);

changeStream.on("change", (change) => { console.log("Change detected:", change.operationType); console.log("Document:", change.fullDocument);

if (change.operationType === "insert") { // Process new pending order processNewOrder(change.fullDocument); } });

// Resume from specific point (for fault tolerance) const resumeToken = change._id; const changeStream = db.orders.watch([], { resumeAfter: resumeToken, fullDocument: "updateLookup" // Include full document on updates });

Spring Data MongoDB Integration

Entity Class

@Document(collection = "users") @Data @Builder @NoArgsConstructor @AllArgsConstructor public class User {

@Id
private String id;  // Maps to _id

@Indexed(unique = true)
private String email;

private String name;

@Field("password_hash")  // Custom field name
private String passwordHash;

@DBRef  // Reference to another collection
private List<Role> roles;

private Address address;  // Embedded document

private List<String> tags;

@CreatedDate
private LocalDateTime createdAt;

@LastModifiedDate
private LocalDateTime updatedAt;

@Version
private Long version;  // Optimistic locking

}

@Data public class Address { private String street; private String city; private String zipCode;

@GeoSpatialIndexed(type = GeoSpatialIndexType.GEO_2DSPHERE)
private GeoJsonPoint location;

}

Repository Interface

public interface UserRepository extends MongoRepository<User, String> {

// Derived queries
Optional&#x3C;User> findByEmail(String email);
List&#x3C;User> findByNameContainingIgnoreCase(String name);
List&#x3C;User> findByTagsContaining(String tag);

// Custom query with @Query
@Query("{ 'createdAt': { $gte: ?0 }, 'status': 'active' }")
List&#x3C;User> findActiveUsersCreatedAfter(LocalDateTime date);

// Projection
@Query(value = "{ 'email': ?0 }", fields = "{ 'name': 1, 'email': 1 }")
Optional&#x3C;UserSummary> findSummaryByEmail(String email);

// Aggregation
@Aggregation(pipeline = {
    "{ $match: { 'status': 'active' } }",
    "{ $group: { _id: '$country', count: { $sum: 1 } } }",
    "{ $sort: { count: -1 } }"
})
List&#x3C;CountryStats> getActiveUsersByCountry();

// Geospatial query
List&#x3C;User> findByAddressLocationNear(Point location, Distance distance);

}

MongoTemplate for Complex Queries

@Service @RequiredArgsConstructor public class UserService {

private final MongoTemplate mongoTemplate;

public List&#x3C;User> searchUsers(UserSearchCriteria criteria) {
    Query query = new Query();

    if (criteria.getName() != null) {
        query.addCriteria(Criteria.where("name")
            .regex(criteria.getName(), "i"));
    }

    if (criteria.getTags() != null &#x26;&#x26; !criteria.getTags().isEmpty()) {
        query.addCriteria(Criteria.where("tags")
            .in(criteria.getTags()));
    }

    if (criteria.getCreatedAfter() != null) {
        query.addCriteria(Criteria.where("createdAt")
            .gte(criteria.getCreatedAfter()));
    }

    query.with(Sort.by(Sort.Direction.DESC, "createdAt"));
    query.with(PageRequest.of(criteria.getPage(), criteria.getSize()));

    return mongoTemplate.find(query, User.class);
}

public AggregationResults&#x3C;UserStats> getUserStatsByStatus() {
    Aggregation aggregation = Aggregation.newAggregation(
        Aggregation.match(Criteria.where("createdAt")
            .gte(LocalDateTime.now().minusMonths(1))),
        Aggregation.group("status")
            .count().as("count")
            .avg("loginCount").as("avgLogins"),
        Aggregation.sort(Sort.Direction.DESC, "count")
    );

    return mongoTemplate.aggregate(aggregation, "users", UserStats.class);
}

public void bulkUpdateStatus(List&#x3C;String> userIds, String newStatus) {
    BulkOperations bulkOps = mongoTemplate.bulkOps(
        BulkOperations.BulkMode.UNORDERED, User.class);

    for (String userId : userIds) {
        Query query = Query.query(Criteria.where("_id").is(userId));
        Update update = Update.update("status", newStatus)
            .currentDate("updatedAt");
        bulkOps.updateOne(query, update);
    }

    bulkOps.execute();
}

}

Performance Optimization

Connection Pooling

// Spring Boot configuration spring: data: mongodb: uri: mongodb://localhost:27017/mydb auto-index-creation: false # Create indexes manually in production

Connection pool settings (via URI)

mongodb://localhost:27017/mydb?maxPoolSize=50&minPoolSize=10&maxIdleTimeMS=30000

Query Optimization

// Use projections to limit returned fields db.users.find({ status: "active" }, { name: 1, email: 1 })

// Use hint to force specific index db.users.find({ status: "active" }).hint({ status: 1, createdAt: -1 })

// Limit results for pagination db.users.find().sort({ createdAt: -1 }).skip(20).limit(10)

// Use $exists: false for missing fields (can use index) db.users.createIndex({ optionalField: 1 }, { sparse: true }) db.users.find({ optionalField: { $exists: true } })

Schema Optimization

// Avoid large arrays (cap at reasonable size) // Use bucketing pattern for time-series data { _id: "sensor1_2024-01-15", sensorId: "sensor1", date: ISODate("2024-01-15"), readings: [ { ts: ISODate("..."), value: 23.5 }, { ts: ISODate("..."), value: 24.1 }, // ... up to N readings per bucket ], count: 288 // Track count for full bucket detection }

// Pre-aggregate for reporting { _id: "stats_2024-01", month: "2024-01", totalOrders: 1523, totalRevenue: 152300.50, avgOrderValue: 100.00, topProducts: ["SKU001", "SKU002", "SKU003"] }

Best Practices

  1. Schema Design

// Embed when: data is queried together, bounded arrays // Reference when: unbounded arrays, many-to-many, independent access // Use extended reference pattern for frequently accessed fields

  1. Indexing

// Create indexes for query patterns, not just fields // Use compound indexes following ESR rule // Monitor slow queries: db.setProfilingLevel(1, { slowms: 100 }) // Avoid indexing low-cardinality fields alone

  1. Write Operations

// Use bulk operations for multiple writes // Avoid unbounded array growth // Use write concern appropriate to durability needs db.orders.insertOne(doc, { writeConcern: { w: "majority" } })

  1. Read Operations

// Always use projections to limit returned data // Use explain() to verify index usage // Prefer aggregation over multiple queries // Use read preference for scaling reads db.orders.find().readPref("secondaryPreferred")

  1. Connection Management

// Use connection pooling // Set appropriate pool size (default: 100) // Handle connection errors with retry logic // Close connections properly on shutdown

Common Pitfalls

Unbounded array growth:

// BAD: Comments array grows forever { _id: "post1", comments: [...thousands of comments...] }

// GOOD: Separate collection with references { _id: "comment1", postId: "post1", text: "..." }

Missing indexes:

// Always create indexes for query patterns // Check with explain() - look for COLLSCAN (bad) db.users.find({ email: "..." }).explain()

Over-indexing:

// Each index adds write overhead // Only index fields used in queries // Monitor index usage: db.users.aggregate([{ $indexStats: {} }])

Resources

Related Skills

When using MongoDB, consider these complementary skills:

  • spring-boot: Java framework integration with Spring Data MongoDB

  • docker: Running MongoDB in containers

  • nodejs: MongoDB with Mongoose ODM

  • aggregation-pipelines: Advanced analytics patterns

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

drizzle-orm

No summary provided by upstream source.

Repository SourceNeeds Review
General

pydantic

No summary provided by upstream source.

Repository SourceNeeds Review
General

playwright-e2e-testing

No summary provided by upstream source.

Repository SourceNeeds Review
General

tailwind-css

No summary provided by upstream source.

Repository SourceNeeds Review