MongoDB - Document Database Patterns

Overview

MongoDB is a document-oriented NoSQL database that stores data in flexible, JSON-like documents. It excels at handling unstructured or semi-structured data, hierarchical relationships, and scenarios requiring horizontal scaling.

Key Features:

Flexible schema (schemaless documents)
Rich query language with secondary indexes
Aggregation framework for analytics
Horizontal scaling (sharding)
Replica sets for high availability
Change streams for real-time data
Geospatial and full-text search

When to Use MongoDB:

Rapidly evolving schemas
Hierarchical/nested data (embedded documents)
Real-time analytics with aggregation
Geospatial applications
Content management systems
IoT data ingestion
Catalog/inventory systems

When NOT to Use MongoDB:

Complex multi-table joins (use RDBMS)
ACID transactions across many documents (improved in 4.0+, but limited)
Strict schema enforcement requirements

Schema Design Fundamentals

Document Structure

// MongoDB document (BSON format) { "_id": ObjectId("507f1f77bcf86cd799439011"), // Primary key (auto-generated) "email": "alice@example.com", "name": "Alice Johnson", "profile": { // Embedded document "bio": "Software developer", "avatar": "https://example.com/avatar.jpg", "social": { "twitter": "@alice", "github": "alice-dev" } }, "tags": ["developer", "python", "mongodb"], // Array field "createdAt": ISODate("2024-01-15T10:30:00Z"), "updatedAt": ISODate("2024-01-20T14:22:00Z") }

Embedded vs Referenced Documents

Embedded (Denormalized) - Store related data in same document:

// Embedded: Good for 1:1 or 1:Few relationships // User with embedded address { "_id": ObjectId("..."), "name": "Alice", "address": { "street": "123 Main St", "city": "San Francisco", "zipCode": "94102" } }

// Embedded: Order with line items (1:Many bounded) { "_id": ObjectId("..."), "orderNumber": "ORD-2024-001", "customer": { "name": "Alice", "email": "alice@example.com" }, "items": [ { "productId": "SKU001", "name": "Widget", "quantity": 2, "price": 29.99 }, { "productId": "SKU002", "name": "Gadget", "quantity": 1, "price": 49.99 } ], "total": 109.97 }

When to Embed:

Data is queried together frequently
Child data doesn't make sense without parent
1:1 or 1:Few relationships
Child data is bounded (won't grow unbounded)
Data doesn't need to be accessed independently

Referenced (Normalized) - Store references to other documents:

// Referenced: Good for 1:Many unbounded or Many:Many // User document { "_id": ObjectId("user123"), "name": "Alice", "email": "alice@example.com" }

// Posts collection (references user) { "_id": ObjectId("post456"), "authorId": ObjectId("user123"), // Reference to user "title": "MongoDB Schema Design", "content": "...", "commentCount": 42 }

// Comments collection (references post) { "_id": ObjectId("comment789"), "postId": ObjectId("post456"), // Reference to post "authorId": ObjectId("user999"), // Reference to commenter "text": "Great article!", "createdAt": ISODate("2024-01-20T10:00:00Z") }

When to Reference:

Many:Many relationships
1:Many with unbounded growth (comments, logs)
Data is accessed independently
Document size would exceed 16MB limit
Need atomic updates on referenced document

Hybrid Pattern (Extended Reference)

// Post with denormalized author info + reference { "_id": ObjectId("post456"), "title": "MongoDB Best Practices", "content": "...", "author": { "_id": ObjectId("user123"), // Reference for lookups "name": "Alice", // Denormalized for display "avatar": "https://..." // Frequently needed fields }, "commentCount": 42, "lastCommentAt": ISODate("...") }

Query Patterns

Basic CRUD Operations

// Find documents db.users.find({ email: "alice@example.com" }) db.users.find({ age: { $gte: 18, $lte: 65 } }) db.users.find({ tags: { $in: ["developer", "designer"] } })

// Find with projection (select specific fields) db.users.find( { status: "active" }, { name: 1, email: 1, _id: 0 } // Include name, email; exclude _id )

// Find one db.users.findOne({ email: "alice@example.com" })

// Insert db.users.insertOne({ name: "Bob", email: "bob@example.com" }) db.users.insertMany([ { name: "Charlie", email: "charlie@example.com" }, { name: "Diana", email: "diana@example.com" } ])

// Update db.users.updateOne( { email: "alice@example.com" }, { $set: { name: "Alice Updated", updatedAt: new Date() } } )

db.users.updateMany( { status: "inactive" }, { $set: { archived: true } } )

// Upsert (update or insert) db.users.updateOne( { email: "new@example.com" }, { $set: { name: "New User", createdAt: new Date() } }, { upsert: true } )

// Delete db.users.deleteOne({ email: "bob@example.com" }) db.users.deleteMany({ status: "deleted" })

Query Operators

// Comparison db.products.find({ price: { $gt: 100 } }) // Greater than db.products.find({ price: { $gte: 100 } }) // Greater than or equal db.products.find({ price: { $lt: 50 } }) // Less than db.products.find({ price: { $lte: 50 } }) // Less than or equal db.products.find({ price: { $ne: 0 } }) // Not equal db.products.find({ category: { $in: ["A", "B"] } }) // In array db.products.find({ category: { $nin: ["C", "D"] } }) // Not in array

// Logical db.users.find({ $and: [{ age: { $gte: 18 } }, { status: "active" }] }) db.users.find({ $or: [{ role: "admin" }, { role: "moderator" }] }) db.users.find({ age: { $not: { $lt: 18 } } })

// Element db.users.find({ middleName: { $exists: true } }) // Field exists db.users.find({ age: { $type: "number" } }) // Field type

// Array db.posts.find({ tags: "mongodb" }) // Contains element db.posts.find({ tags: { $all: ["mongodb", "database"] } }) // Contains all db.posts.find({ tags: { $size: 3 } }) // Array size db.posts.find({ "tags.0": "featured" }) // First element

// Embedded documents db.users.find({ "address.city": "San Francisco" }) db.users.find({ "profile.social.twitter": { $exists: true } })

// Regex db.users.find({ name: { $regex: /^alice/i } }) db.users.find({ email: { $regex: /@example.com$/ } })

Update Operators

// Field updates db.users.updateOne( { _id: userId }, { $set: { name: "New Name" }, // Set field value $unset: { temporaryField: "" }, // Remove field $rename: { oldName: "newName" }, // Rename field $inc: { loginCount: 1 }, // Increment $mul: { price: 1.1 }, // Multiply $min: { lowestScore: 50 }, // Set if less than current $max: { highestScore: 100 }, // Set if greater than current $currentDate: { updatedAt: true } // Set to current date } )

// Array updates db.posts.updateOne( { _id: postId }, { $push: { tags: "new-tag" }, // Add to array $addToSet: { tags: "unique-tag" }, // Add if not exists $pop: { tags: 1 }, // Remove last (-1 for first) $pull: { tags: "old-tag" }, // Remove specific value $pullAll: { tags: ["a", "b"] } // Remove multiple values } )

// Array with modifiers db.posts.updateOne( { _id: postId }, { $push: { comments: { $each: [comment1, comment2], // Add multiple $sort: { createdAt: -1 }, // Sort after push $slice: -100 // Keep only last 100 } } } )

// Positional update (update matched array element) db.posts.updateOne( { _id: postId, "comments._id": commentId }, { $set: { "comments.$.text": "Updated comment" } } )

// Update all matching array elements db.posts.updateOne( { _id: postId }, { $set: { "comments.$[elem].read": true } }, { arrayFilters: [{ "elem.userId": currentUserId }] } )

Indexing Strategies

Index Types

// Single field index db.users.createIndex({ email: 1 }) // Ascending db.users.createIndex({ createdAt: -1 }) // Descending

// Compound index db.orders.createIndex({ customerId: 1, createdAt: -1 })

// Unique index db.users.createIndex({ email: 1 }, { unique: true })

// Partial index (index subset of documents) db.orders.createIndex( { createdAt: 1 }, { partialFilterExpression: { status: "pending" } } )

// TTL index (auto-delete old documents) db.sessions.createIndex( { createdAt: 1 }, { expireAfterSeconds: 3600 } // Delete after 1 hour )

// Text index (full-text search) db.articles.createIndex({ title: "text", content: "text" }) // Query: db.articles.find({ $text: { $search: "mongodb tutorial" } })

// Geospatial index db.locations.createIndex({ coordinates: "2dsphere" }) // Query: db.locations.find({ // coordinates: { // $near: { // $geometry: { type: "Point", coordinates: [-122.4, 37.8] }, // $maxDistance: 5000 // meters // } // } // })

// Multikey index (for array fields - automatic) db.posts.createIndex({ tags: 1 })

// Hashed index (for sharding) db.users.createIndex({ email: "hashed" })

Index Best Practices

// ESR Rule for compound indexes: Equality, Sort, Range // Query: find users where status = "active", sorted by createdAt, age > 18 // Index: { status: 1, createdAt: -1, age: 1 } // ^Equality ^Sort ^Range

// Covered queries (all fields in index) db.users.createIndex({ email: 1, name: 1, status: 1 }) db.users.find( { email: "alice@example.com" }, { name: 1, status: 1, _id: 0 } // All fields from index ) // Check with explain: "totalDocsExamined": 0

// Analyze query performance db.users.find({ email: "alice@example.com" }).explain("executionStats") // Look for: // - IXSCAN (index scan) vs COLLSCAN (collection scan) // - totalDocsExamined vs totalKeysExamined // - executionTimeMillis

// List indexes db.users.getIndexes()

// Drop index db.users.dropIndex("email_1") db.users.dropIndex({ email: 1, name: 1 })

Aggregation Pipeline

Pipeline Stages

// Basic aggregation structure db.orders.aggregate([ { $match: { status: "completed" } }, // Filter (like WHERE) { $group: { _id: "$customerId", total: { $sum: "$amount" } } }, // Group { $sort: { total: -1 } }, // Sort { $limit: 10 } // Limit results ])

// Common stages db.collection.aggregate([ // $match - Filter documents { $match: { status: "active", createdAt: { $gte: ISODate("2024-01-01") } } },

// $project - Reshape documents (include/exclude/compute fields) { $project: { name: 1, email: 1, fullName: { $concat: ["$firstName", " ", "$lastName"] }, yearCreated: { $year: "$createdAt" } }},

// $addFields - Add new fields (keeps existing) { $addFields: { totalPrice: { $multiply: ["$price", "$quantity"] } }},

// $group - Group and aggregate { $group: { _id: "$category", count: { $sum: 1 }, totalRevenue: { $sum: "$amount" }, avgPrice: { $avg: "$price" }, maxPrice: { $max: "$price" }, products: { $push: "$name" }, // Collect into array uniqueTags: { $addToSet: "$tag" } // Unique values }},

// $sort { $sort: { totalRevenue: -1, count: 1 } },

// $skip and $limit (pagination) { $skip: 20 }, { $limit: 10 },

// $unwind - Deconstruct array field { $unwind: "$tags" }, // { tags: ["a", "b"] } becomes { tags: "a" }, { tags: "b" }

// $lookup - Join collections { $lookup: { from: "users", localField: "authorId", foreignField: "_id", as: "author" }}, { $unwind: "$author" }, // Convert single-element array to object

// $facet - Multiple pipelines in parallel { $facet: { results: [{ $skip: 0 }, { $limit: 10 }], totalCount: [{ $count: "count" }] }} ])

Real-World Aggregation Examples

// Sales analytics by month db.orders.aggregate([ { $match: { status: "completed" } }, { $group: { _id: { year: { $year: "$createdAt" }, month: { $month: "$createdAt" } }, totalSales: { $sum: "$amount" }, orderCount: { $sum: 1 }, avgOrderValue: { $avg: "$amount" } }}, { $sort: { "_id.year": -1, "_id.month": -1 } } ])

// Top customers with order details db.orders.aggregate([ { $match: { createdAt: { $gte: ISODate("2024-01-01") } } }, { $group: { _id: "$customerId", totalSpent: { $sum: "$amount" }, orderCount: { $sum: 1 }, lastOrder: { $max: "$createdAt" } }}, { $sort: { totalSpent: -1 } }, { $limit: 10 }, { $lookup: { from: "customers", localField: "_id", foreignField: "_id", as: "customer" }}, { $unwind: "$customer" }, { $project: { customerName: "$customer.name", customerEmail: "$customer.email", totalSpent: 1, orderCount: 1, lastOrder: 1 }} ])

// Product category performance with nested unwind db.orders.aggregate([ { $unwind: "$items" }, { $lookup: { from: "products", localField: "items.productId", foreignField: "_id", as: "product" }}, { $unwind: "$product" }, { $group: { _id: "$product.category", totalRevenue: { $sum: { $multiply: ["$items.quantity", "$items.price"] } }, unitsSold: { $sum: "$items.quantity" }, uniqueProducts: { $addToSet: "$product._id" } }}, { $addFields: { uniqueProductCount: { $size: "$uniqueProducts" } }}, { $sort: { totalRevenue: -1 } } ])

Transactions (MongoDB 4.0+)

// Multi-document transaction const session = client.startSession();

try { session.startTransaction();

// Transfer money between accounts await accounts.updateOne( { _id: fromAccountId }, { $inc: { balance: -amount } }, { session } );

await accounts.updateOne( { _id: toAccountId }, { $inc: { balance: amount } }, { session } );

await transactions.insertOne( { from: fromAccountId, to: toAccountId, amount: amount, createdAt: new Date() }, { session } );

await session.commitTransaction(); } catch (error) { await session.abortTransaction(); throw error; } finally { session.endSession(); }

Change Streams

// Watch for changes in real-time const changeStream = db.orders.watch([ { $match: { "fullDocument.status": "pending" } } ]);

changeStream.on("change", (change) => { console.log("Change detected:", change.operationType); console.log("Document:", change.fullDocument);

if (change.operationType === "insert") { // Process new pending order processNewOrder(change.fullDocument); } });

// Resume from specific point (for fault tolerance) const resumeToken = change._id; const changeStream = db.orders.watch([], { resumeAfter: resumeToken, fullDocument: "updateLookup" // Include full document on updates });

Spring Data MongoDB Integration

Entity Class

@Document(collection = "users") @Data @Builder @NoArgsConstructor @AllArgsConstructor public class User {

@Id
private String id;  // Maps to _id

@Indexed(unique = true)
private String email;

private String name;

@Field("password_hash")  // Custom field name
private String passwordHash;

@DBRef  // Reference to another collection
private List&#x3C;Role> roles;

private Address address;  // Embedded document

private List&#x3C;String> tags;

@CreatedDate
private LocalDateTime createdAt;

@LastModifiedDate
private LocalDateTime updatedAt;

@Version
private Long version;  // Optimistic locking

}

@Data public class Address { private String street; private String city; private String zipCode;

@GeoSpatialIndexed(type = GeoSpatialIndexType.GEO_2DSPHERE)
private GeoJsonPoint location;

}

Repository Interface

public interface UserRepository extends MongoRepository<User, String> {

// Derived queries
Optional&#x3C;User> findByEmail(String email);
List&#x3C;User> findByNameContainingIgnoreCase(String name);
List&#x3C;User> findByTagsContaining(String tag);

// Custom query with @Query
@Query("{ 'createdAt': { $gte: ?0 }, 'status': 'active' }")
List&#x3C;User> findActiveUsersCreatedAfter(LocalDateTime date);

// Projection
@Query(value = "{ 'email': ?0 }", fields = "{ 'name': 1, 'email': 1 }")
Optional&#x3C;UserSummary> findSummaryByEmail(String email);

// Aggregation
@Aggregation(pipeline = {
    "{ $match: { 'status': 'active' } }",
    "{ $group: { _id: '$country', count: { $sum: 1 } } }",
    "{ $sort: { count: -1 } }"
})
List&#x3C;CountryStats> getActiveUsersByCountry();

// Geospatial query
List&#x3C;User> findByAddressLocationNear(Point location, Distance distance);

}

MongoTemplate for Complex Queries

@Service @RequiredArgsConstructor public class UserService {

private final MongoTemplate mongoTemplate;

public List&#x3C;User> searchUsers(UserSearchCriteria criteria) {
    Query query = new Query();

    if (criteria.getName() != null) {
        query.addCriteria(Criteria.where("name")
            .regex(criteria.getName(), "i"));
    }

    if (criteria.getTags() != null &#x26;&#x26; !criteria.getTags().isEmpty()) {
        query.addCriteria(Criteria.where("tags")
            .in(criteria.getTags()));
    }

    if (criteria.getCreatedAfter() != null) {
        query.addCriteria(Criteria.where("createdAt")
            .gte(criteria.getCreatedAfter()));
    }

    query.with(Sort.by(Sort.Direction.DESC, "createdAt"));
    query.with(PageRequest.of(criteria.getPage(), criteria.getSize()));

    return mongoTemplate.find(query, User.class);
}

public AggregationResults&#x3C;UserStats> getUserStatsByStatus() {
    Aggregation aggregation = Aggregation.newAggregation(
        Aggregation.match(Criteria.where("createdAt")
            .gte(LocalDateTime.now().minusMonths(1))),
        Aggregation.group("status")
            .count().as("count")
            .avg("loginCount").as("avgLogins"),
        Aggregation.sort(Sort.Direction.DESC, "count")
    );

    return mongoTemplate.aggregate(aggregation, "users", UserStats.class);
}

public void bulkUpdateStatus(List&#x3C;String> userIds, String newStatus) {
    BulkOperations bulkOps = mongoTemplate.bulkOps(
        BulkOperations.BulkMode.UNORDERED, User.class);

    for (String userId : userIds) {
        Query query = Query.query(Criteria.where("_id").is(userId));
        Update update = Update.update("status", newStatus)
            .currentDate("updatedAt");
        bulkOps.updateOne(query, update);
    }

    bulkOps.execute();
}

}

Performance Optimization

Connection Pooling

// Spring Boot configuration spring: data: mongodb: uri: mongodb://localhost:27017/mydb auto-index-creation: false # Create indexes manually in production

Connection pool settings (via URI)

mongodb://localhost:27017/mydb?maxPoolSize=50&minPoolSize=10&maxIdleTimeMS=30000

Query Optimization

// Use projections to limit returned fields db.users.find({ status: "active" }, { name: 1, email: 1 })

// Use hint to force specific index db.users.find({ status: "active" }).hint({ status: 1, createdAt: -1 })

// Limit results for pagination db.users.find().sort({ createdAt: -1 }).skip(20).limit(10)

// Use $exists: false for missing fields (can use index) db.users.createIndex({ optionalField: 1 }, { sparse: true }) db.users.find({ optionalField: { $exists: true } })

Schema Optimization

// Avoid large arrays (cap at reasonable size) // Use bucketing pattern for time-series data { _id: "sensor1_2024-01-15", sensorId: "sensor1", date: ISODate("2024-01-15"), readings: [ { ts: ISODate("..."), value: 23.5 }, { ts: ISODate("..."), value: 24.1 }, // ... up to N readings per bucket ], count: 288 // Track count for full bucket detection }

// Pre-aggregate for reporting { _id: "stats_2024-01", month: "2024-01", totalOrders: 1523, totalRevenue: 152300.50, avgOrderValue: 100.00, topProducts: ["SKU001", "SKU002", "SKU003"] }

Best Practices

Schema Design

// Embed when: data is queried together, bounded arrays // Reference when: unbounded arrays, many-to-many, independent access // Use extended reference pattern for frequently accessed fields

Indexing

// Create indexes for query patterns, not just fields // Use compound indexes following ESR rule // Monitor slow queries: db.setProfilingLevel(1, { slowms: 100 }) // Avoid indexing low-cardinality fields alone

Write Operations

// Use bulk operations for multiple writes // Avoid unbounded array growth // Use write concern appropriate to durability needs db.orders.insertOne(doc, { writeConcern: { w: "majority" } })

Read Operations

// Always use projections to limit returned data // Use explain() to verify index usage // Prefer aggregation over multiple queries // Use read preference for scaling reads db.orders.find().readPref("secondaryPreferred")

Connection Management

// Use connection pooling // Set appropriate pool size (default: 100) // Handle connection errors with retry logic // Close connections properly on shutdown

Common Pitfalls

Unbounded array growth:

// BAD: Comments array grows forever { _id: "post1", comments: [...thousands of comments...] }

// GOOD: Separate collection with references { _id: "comment1", postId: "post1", text: "..." }

Missing indexes:

// Always create indexes for query patterns // Check with explain() - look for COLLSCAN (bad) db.users.find({ email: "..." }).explain()

Over-indexing:

// Each index adds write overhead // Only index fields used in queries // Monitor index usage: db.users.aggregate([{ $indexStats: {} }])

Resources

MongoDB Documentation: https://docs.mongodb.com/manual/
MongoDB University: https://university.mongodb.com/
Schema Design Patterns: https://www.mongodb.com/blog/post/building-with-patterns-a-summary
Aggregation Reference: https://docs.mongodb.com/manual/reference/operator/aggregation/
Spring Data MongoDB: https://docs.spring.io/spring-data/mongodb/docs/current/reference/html/

Related Skills

When using MongoDB, consider these complementary skills:

spring-boot: Java framework integration with Spring Data MongoDB
docker: Running MongoDB in containers
nodejs: MongoDB with Mongoose ODM
aggregation-pipelines: Advanced analytics patterns

mongodb

Safety Notice

Copy this and send it to your AI assistant to learn

Connection pool settings (via URI)

mongodb://localhost:27017/mydb?maxPoolSize=50&minPoolSize=10&maxIdleTimeMS=30000

Source Transparency

Related Skills

drizzle-orm

pydantic

playwright-e2e-testing

tailwind-css