MongoDB - Document Database Patterns
Overview
MongoDB is a document-oriented NoSQL database that stores data in flexible, JSON-like documents. It excels at handling unstructured or semi-structured data, hierarchical relationships, and scenarios requiring horizontal scaling.
Key Features:
-
Flexible schema (schemaless documents)
-
Rich query language with secondary indexes
-
Aggregation framework for analytics
-
Horizontal scaling (sharding)
-
Replica sets for high availability
-
Change streams for real-time data
-
Geospatial and full-text search
When to Use MongoDB:
-
Rapidly evolving schemas
-
Hierarchical/nested data (embedded documents)
-
Real-time analytics with aggregation
-
Geospatial applications
-
Content management systems
-
IoT data ingestion
-
Catalog/inventory systems
When NOT to Use MongoDB:
-
Complex multi-table joins (use RDBMS)
-
ACID transactions across many documents (improved in 4.0+, but limited)
-
Strict schema enforcement requirements
Schema Design Fundamentals
Document Structure
// MongoDB document (BSON format) { "_id": ObjectId("507f1f77bcf86cd799439011"), // Primary key (auto-generated) "email": "alice@example.com", "name": "Alice Johnson", "profile": { // Embedded document "bio": "Software developer", "avatar": "https://example.com/avatar.jpg", "social": { "twitter": "@alice", "github": "alice-dev" } }, "tags": ["developer", "python", "mongodb"], // Array field "createdAt": ISODate("2024-01-15T10:30:00Z"), "updatedAt": ISODate("2024-01-20T14:22:00Z") }
Embedded vs Referenced Documents
Embedded (Denormalized) - Store related data in same document:
// Embedded: Good for 1:1 or 1:Few relationships // User with embedded address { "_id": ObjectId("..."), "name": "Alice", "address": { "street": "123 Main St", "city": "San Francisco", "zipCode": "94102" } }
// Embedded: Order with line items (1:Many bounded) { "_id": ObjectId("..."), "orderNumber": "ORD-2024-001", "customer": { "name": "Alice", "email": "alice@example.com" }, "items": [ { "productId": "SKU001", "name": "Widget", "quantity": 2, "price": 29.99 }, { "productId": "SKU002", "name": "Gadget", "quantity": 1, "price": 49.99 } ], "total": 109.97 }
When to Embed:
-
Data is queried together frequently
-
Child data doesn't make sense without parent
-
1:1 or 1:Few relationships
-
Child data is bounded (won't grow unbounded)
-
Data doesn't need to be accessed independently
Referenced (Normalized) - Store references to other documents:
// Referenced: Good for 1:Many unbounded or Many:Many // User document { "_id": ObjectId("user123"), "name": "Alice", "email": "alice@example.com" }
// Posts collection (references user) { "_id": ObjectId("post456"), "authorId": ObjectId("user123"), // Reference to user "title": "MongoDB Schema Design", "content": "...", "commentCount": 42 }
// Comments collection (references post) { "_id": ObjectId("comment789"), "postId": ObjectId("post456"), // Reference to post "authorId": ObjectId("user999"), // Reference to commenter "text": "Great article!", "createdAt": ISODate("2024-01-20T10:00:00Z") }
When to Reference:
-
Many:Many relationships
-
1:Many with unbounded growth (comments, logs)
-
Data is accessed independently
-
Document size would exceed 16MB limit
-
Need atomic updates on referenced document
Hybrid Pattern (Extended Reference)
// Post with denormalized author info + reference { "_id": ObjectId("post456"), "title": "MongoDB Best Practices", "content": "...", "author": { "_id": ObjectId("user123"), // Reference for lookups "name": "Alice", // Denormalized for display "avatar": "https://..." // Frequently needed fields }, "commentCount": 42, "lastCommentAt": ISODate("...") }
Query Patterns
Basic CRUD Operations
// Find documents db.users.find({ email: "alice@example.com" }) db.users.find({ age: { $gte: 18, $lte: 65 } }) db.users.find({ tags: { $in: ["developer", "designer"] } })
// Find with projection (select specific fields) db.users.find( { status: "active" }, { name: 1, email: 1, _id: 0 } // Include name, email; exclude _id )
// Find one db.users.findOne({ email: "alice@example.com" })
// Insert db.users.insertOne({ name: "Bob", email: "bob@example.com" }) db.users.insertMany([ { name: "Charlie", email: "charlie@example.com" }, { name: "Diana", email: "diana@example.com" } ])
// Update db.users.updateOne( { email: "alice@example.com" }, { $set: { name: "Alice Updated", updatedAt: new Date() } } )
db.users.updateMany( { status: "inactive" }, { $set: { archived: true } } )
// Upsert (update or insert) db.users.updateOne( { email: "new@example.com" }, { $set: { name: "New User", createdAt: new Date() } }, { upsert: true } )
// Delete db.users.deleteOne({ email: "bob@example.com" }) db.users.deleteMany({ status: "deleted" })
Query Operators
// Comparison db.products.find({ price: { $gt: 100 } }) // Greater than db.products.find({ price: { $gte: 100 } }) // Greater than or equal db.products.find({ price: { $lt: 50 } }) // Less than db.products.find({ price: { $lte: 50 } }) // Less than or equal db.products.find({ price: { $ne: 0 } }) // Not equal db.products.find({ category: { $in: ["A", "B"] } }) // In array db.products.find({ category: { $nin: ["C", "D"] } }) // Not in array
// Logical db.users.find({ $and: [{ age: { $gte: 18 } }, { status: "active" }] }) db.users.find({ $or: [{ role: "admin" }, { role: "moderator" }] }) db.users.find({ age: { $not: { $lt: 18 } } })
// Element db.users.find({ middleName: { $exists: true } }) // Field exists db.users.find({ age: { $type: "number" } }) // Field type
// Array db.posts.find({ tags: "mongodb" }) // Contains element db.posts.find({ tags: { $all: ["mongodb", "database"] } }) // Contains all db.posts.find({ tags: { $size: 3 } }) // Array size db.posts.find({ "tags.0": "featured" }) // First element
// Embedded documents db.users.find({ "address.city": "San Francisco" }) db.users.find({ "profile.social.twitter": { $exists: true } })
// Regex db.users.find({ name: { $regex: /^alice/i } }) db.users.find({ email: { $regex: /@example.com$/ } })
Update Operators
// Field updates db.users.updateOne( { _id: userId }, { $set: { name: "New Name" }, // Set field value $unset: { temporaryField: "" }, // Remove field $rename: { oldName: "newName" }, // Rename field $inc: { loginCount: 1 }, // Increment $mul: { price: 1.1 }, // Multiply $min: { lowestScore: 50 }, // Set if less than current $max: { highestScore: 100 }, // Set if greater than current $currentDate: { updatedAt: true } // Set to current date } )
// Array updates db.posts.updateOne( { _id: postId }, { $push: { tags: "new-tag" }, // Add to array $addToSet: { tags: "unique-tag" }, // Add if not exists $pop: { tags: 1 }, // Remove last (-1 for first) $pull: { tags: "old-tag" }, // Remove specific value $pullAll: { tags: ["a", "b"] } // Remove multiple values } )
// Array with modifiers db.posts.updateOne( { _id: postId }, { $push: { comments: { $each: [comment1, comment2], // Add multiple $sort: { createdAt: -1 }, // Sort after push $slice: -100 // Keep only last 100 } } } )
// Positional update (update matched array element) db.posts.updateOne( { _id: postId, "comments._id": commentId }, { $set: { "comments.$.text": "Updated comment" } } )
// Update all matching array elements db.posts.updateOne( { _id: postId }, { $set: { "comments.$[elem].read": true } }, { arrayFilters: [{ "elem.userId": currentUserId }] } )
Indexing Strategies
Index Types
// Single field index db.users.createIndex({ email: 1 }) // Ascending db.users.createIndex({ createdAt: -1 }) // Descending
// Compound index db.orders.createIndex({ customerId: 1, createdAt: -1 })
// Unique index db.users.createIndex({ email: 1 }, { unique: true })
// Partial index (index subset of documents) db.orders.createIndex( { createdAt: 1 }, { partialFilterExpression: { status: "pending" } } )
// TTL index (auto-delete old documents) db.sessions.createIndex( { createdAt: 1 }, { expireAfterSeconds: 3600 } // Delete after 1 hour )
// Text index (full-text search) db.articles.createIndex({ title: "text", content: "text" }) // Query: db.articles.find({ $text: { $search: "mongodb tutorial" } })
// Geospatial index db.locations.createIndex({ coordinates: "2dsphere" }) // Query: db.locations.find({ // coordinates: { // $near: { // $geometry: { type: "Point", coordinates: [-122.4, 37.8] }, // $maxDistance: 5000 // meters // } // } // })
// Multikey index (for array fields - automatic) db.posts.createIndex({ tags: 1 })
// Hashed index (for sharding) db.users.createIndex({ email: "hashed" })
Index Best Practices
// ESR Rule for compound indexes: Equality, Sort, Range // Query: find users where status = "active", sorted by createdAt, age > 18 // Index: { status: 1, createdAt: -1, age: 1 } // ^Equality ^Sort ^Range
// Covered queries (all fields in index) db.users.createIndex({ email: 1, name: 1, status: 1 }) db.users.find( { email: "alice@example.com" }, { name: 1, status: 1, _id: 0 } // All fields from index ) // Check with explain: "totalDocsExamined": 0
// Analyze query performance db.users.find({ email: "alice@example.com" }).explain("executionStats") // Look for: // - IXSCAN (index scan) vs COLLSCAN (collection scan) // - totalDocsExamined vs totalKeysExamined // - executionTimeMillis
// List indexes db.users.getIndexes()
// Drop index db.users.dropIndex("email_1") db.users.dropIndex({ email: 1, name: 1 })
Aggregation Pipeline
Pipeline Stages
// Basic aggregation structure db.orders.aggregate([ { $match: { status: "completed" } }, // Filter (like WHERE) { $group: { _id: "$customerId", total: { $sum: "$amount" } } }, // Group { $sort: { total: -1 } }, // Sort { $limit: 10 } // Limit results ])
// Common stages db.collection.aggregate([ // $match - Filter documents { $match: { status: "active", createdAt: { $gte: ISODate("2024-01-01") } } },
// $project - Reshape documents (include/exclude/compute fields) { $project: { name: 1, email: 1, fullName: { $concat: ["$firstName", " ", "$lastName"] }, yearCreated: { $year: "$createdAt" } }},
// $addFields - Add new fields (keeps existing) { $addFields: { totalPrice: { $multiply: ["$price", "$quantity"] } }},
// $group - Group and aggregate { $group: { _id: "$category", count: { $sum: 1 }, totalRevenue: { $sum: "$amount" }, avgPrice: { $avg: "$price" }, maxPrice: { $max: "$price" }, products: { $push: "$name" }, // Collect into array uniqueTags: { $addToSet: "$tag" } // Unique values }},
// $sort { $sort: { totalRevenue: -1, count: 1 } },
// $skip and $limit (pagination) { $skip: 20 }, { $limit: 10 },
// $unwind - Deconstruct array field { $unwind: "$tags" }, // { tags: ["a", "b"] } becomes { tags: "a" }, { tags: "b" }
// $lookup - Join collections { $lookup: { from: "users", localField: "authorId", foreignField: "_id", as: "author" }}, { $unwind: "$author" }, // Convert single-element array to object
// $facet - Multiple pipelines in parallel { $facet: { results: [{ $skip: 0 }, { $limit: 10 }], totalCount: [{ $count: "count" }] }} ])
Real-World Aggregation Examples
// Sales analytics by month db.orders.aggregate([ { $match: { status: "completed" } }, { $group: { _id: { year: { $year: "$createdAt" }, month: { $month: "$createdAt" } }, totalSales: { $sum: "$amount" }, orderCount: { $sum: 1 }, avgOrderValue: { $avg: "$amount" } }}, { $sort: { "_id.year": -1, "_id.month": -1 } } ])
// Top customers with order details db.orders.aggregate([ { $match: { createdAt: { $gte: ISODate("2024-01-01") } } }, { $group: { _id: "$customerId", totalSpent: { $sum: "$amount" }, orderCount: { $sum: 1 }, lastOrder: { $max: "$createdAt" } }}, { $sort: { totalSpent: -1 } }, { $limit: 10 }, { $lookup: { from: "customers", localField: "_id", foreignField: "_id", as: "customer" }}, { $unwind: "$customer" }, { $project: { customerName: "$customer.name", customerEmail: "$customer.email", totalSpent: 1, orderCount: 1, lastOrder: 1 }} ])
// Product category performance with nested unwind db.orders.aggregate([ { $unwind: "$items" }, { $lookup: { from: "products", localField: "items.productId", foreignField: "_id", as: "product" }}, { $unwind: "$product" }, { $group: { _id: "$product.category", totalRevenue: { $sum: { $multiply: ["$items.quantity", "$items.price"] } }, unitsSold: { $sum: "$items.quantity" }, uniqueProducts: { $addToSet: "$product._id" } }}, { $addFields: { uniqueProductCount: { $size: "$uniqueProducts" } }}, { $sort: { totalRevenue: -1 } } ])
Transactions (MongoDB 4.0+)
// Multi-document transaction const session = client.startSession();
try { session.startTransaction();
// Transfer money between accounts await accounts.updateOne( { _id: fromAccountId }, { $inc: { balance: -amount } }, { session } );
await accounts.updateOne( { _id: toAccountId }, { $inc: { balance: amount } }, { session } );
await transactions.insertOne( { from: fromAccountId, to: toAccountId, amount: amount, createdAt: new Date() }, { session } );
await session.commitTransaction(); } catch (error) { await session.abortTransaction(); throw error; } finally { session.endSession(); }
Change Streams
// Watch for changes in real-time const changeStream = db.orders.watch([ { $match: { "fullDocument.status": "pending" } } ]);
changeStream.on("change", (change) => { console.log("Change detected:", change.operationType); console.log("Document:", change.fullDocument);
if (change.operationType === "insert") { // Process new pending order processNewOrder(change.fullDocument); } });
// Resume from specific point (for fault tolerance) const resumeToken = change._id; const changeStream = db.orders.watch([], { resumeAfter: resumeToken, fullDocument: "updateLookup" // Include full document on updates });
Spring Data MongoDB Integration
Entity Class
@Document(collection = "users") @Data @Builder @NoArgsConstructor @AllArgsConstructor public class User {
@Id
private String id; // Maps to _id
@Indexed(unique = true)
private String email;
private String name;
@Field("password_hash") // Custom field name
private String passwordHash;
@DBRef // Reference to another collection
private List<Role> roles;
private Address address; // Embedded document
private List<String> tags;
@CreatedDate
private LocalDateTime createdAt;
@LastModifiedDate
private LocalDateTime updatedAt;
@Version
private Long version; // Optimistic locking
}
@Data public class Address { private String street; private String city; private String zipCode;
@GeoSpatialIndexed(type = GeoSpatialIndexType.GEO_2DSPHERE)
private GeoJsonPoint location;
}
Repository Interface
public interface UserRepository extends MongoRepository<User, String> {
// Derived queries
Optional<User> findByEmail(String email);
List<User> findByNameContainingIgnoreCase(String name);
List<User> findByTagsContaining(String tag);
// Custom query with @Query
@Query("{ 'createdAt': { $gte: ?0 }, 'status': 'active' }")
List<User> findActiveUsersCreatedAfter(LocalDateTime date);
// Projection
@Query(value = "{ 'email': ?0 }", fields = "{ 'name': 1, 'email': 1 }")
Optional<UserSummary> findSummaryByEmail(String email);
// Aggregation
@Aggregation(pipeline = {
"{ $match: { 'status': 'active' } }",
"{ $group: { _id: '$country', count: { $sum: 1 } } }",
"{ $sort: { count: -1 } }"
})
List<CountryStats> getActiveUsersByCountry();
// Geospatial query
List<User> findByAddressLocationNear(Point location, Distance distance);
}
MongoTemplate for Complex Queries
@Service @RequiredArgsConstructor public class UserService {
private final MongoTemplate mongoTemplate;
public List<User> searchUsers(UserSearchCriteria criteria) {
Query query = new Query();
if (criteria.getName() != null) {
query.addCriteria(Criteria.where("name")
.regex(criteria.getName(), "i"));
}
if (criteria.getTags() != null && !criteria.getTags().isEmpty()) {
query.addCriteria(Criteria.where("tags")
.in(criteria.getTags()));
}
if (criteria.getCreatedAfter() != null) {
query.addCriteria(Criteria.where("createdAt")
.gte(criteria.getCreatedAfter()));
}
query.with(Sort.by(Sort.Direction.DESC, "createdAt"));
query.with(PageRequest.of(criteria.getPage(), criteria.getSize()));
return mongoTemplate.find(query, User.class);
}
public AggregationResults<UserStats> getUserStatsByStatus() {
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(Criteria.where("createdAt")
.gte(LocalDateTime.now().minusMonths(1))),
Aggregation.group("status")
.count().as("count")
.avg("loginCount").as("avgLogins"),
Aggregation.sort(Sort.Direction.DESC, "count")
);
return mongoTemplate.aggregate(aggregation, "users", UserStats.class);
}
public void bulkUpdateStatus(List<String> userIds, String newStatus) {
BulkOperations bulkOps = mongoTemplate.bulkOps(
BulkOperations.BulkMode.UNORDERED, User.class);
for (String userId : userIds) {
Query query = Query.query(Criteria.where("_id").is(userId));
Update update = Update.update("status", newStatus)
.currentDate("updatedAt");
bulkOps.updateOne(query, update);
}
bulkOps.execute();
}
}
Performance Optimization
Connection Pooling
// Spring Boot configuration spring: data: mongodb: uri: mongodb://localhost:27017/mydb auto-index-creation: false # Create indexes manually in production
Connection pool settings (via URI)
mongodb://localhost:27017/mydb?maxPoolSize=50&minPoolSize=10&maxIdleTimeMS=30000
Query Optimization
// Use projections to limit returned fields db.users.find({ status: "active" }, { name: 1, email: 1 })
// Use hint to force specific index db.users.find({ status: "active" }).hint({ status: 1, createdAt: -1 })
// Limit results for pagination db.users.find().sort({ createdAt: -1 }).skip(20).limit(10)
// Use $exists: false for missing fields (can use index) db.users.createIndex({ optionalField: 1 }, { sparse: true }) db.users.find({ optionalField: { $exists: true } })
Schema Optimization
// Avoid large arrays (cap at reasonable size) // Use bucketing pattern for time-series data { _id: "sensor1_2024-01-15", sensorId: "sensor1", date: ISODate("2024-01-15"), readings: [ { ts: ISODate("..."), value: 23.5 }, { ts: ISODate("..."), value: 24.1 }, // ... up to N readings per bucket ], count: 288 // Track count for full bucket detection }
// Pre-aggregate for reporting { _id: "stats_2024-01", month: "2024-01", totalOrders: 1523, totalRevenue: 152300.50, avgOrderValue: 100.00, topProducts: ["SKU001", "SKU002", "SKU003"] }
Best Practices
- Schema Design
// Embed when: data is queried together, bounded arrays // Reference when: unbounded arrays, many-to-many, independent access // Use extended reference pattern for frequently accessed fields
- Indexing
// Create indexes for query patterns, not just fields // Use compound indexes following ESR rule // Monitor slow queries: db.setProfilingLevel(1, { slowms: 100 }) // Avoid indexing low-cardinality fields alone
- Write Operations
// Use bulk operations for multiple writes // Avoid unbounded array growth // Use write concern appropriate to durability needs db.orders.insertOne(doc, { writeConcern: { w: "majority" } })
- Read Operations
// Always use projections to limit returned data // Use explain() to verify index usage // Prefer aggregation over multiple queries // Use read preference for scaling reads db.orders.find().readPref("secondaryPreferred")
- Connection Management
// Use connection pooling // Set appropriate pool size (default: 100) // Handle connection errors with retry logic // Close connections properly on shutdown
Common Pitfalls
Unbounded array growth:
// BAD: Comments array grows forever { _id: "post1", comments: [...thousands of comments...] }
// GOOD: Separate collection with references { _id: "comment1", postId: "post1", text: "..." }
Missing indexes:
// Always create indexes for query patterns // Check with explain() - look for COLLSCAN (bad) db.users.find({ email: "..." }).explain()
Over-indexing:
// Each index adds write overhead // Only index fields used in queries // Monitor index usage: db.users.aggregate([{ $indexStats: {} }])
Resources
-
MongoDB Documentation: https://docs.mongodb.com/manual/
-
MongoDB University: https://university.mongodb.com/
-
Schema Design Patterns: https://www.mongodb.com/blog/post/building-with-patterns-a-summary
-
Aggregation Reference: https://docs.mongodb.com/manual/reference/operator/aggregation/
-
Spring Data MongoDB: https://docs.spring.io/spring-data/mongodb/docs/current/reference/html/
Related Skills
When using MongoDB, consider these complementary skills:
-
spring-boot: Java framework integration with Spring Data MongoDB
-
docker: Running MongoDB in containers
-
nodejs: MongoDB with Mongoose ODM
-
aggregation-pipelines: Advanced analytics patterns