Database Schema Design

When to use this skill

Lists specific situations where this skill should be triggered:

New Project: Database schema design for a new application
Schema Refactoring: Redesigning an existing schema for performance or scalability
Relationship Definition: Implementing 1:1, 1:N, N:M relationships between tables
Migration: Safely applying schema changes
Performance Issues: Index and schema optimization to resolve slow queries

Input Format

The required and optional input information to collect from the user:

Required Information

Database Type: PostgreSQL, MySQL, MongoDB, SQLite, etc.
Domain Description: What data will be stored (e.g., e-commerce, blog, social media)
Key Entities: Core data objects (e.g., User, Product, Order)

Optional Information

Expected Data Volume: Small (<10K rows), Medium (10K-1M), Large (>1M) (default: Medium)
Read/Write Ratio: Read-heavy, Write-heavy, Balanced (default: Balanced)
Transaction Requirements: Whether ACID is required (default: true)
Sharding/Partitioning: Whether large data distribution is needed (default: false)

Input Example

Design a database for an e-commerce platform:

DB: PostgreSQL
Entities: User, Product, Order, Review
Relationships:
- A User can have multiple Orders
- An Order contains multiple Products (N:M)
- A Review is linked to a User and a Product
Expected data: 100,000 users, 10,000 products
Read-heavy (frequent product lookups)

Instructions

Specifies the step-by-step task sequence to follow precisely.

Step 1: Define Entities and Attributes

Identify core data objects and their attributes.

Tasks:

Extract nouns from business requirements → entities
List each entity's attributes (columns)
Determine data types (VARCHAR, INTEGER, TIMESTAMP, JSON, etc.)
Designate Primary Keys (UUID vs Auto-increment ID)

Example (E-commerce):

Users

id: UUID PRIMARY KEY
email: VARCHAR(255) UNIQUE NOT NULL
username: VARCHAR(50) UNIQUE NOT NULL
password_hash: VARCHAR(255) NOT NULL
created_at: TIMESTAMP DEFAULT NOW()
updated_at: TIMESTAMP DEFAULT NOW()

Products

id: UUID PRIMARY KEY
name: VARCHAR(255) NOT NULL
description: TEXT
price: DECIMAL(10, 2) NOT NULL
stock: INTEGER DEFAULT 0
category_id: UUID REFERENCES Categories(id)
created_at: TIMESTAMP DEFAULT NOW()

Orders

id: UUID PRIMARY KEY
user_id: UUID REFERENCES Users(id)
total_amount: DECIMAL(10, 2) NOT NULL
status: VARCHAR(20) DEFAULT 'pending'
created_at: TIMESTAMP DEFAULT NOW()

OrderItems (Junction table)

id: UUID PRIMARY KEY
order_id: UUID REFERENCES Orders(id) ON DELETE CASCADE
product_id: UUID REFERENCES Products(id)
quantity: INTEGER NOT NULL
price: DECIMAL(10, 2) NOT NULL

Step 2: Design Relationships and Normalization

Define relationships between tables and apply normalization.

Tasks:

1:1 relationship: Foreign Key + UNIQUE constraint
1:N relationship: Foreign Key
N:M relationship: Create junction table
Determine normalization level (1NF ~ 3NF)

Decision Criteria:

OLTP systems → normalize to 3NF (data integrity)
OLAP/analytics systems → denormalization allowed (query performance)
Read-heavy → minimize JOINs with partial denormalization
Write-heavy → full normalization to eliminate redundancy

Example (ERD Mermaid):

erDiagram Users ||--o{ Orders : places Orders ||--|{ OrderItems : contains Products ||--o{ OrderItems : "ordered in" Categories ||--o{ Products : categorizes Users ||--o{ Reviews : writes Products ||--o{ Reviews : "reviewed by"

Users {
    uuid id PK
    string email UK
    string username UK
    string password_hash
    timestamp created_at
}

Products {
    uuid id PK
    string name
    decimal price
    int stock
    uuid category_id FK
}

Orders {
    uuid id PK
    uuid user_id FK
    decimal total_amount
    string status
    timestamp created_at
}

OrderItems {
    uuid id PK
    uuid order_id FK
    uuid product_id FK
    int quantity
    decimal price
}

Step 3: Establish Indexing Strategy

Design indexes for query performance.

Tasks:

Primary Keys automatically create indexes
Columns frequently used in WHERE clauses → add indexes
Foreign Keys used in JOINs → indexes
Consider composite indexes (WHERE col1 = ? AND col2 = ?)
UNIQUE indexes (email, username, etc.)

Checklist:

Indexes on frequently queried columns
Indexes on Foreign Key columns
Composite index order optimized (high selectivity columns first)
Avoid excessive indexes (degrades INSERT/UPDATE performance)

Example (PostgreSQL):

-- Primary Keys (auto-indexed) CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email VARCHAR(255) UNIQUE NOT NULL, -- UNIQUE = auto-indexed username VARCHAR(50) UNIQUE NOT NULL, password_hash VARCHAR(255) NOT NULL, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() );

-- Foreign Keys + explicit indexes CREATE TABLE orders ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, total_amount DECIMAL(10, 2) NOT NULL, status VARCHAR(20) DEFAULT 'pending', created_at TIMESTAMP DEFAULT NOW() );

CREATE INDEX idx_orders_user_id ON orders(user_id); CREATE INDEX idx_orders_status ON orders(status); CREATE INDEX idx_orders_created_at ON orders(created_at);

-- Composite index (status and created_at frequently queried together) CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);

-- Products table CREATE TABLE products ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(255) NOT NULL, description TEXT, price DECIMAL(10, 2) NOT NULL CHECK (price >= 0), stock INTEGER DEFAULT 0 CHECK (stock >= 0), category_id UUID REFERENCES categories(id), created_at TIMESTAMP DEFAULT NOW() );

CREATE INDEX idx_products_category ON products(category_id); CREATE INDEX idx_products_price ON products(price); -- price range search CREATE INDEX idx_products_name ON products(name); -- product name search

-- Full-text search (PostgreSQL) CREATE INDEX idx_products_name_fts ON products USING GIN(to_tsvector('english', name)); CREATE INDEX idx_products_description_fts ON products USING GIN(to_tsvector('english', description));

Step 4: Set Up Constraints and Triggers

Add constraints to ensure data integrity.

Tasks:

NOT NULL: required columns
UNIQUE: columns that must be unique
CHECK: value range constraints (e.g., price >= 0)
Foreign Key + CASCADE option
Set default values

Example:

CREATE TABLE products ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(255) NOT NULL, price DECIMAL(10, 2) NOT NULL CHECK (price >= 0), stock INTEGER DEFAULT 0 CHECK (stock >= 0), discount_percent INTEGER CHECK (discount_percent >= 0 AND discount_percent <= 100), category_id UUID REFERENCES categories(id) ON DELETE SET NULL, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() );

-- Trigger: auto-update updated_at CREATE OR REPLACE FUNCTION update_updated_at_column() RETURNS TRIGGER AS $$ BEGIN NEW.updated_at = NOW(); RETURN NEW; END; $$ LANGUAGE plpgsql;

CREATE TRIGGER update_products_updated_at BEFORE UPDATE ON products FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();

Step 5: Write Migration Scripts

Write migrations that safely apply schema changes.

Tasks:

UP migration: apply changes
DOWN migration: rollback
Wrap in transactions
Prevent data loss (use ALTER TABLE carefully)

Example (SQL migration):

-- migrations/001_create_initial_schema.up.sql BEGIN;

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email VARCHAR(255) UNIQUE NOT NULL, username VARCHAR(50) UNIQUE NOT NULL, password_hash VARCHAR(255) NOT NULL, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() );

CREATE TABLE categories ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(100) UNIQUE NOT NULL, parent_id UUID REFERENCES categories(id) );

CREATE TABLE products ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(255) NOT NULL, description TEXT, price DECIMAL(10, 2) NOT NULL CHECK (price >= 0), stock INTEGER DEFAULT 0 CHECK (stock >= 0), category_id UUID REFERENCES categories(id), created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() );

CREATE INDEX idx_products_category ON products(category_id); CREATE INDEX idx_products_price ON products(price);

COMMIT;

-- migrations/001_create_initial_schema.down.sql BEGIN;

DROP TABLE IF EXISTS products CASCADE; DROP TABLE IF EXISTS categories CASCADE; DROP TABLE IF EXISTS users CASCADE;

COMMIT;

Output format

Defines the exact format that deliverables should follow.

Basic Structure

project/ ├── database/ │ ├── schema.sql # full schema │ ├── migrations/ │ │ ├── 001_create_users.up.sql │ │ ├── 001_create_users.down.sql │ │ ├── 002_create_products.up.sql │ │ └── 002_create_products.down.sql │ ├── seeds/ │ │ └── sample_data.sql # test data │ └── docs/ │ ├── ERD.md # Mermaid ERD diagram │ └── SCHEMA.md # schema documentation └── README.md

ERD Diagram (Mermaid Format)

Database Schema

Entity Relationship Diagram

```mermaid erDiagram Users ||--o{ Orders : places Orders ||--|{ OrderItems : contains Products ||--o{ OrderItems : "ordered in"

Users {
    uuid id PK
    string email UK
    string username UK
}

Products {
    uuid id PK
    string name
    decimal price
}

```

Table Descriptions

users

Purpose: Store user account information
Indexes: email, username
Estimated rows: 100,000

products

Purpose: Product catalog
Indexes: category_id, price, name
Estimated rows: 10,000

Constraints

Specifies mandatory rules and prohibited actions.

Mandatory Rules (MUST)

Primary Key Required: Define a Primary Key on every table

Unique record identification
Ensures referential integrity

Explicit Foreign Keys: Tables with relationships must define Foreign Keys

Specify ON DELETE CASCADE/SET NULL options
Prevent orphan records

Use NOT NULL Appropriately: Required columns must be NOT NULL

Clearly specify nullable vs. non-nullable
Providing defaults is recommended

Prohibited Actions (MUST NOT)

Avoid EAV Pattern Abuse: Use the Entity-Attribute-Value pattern only in special cases

Query complexity increases dramatically
Performance degradation

Excessive Denormalization: Be careful when denormalizing for performance

Data consistency issues
Risk of update anomalies

No Plaintext Storage of Sensitive Data: Never store passwords, card numbers, etc. in plaintext

Hashing/encryption is mandatory
Legal liability issues

Security Rules

Principle of Least Privilege: Grant only the necessary permissions to application DB accounts
SQL Injection Prevention: Use Prepared Statements / Parameterized Queries
Encrypt Sensitive Columns: Consider encrypting personally identifiable information at rest

Examples

Demonstrates how to apply the skill through real-world use cases.

Example 1: Blog Platform Schema

Situation: Database design for a Medium-style blog platform

User Request:

Design a PostgreSQL schema for a blog platform:

Users can write multiple posts
Posts can have multiple tags (N:M)
Users can like and bookmark posts
Comment feature (with nested replies)

Final Result:

-- Users CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email VARCHAR(255) UNIQUE NOT NULL, username VARCHAR(50) UNIQUE NOT NULL, bio TEXT, avatar_url VARCHAR(500), created_at TIMESTAMP DEFAULT NOW() );

-- Posts CREATE TABLE posts ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), author_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, title VARCHAR(255) NOT NULL, slug VARCHAR(255) UNIQUE NOT NULL, content TEXT NOT NULL, published_at TIMESTAMP, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() );

CREATE INDEX idx_posts_author ON posts(author_id); CREATE INDEX idx_posts_published ON posts(published_at); CREATE INDEX idx_posts_slug ON posts(slug);

-- Tags CREATE TABLE tags ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(50) UNIQUE NOT NULL, slug VARCHAR(50) UNIQUE NOT NULL );

-- Post-Tag relationship (N:M) CREATE TABLE post_tags ( post_id UUID REFERENCES posts(id) ON DELETE CASCADE, tag_id UUID REFERENCES tags(id) ON DELETE CASCADE, PRIMARY KEY (post_id, tag_id) );

CREATE INDEX idx_post_tags_post ON post_tags(post_id); CREATE INDEX idx_post_tags_tag ON post_tags(tag_id);

-- Likes CREATE TABLE post_likes ( user_id UUID REFERENCES users(id) ON DELETE CASCADE, post_id UUID REFERENCES posts(id) ON DELETE CASCADE, created_at TIMESTAMP DEFAULT NOW(), PRIMARY KEY (user_id, post_id) );

-- Bookmarks CREATE TABLE post_bookmarks ( user_id UUID REFERENCES users(id) ON DELETE CASCADE, post_id UUID REFERENCES posts(id) ON DELETE CASCADE, created_at TIMESTAMP DEFAULT NOW(), PRIMARY KEY (user_id, post_id) );

-- Comments (self-referencing for nested comments) CREATE TABLE comments ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), post_id UUID NOT NULL REFERENCES posts(id) ON DELETE CASCADE, author_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, parent_comment_id UUID REFERENCES comments(id) ON DELETE CASCADE, content TEXT NOT NULL, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() );

CREATE INDEX idx_comments_post ON comments(post_id); CREATE INDEX idx_comments_author ON comments(author_id); CREATE INDEX idx_comments_parent ON comments(parent_comment_id);

Example 2: MongoDB Schema (NoSQL)

Situation: MongoDB schema for a real-time chat app

User Request:

Design a MongoDB schema for a real-time chat app. Reads are very frequent, and message history needs to be retrieved quickly.

Final Result:

// users collection { _id: ObjectId, username: String, // indexed, unique email: String, // indexed, unique avatar_url: String, status: String, // 'online', 'offline', 'away' last_seen: Date, created_at: Date }

// conversations collection (denormalized - read-optimized) { _id: ObjectId, participants: [ // indexed { user_id: ObjectId, username: String, avatar_url: String } ], last_message: { // denormalized for fast recent-message retrieval content: String, sender_id: ObjectId, sent_at: Date }, unread_counts: { // unread message count per participant "user_id_1": 5, "user_id_2": 0 }, created_at: Date, updated_at: Date }

// messages collection { _id: ObjectId, conversation_id: ObjectId, // indexed sender_id: ObjectId, content: String, attachments: [ { type: String, // 'image', 'file', 'video' url: String, filename: String } ], read_by: [ObjectId], // array of user IDs who have read the message sent_at: Date, // indexed edited_at: Date }

// Indexes db.users.createIndex({ username: 1 }, { unique: true }); db.users.createIndex({ email: 1 }, { unique: true });

db.conversations.createIndex({ "participants.user_id": 1 }); db.conversations.createIndex({ updated_at: -1 });

db.messages.createIndex({ conversation_id: 1, sent_at: -1 }); db.messages.createIndex({ sender_id: 1 });

Design Highlights:

Denormalization for read optimization (embedding last_message)
Indexes on frequently accessed fields
Using array fields (participants, read_by)

Best practices

Quality Improvement

Naming Convention Consistency: Use snake_case for table/column names

users, post_tags, created_at
Be consistent with plurals/singulars (tables plural, columns singular, etc.)

Consider Soft Delete: Use logical deletion instead of physical deletion for important data

deleted_at TIMESTAMP (NULL = active, NOT NULL = deleted)
Allows recovery of accidentally deleted data
Audit trail

Timestamps Required: Include created_at and updated_at in most tables

Data tracking and debugging
Time-series analysis

Efficiency Improvements

Partial Indexes: Minimize index size with conditional indexes CREATE INDEX idx_posts_published ON posts(published_at) WHERE published_at IS NOT NULL;
Materialized Views: Cache complex aggregate queries as Materialized Views
Partitioning: Partition large tables by date/range

Common Issues

Issue 1: N+1 Query Problem

Symptom: Multiple DB calls when a single query would suffice

Cause: Individual lookups in a loop without JOINs

Solution:

-- ❌ Bad example: N+1 queries SELECT * FROM posts; -- 1 time -- for each post SELECT * FROM users WHERE id = ?; -- N times

-- ✅ Good example: 1 query SELECT posts.*, users.username, users.avatar_url FROM posts JOIN users ON posts.author_id = users.id;

Issue 2: Slow JOINs Due to Unindexed Foreign Keys

Symptom: JOIN queries are very slow

Cause: Missing index on Foreign Key column

Solution:

CREATE INDEX idx_orders_user_id ON orders(user_id); CREATE INDEX idx_order_items_order_id ON order_items(order_id); CREATE INDEX idx_order_items_product_id ON order_items(product_id);

Issue 3: UUID vs Auto-increment Performance

Symptom: Insert performance degradation when using UUID Primary Keys

Cause: UUIDs are random, causing index fragmentation

Solution:

PostgreSQL: Use uuid_generate_v7() (time-ordered UUID)
MySQL: Use UUID_TO_BIN(UUID(), 1)
Or consider using Auto-increment BIGINT

References

Official Documentation

PostgreSQL Documentation
MySQL Documentation
MongoDB Schema Design Best Practices

Tools

dbdiagram.io - ERD diagram creation
PgModeler - PostgreSQL modeling tool
Prisma - ORM + migrations

Learning Resources

Database Design Course (freecodecamp)
Use The Index, Luke - SQL indexing guide

Metadata

Version

Current Version: 1.0.0
Last Updated: 2025-01-01
Compatible Platforms: Claude, ChatGPT, Gemini

Related Skills

api-design: Schema design alongside API design
performance-optimization: Query performance optimization

database-schema-design

Safety Notice

Copy this and send it to your AI assistant to learn

Database Schema

Entity Relationship Diagram

Table Descriptions

users

products

Source Transparency

Related Skills

unity-mcp

performance-optimization

ralph