Data Validation Security Pattern
Ensures all incoming data is validated against specifications before processing, preventing injection attacks, data corruption, and unexpected behavior.
When to Use
Use this pattern when:
- Processing ANY input from external sources (users, APIs, databases)
- Preventing injection attacks (SQLi, XSS, Command Injection)
- Implementing API request validation checklists
- Ensuring data integrity for business logic
- Handling file uploads or complex data structures
Problem Addressed
Entity provides unexpected data: Malicious or malformed input causes:
- Injection attacks (SQL, XSS, command injection)
- System crashes or unexpected behavior
- Data corruption
- Security bypasses
Core Components
| Role | Type | Responsibility |
|---|---|---|
| Entity | Entity | Sends data to system |
| Enforcer | Enforcement Point | Intercepts all incoming data |
| Validator | Decision Point | Validates data against specification |
| Specification Provider | Information Point | Manages validation rules |
| System | Entity | Processes validated data |
Data Elements
- data: Input from entity (raw)
- canonical_data: Normalized, validated form
- specification: Rules defining valid data
- type: Identifier for applicable specification
- error: Validation failure message
Validation Flow
Entity → [data] → Enforcer
Enforcer → [data] → Validator
Validator → [get_specification(type)] → Specification Provider
Specification Provider → [specification] → Validator
Validator → [validate, transform to canonical] → Validator
Validator → [canonical_data or error] → Enforcer
Enforcer → [canonical_data] → System (if valid)
→ [error] → Entity (if invalid)
- Enforcer intercepts ALL incoming data
- Validator retrieves appropriate specification
- Validator transforms to canonical form
- Validator checks against specification
- Valid: canonical data forwarded to System
- Invalid: error returned to Entity
Validation Principles
Validate Everything
- All data from uncontrolled sources
- Parameters, headers, cookies, files
- Data from APIs, databases (defense in depth)
Canonical Form
Transform data to standardized form:
- Remove/escape special characters
- Decode encoded values
- Normalize Unicode
- Parse structured data to typed objects
Benefit: System only processes data in known format.
Allowlist vs. Blocklist
- Allowlist (preferred): Define what IS allowed
- Blocklist (risky): Define what is NOT allowed
Blocklists fail against unknown attack patterns. Use allowlists.
Validate Early, Validate Often
- Validate at system boundary (earliest point)
- Re-validate near code that relies on data
- Defense in depth
Validation Types
Type Validation
- Ensure data matches expected type
- Integer, string, boolean, date, email, URL
Range/Length Validation
- Numeric bounds
- String length limits
- Array size limits
Format Validation
- Regular expressions (carefully!)
- Structural patterns
- Protocol conformance
Business Logic Validation
- Application-specific rules
- Cross-field validation
- State-dependent validation
Security Considerations
Validation ≠ Authorization
- Validation: Is this data well-formed?
- Authorization: Is entity allowed to use this data?
Both are required. Valid data doesn't mean authorized access.
Error Messages
- Don't reveal validation internals to attackers
- Log detailed errors server-side
- Return generic errors to clients
Encoding Output
Validation alone doesn't prevent all injection:
- Still encode output for context (HTML, SQL, etc.)
- Use parameterized queries
- Use context-appropriate escaping
File Uploads
Special validation needed:
- Verify content type (not just extension)
- Scan for malware
- Restrict file sizes
- Store outside web root
Structured Data (JSON, XML)
- Parse with secure parser
- Disable external entity processing (XXE)
- Validate against schema
- Limit nesting depth
Regular Expression Safety
- Avoid ReDoS-vulnerable patterns
- Limit input length before regex
- Test regex performance with malicious input
Common Validation Scenarios
| Input Type | Validations |
|---|---|
| Username | Length, allowed characters, no control chars |
| Format, length, allowlist domains (if applicable) | |
| Integer | Type, range, positive/negative |
| URL | Protocol allowlist, format, no javascript: |
| File | Extension, content-type, size, malware scan |
| JSON | Schema validation, depth limits, size limits |
Implementation Examples
Python (Pydantic / Flask)
BAD (Vulnerable):
# ❌ VULNERABILITY: Manual, incomplete validation
@app.route("/user", methods=["POST"])
def create_user():
data = request.get_json()
if 'email' not in data: # What about type? Length? format?
return "Missing email", 400
# ... proceeding to use data['age'] which might be a string or negative
GOOD (Secure):
from pydantic import BaseModel, EmailStr, conint, constr
# ✅ Define strict schema
class UserSchema(BaseModel):
username: constr(min_length=3, max_length=50, pattern=r'^[a-zA-Z0-9_]+$')
email: EmailStr
age: conint(ge=18, le=120)
@app.route("/user", methods=["POST"])
def create_user():
try:
# ✅ Validate payload against schema
user = UserSchema(**request.get_json())
save_to_db(user.model_dump())
except ValueError as e:
return jsonify({"error": str(e)}), 400
JavaScript (Zod / Express)
BAD (Vulnerable):
// ❌ VULNERABILITY: Implicit trust
app.post('/api/profile', (req, res) => {
// trusting req.body.website is a valid URL
// trusting req.body.role is not "admin"
updateProfile(req.user.id, req.body);
});
GOOD (Secure):
const { z } = require('zod');
// ✅ Define strict schema
const ProfileSchema = z.object({
website: z.string().url().max(100),
bio: z.string().max(500).optional(),
role: z.enum(['user', 'editor']), // Block 'admin'
});
app.post('/api/profile', (req, res) => {
const result = ProfileSchema.safeParse(req.body);
if (!result.success) {
return res.status(400).json(result.error);
}
// ✅ Apply canonical/validated data
updateProfile(req.user.id, result.data);
});
Implementation Checklist
- All entry points have validation
- Canonical form transformation
- Allowlist-based rules
- Type checking
- Length/range limits
- Business rule validation
- Secure error handling
- Output encoding (separate from validation)
- File upload validation
- Structured data parsing safely
- Re-validation near sensitive operations
Related Patterns
- Authorisation (validation doesn't replace authorization)
- Selective encrypted transmission (protect data in transit)
- Log entity actions (log validation failures)
References
- Source: https://securitypatterns.distrinet-research.be/patterns/04_01_001__data_validation/
- OWASP Input Validation Cheat Sheet
- OWASP XSS Prevention Cheat Sheet