Metrics Cardinality
Prevent high-cardinality tags in StatsD/Datadog metrics to avoid cost explosions.
<rule> name: metrics_cardinality description: | Prevent high-cardinality tags in StatsD/Datadog metrics to avoid cost explosions. High-cardinality tags (IDs, UUIDs, timestamps, user-specific values) create exponential combinations of metric series, leading to significant cost increases.filters:
- type: file_extension pattern: "\.rb$"
actions:
-
type: reject conditions:
- pattern: "STATSD\.(increment|gauge|histogram|timing|distribution)\([^)]*(_id|_uuid|user_id|tenant_id|school_id|section_id|student_id|email|ip_address|timestamp)" message: "Avoid high-cardinality tags in metrics (IDs, UUIDs, emails, IPs, timestamps). Use low-cardinality dimensions only."
-
type: suggest message: | When instrumenting with StatsD/Datadog metrics:
❌ High-Cardinality Tags (AVOID)
- IDs:
user_id,tenant_id,school_id,section_id,student_id,order_id, etc. - UUIDs:
request_id,workflow_uuid,sticker_uuid, etc. - User-specific:
email,username,ip_address - Timestamps:
created_at,timestamp - Free-form text:
error_message,description
✅ Low-Cardinality Tags (SAFE)
- Enums/Categories:
status:success,status:failed,variant:summary,variant:detailed - Types:
report_type:stickers,payment_method:credit_card - Environments:
env:production,env:staging - Regions:
region:us-east,region:eu-west - Boolean flags:
feature_enabled:true,is_admin:false - Fixed sets:
role:teacher,role:parent,role:admin
Examples
❌ BAD - High cardinality
STATSD.increment( 'report.requested', tags: [ "tenant_id:#{tenant.id}", # ❌ Unbounded - thousands of tenants "section_id:#{section.id}", # ❌ Unbounded - millions of sections "user_id:#{user.id}" # ❌ Unbounded - millions of users ] )✅ GOOD - Low cardinality
STATSD.increment( 'report.requested', tags: [ "variant:#{report_variant}", # ✅ Fixed set: summary, detailed "status:#{status}" # ✅ Fixed set: pending, success, failed ] )When You Need High-Cardinality Data
If you need to track per-tenant, per-user, or per-resource metrics:
-
Use Logging Instead
Rails.logger.info( "Report requested", tenant_id: tenant.id, section_id: section.id, variant: variant ) -
Use APM Spans (Datadog APM allows high-cardinality tags)
Datadog::Tracing.trace('report.generate') do |span| span.set_tag('tenant_id', tenant.id) # OK in APM spans span.set_tag('variant', variant) end -
Use Database/Analytics for detailed per-entity tracking
ReportRequest.create!( tenant_id: tenant.id, section_id: section.id, variant: variant )
Cost Impact
- Each unique tag combination creates a new metric series
- Example:
report.requestedwith 3 tags:- 1,000 tenants × 10,000 sections × 2 variants = 20,000,000 series
- At $0.05 per custom metric, this costs $1,000,000/month
- With low-cardinality tags only:
- 2 variants × 3 statuses = 6 series
- Costs $0.30/month
- IDs:
examples:
-
input: |
❌ BAD - Will create millions of metric series
STATSD.increment( 'stickers.awarded', tags: [ "student_id:#{student.id}", "teacher_id:#{teacher.id}", "sticker_uuid:#{sticker.uuid}" ] ) output: | High-cardinality tags detected. Use low-cardinality dimensions or logging instead.
-
input: |
✅ GOOD - Fixed set of tag values
STATSD.increment( 'stickers.awarded', tags: [ "role:#{user.role}", # teacher, admin, parent "sticker_type:#{sticker.type}" # achievement, behavior, custom ] ) output: | Low-cardinality tags are safe and cost-effective.
metadata: priority: high version: 1.0 applies_to: all_ruby_code </rule>