Grafana Mimir Skill
Comprehensive guide for Grafana Mimir - the horizontally scalable, highly available, multi-tenant time series database for long-term Prometheus metrics storage.
What is Mimir?
Mimir is an open-source, horizontally scalable, highly available, multi-tenant long-term storage solution for Prometheus and OpenTelemetry metrics that:
-
Overcomes Prometheus limitations - Scalability and long-term retention
-
Multi-tenant by default - Built-in tenant isolation via X-Scope-OrgID header
-
Stores data in object storage - S3, GCS, Azure Blob Storage, or Swift
-
100% Prometheus compatible - PromQL queries, remote write protocol
-
Part of LGTM+ Stack - Logs, Grafana, Traces, Metrics unified observability
Architecture Overview
Core Components
Component Purpose
Distributor Validates requests, routes incoming metrics to ingesters via hash ring
Ingester Stores time-series data in memory, flushes to object storage
Querier Executes PromQL queries from ingesters and store-gateways
Query Frontend Caches query results, optimizes and splits queries
Query Scheduler Manages per-tenant query queues for fairness
Store-Gateway Provides access to historical metric blocks in object storage
Compactor Consolidates and optimizes stored metric data blocks
Ruler Evaluates recording and alerting rules (optional)
Alertmanager Handles alert routing and deduplication (optional)
Data Flow
Write Path:
Prometheus/OTel → Distributor → Ingester → Object Storage ↓ Hash Ring (routes by series)
Read Path:
Query → Query Frontend → Query Scheduler → Querier ↓ Ingesters (recent) ↓ Store-Gateway (historical)
Deployment Modes
- Monolithic Mode (-target=all )
-
All components in single process
-
Best for: Development, testing, small-scale (~1M series)
-
Horizontally scalable by deploying multiple instances
-
Not recommended for large-scale (all components scale together)
- Microservices Mode (Distributed) - Recommended for Production
Using mimir-distributed Helm chart
distributor: replicas: 3
ingester: replicas: 3 zoneAwareReplication: enabled: true
querier: replicas: 3
queryFrontend: replicas: 2
queryScheduler: replicas: 2
storeGateway: replicas: 3
compactor: replicas: 1
Helm Deployment
Add Repository
helm repo add grafana https://grafana.github.io/helm-charts helm repo update
Install Distributed Mimir
helm install mimir grafana/mimir-distributed
--namespace monitoring
--values values.yaml
Pre-Built Values Files
File Purpose
values.yaml
Non-production testing with MinIO
small.yaml
~1 million series (single replicas, not HA)
large.yaml
Production (~10 million series)
Production Values Example
Deployment mode
mimir: structuredConfig: multitenancy_enabled: true
Storage configuration
mimir: structuredConfig: common: storage: backend: azure # or s3, gcs azure: account_name: ${AZURE_STORAGE_ACCOUNT} account_key: ${AZURE_STORAGE_KEY} endpoint_suffix: blob.core.windows.net
blocks_storage:
azure:
container_name: mimir-blocks
alertmanager_storage:
azure:
container_name: mimir-alertmanager
ruler_storage:
azure:
container_name: mimir-ruler
Distributor
distributor: replicas: 3 resources: requests: cpu: 1 memory: 2Gi limits: memory: 4Gi
Ingester
ingester: replicas: 3 zoneAwareReplication: enabled: true persistentVolume: enabled: true size: 50Gi resources: requests: cpu: 2 memory: 8Gi limits: memory: 16Gi
Querier
querier: replicas: 3 resources: requests: cpu: 1 memory: 2Gi limits: memory: 8Gi
Query Frontend
query_frontend: replicas: 2 resources: requests: cpu: 500m memory: 1Gi limits: memory: 2Gi
Query Scheduler
query_scheduler: replicas: 2
Store Gateway
store_gateway: replicas: 3 persistentVolume: enabled: true size: 20Gi resources: requests: cpu: 500m memory: 2Gi limits: memory: 8Gi
Compactor
compactor: replicas: 1 persistentVolume: enabled: true size: 50Gi resources: requests: cpu: 1 memory: 4Gi limits: memory: 8Gi
Gateway for external access
gateway: enabledNonEnterprise: true replicas: 2
Monitoring
metaMonitoring: serviceMonitor: enabled: true
Storage Configuration
Critical Requirements
-
Must create buckets manually - Mimir doesn't create them
-
Separate buckets required - blocks_storage, alertmanager_storage, ruler_storage cannot share the same bucket+prefix
-
Azure: Hierarchical namespace must be disabled
Azure Blob Storage
mimir: structuredConfig: common: storage: backend: azure azure: account_name: <storage-account-name> # Option 1: Account Key (via environment variable) account_key: ${AZURE_STORAGE_KEY} # Option 2: User-Assigned Managed Identity # user_assigned_id: <identity-client-id> endpoint_suffix: blob.core.windows.net
blocks_storage:
azure:
container_name: mimir-blocks
alertmanager_storage:
azure:
container_name: mimir-alertmanager
ruler_storage:
azure:
container_name: mimir-ruler
AWS S3
mimir: structuredConfig: common: storage: backend: s3 s3: endpoint: s3.us-east-1.amazonaws.com region: us-east-1 access_key_id: ${AWS_ACCESS_KEY_ID} secret_access_key: ${AWS_SECRET_ACCESS_KEY}
blocks_storage:
s3:
bucket_name: mimir-blocks
alertmanager_storage:
s3:
bucket_name: mimir-alertmanager
ruler_storage:
s3:
bucket_name: mimir-ruler
Google Cloud Storage
mimir: structuredConfig: common: storage: backend: gcs gcs: service_account: ${GCS_SERVICE_ACCOUNT_JSON}
blocks_storage:
gcs:
bucket_name: mimir-blocks
alertmanager_storage:
gcs:
bucket_name: mimir-alertmanager
ruler_storage:
gcs:
bucket_name: mimir-ruler
Limits Configuration
mimir: structuredConfig: limits: # Ingestion limits ingestion_rate: 25000 # Samples/sec per tenant ingestion_burst_size: 50000 # Burst size max_series_per_metric: 10000 max_series_per_user: 1000000 max_global_series_per_user: 1000000 max_label_names_per_series: 30 max_label_name_length: 1024 max_label_value_length: 2048
# Query limits
max_fetched_series_per_query: 100000
max_fetched_chunks_per_query: 2000000
max_query_lookback: 0 # No limit
max_query_parallelism: 32
# Retention
compactor_blocks_retention_period: 365d # 1 year
# Out-of-order samples
out_of_order_time_window: 5m
Per-Tenant Overrides (Runtime Configuration)
runtime-config.yaml
overrides: tenant1: ingestion_rate: 50000 max_series_per_user: 2000000 compactor_blocks_retention_period: 730d # 2 years tenant2: ingestion_rate: 75000 max_global_series_per_user: 5000000
Enable runtime configuration:
mimir: structuredConfig: runtime_config: file: /etc/mimir/runtime-config.yaml period: 10s
High Availability Configuration
HA Tracker for Prometheus Deduplication
mimir: structuredConfig: distributor: ha_tracker: enable_ha_tracker: true kvstore: store: memberlist cluster_label: cluster replica_label: replica
memberlist:
join_members:
- mimir-gossip-ring.monitoring.svc.cluster.local:7946
Prometheus Configuration:
global: external_labels: cluster: prom-team1 replica: replica1
remote_write:
- url: http://mimir-gateway:8080/api/v1/push headers: X-Scope-OrgID: my-tenant
Zone-Aware Replication
ingester: zoneAwareReplication: enabled: true zones: - name: zone-a nodeSelector: topology.kubernetes.io/zone: us-east-1a - name: zone-b nodeSelector: topology.kubernetes.io/zone: us-east-1b - name: zone-c nodeSelector: topology.kubernetes.io/zone: us-east-1c
store_gateway: zoneAwareReplication: enabled: true
Shuffle Sharding
Limits tenant data to a subset of instances for fault isolation:
mimir: structuredConfig: limits: # Write path ingestion_tenant_shard_size: 3
# Read path
max_queriers_per_tenant: 5
store_gateway_tenant_shard_size: 3
OpenTelemetry Integration
OTLP Metrics Ingestion
OpenTelemetry Collector Config:
exporters: otlphttp: endpoint: http://mimir-gateway:8080/otlp headers: X-Scope-OrgID: "my-tenant"
service: pipelines: metrics: receivers: [otlp] exporters: [otlphttp]
Exponential Histograms (Experimental)
// Go SDK configuration Aggregation: metric.AggregationBase2ExponentialHistogram{ MaxSize: 160, // Maximum buckets MaxScale: 20, // Scale factor }
Key Benefits:
-
Explicit min/max values (no estimation needed)
-
Better accuracy for extreme percentiles
-
Native OTLP format preservation
Multi-Tenancy
mimir: structuredConfig: multitenancy_enabled: true no_auth_tenant: anonymous # Used when multitenancy disabled
Query with tenant header:
curl -H "X-Scope-OrgID: tenant-a"
"http://mimir:8080/prometheus/api/v1/query?query=up"
Tenant ID Constraints:
-
Max 150 characters
-
Allowed: alphanumeric, !
_ . * ' ( )
- Prohibited: . or .. alone, __mimir_cluster , slashes
API Reference
Ingestion Endpoints
Prometheus remote write
POST /api/v1/push
OTLP metrics
POST /otlp/v1/metrics
InfluxDB line protocol
POST /api/v1/push/influx/write
Query Endpoints
Instant query
GET,POST /prometheus/api/v1/query?query=<promql>&time=<timestamp>
Range query
GET,POST /prometheus/api/v1/query_range?query=<promql>&start=<start>&end=<end>&step=<step>
Labels
GET,POST /prometheus/api/v1/labels GET /prometheus/api/v1/label/{name}/values
Series
GET,POST /prometheus/api/v1/series
Exemplars
GET,POST /prometheus/api/v1/query_exemplars
Cardinality
GET,POST /prometheus/api/v1/cardinality/label_names GET,POST /prometheus/api/v1/cardinality/active_series
Administrative Endpoints
Flush ingester data
GET,POST /ingester/flush
Prepare shutdown
GET,POST,DELETE /ingester/prepare-shutdown
Ring status
GET /ingester/ring GET /distributor/ring GET /store-gateway/ring GET /compactor/ring
Tenant stats
GET /distributor/all_user_stats GET /api/v1/user_stats GET /api/v1/user_limits
Health & Config
GET /ready GET /metrics GET /config GET /config?mode=diff GET /runtime_config
Azure Identity Configuration
User-Assigned Managed Identity
- Create Identity:
az identity create
--name mimir-identity
--resource-group <rg>
IDENTITY_CLIENT_ID=$(az identity show --name mimir-identity --resource-group <rg> --query clientId -o tsv) IDENTITY_PRINCIPAL_ID=$(az identity show --name mimir-identity --resource-group <rg> --query principalId -o tsv)
- Assign to Node Pool:
az vmss identity assign
--resource-group <aks-node-rg>
--name <vmss-name>
--identities /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/mimir-identity
- Grant Storage Permission:
az role assignment create
--role "Storage Blob Data Contributor"
--assignee-object-id $IDENTITY_PRINCIPAL_ID
--scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>
- Configure Mimir:
mimir: structuredConfig: common: storage: azure: user_assigned_id: <IDENTITY_CLIENT_ID>
Workload Identity Federation
- Create Federated Credential:
az identity federated-credential create
--name mimir-federated
--identity-name mimir-identity
--resource-group <rg>
--issuer <aks-oidc-issuer-url>
--subject system:serviceaccount:monitoring:mimir
--audiences api://AzureADTokenExchange
- Configure Helm Values:
serviceAccount: annotations: azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>
podLabels: azure.workload.identity/use: "true"
Troubleshooting
Common Issues
- Container Not Found (Azure)
Create required containers
az storage container create --name mimir-blocks --account-name <storage> az storage container create --name mimir-alertmanager --account-name <storage> az storage container create --name mimir-ruler --account-name <storage>
- Authorization Failure (Azure)
Verify RBAC assignment
az role assignment list --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>
Assign if missing
az role assignment create
--role "Storage Blob Data Contributor"
--assignee-object-id <principal-id>
--scope <storage-scope>
Restart pod to refresh token
kubectl delete pod -n monitoring <ingester-pod>
- Ingester OOM
ingester: resources: limits: memory: 16Gi # Increase memory
- Query Timeout
mimir: structuredConfig: querier: timeout: 5m max_concurrent: 20
- High Cardinality
mimir: structuredConfig: limits: max_series_per_user: 5000000 max_series_per_metric: 50000
Diagnostic Commands
Check pod status
kubectl get pods -n monitoring -l app.kubernetes.io/name=mimir
Check ingester logs
kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100
Check distributor logs
kubectl logs -n monitoring -l app.kubernetes.io/component=distributor --tail=100
Verify readiness
kubectl exec -it <mimir-pod> -n monitoring -- wget -qO- http://localhost:8080/ready
Check ring status
kubectl port-forward svc/mimir-distributor 8080:8080 -n monitoring curl http://localhost:8080/distributor/ring
Check configuration
kubectl exec -it <mimir-pod> -n monitoring -- cat /etc/mimir/mimir.yaml
Validate configuration before deployment
mimir -modules -config.file <path-to-config-file>
Key Metrics to Monitor
Ingestion rate per tenant
sum by (user) (rate(cortex_distributor_received_samples_total[5m]))
Series count per tenant
sum by (user) (cortex_ingester_memory_series)
Query latency
histogram_quantile(0.99, sum by (le) (rate(cortex_request_duration_seconds_bucket{route=~"/api/prom/api/v1/query.*"}[5m])))
Compactor status
cortex_compactor_runs_completed_total cortex_compactor_runs_failed_total
Store-gateway block sync
cortex_bucket_store_blocks_loaded
Circuit Breakers (Ingester)
mimir: structuredConfig: ingester: push_circuit_breaker: enabled: true request_timeout: 2s failure_threshold_percentage: 10 cooldown_period: 10s read_circuit_breaker: enabled: true request_timeout: 30s
States:
-
Closed - Normal operation
-
Open - Stops forwarding to failing instances
-
Half-open - Limited trial requests after cooldown
External Resources
-
Official Mimir Documentation
-
Mimir Helm Chart
-
Configuration Reference
-
HTTP API Reference
-
Mimir GitHub Repository