Grafana Mimir Skill

Comprehensive guide for Grafana Mimir - the horizontally scalable, highly available, multi-tenant time series database for long-term Prometheus metrics storage.

What is Mimir?

Mimir is an open-source, horizontally scalable, highly available, multi-tenant long-term storage solution for Prometheus and OpenTelemetry metrics that:

Overcomes Prometheus limitations - Scalability and long-term retention
Multi-tenant by default - Built-in tenant isolation via X-Scope-OrgID header
Stores data in object storage - S3, GCS, Azure Blob Storage, or Swift
100% Prometheus compatible - PromQL queries, remote write protocol
Part of LGTM+ Stack - Logs, Grafana, Traces, Metrics unified observability

Architecture Overview

Core Components

Component Purpose

Distributor Validates requests, routes incoming metrics to ingesters via hash ring

Ingester Stores time-series data in memory, flushes to object storage

Querier Executes PromQL queries from ingesters and store-gateways

Query Frontend Caches query results, optimizes and splits queries

Query Scheduler Manages per-tenant query queues for fairness

Store-Gateway Provides access to historical metric blocks in object storage

Compactor Consolidates and optimizes stored metric data blocks

Ruler Evaluates recording and alerting rules (optional)

Alertmanager Handles alert routing and deduplication (optional)

Data Flow

Write Path:

Prometheus/OTel → Distributor → Ingester → Object Storage ↓ Hash Ring (routes by series)

Read Path:

Query → Query Frontend → Query Scheduler → Querier ↓ Ingesters (recent) ↓ Store-Gateway (historical)

Deployment Modes

Monolithic Mode (-target=all )

All components in single process
Best for: Development, testing, small-scale (~1M series)
Horizontally scalable by deploying multiple instances
Not recommended for large-scale (all components scale together)

Microservices Mode (Distributed) - Recommended for Production

Using mimir-distributed Helm chart

distributor: replicas: 3

ingester: replicas: 3 zoneAwareReplication: enabled: true

querier: replicas: 3

queryFrontend: replicas: 2

queryScheduler: replicas: 2

storeGateway: replicas: 3

compactor: replicas: 1

Helm Deployment

Add Repository

helm repo add grafana https://grafana.github.io/helm-charts helm repo update

Install Distributed Mimir

helm install mimir grafana/mimir-distributed
--namespace monitoring
--values values.yaml

Pre-Built Values Files

File Purpose

values.yaml

Non-production testing with MinIO

small.yaml

~1 million series (single replicas, not HA)

large.yaml

Production (~10 million series)

Production Values Example

Deployment mode

mimir: structuredConfig: multitenancy_enabled: true

Storage configuration

mimir: structuredConfig: common: storage: backend: azure # or s3, gcs azure: account_name: ${AZURE_STORAGE_ACCOUNT} account_key: ${AZURE_STORAGE_KEY} endpoint_suffix: blob.core.windows.net

blocks_storage:
  azure:
    container_name: mimir-blocks

alertmanager_storage:
  azure:
    container_name: mimir-alertmanager

ruler_storage:
  azure:
    container_name: mimir-ruler

Distributor

distributor: replicas: 3 resources: requests: cpu: 1 memory: 2Gi limits: memory: 4Gi

Ingester

ingester: replicas: 3 zoneAwareReplication: enabled: true persistentVolume: enabled: true size: 50Gi resources: requests: cpu: 2 memory: 8Gi limits: memory: 16Gi

Querier

querier: replicas: 3 resources: requests: cpu: 1 memory: 2Gi limits: memory: 8Gi

Query Frontend

query_frontend: replicas: 2 resources: requests: cpu: 500m memory: 1Gi limits: memory: 2Gi

Query Scheduler

query_scheduler: replicas: 2

Store Gateway

store_gateway: replicas: 3 persistentVolume: enabled: true size: 20Gi resources: requests: cpu: 500m memory: 2Gi limits: memory: 8Gi

Compactor

compactor: replicas: 1 persistentVolume: enabled: true size: 50Gi resources: requests: cpu: 1 memory: 4Gi limits: memory: 8Gi

Gateway for external access

gateway: enabledNonEnterprise: true replicas: 2

Monitoring

metaMonitoring: serviceMonitor: enabled: true

Storage Configuration

Critical Requirements

Must create buckets manually - Mimir doesn't create them
Separate buckets required - blocks_storage, alertmanager_storage, ruler_storage cannot share the same bucket+prefix
Azure: Hierarchical namespace must be disabled

Azure Blob Storage

mimir: structuredConfig: common: storage: backend: azure azure: account_name: <storage-account-name> # Option 1: Account Key (via environment variable) account_key: ${AZURE_STORAGE_KEY} # Option 2: User-Assigned Managed Identity # user_assigned_id: <identity-client-id> endpoint_suffix: blob.core.windows.net

blocks_storage:
  azure:
    container_name: mimir-blocks

alertmanager_storage:
  azure:
    container_name: mimir-alertmanager

ruler_storage:
  azure:
    container_name: mimir-ruler

AWS S3

mimir: structuredConfig: common: storage: backend: s3 s3: endpoint: s3.us-east-1.amazonaws.com region: us-east-1 access_key_id: ${AWS_ACCESS_KEY_ID} secret_access_key: ${AWS_SECRET_ACCESS_KEY}

blocks_storage:
  s3:
    bucket_name: mimir-blocks

alertmanager_storage:
  s3:
    bucket_name: mimir-alertmanager

ruler_storage:
  s3:
    bucket_name: mimir-ruler

Google Cloud Storage

mimir: structuredConfig: common: storage: backend: gcs gcs: service_account: ${GCS_SERVICE_ACCOUNT_JSON}

blocks_storage:
  gcs:
    bucket_name: mimir-blocks

alertmanager_storage:
  gcs:
    bucket_name: mimir-alertmanager

ruler_storage:
  gcs:
    bucket_name: mimir-ruler

Limits Configuration

mimir: structuredConfig: limits: # Ingestion limits ingestion_rate: 25000 # Samples/sec per tenant ingestion_burst_size: 50000 # Burst size max_series_per_metric: 10000 max_series_per_user: 1000000 max_global_series_per_user: 1000000 max_label_names_per_series: 30 max_label_name_length: 1024 max_label_value_length: 2048

  # Query limits
  max_fetched_series_per_query: 100000
  max_fetched_chunks_per_query: 2000000
  max_query_lookback: 0                    # No limit
  max_query_parallelism: 32

  # Retention
  compactor_blocks_retention_period: 365d  # 1 year

  # Out-of-order samples
  out_of_order_time_window: 5m

Per-Tenant Overrides (Runtime Configuration)

runtime-config.yaml

overrides: tenant1: ingestion_rate: 50000 max_series_per_user: 2000000 compactor_blocks_retention_period: 730d # 2 years tenant2: ingestion_rate: 75000 max_global_series_per_user: 5000000

Enable runtime configuration:

mimir: structuredConfig: runtime_config: file: /etc/mimir/runtime-config.yaml period: 10s

High Availability Configuration

HA Tracker for Prometheus Deduplication

mimir: structuredConfig: distributor: ha_tracker: enable_ha_tracker: true kvstore: store: memberlist cluster_label: cluster replica_label: replica

memberlist:
  join_members:
    - mimir-gossip-ring.monitoring.svc.cluster.local:7946

Prometheus Configuration:

global: external_labels: cluster: prom-team1 replica: replica1

remote_write:

url: http://mimir-gateway:8080/api/v1/push headers: X-Scope-OrgID: my-tenant

Zone-Aware Replication

ingester: zoneAwareReplication: enabled: true zones: - name: zone-a nodeSelector: topology.kubernetes.io/zone: us-east-1a - name: zone-b nodeSelector: topology.kubernetes.io/zone: us-east-1b - name: zone-c nodeSelector: topology.kubernetes.io/zone: us-east-1c

store_gateway: zoneAwareReplication: enabled: true

Shuffle Sharding

Limits tenant data to a subset of instances for fault isolation:

mimir: structuredConfig: limits: # Write path ingestion_tenant_shard_size: 3

  # Read path
  max_queriers_per_tenant: 5
  store_gateway_tenant_shard_size: 3

OpenTelemetry Integration

OTLP Metrics Ingestion

OpenTelemetry Collector Config:

exporters: otlphttp: endpoint: http://mimir-gateway:8080/otlp headers: X-Scope-OrgID: "my-tenant"

service: pipelines: metrics: receivers: [otlp] exporters: [otlphttp]

Exponential Histograms (Experimental)

// Go SDK configuration Aggregation: metric.AggregationBase2ExponentialHistogram{ MaxSize: 160, // Maximum buckets MaxScale: 20, // Scale factor }

Key Benefits:

Explicit min/max values (no estimation needed)
Better accuracy for extreme percentiles
Native OTLP format preservation

Multi-Tenancy

mimir: structuredConfig: multitenancy_enabled: true no_auth_tenant: anonymous # Used when multitenancy disabled

Query with tenant header:

curl -H "X-Scope-OrgID: tenant-a"
"http://mimir:8080/prometheus/api/v1/query?query=up"

Tenant ID Constraints:

Max 150 characters
Allowed: alphanumeric, !

_ . * ' ( )

Prohibited: . or .. alone, __mimir_cluster , slashes

API Reference

Ingestion Endpoints

Prometheus remote write

POST /api/v1/push

OTLP metrics

POST /otlp/v1/metrics

InfluxDB line protocol

POST /api/v1/push/influx/write

Query Endpoints

Instant query

GET,POST /prometheus/api/v1/query?query=<promql>&time=<timestamp>

Range query

GET,POST /prometheus/api/v1/query_range?query=<promql>&start=<start>&end=<end>&step=<step>

Labels

GET,POST /prometheus/api/v1/labels GET /prometheus/api/v1/label/{name}/values

Series

GET,POST /prometheus/api/v1/series

Exemplars

GET,POST /prometheus/api/v1/query_exemplars

Cardinality

GET,POST /prometheus/api/v1/cardinality/label_names GET,POST /prometheus/api/v1/cardinality/active_series

Administrative Endpoints

Flush ingester data

GET,POST /ingester/flush

Prepare shutdown

GET,POST,DELETE /ingester/prepare-shutdown

Ring status

GET /ingester/ring GET /distributor/ring GET /store-gateway/ring GET /compactor/ring

Tenant stats

GET /distributor/all_user_stats GET /api/v1/user_stats GET /api/v1/user_limits

Health & Config

GET /ready GET /metrics GET /config GET /config?mode=diff GET /runtime_config

Azure Identity Configuration

User-Assigned Managed Identity

Create Identity:

az identity create
--name mimir-identity
--resource-group <rg>

IDENTITY_CLIENT_ID=$(az identity show --name mimir-identity --resource-group <rg> --query clientId -o tsv) IDENTITY_PRINCIPAL_ID=$(az identity show --name mimir-identity --resource-group <rg> --query principalId -o tsv)

Assign to Node Pool:

az vmss identity assign
--resource-group <aks-node-rg>
--name <vmss-name>
--identities /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/mimir-identity

Grant Storage Permission:

az role assignment create
--role "Storage Blob Data Contributor"
--assignee-object-id $IDENTITY_PRINCIPAL_ID
--scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>

Configure Mimir:

mimir: structuredConfig: common: storage: azure: user_assigned_id: <IDENTITY_CLIENT_ID>

Workload Identity Federation

Create Federated Credential:

az identity federated-credential create
--name mimir-federated
--identity-name mimir-identity
--resource-group <rg>
--issuer <aks-oidc-issuer-url>
--subject system:serviceaccount:monitoring:mimir
--audiences api://AzureADTokenExchange

Configure Helm Values:

serviceAccount: annotations: azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>

podLabels: azure.workload.identity/use: "true"

Troubleshooting

Common Issues

Container Not Found (Azure)

Create required containers

az storage container create --name mimir-blocks --account-name <storage> az storage container create --name mimir-alertmanager --account-name <storage> az storage container create --name mimir-ruler --account-name <storage>

Authorization Failure (Azure)

Verify RBAC assignment

az role assignment list --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>

Assign if missing

az role assignment create
--role "Storage Blob Data Contributor"
--assignee-object-id <principal-id>
--scope <storage-scope>

Restart pod to refresh token

kubectl delete pod -n monitoring <ingester-pod>

Ingester OOM

ingester: resources: limits: memory: 16Gi # Increase memory

Query Timeout

mimir: structuredConfig: querier: timeout: 5m max_concurrent: 20

High Cardinality

mimir: structuredConfig: limits: max_series_per_user: 5000000 max_series_per_metric: 50000

Diagnostic Commands

Check pod status

kubectl get pods -n monitoring -l app.kubernetes.io/name=mimir

Check ingester logs

kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100

Check distributor logs

kubectl logs -n monitoring -l app.kubernetes.io/component=distributor --tail=100

Verify readiness

kubectl exec -it <mimir-pod> -n monitoring -- wget -qO- http://localhost:8080/ready

Check ring status

kubectl port-forward svc/mimir-distributor 8080:8080 -n monitoring curl http://localhost:8080/distributor/ring

Check configuration

kubectl exec -it <mimir-pod> -n monitoring -- cat /etc/mimir/mimir.yaml

Validate configuration before deployment

mimir -modules -config.file <path-to-config-file>

Key Metrics to Monitor

Ingestion rate per tenant

sum by (user) (rate(cortex_distributor_received_samples_total[5m]))

Series count per tenant

sum by (user) (cortex_ingester_memory_series)

Query latency

histogram_quantile(0.99, sum by (le) (rate(cortex_request_duration_seconds_bucket{route=~"/api/prom/api/v1/query.*"}[5m])))

Compactor status

cortex_compactor_runs_completed_total cortex_compactor_runs_failed_total

Store-gateway block sync

cortex_bucket_store_blocks_loaded

Circuit Breakers (Ingester)

mimir: structuredConfig: ingester: push_circuit_breaker: enabled: true request_timeout: 2s failure_threshold_percentage: 10 cooldown_period: 10s read_circuit_breaker: enabled: true request_timeout: 30s

States:

Closed - Normal operation
Open - Stops forwarding to failing instances
Half-open - Limited trial requests after cooldown

External Resources

Official Mimir Documentation
Mimir Helm Chart
Configuration Reference
HTTP API Reference
Mimir GitHub Repository