databricks-cost-tuning

Databricks Cost Tuning

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "databricks-cost-tuning" with this command: npx skills add jeremylongshore/claude-code-plugins-plus-skills/jeremylongshore-claude-code-plugins-plus-skills-databricks-cost-tuning

Databricks Cost Tuning

Overview

Reduce Databricks spending by optimizing cluster configurations, leveraging spot instances, right-sizing SQL warehouses, and implementing cost governance policies. Databricks charges per DBU (Databricks Unit) with rates varying by workload type: Jobs Compute ($0.15/DBU), All-Purpose Compute ($0.40/DBU), SQL Compute ($0.22/DBU), and Serverless ($0.07/DBU).

Prerequisites

  • Databricks Premium or Enterprise workspace

  • Access to system.billing.usage tables

  • Workspace admin for cluster policy creation

Instructions

Step 1: Identify Top Cost Drivers

-- Top 10 most expensive clusters this month SELECT cluster_id, cluster_name, sku_name, SUM(usage_quantity) AS total_dbus, ROUND(SUM(usage_quantity * list_price), 2) AS cost_usd FROM system.billing.usage WHERE usage_date >= date_trunc('month', current_date()) GROUP BY cluster_id, cluster_name, sku_name ORDER BY cost_usd DESC LIMIT 10;

Step 2: Enforce Cluster Policies

{ "name": "cost-optimized-interactive", "definition": { "autotermination_minutes": { "type": "range", "minValue": 10, "maxValue": 60, "defaultValue": 20 }, "num_workers": { "type": "range", "minValue": 1, "maxValue": 8 }, "node_type_id": { "type": "allowlist", "values": ["m5.xlarge", "m5.2xlarge"] }, "aws_attributes.availability": { "type": "fixed", "value": "SPOT_WITH_FALLBACK" }, "spark_conf.spark.databricks.cluster.profile": { "type": "fixed", "value": "singleNode" } } }

Step 3: Use Spot Instances for Jobs

jobs/cost_optimized_job.py

job_config = { "name": "nightly-etl", "new_cluster": { "num_workers": 4, "node_type_id": "m5.2xlarge", "aws_attributes": { "availability": "SPOT_WITH_FALLBACK", "first_on_demand": 1, # Driver on-demand, workers spot "spot_bid_price_percent": 100, }, "autotermination_minutes": 0, # Jobs clusters auto-terminate on completion }, }

Spot instances save 60-90% on worker node costs

Step 4: Right-Size SQL Warehouses

-- Check warehouse utilization (queries/hour and queue time) SELECT warehouse_id, warehouse_name, COUNT(*) AS query_count, AVG(total_duration_ms) AS avg_duration_ms, MAX(queue_duration_ms) AS max_queue_ms FROM system.query.history WHERE start_time > current_timestamp() - INTERVAL 7 DAYS GROUP BY warehouse_id, warehouse_name; -- If max_queue_ms is near 0, warehouse is oversized. Reduce cluster size. -- If max_queue_ms > 30000, warehouse needs more capacity or auto-scaling. # 30000: 30 seconds in ms

Step 5: Schedule Auto-Stop for Development Clusters

Set aggressive auto-termination for all dev clusters

databricks clusters list --output JSON |
jq -r '.[] | select(.cluster_name | test("dev|sandbox|test")) | .cluster_id' |
xargs -I{} databricks clusters edit {} --json '{"autotermination_minutes": 15}'

Error Handling

Issue Cause Solution

Spot instance interruption Cloud provider reclaiming capacity Use SPOT_WITH_FALLBACK, checkpoint long jobs

Cluster policy too restrictive Workers can't handle workload Increase max_workers or allow larger instance types

SQL warehouse idle but running No auto-stop configured Enable auto-stop with 10-minute timeout

Billing data not available System tables not enabled Enable system table access in account settings

Examples

Basic usage: Apply databricks cost tuning to a standard project setup with default configuration options.

Advanced scenario: Customize databricks cost tuning for production environments with multiple constraints and team-specific requirements.

Output

  • Configuration files or code changes applied to the project

  • Validation report confirming correct implementation

  • Summary of changes made and their rationale

Resources

  • Official monitoring documentation

  • Community best practices and patterns

  • Related skills in this plugin pack

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

backtesting-trading-strategies

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

svg-icon-generator

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

performance-lighthouse-runner

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

mindmap-generator

No summary provided by upstream source.

Repository SourceNeeds Review