databricks-asset-bundles

Databricks Asset Bundles Skill

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "databricks-asset-bundles" with this command: npx skills add vivekgana/databricks-platform-marketplace/vivekgana-databricks-platform-marketplace-databricks-asset-bundles

Databricks Asset Bundles Skill

Overview

Databricks Asset Bundles (DAB) is a modern deployment framework that packages notebooks, DLT pipelines, jobs, and configurations into versioned, environment-aware bundles. It enables Infrastructure as Code for Databricks.

Key Benefits:

  • Infrastructure as Code

  • Multi-environment support (dev, staging, prod)

  • Version control for all artifacts

  • Automated deployment

  • Environment-specific configurations

  • Integrated with CI/CD

When to Use This Skill

Use Databricks Asset Bundles when you need to:

  • Deploy pipelines across multiple environments

  • Implement Infrastructure as Code

  • Automate deployment workflows

  • Manage environment-specific configurations

  • Version control Databricks artifacts

  • Enable collaborative development

  • Standardize deployment processes

Core Concepts

  1. Bundle Structure

Standard Bundle Layout:

my-bundle/ ├── databricks.yml # Main configuration ├── environments/ │ ├── dev.yml # Development overrides │ ├── staging.yml # Staging overrides │ └── prod.yml # Production overrides ├── src/ │ ├── notebooks/ │ │ ├── bronze_ingestion.py │ │ └── silver_transformation.py │ └── pipelines/ │ └── dlt_pipeline.py ├── resources/ │ ├── jobs.yml │ ├── pipelines.yml │ └── clusters.yml └── tests/ └── test_transformations.py

  1. Main Configuration

databricks.yml:

bundle: name: data-platform-bundle

Optional git configuration

git: branch: main origin_url: https://github.com/org/repo.git

workspace: host: https://your-workspace.databricks.com root_path: /Workspace/bundles/${bundle.name}

Define variables

variables: catalog_name: description: "Unity Catalog name" default: "dev_catalog"

storage_path: description: "Base storage path" default: "/mnt/dev/data"

cluster_size: description: "Cluster size" default: "small"

Include other configuration files

include:

  • resources/*.yml

Define resources

resources: jobs: daily_pipeline: name: "[${bundle.environment}] Daily Pipeline"

  tasks:
    - task_key: bronze_ingestion
      notebook_task:
        notebook_path: ./src/notebooks/bronze_ingestion
        source: WORKSPACE
        base_parameters:
          catalog: ${var.catalog_name}
          storage: ${var.storage_path}

      new_cluster:
        num_workers: 2
        spark_version: 13.3.x-scala2.12
        node_type_id: i3.xlarge
        spark_conf:
          spark.databricks.delta.preview.enabled: "true"

    - task_key: silver_transformation
      depends_on:
        - task_key: bronze_ingestion
      notebook_task:
        notebook_path: ./src/notebooks/silver_transformation
        source: WORKSPACE

      job_cluster_key: shared_cluster

  job_clusters:
    - job_cluster_key: shared_cluster
      new_cluster:
        num_workers: "${var.cluster_size == 'small' ? 2 : 8}"
        spark_version: 13.3.x-scala2.12
        node_type_id: i3.xlarge

  schedule:
    quartz_cron_expression: "0 0 1 * * ?"  # Daily at 1 AM
    timezone_id: "America/New_York"

  email_notifications:
    on_failure:
      - data-team@company.com

pipelines: bronze_to_gold: name: "[${bundle.environment}] Bronze to Gold Pipeline" target: ${var.catalog_name} storage: ${var.storage_path}/dlt

  libraries:
    - notebook:
        path: ./src/pipelines/dlt_pipeline.py

  clusters:
    - label: default
      num_workers: 4
      node_type_id: i3.xlarge

  configuration:
    source_path: ${var.storage_path}/landing
    checkpoint_path: ${var.storage_path}/checkpoints

  development: false
  continuous: false

targets: dev: mode: development workspace: host: https://dev-workspace.databricks.com root_path: /Workspace/dev/${bundle.name} variables: catalog_name: dev_catalog storage_path: /mnt/dev/data cluster_size: small

staging: mode: production workspace: host: https://staging-workspace.databricks.com root_path: /Workspace/staging/${bundle.name} variables: catalog_name: staging_catalog storage_path: /mnt/staging/data cluster_size: medium

prod: mode: production workspace: host: https://prod-workspace.databricks.com root_path: /Workspace/prod/${bundle.name} variables: catalog_name: prod_catalog storage_path: /mnt/prod/data cluster_size: large

  1. Environment-Specific Configuration

environments/prod.yml:

Production-specific overrides

variables: catalog_name: prod_catalog storage_path: /mnt/prod/data cluster_size: large

resources: jobs: daily_pipeline: # Production-specific settings max_concurrent_runs: 1 timeout_seconds: 7200

  job_clusters:
    - job_cluster_key: shared_cluster
      new_cluster:
        num_workers: 8
        node_type_id: i3.2xlarge
        autoscale:
          min_workers: 4
          max_workers: 16

  email_notifications:
    on_start:
      - data-team@company.com
    on_success:
      - data-team@company.com
    on_failure:
      - data-team@company.com
      - oncall@company.com

pipelines: bronze_to_gold: development: false continuous: true # Continuous processing in prod

  clusters:
    - label: default
      num_workers: 8
      node_type_id: i3.2xlarge
      autoscale:
        min_workers: 4
        max_workers: 16

  notifications:
    - email_recipients:
        - data-team@company.com
      on_failure: true
      on_success: false

4. Deployment Workflow

CLI Commands:

Install Databricks CLI

curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

Authenticate

databricks auth login --host https://your-workspace.databricks.com

Validate bundle

databricks bundle validate -t dev

Deploy to development

databricks bundle deploy -t dev

Run a job

databricks bundle run -t dev daily_pipeline

Deploy to production

databricks bundle deploy -t prod

Destroy bundle (cleanup)

databricks bundle destroy -t dev

Implementation Patterns

Pattern 1: Multi-Environment Pipeline

Complete Bundle with Environment Variations:

databricks.yml

bundle: name: customer-analytics

variables: environment: description: "Deployment environment" catalog: description: "Unity Catalog" min_workers: description: "Minimum cluster workers" default: 2 max_workers: description: "Maximum cluster workers" default: 8

resources: jobs: customer_pipeline: name: "[${var.environment}] Customer Analytics Pipeline"

  tasks:
    - task_key: ingest
      notebook_task:
        notebook_path: ./notebooks/ingest_customers
      new_cluster:
        num_workers: ${var.min_workers}
        spark_version: 13.3.x-scala2.12
        node_type_id: i3.xlarge

    - task_key: transform
      depends_on:
        - task_key: ingest
      notebook_task:
        notebook_path: ./notebooks/transform_customers
      new_cluster:
        autoscale:
          min_workers: ${var.min_workers}
          max_workers: ${var.max_workers}
        spark_version: 13.3.x-scala2.12
        node_type_id: i3.xlarge

    - task_key: aggregate
      depends_on:
        - task_key: transform
      notebook_task:
        notebook_path: ./notebooks/aggregate_metrics
      new_cluster:
        num_workers: ${var.min_workers}
        spark_version: 13.3.x-scala2.12
        node_type_id: i3.xlarge

targets: dev: variables: environment: dev catalog: dev_catalog min_workers: 2 max_workers: 4

prod: variables: environment: prod catalog: prod_catalog min_workers: 4 max_workers: 16

Pattern 2: Modular Configuration

Split Configuration Across Files:

databricks.yml

bundle: name: data-platform

include:

  • resources/jobs/*.yml
  • resources/pipelines/*.yml
  • resources/clusters/*.yml

resources/jobs/ingestion_jobs.yml

resources: jobs: ingest_customers: name: "[${bundle.environment}] Ingest Customers" tasks: - task_key: main notebook_task: notebook_path: ./notebooks/ingest_customers

ingest_orders:
  name: "[${bundle.environment}] Ingest Orders"
  tasks:
    - task_key: main
      notebook_task:
        notebook_path: ./notebooks/ingest_orders

resources/pipelines/dlt_pipelines.yml

resources: pipelines: customer_pipeline: name: "[${bundle.environment}] Customer DLT Pipeline" target: ${var.catalog}.customer libraries: - notebook: path: ./pipelines/customer_dlt

order_pipeline:
  name: "[${bundle.environment}] Order DLT Pipeline"
  target: ${var.catalog}.orders
  libraries:
    - notebook:
        path: ./pipelines/order_dlt

Pattern 3: Python Deployment Script

Automated Deployment:

""" Automated bundle deployment script. """ import subprocess import sys from typing import Dict, Any

class BundleDeployer: """Deploy Databricks Asset Bundles."""

def __init__(self, bundle_path: str):
    self.bundle_path = bundle_path

def validate(self, target: str) -> bool:
    """Validate bundle configuration."""
    print(f"Validating bundle for target: {target}")

    result = subprocess.run(
        ["databricks", "bundle", "validate", "-t", target],
        cwd=self.bundle_path,
        capture_output=True,
        text=True
    )

    if result.returncode != 0:
        print(f"Validation failed: {result.stderr}")
        return False

    print("Validation successful")
    return True

def deploy(self, target: str, force: bool = False) -> bool:
    """Deploy bundle to target environment."""
    if not self.validate(target):
        return False

    print(f"Deploying bundle to {target}")

    cmd = ["databricks", "bundle", "deploy", "-t", target]
    if force:
        cmd.append("--force")

    result = subprocess.run(
        cmd,
        cwd=self.bundle_path,
        capture_output=True,
        text=True
    )

    if result.returncode != 0:
        print(f"Deployment failed: {result.stderr}")
        return False

    print(f"Deployment successful: {result.stdout}")
    return True

def run_job(self, target: str, job_key: str) -> bool:
    """Run a specific job from bundle."""
    print(f"Running job: {job_key} on {target}")

    result = subprocess.run(
        ["databricks", "bundle", "run", "-t", target, job_key],
        cwd=self.bundle_path,
        capture_output=True,
        text=True
    )

    if result.returncode != 0:
        print(f"Job run failed: {result.stderr}")
        return False

    print(f"Job started: {result.stdout}")
    return True

def destroy(self, target: str, auto_approve: bool = False) -> bool:
    """Destroy bundle resources."""
    print(f"WARNING: Destroying bundle resources in {target}")

    cmd = ["databricks", "bundle", "destroy", "-t", target]
    if auto_approve:
        cmd.append("--auto-approve")

    result = subprocess.run(
        cmd,
        cwd=self.bundle_path,
        capture_output=True,
        text=True
    )

    if result.returncode != 0:
        print(f"Destroy failed: {result.stderr}")
        return False

    print("Bundle resources destroyed")
    return True

Usage

if name == "main": deployer = BundleDeployer("./my-bundle")

# Deploy to development
if deployer.deploy("dev"):
    deployer.run_job("dev", "daily_pipeline")

# Deploy to production (requires approval)
if len(sys.argv) > 1 and sys.argv[1] == "--prod":
    deployer.deploy("prod")

Pattern 4: GitOps Integration

GitHub Actions Workflow:

.github/workflows/bundle-deploy.yml

name: Deploy Databricks Bundle

on: push: branches: [main, develop] pull_request: branches: [main] workflow_dispatch: inputs: environment: description: 'Target environment' required: true type: choice options: - dev - staging - prod

jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3

  - name: Install Databricks CLI
    run: |
      curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

  - name: Validate Bundle
    env:
      DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
      DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
    run: |
      cd bundle/
      databricks bundle validate -t dev

deploy-dev: needs: validate if: github.ref == 'refs/heads/develop' runs-on: ubuntu-latest environment: development steps: - uses: actions/checkout@v3

  - name: Install Databricks CLI
    run: |
      curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

  - name: Deploy to Development
    env:
      DATABRICKS_HOST: ${{ secrets.DEV_DATABRICKS_HOST }}
      DATABRICKS_TOKEN: ${{ secrets.DEV_DATABRICKS_TOKEN }}
    run: |
      cd bundle/
      databricks bundle deploy -t dev

deploy-prod: needs: validate if: github.ref == 'refs/heads/main' runs-on: ubuntu-latest environment: production steps: - uses: actions/checkout@v3

  - name: Install Databricks CLI
    run: |
      curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

  - name: Deploy to Production
    env:
      DATABRICKS_HOST: ${{ secrets.PROD_DATABRICKS_HOST }}
      DATABRICKS_TOKEN: ${{ secrets.PROD_DATABRICKS_TOKEN }}
    run: |
      cd bundle/
      databricks bundle deploy -t prod

Best Practices

  1. Bundle Organization
  • Keep bundle files under version control

  • Use environment-specific overrides

  • Separate resources into logical files

  • Document variable purposes

  • Include README for bundle usage

  1. Environment Management

Use consistent naming

targets: dev: mode: development # Enables faster iterations staging: mode: production # Production-like behavior prod: mode: production # Full production settings

  1. Variable Usage

Define reusable variables

variables: project_name: description: "Project identifier" default: "customer-analytics"

Use variables consistently

resources: jobs: ${var.project_name}_job: name: "[${bundle.environment}] ${var.project_name}"

  1. Testing Strategy

Test bundle locally

databricks bundle validate -t dev

Deploy to dev for testing

databricks bundle deploy -t dev

Run integration tests

databricks bundle run -t dev test_job

Deploy to prod after validation

databricks bundle deploy -t prod

Common Pitfalls to Avoid

Don't:

  • Hard-code environment-specific values

  • Skip validation before deployment

  • Modify resources outside of bundles

  • Use development mode in production

  • Deploy without testing

Do:

  • Use variables for environment differences

  • Always validate before deploying

  • Manage all resources through bundles

  • Use production mode for prod

  • Test in lower environments first

Complete Examples

See /examples/ directory for:

  • complete_bundle_project/ : Full bundle structure

  • multi_workspace_deployment/ : Cross-workspace deployment

Related Skills

  • delta-live-tables : Deploy DLT pipelines

  • cicd-workflows : Automate deployments

  • testing-patterns : Test before deploy

  • data-products : Deploy data products

References

  • Databricks Asset Bundles Docs

  • Bundle Configuration Reference

  • CLI Reference

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

medallion-architecture

No summary provided by upstream source.

Repository SourceNeeds Review
General

delta-live-tables

No summary provided by upstream source.

Repository SourceNeeds Review
General

access-management

No summary provided by upstream source.

Repository SourceNeeds Review
General

data-products

No summary provided by upstream source.

Repository SourceNeeds Review