databricks-jobs

Databricks Lakeflow Jobs

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "databricks-jobs" with this command: npx skills add databricks-solutions/ai-dev-kit/databricks-solutions-ai-dev-kit-databricks-jobs

Databricks Lakeflow Jobs

Overview

Databricks Jobs orchestrate data workflows with multi-task DAGs, flexible triggers, and comprehensive monitoring. Jobs support diverse task types and can be managed via Python SDK, CLI, or Asset Bundles.

Reference Files

Use Case Reference File

Configure task types (notebook, Python, SQL, dbt, etc.) task-types.md

Set up triggers and schedules triggers-schedules.md

Configure notifications and health monitoring notifications-monitoring.md

Complete working examples examples.md

Quick Start

Python SDK

from databricks.sdk import WorkspaceClient from databricks.sdk.service.jobs import Task, NotebookTask, Source

w = WorkspaceClient()

job = w.jobs.create( name="my-etl-job", tasks=[ Task( task_key="extract", notebook_task=NotebookTask( notebook_path="/Workspace/Users/user@example.com/extract", source=Source.WORKSPACE ) ) ] ) print(f"Created job: {job.job_id}")

CLI

databricks jobs create --json '{ "name": "my-etl-job", "tasks": [{ "task_key": "extract", "notebook_task": { "notebook_path": "/Workspace/Users/user@example.com/extract", "source": "WORKSPACE" } }] }'

Asset Bundles (DABs)

resources/jobs.yml

resources: jobs: my_etl_job: name: "[${bundle.target}] My ETL Job" tasks: - task_key: extract notebook_task: notebook_path: ../src/notebooks/extract.py

Core Concepts

Multi-Task Workflows

Jobs support DAG-based task dependencies:

tasks:

  • task_key: extract notebook_task: notebook_path: ../src/extract.py

  • task_key: transform depends_on:

    • task_key: extract notebook_task: notebook_path: ../src/transform.py
  • task_key: load depends_on:

    • task_key: transform run_if: ALL_SUCCESS # Only run if all dependencies succeed notebook_task: notebook_path: ../src/load.py

run_if conditions:

  • ALL_SUCCESS (default) - Run when all dependencies succeed

  • ALL_DONE

  • Run when all dependencies complete (success or failure)

  • AT_LEAST_ONE_SUCCESS

  • Run when at least one dependency succeeds

  • NONE_FAILED

  • Run when no dependencies failed

  • ALL_FAILED

  • Run when all dependencies failed

  • AT_LEAST_ONE_FAILED

  • Run when at least one dependency failed

Task Types Summary

Task Type Use Case Reference

notebook_task

Run notebooks task-types.md#notebook-task

spark_python_task

Run Python scripts task-types.md#spark-python-task

python_wheel_task

Run Python wheels task-types.md#python-wheel-task

sql_task

Run SQL queries/files task-types.md#sql-task

dbt_task

Run dbt projects task-types.md#dbt-task

pipeline_task

Trigger DLT/SDP pipelines task-types.md#pipeline-task

spark_jar_task

Run Spark JARs task-types.md#spark-jar-task

run_job_task

Trigger other jobs task-types.md#run-job-task

for_each_task

Loop over inputs task-types.md#for-each-task

Trigger Types Summary

Trigger Type Use Case Reference

schedule

Cron-based scheduling triggers-schedules.md#cron-schedule

trigger.periodic

Interval-based triggers-schedules.md#periodic-trigger

trigger.file_arrival

File arrival events triggers-schedules.md#file-arrival-trigger

trigger.table_update

Table change events triggers-schedules.md#table-update-trigger

continuous

Always-running jobs triggers-schedules.md#continuous-jobs

Compute Configuration

Job Clusters (Recommended)

Define reusable cluster configurations:

job_clusters:

  • job_cluster_key: shared_cluster new_cluster: spark_version: "15.4.x-scala2.12" node_type_id: "i3.xlarge" num_workers: 2 spark_conf: spark.speculation: "true"

tasks:

  • task_key: my_task job_cluster_key: shared_cluster notebook_task: notebook_path: ../src/notebook.py

Autoscaling Clusters

new_cluster: spark_version: "15.4.x-scala2.12" node_type_id: "i3.xlarge" autoscale: min_workers: 2 max_workers: 8

Existing Cluster

tasks:

  • task_key: my_task existing_cluster_id: "0123-456789-abcdef12" notebook_task: notebook_path: ../src/notebook.py

Serverless Compute

For notebook and Python tasks, omit cluster configuration to use serverless:

tasks:

  • task_key: serverless_task notebook_task: notebook_path: ../src/notebook.py

    No cluster config = serverless

Job Parameters

Define Parameters

parameters:

  • name: env default: "dev"
  • name: date default: "{{start_date}}" # Dynamic value reference

Access in Notebook

In notebook

dbutils.widgets.get("env") dbutils.widgets.get("date")

Pass to Tasks

tasks:

  • task_key: my_task notebook_task: notebook_path: ../src/notebook.py base_parameters: env: "{{job.parameters.env}}" custom_param: "value"

Common Operations

Python SDK Operations

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

List jobs

jobs = w.jobs.list()

Get job details

job = w.jobs.get(job_id=12345)

Run job now

run = w.jobs.run_now(job_id=12345)

Run with parameters

run = w.jobs.run_now( job_id=12345, job_parameters={"env": "prod", "date": "2024-01-15"} )

Cancel run

w.jobs.cancel_run(run_id=run.run_id)

Delete job

w.jobs.delete(job_id=12345)

CLI Operations

List jobs

databricks jobs list

Get job details

databricks jobs get 12345

Run job

databricks jobs run-now 12345

Run with parameters

databricks jobs run-now 12345 --job-params '{"env": "prod"}'

Cancel run

databricks jobs cancel-run 67890

Delete job

databricks jobs delete 12345

Asset Bundle Operations

Validate configuration

databricks bundle validate

Deploy job

databricks bundle deploy

Run job

databricks bundle run my_job_resource_key

Deploy to specific target

databricks bundle deploy -t prod

Destroy resources

databricks bundle destroy

Permissions (DABs)

resources: jobs: my_job: name: "My Job" permissions: - level: CAN_VIEW group_name: "data-analysts" - level: CAN_MANAGE_RUN group_name: "data-engineers" - level: CAN_MANAGE user_name: "admin@example.com"

Permission levels:

  • CAN_VIEW

  • View job and run history

  • CAN_MANAGE_RUN

  • View, trigger, and cancel runs

  • CAN_MANAGE

  • Full control including edit and delete

Common Issues

Issue Solution

Job cluster startup slow Use job clusters with job_cluster_key for reuse across tasks

Task dependencies not working Verify task_key references match exactly in depends_on

Schedule not triggering Check pause_status: UNPAUSED and valid timezone

File arrival not detecting Ensure path has proper permissions and uses cloud storage URL

Table update trigger missing events Verify Unity Catalog table and proper grants

Parameter not accessible Use dbutils.widgets.get() in notebooks

"admins" group error Cannot modify admins permissions on jobs

Serverless task fails Ensure task type supports serverless (notebook, Python)

Related Skills

  • databricks-asset-bundles - Deploy jobs via Databricks Asset Bundles

  • databricks-spark-declarative-pipelines - Configure pipelines triggered by jobs

Resources

  • Jobs API Reference

  • Jobs Documentation

  • DABs Job Task Types

  • Bundle Examples Repository

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

databricks-python-sdk

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

python-dev

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

skill-test

No summary provided by upstream source.

Repository SourceNeeds Review