lakebase-setup

Lakebase Setup for Agent Persistence

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "lakebase-setup" with this command: npx skills add databricks/app-templates/databricks-app-templates-lakebase-setup

Lakebase Setup for Agent Persistence

Profile reminder: All databricks CLI commands must include the profile from .env : databricks <command> --profile <profile> or DATABRICKS_CONFIG_PROFILE=<profile> databricks <command>

Autoscaling Lakebase? If the user mentions "autoscaling", "project", or "branch" in the context of Lakebase, they are using an autoscaling Lakebase instance (not provisioned). This skill covers provisioned instances only. For autoscaling, see .claude/skills/add-tools/examples/lakebase-autoscaling.md instead — it uses LAKEBASE_AUTOSCALING_PROJECT and LAKEBASE_AUTOSCALING_BRANCH env vars, deploys the app first, then adds the postgres resource via API for permissions and grants table access.

Overview

Lakebase provides persistent PostgreSQL storage for agents:

  • Short-term memory (LangGraph): Conversation history within a thread (AsyncCheckpointSaver )

  • Long-term memory (LangGraph): User facts across sessions (AsyncDatabricksStore )

  • Long-running agent persistence (OpenAI SDK): Background task state via custom SQLAlchemy tables (agent_server schema)

Note: For pre-configured memory templates, see:

  • agent-langgraph-short-term-memory

  • Conversation history within a session

  • agent-langgraph-long-term-memory

  • User facts that persist across sessions

  • agent-openai-agents-sdk-long-running-agent

  • Background tasks with Lakebase persistence

Complete Setup Workflow

┌─────────────────────────────────────────────────────────────────────────────┐ │ 1. Add dependency → 2. Get instance → 3. Configure DAB │ │ 4. Configure .env → 5. Initialize tables → 6. Deploy + Run │ └─────────────────────────────────────────────────────────────────────────────┘

Step 1: Add Memory Dependency

Add the memory extra to your pyproject.toml :

dependencies = [ "databricks-langchain[memory]", # ... other dependencies ]

Then sync dependencies:

uv sync

Step 2: Create or Get Lakebase Instance

Option A: Create New Instance (via Databricks UI)

  • Go to your Databricks workspace

  • Navigate to Compute → Lakebase

  • Click Create Instance

  • Note the instance name

Option B: Use Existing Instance

If you have an existing instance, note its name for the next step.

Step 3: Configure databricks.yml (Lakebase Resource)

Add the Lakebase database resource to your app in databricks.yml :

resources: apps: agent_langgraph: name: "your-app-name" source_code_path: ./

  resources:
    # ... other resources (experiment, UC functions, etc.) ...

    # Lakebase instance for long-term memory
    - name: 'database'
      database:
        instance_name: '&#x3C;your-lakebase-instance-name>'
        database_name: 'databricks_postgres'
        permission: 'CAN_CONNECT_AND_CREATE'

Important:

  • The instance_name: '<your-lakebase-instance-name>' must match the actual Lakebase instance name

  • Using the database resource type automatically grants the app's service principal access to Lakebase

Add Environment Variables to databricks.yml config block

Add the Lakebase environment variables to the config.env section of your app in databricks.yml :

  config:
    command: ["uv", "run", "start-app"]
    env:
      # ... other env vars ...

      # Lakebase instance name - resolved from database resource at deploy time
      - name: LAKEBASE_INSTANCE_NAME
        value_from: "database"

      # Static values for embedding configuration
      - name: EMBEDDING_ENDPOINT
        value: "databricks-gte-large-en"
      - name: EMBEDDING_DIMS
        value: "1024"

Important:

  • The LAKEBASE_INSTANCE_NAME uses value_from: "database" which resolves from the database resource at deploy time

  • The database resource handles permissions; the config.env provides the instance name to your code

Step 4: Configure .env (Local Development)

For local development, add to .env :

Lakebase configuration for long-term memory

LAKEBASE_INSTANCE_NAME=<your-instance-name> EMBEDDING_ENDPOINT=databricks-gte-large-en EMBEDDING_DIMS=1024

Important: embedding_dims must match the embedding endpoint:

Endpoint Dimensions

databricks-gte-large-en

1024

databricks-bge-large-en

1024

Note: .env is only for local development. When deployed, the app gets LAKEBASE_INSTANCE_NAME from the value_from reference in the databricks.yml config block.

Step 5: Initialize Tables

Option A: LangGraph Memory Templates (public schema)

Before deploying, initialize the Lakebase tables. The AsyncDatabricksStore creates tables on first use, but you need to do this locally first:

DATABRICKS_CONFIG_PROFILE=<profile> uv run python -c "$(cat <<'EOF' import asyncio from databricks_langchain import AsyncDatabricksStore

async def setup(): async with AsyncDatabricksStore( instance_name="<your-instance-name>", embedding_endpoint="databricks-gte-large-en", embedding_dims=1024, ) as store: await store.setup() print("Tables created!")

asyncio.run(setup()) EOF )"

This creates these tables in the public schema:

  • store

  • Key-value storage for memories

  • store_vectors

  • Vector embeddings for semantic search

  • store_migrations

  • Schema migration tracking

  • vector_migrations

  • Vector schema migration tracking

Option B: Long-Running Agent Templates (agent_server schema)

The long-running agent uses SQLAlchemy with a custom agent_server schema. Tables are created automatically on app startup via CREATE SCHEMA IF NOT EXISTS agent_server and Base.metadata.create_all . No manual table initialization is needed.

Tables created in the agent_server schema:

  • responses

  • Response status tracking for background agent tasks

  • messages

  • Stream events and output items for responses

Step 6: Grant SP Permissions (CRITICAL for deployed apps)

After deploying, the app's service principal needs Postgres roles to access Lakebase tables. The DAB database resource with CAN_CONNECT_AND_CREATE grants basic connectivity, but you must also grant Postgres-level schema and table permissions.

Step 1: Get the app's service principal client ID:

DATABRICKS_CONFIG_PROFILE=<profile> databricks apps get <app-name> --output json | jq -r '.service_principal_client_id'

Step 2: Grant permissions using LakebaseClient :

DATABRICKS_CONFIG_PROFILE=<profile> uv run python -c " from databricks_ai_bridge.lakebase import LakebaseClient, SchemaPrivilege, TablePrivilege

client = LakebaseClient(instance_name='<your-instance-name>') sp_id = '<service-principal-client-id>' # UUID from step 1

Create role (must do first)

client.create_role(sp_id, 'SERVICE_PRINCIPAL')

Grant schema privileges

client.grant_schema( grantee=sp_id, schemas=['<schema-name>'], # 'public' for LangGraph, 'agent_server' for long-running agent privileges=[SchemaPrivilege.USAGE, SchemaPrivilege.CREATE], )

Grant table privileges

client.grant_table( grantee=sp_id, tables=['<schema>.<table1>', '<schema>.<table2>'], privileges=[TablePrivilege.SELECT, TablePrivilege.INSERT, TablePrivilege.UPDATE, TablePrivilege.DELETE], )

print('Done!') "

LangGraph Memory Templates

Grant on public schema:

client.grant_schema(grantee=sp_id, schemas=['public'], privileges=[SchemaPrivilege.USAGE, SchemaPrivilege.CREATE]) client.grant_table(grantee=sp_id, tables=['public.store', 'public.store_vectors'], privileges=[TablePrivilege.SELECT, TablePrivilege.INSERT, TablePrivilege.UPDATE, TablePrivilege.DELETE])

Long-Running Agent Templates

Grant on agent_server schema:

client.grant_schema(grantee=sp_id, schemas=['agent_server'], privileges=[SchemaPrivilege.USAGE, SchemaPrivilege.CREATE]) client.grant_table(grantee=sp_id, tables=['agent_server.responses', 'agent_server.messages'], privileges=[TablePrivilege.SELECT, TablePrivilege.INSERT, TablePrivilege.UPDATE, TablePrivilege.DELETE])

Step 7: Deploy and Run Your App

IMPORTANT: Always run both deploy AND run commands:

Deploy resources and upload files

DATABRICKS_CONFIG_PROFILE=<profile> databricks bundle deploy

Start/restart the app with new code (REQUIRED!)

DATABRICKS_CONFIG_PROFILE=<profile> databricks bundle run {{BUNDLE_NAME}}

Note: bundle deploy only uploads files and configures resources. bundle run is required to actually start the app with the new code.

Complete Example: databricks.yml with Lakebase

bundle: name: agent_langgraph

resources: experiments: agent_langgraph_experiment: name: /Users/${workspace.current_user.userName}/${bundle.name}-${bundle.target}

apps: agent_langgraph: name: "my-agent-app" description: "Agent with long-term memory" source_code_path: ./ config: command: ["uv", "run", "start-app"] env: - name: MLFLOW_TRACKING_URI value: "databricks" - name: MLFLOW_REGISTRY_URI value: "databricks-uc" - name: API_PROXY value: "http://localhost:8000/invocations" - name: CHAT_APP_PORT value: "3000" - name: CHAT_PROXY_TIMEOUT_SECONDS value: "300" # Reference experiment resource - name: MLFLOW_EXPERIMENT_ID value_from: "experiment" # Lakebase instance name (resolved from database resource) - name: LAKEBASE_INSTANCE_NAME value_from: "database" # Embedding configuration - name: EMBEDDING_ENDPOINT value: "databricks-gte-large-en" - name: EMBEDDING_DIMS value: "1024"

  resources:
    - name: 'experiment'
      experiment:
        experiment_id: "${resources.experiments.agent_langgraph_experiment.id}"
        permission: 'CAN_MANAGE'

    # Lakebase instance for long-term memory
    - name: 'database'
      database:
        instance_name: '&#x3C;your-lakebase-instance-name>'
        database_name: 'databricks_postgres'
        permission: 'CAN_CONNECT_AND_CREATE'

targets: dev: mode: development default: true

Troubleshooting

Issue Cause Solution

"embedding_dims is required when embedding_endpoint is specified" Missing embedding_dims parameter Add embedding_dims=1024 to AsyncDatabricksStore

"relation 'store' does not exist" Tables not initialized Run await store.setup() locally first (Step 5)

"Unable to resolve Lakebase instance 'None'" Missing env var in deployed app Add LAKEBASE_INSTANCE_NAME to databricks.yml config.env

"Unable to resolve Lakebase instance '...database.cloud.databricks.com'" Used value_from instead of value Use value: "<instance-name>" not value_from for Lakebase

"permission denied for table store" Missing grants Run uv run python scripts/grant_lakebase_permissions.py <sp-client-id> to grant permissions

"Failed to connect to Lakebase" Wrong instance name Verify instance name in databricks.yml and .env

Connection pool errors on exit Python cleanup race Ignore PythonFinalizationError

  • it's harmless

App not updated after deploy Forgot to run bundle Run databricks bundle run agent_langgraph after deploy

value_from not resolving Resource name mismatch Ensure value_from value matches name in databricks.yml resources

Granting Permissions

Memory templates include a scripts/grant_lakebase_permissions.py script that handles all permission grants.

Get the SP client ID:

databricks apps get <app-name> --output json | jq -r '.service_principal_client_id'

Provisioned:

uv run python scripts/grant_lakebase_permissions.py <sp-client-id> --instance-name <name>

Autoscaling:

uv run python scripts/grant_lakebase_permissions.py <sp-client-id> --project <project> --branch <branch>

The script reads defaults from .env and handles fresh branches gracefully (warns but doesn't fail if tables don't exist yet).

LakebaseClient API (for reference)

from databricks_ai_bridge.lakebase import LakebaseClient, SchemaPrivilege, TablePrivilege

Provisioned:

client = LakebaseClient(instance_name="...")

Autoscaling:

client = LakebaseClient(project="...", branch="...")

Create role (must do first)

client.create_role(identity_name, "SERVICE_PRINCIPAL")

Grant schema (note: schemas is a list, grantee not role)

client.grant_schema( grantee="...", schemas=["public"], privileges=[SchemaPrivilege.USAGE, SchemaPrivilege.CREATE], )

Grant tables (note: tables includes schema prefix)

client.grant_table( grantee="...", tables=["public.store"], privileges=[TablePrivilege.SELECT, TablePrivilege.INSERT, ...], )

Execute raw SQL

client.execute("SELECT * FROM pg_tables WHERE schemaname = 'public'")

Service Principal Identifiers

When granting permissions manually, note that Databricks apps have multiple identifiers:

Field Format Example

service_principal_id

Numeric ID 1234567890123456

service_principal_client_id

UUID a1b2c3d4-e5f6-7890-abcd-ef1234567890

service_principal_name

String name my-app-service-principal

Get all identifiers:

DATABRICKS_CONFIG_PROFILE=<profile> databricks apps get <app-name> --output json | jq '{ id: .service_principal_id, client_id: .service_principal_client_id, name: .service_principal_name }'

Which to use:

  • LakebaseClient.create_role()

  • Use service_principal_client_id (UUID) or service_principal_name

  • Raw SQL grants - Use service_principal_client_id (UUID)

Next Steps

  • Add memory to agent code: see agent-memory skill

  • Test locally: see run-locally skill

  • Deploy: see deploy skill

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

modify-agent

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

agent-memory

No summary provided by upstream source.

Repository SourceNeeds Review
General

deploy

No summary provided by upstream source.

Repository SourceNeeds Review