databricks-migration-deep-dive

Databricks Migration Deep Dive

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "databricks-migration-deep-dive" with this command: npx skills add jeremylongshore/claude-code-plugins-plus-skills/jeremylongshore-claude-code-plugins-plus-skills-databricks-migration-deep-dive

Databricks Migration Deep Dive

Contents

  • Overview

  • Prerequisites

  • Instructions

  • Output

  • Error Handling

  • Examples

  • Resources

Overview

Comprehensive migration strategies for moving to Databricks from Hadoop, Snowflake, Redshift, Synapse, or legacy data warehouses.

Prerequisites

  • Access to source and target systems

  • Understanding of current data architecture

  • Migration timeline and stakeholder alignment

Migration Patterns

Source Pattern Complexity Timeline

On-prem Hadoop Lift-and-shift + modernize High 6-12 months

Snowflake Parallel run + cutover Medium 3-6 months

AWS Redshift ETL rewrite + data copy Medium 3-6 months

Legacy DW (Oracle/Teradata) Full rebuild High 12-18 months

Instructions

Step 1: Discovery and Assessment

Inventory all source tables with metadata (size, partitions, dependencies, data classification). Generate prioritized migration plan with wave assignments.

Step 2: Schema Migration

Convert source schemas to Delta Lake compatible types. Handle type conversions (char->string, tinyint->int). Enable auto-optimize on target tables.

Step 3: Data Migration

Batch large tables by partition. Validate row counts and schema match after each table migration.

Step 4: ETL/Pipeline Migration

Convert spark-submit/Oozie jobs to Databricks jobs. Update paths, remove Hive metastore references, adapt for Unity Catalog.

Step 5: Cutover Planning

Execute 6-step cutover: validate -> disable source -> final sync -> enable Databricks -> update apps -> monitor. Each step has rollback procedure.

See detailed implementation for assessment scripts, schema conversion, data migration with batching, ETL conversion, and cutover plan generation.

Output

  • Migration assessment with prioritized plan

  • Schema migration automated

  • Data migration pipeline with validation

  • Cutover plan with rollback procedures

Error Handling

Error Cause Solution

Schema incompatibility Unsupported types Use type conversion mappings

Data loss Truncation during migration Validate counts at each step

Performance issues Large tables Use partitioned migration

Dependency conflicts Wrong migration order Analyze dependencies first

Examples

Quick Validation

SELECT 'source' as system, COUNT() FROM hive_metastore.db.table UNION ALL SELECT 'target' as system, COUNT() FROM migrated.db.table;

Resources

  • Databricks Migration Guide

  • Delta Lake Migration

  • Unity Catalog Migration

Completion

Provides coverage for Databricks platform migrations.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

backtesting-trading-strategies

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

svg-icon-generator

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

performance-lighthouse-runner

No summary provided by upstream source.

Repository SourceNeeds Review