lakehouse-pipeline-design

Lakehouse pipeline design (Databricks)

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "lakehouse-pipeline-design" with this command: npx skills add hubert-dudek/medium/hubert-dudek-medium-lakehouse-pipeline-design

Lakehouse pipeline design (Databricks)

Use this skill when someone asks for a pipeline design, DLT design, ETL plan, CDC ingestion, or a review of an existing pipeline.

Deliverables

When activated, produce at least:

  • A filled design doc based on assets/pipeline-design-doc.md

  • A short, actionable implementation checklist (you can reuse references/pipeline-checklist.md )

Optionally (only if asked): a code skeleton (PySpark / SQL / DLT) that matches the design.

Minimal inputs (ask only what’s missing)

Ask up to 3 questions total. Prefer defaults.

  • Source type: files / DB / API / Kafka / etc.

  • Mode: batch / streaming / CDC

  • Target: tables (catalog.schema.*) and consumers (dashboards, ML, downstream jobs)

  • Volume + SLA: rows/day, latency/freshness SLO, cost constraints

  • Governance: PII? UC catalogs/schemas, access groups

Design guidance (what to include)

  • Architecture: bronze → silver → gold; DLT vs Jobs; where to enforce quality

  • Incremental strategy: watermarking, MERGE for CDC, idempotency

  • Delta table design: partitioning, ZORDER, OPTIMIZE/VACUUM policy

  • Quality checks: schema validation, null/unique, freshness, anomaly checks

  • Observability: metrics, logs, expectations failures, alerts, runbooks

  • Backfills: replay strategy, how to reprocess safely, versioning

  • Security: UC permissions, row/column filtering if needed, secrets management

  • Operational: retries, SLAs, escalation, deployment strategy

Output rules

  • Put concrete decisions in a “Decisions” section and unknowns in “Open questions”.

  • If details are missing, keep placeholders like {{...}} and add an “Info needed” section.

  • Keep the doc concise; link to references/pipeline-checklist.md when you need long checklists.

Examples

User: “Design a DLT pipeline that ingests Salesforce accounts daily and publishes a gold table for dashboards.”

Output: Design doc + checklist + optional DLT skeleton.

User: “Review our existing silver-to-gold job for performance and reliability.”

Output: Review-style design doc: risks, improvements, and prioritized actions.

Edge cases

  • Streaming sources: include checkpointing, schema evolution handling, and late data policy.

  • Regulated data: include classification, retention, and UC policy controls.

  • Multi-tenant tables: call out tenant key, partitioning, and access controls.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

sql-performance-review

No summary provided by upstream source.

Repository SourceNeeds Review
General

team-templates

No summary provided by upstream source.

Repository SourceNeeds Review
General

image-gen

Generate AI images from text prompts. Triggers on: "生成图片", "画一张", "AI图", "generate image", "配图", "create picture", "draw", "visualize", "generate an image".

Archived SourceRecently Updated
General

explainer

Create explainer videos with narration and AI-generated visuals. Triggers on: "解说视频", "explainer video", "explain this as a video", "tutorial video", "introduce X (video)", "解释一下XX(视频形式)".

Archived SourceRecently Updated