update-dataset

Update Dataset (PR → snapshot → steps → grapher)

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "update-dataset" with this command: npx skills add owid/etl/owid-etl-update-dataset

Update Dataset (PR → snapshot → steps → grapher)

Use this skill to run a complete dataset update with Claude Code subagents, keep a live progress checklist, and pause for approval at a checkpoint after every numbered workflow step before continuing.

Inputs

  • <namespace>/<old_version>/<name>

  • Get <new_version> as today's date by running date -u +"%Y-%m-%d"

Optional trailing args:

  • branch: The working branch name (defaults to current branch)

Assumptions:

  • All artifacts are written to workbench/<short_name>/ .

  • Persist progress to workbench/<short_name>/progress.md and update it after each step.

Progress checklist (maintain, tick live, and persist to progress.md)

(Checkpoint rule: After you finish each item below that represents a workflow step, immediately run the CHECKPOINT procedure. Do not batch multiple steps before a checkpoint.)

  • Parse inputs and resolve: channel, namespace, version, short_name, old_version, branch

  • Clean workbench directory: delete workbench/<short_name> unless continuing existing update

  • Run ETL update workflow via etl-update subagent (help → dry run → approval → real run)

  • Create or reuse draft PR and work branch

  • Update snapshot and compare to previous version; capture summary

  • Meadow step: run + fix + diff + summarize

  • Garden step: run + fix + diff + summarize

  • Grapher step: run + verify (skip diffs), or explicitly mark N/A

  • CHECKPOINT — present consolidated summary and request approval

  • If approved, commit, push, and update PR description

  • Optional: run indicator upgrade on staging and persist report

  • Draft Slack announcement and notify user to post it to #data-updates-comms

Persistence:

  • After ticking each item, update workbench/<short_name>/progress.md with the current checklist state and a timestamp.

CHECKPOINT (mandatory user approval)

Always performed immediately after completing each numbered workflow step (1–6). Never start the next step until approval is granted.

Procedure (each time):

  • Present a concise summary of what just changed, key diffs/issues resolved, and what the next step will do.

  • Ask exactly: Proceed? reply: yes/no

  • Only continue if the user replies exactly yes (case-insensitive). Any other reply = no; stop and wait.

  • On approval:

  • Update progress checklist (tick the completed item) and write workbench/<short_name>/progress.md with timestamp.

  • Commit related changes (if any), push.

  • Update (or append to) the PR description: add a collapsed section titled with the step name (e.g., "Snapshot Update", "Meadow Update") containing the summary.

Mandatory per-step checkpoints (rule)

You MUST:

  • Stop after each workflow step (1–6) and run CHECKPOINT before starting the next (step 7 is optional and still requires a checkpoint if executed).

  • Never chain multiple steps inside a single approval.

  • Treat missing or ambiguous replies as no.

Workflow orchestration

Initial setup

  • Check if workbench/<short_name>/progress.md exists to determine if continuing existing update

  • If starting fresh: delete workbench/<short_name> directory if it exists

  • Create fresh workbench/<short_name> directory for artifacts

Run ETL update command (etl-update subagent)

  • Inputs: <namespace>/<old_version>/<short_name> plus any required flags

  • CRITICAL: Run etl update ONCE for the full step URI (e.g., data://garden/namespace/old_version/short_name ). Do NOT run it separately per channel (snapshot, meadow, garden, grapher). Running it once ensures all cross-step DAG dependencies are updated together. Running it per-channel leaves stale version references in dag/main.yml (e.g., garden pointing to old meadow version).

  • Perform help check, dry run, approval, then real execution; capture summary for later PR notes

  • After running, always verify dag/main.yml : grep for the old version and confirm all internal references between the new steps point to the new version (e.g., garden depends on new meadow, not old meadow).

  • CHECKPOINT (stop → summarize → ask → require yes)

Create PR and integrate update via subagent (etl-pr)

  • Inputs: <namespace>/<old_version>/<short_name>

  • Create or reuse draft PR, set up work branch, and incorporate the ETL update outputs

  • CHECKPOINT

Snapshot run & compare (snapshot-runner subagent)

  • Inputs: <namespace>/<new_version>/<short_name> and <old_version>

  • CHECKPOINT

Meadow step repair/verify (step-fixer subagent, channel=meadow)

  • Run, fix, re-run; produce diffs

  • Save diffs and summaries

  • CHECKPOINT

Garden step repair/verify (step-fixer subagent, channel=garden)

  • Run, fix, re-run; produce diffs

  • Save diffs and summaries

  • CHECKPOINT

Grapher step run/verify (step-fixer subagent, channel=grapher, add --grapher)

  • Skip diff

  • CHECKPOINT

Indicator upgrade (optional, staging only)

  • Use indicator-upgrader subagent with <short_name> <branch>

  • CRITICAL: After the upgrader finishes, always verify it actually worked by querying staging: make query SQL="SELECT COUNT(*) FROM chart_dimensions cd JOIN variables v ON cd.variableId = v.id WHERE v.catalogPath LIKE '%<namespace>/<new_version>%'" . If the count is 0, the upgrade did not run — re-run it.

  • CHECKPOINT (if executed)

Slack announcement

  • Fill out the template at .claude/skills/update-dataset/slack-announcement-template.md using facts gathered during the update (coverage, chart count, key changes, etc.)

  • Ask user if unsure about any details

  • Save the draft to workbench/<short_name>/slack-announcement.md

  • Tell the user: "Slack announcement drafted at workbench/<short_name>/slack-announcement.md . Please review and post it to #data-updates-comms."

Guardrails and tips

  • DAG consistency: After etl update , always verify that all new steps in dag/main.yml reference each other with the new version. A common bug is garden depending on old meadow or old snapshot — this silently loads stale data.

  • Never return empty tables or comment out logic as a workaround — fix the parsing/transformations instead.

  • Column name changes: update garden processing code and metadata YAMLs (garden/grapher) to match schema changes.

  • Indexing: avoid leaking index columns from reset_index() ; format tables with tb.format(["country", "year"]) as appropriate.

  • Metadata validation errors are guidance — update YAML to add/remove variables as indicated.

Artifacts (expected)

  • workbench/<short_name>/snapshot-runner.md

  • workbench/<short_name>/progress.md

  • workbench/<short_name>/meadow_diff_raw.txt and meadow_diff.md

  • workbench/<short_name>/garden_diff_raw.txt and garden_diff.md

  • workbench/<short_name>/indicator_upgrade.json (if indicator-upgrader was used)

Example usage

  • Minimal catalog URI with explicit old version:

  • update-dataset data://snapshot/irena/2024-11-15/renewable_power_generation_costs 2023-11-15 update-irena-costs

Common issues when data structure changes

  • SILENT FAILURES WARNING: Never return empty tables or comment code as workarounds!

  • Column name changes: If columns are renamed/split (e.g., single cost → local currency + PPP), update:

  • Python code references in the garden step

  • Garden metadata YAML (e.g., food_prices_for_nutrition.meta.yml )

  • Grapher metadata YAML (if it exists)

  • Index issues: Check for unwanted index columns from reset_index() — ensure proper indexing with tb.format(["country", "year"]) .

  • Metadata validation: Use error messages as a guide — they show exactly which variables to add/remove from YAML files.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

streamlit-app

No summary provided by upstream source.

Repository SourceNeeds Review
-43
owid
General

check-metadata-typos

No summary provided by upstream source.

Repository SourceNeeds Review
-41
owid
General

chart-editing

No summary provided by upstream source.

Repository SourceNeeds Review
-21
owid