Smart Backup System with Skill Integration
Supporting files in this directory:
-
MANIFEST_BACKUPS.md -- MANIFEST-aware intelligent backups
-
FULL_PROJECT_BACKUPS.md -- Full project backups, selective inclusion/exclusion, path verification
-
ADVANCED_USAGE.md -- Custom scripts, multiple file backups, real-world examples
When to Use This Skill
Use this skill when:
-
Working on any project with files that change over time
-
Jupyter notebooks, data files (CSV/TSV), HackMD presentations, or mixed projects
-
Need intelligent cleanup before backup (clear outputs, remove debug code)
-
Want to track what changed when (data provenance)
-
Need professional backup workflow for collaboration or publication
-
Want context-aware backups that use other skills intelligently
The Problem
Long-running data enrichment projects risk:
-
Losing days of work from accidental overwrites
-
Unable to revert to previous data states
-
No documentation of what changed when
-
Running out of disk space from manual backups
-
Confusion about which version is current
Solution: Smart Two-Tier Backup System with Skill Integration
Core Features
-
Intelligent Detection - Automatically detects project type and files to backup
-
Skill Integration - Uses jupyter-notebook, hackmd, and other skills for pre-backup cleanup
-
Daily backups - Rolling 7-day window (auto-cleanup)
-
Milestone backups - Permanent, compressed (gzip ~80% reduction)
-
CHANGELOG - Automatic documentation of all changes
-
Session Integration - Prompts for backup when exiting Claude Code session
Smart Detection & Integration
The backup system automatically detects your project type and applies appropriate cleanup:
Jupyter Notebooks (uses jupyter-notebook skill):
-
Detects: *.ipynb files
-
Pre-backup cleanup: Clear all cell outputs, remove cells tagged 'debug' or 'remove', validate notebooks
HackMD/Presentations (uses hackmd skill):
-
Detects: *.md files with slideOptions: frontmatter
-
Pre-backup cleanup: Validate SVG elements, check slide separators, verify YAML frontmatter
Data Files (native handling):
-
Detects: *.csv , *.tsv , *.xlsx files
-
Pre-backup cleanup: Validate file integrity, check for corruption
Python Projects (uses managing-environments skill):
-
Detects: requirements.txt , environment.yml , venv/ , .venv/
-
Pre-backup cleanup: Remove .pyc , pycache , .pytest_cache , clean build artifacts
Mixed Projects: Detects all of the above and applies appropriate cleanup for each file type.
Directory Structure
For data-only projects:
project/ ├── your_data_file.csv # Main working file ├── backup_project.sh # Smart backup script └── backups/ ├── daily/ # Rolling 7-day backups ├── milestones/ # Permanent compressed backups ├── CHANGELOG.md # Auto-generated change log └── README.md # User documentation
For mixed projects (notebooks + data):
project/ ├── analysis.ipynb # Jupyter notebooks ├── data.csv # Data files ├── backup_project.sh # Smart backup script └── backups/ ├── daily/ # Rolling 7-day backups │ └── backup_2026-01-17/ │ ├── notebooks/ # Cleaned (no outputs) │ └── data/ ├── milestones/ # Permanent compressed backups ├── CHANGELOG.md └── README.md
Storage Efficiency
-
Daily backups: ~5.4 MB (7 days x 770KB)
-
Milestone backups: ~200KB each compressed (80% size reduction with gzip)
-
Total: <10 MB for complete project history
-
Auto-cleanup: Old daily backups delete after 7 days
Implementation
Quick Start with /backup Command
First time - Setup the backup system:
/backup
This will:
-
Detect your project type (notebooks, data files, presentations, etc.)
-
Set up appropriate backup scripts with smart cleanup
-
Create backup directory structure
-
Optionally configure automated backups
Daily usage - Create backups:
/backup # Daily backup with smart cleanup /backup milestone "desc" # Milestone backup /backup list # View all backups /backup restore DATE # Restore from backup
What Happens During Backup
Smart cleanup before backup:
-
Detects file types in your project
-
Applies skill-specific cleanup:
-
Notebooks: Clear outputs, remove debug cells
-
HackMD: Validate SVG, check formatting
-
Python: Remove .pyc , pycache
-
Data: Validate integrity
-
Creates organized backup with cleaned files
-
Updates CHANGELOG with what was backed up
Manual Script Usage (Alternative)
./backup_project.sh # Daily backup ./backup_project.sh milestone "description" # Milestone ./backup_project.sh list # List backups ./backup_project.sh restore 2026-01-23 # Restore
When to Create Milestones
-
After adding new data sources (GenomeScope, karyotypes, external APIs)
-
Before major data transformations or filtering
-
When completing analysis sections
-
Before submitting/publishing
-
Before sharing with collaborators
-
After recovering missing data
Key Features
Safety Features
-
Never overwrites without asking - Prompts before overwriting existing backups
-
Safety backup before restore - Creates backup of current state before any restore
-
Automatic cleanup - Old daily backups auto-delete (configurable)
-
Complete audit trail - CHANGELOG tracks everything
-
Milestone protection - Important versions preserved forever (compressed)
CHANGELOG Tracking
The CHANGELOG.md automatically documents:
-
Date of each backup
-
Type (daily vs milestone)
-
Description of changes (for milestones)
-
Major modifications made to data
Example CHANGELOG:
2026-01-23
- MILESTONE: Recovered VGP accessions (backup created)
- Added columns:
accession_recovered,accession_recovered_all - Recovered 5 VGP accessions from NCBI
- Added columns:
- Daily backup created at 2026-01-23 15:00:00
2026-01-22
- Enriched GenomeScope data for 21 species from AWS repository
- Added column:
genomescope_pathwith direct links to summary files
Using /backup Command
Setup mode (first run): /backup -- Detects project type, sets up scripts, creates directory structure.
Daily backup mode: /backup -- Quick daily backup.
Milestone mode: /backup milestone "description of changes" -- e.g., /backup milestone "added heterozygosity data"
List and restore:
/backup list # Show all available backups /backup restore 2026-01-23 # Restore from specific date
Configuration: Edit backup_project.sh to change retention days (default: 7), backup directory location, or custom cleanup rules.
Benefits for Data Analysis
-
Data Provenance: CHANGELOG documents every modification; clear audit trail for methods sections in papers
-
Confidence to Experiment: Easy rollback encourages trying different approaches safely
-
Professional Workflow: Matches publication standards; reviewers can verify data processing steps
-
Collaboration-Ready: Team members can understand data history and enrichment process
Session Integration with /safe-exit
When you end a Claude Code session with /safe-exit , the system automatically:
-
Detects if backup system exists in the current project
-
Prompts for backup if system is configured (daily, milestone, skip, or cancel)
-
Performs cleanup and backup if requested
-
Prompts for Obsidian session summary (if obsidian skill is available)
-
Exits session cleanly
This ensures you never forget to backup AND document your work at the end of your session!
Example Workflow
Monday Morning
/backup # Daily backup with smart cleanup
Work on notebooks and data enrichment all day
/backup milestone "added karyotype data for 50 new species"
End of session
/safe-exit
Prompted: daily backup -> backup complete -> session summary -> exit
Friday (oops, made a mistake!)
/backup list # Check available backups /backup restore 2026-01-23 # Restore from Wednesday
MANIFEST-Aware Backups
For projects with MANIFEST files, use intelligent backups that include only essential files. See MANIFEST_BACKUPS.md for the full pattern, script templates, inclusion/exclusion rules, and integration with the /backup command.
Full Project Backups
For projects where both code and data change, selective full-project backups capture the complete state without bloat. See FULL_PROJECT_BACKUPS.md for implementation patterns, backup strategy comparison, size benchmarks, and path verification guidance.
Advanced Usage
For custom backup script templates, handling multiple files, viewing compressed milestones, and real-world examples, see ADVANCED_USAGE.md.
Best Practices
-
Create daily backups at session start - Make it a habit
-
Milestone after every major change - Don't rely on memory
-
Use descriptive milestone names - "added genomescope" not "updates"
-
Check CHANGELOG before sharing - Verify data provenance is clear
-
List backups periodically - Ensure auto-cleanup is working
-
Test restore once - Verify you know how to recover
Troubleshooting
Backup script not found
ls -l backup_project.sh # Check if backup system is set up /backup # Set up if needed
Disk space running low
du -sh backups/ # Check backup sizes
Reduce retention: edit DAYS_TO_KEEP=3 in backup_table.sh
Manually clean old milestones if needed
CHANGELOG getting too large
tail -100 backups/CHANGELOG.md > backups/CHANGELOG_recent.md mv backups/CHANGELOG.md backups/CHANGELOG_archive.md mv backups/CHANGELOG_recent.md backups/CHANGELOG.md
Summary
-
Two-tier system: Daily rolling + permanent milestones
-
Storage efficient: Gzip compression (~80% reduction)
-
Auto-cleanup: 7-day rolling window for dailies
-
Complete audit trail: CHANGELOG tracks all changes
-
Safety first: Never overwrites without confirmation
-
Global installer: Use across all projects
-
Professional workflow: Publication-ready data provenance