data-skill

专门处理日常办公场景下的高频、复杂数据分析与处理的助手。使用本地代码执行模式(SQL 或 Python + SQLite)来处理数据导入、清洗、查询、提取、合并拆分及报告生成,支持大数据量且保障数据隐私安全。当用户需要处理 Excel/CSV 文件、跨表查询、生成图表或输出数据分析报告时使用此 Skill。

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "data-skill" with this command: npx skills add lgwanai/data-skill

Data Analysis Assistant Workflow

This skill transforms the agent into a powerful local data analysis assistant, strictly adhering to a Local Code Execution paradigm.

Core Architecture & Principles

  1. Local Execution First: NEVER read large datasets directly into the context window. Always generate Python scripts or SQL commands and execute them locally using RunCommand.
  2. SQLite as the Engine: All CSV/Excel files should be imported into a local SQLite database (default: workspace.db). Rely on SQL for robust data manipulation (filtering, joining, grouping).
  3. Non-Destructive Operations (Undo Mechanism): Do not overwrite original tables. When modifying data, create a new table (e.g., CREATE TABLE table_v2 AS SELECT ...) or a View. This guarantees the user can always say "undo the last step".
  4. Data Privacy: Keep data local. Only send aggregated statistics or schema info into the context window.

Scenarios & Procedures

Scenario 1: Data Import & Auto-Cleaning

Trigger: User uploads or specifies a CSV/Excel/WPS(.et)/Numbers file. Action:

  1. Run the built-in importer script (supports .csv, .xlsx, .xls, .et, .numbers):
    python scripts/data_importer.py "path/to/file.xlsx" --db workspace.db
    
    Note: This script calculates the MD5 hash of the file. If an identical file was already imported, it skips the import and returns the existing table name. It also automatically handles merged cells, detects the real header row, chunks large CSVs, and sanitizes column names for SQLite.
  2. Once imported, run a quick check to understand the schema and data:
    sqlite3 workspace.db "PRAGMA table_info(table_name);"
    sqlite3 workspace.db "SELECT * FROM table_name LIMIT 3;" -header -column
    
  3. Ask the user if they want to perform standard cleaning (e.g., handling missing values, deduplication). Execute these via SQL.

Scenario 2: Continuous Queries & Manipulation

Trigger: User asks to filter, sort, aggregate, or add columns. Action:

  1. Formulate the SQL query.
  2. Execute it via RunCommand: sqlite3 workspace.db "SELECT ..."
  3. For structural changes, remember the Undo principle: CREATE TABLE table_name_step2 AS SELECT ...

Scenario 3: Semantic Extraction & Fuzzy Join

Trigger: User wants to split addresses, do sentiment analysis, or join tables with mismatched keys (e.g., "Beijing Branch" vs "Beijing Office"). Action:

  1. Generate a Python script using pandas and sqlite3.
  2. For Fuzzy Joins, use libraries like thefuzz or difflib in the Python script to match keys, then write the mapping back to SQLite.
  3. For Semantic extraction, use regex or heuristic rules in Python. If LLM analysis is strictly required, write a script that processes the column locally or prompts the user for permission to send a sample.

Scenario 4: Chart Generation

Trigger: User requests a visualization (bar, pie, line, scatter, map, funnel, 3D charts, etc.). Action:

  1. Do NOT write custom Python scripts from scratch.
  2. We have a powerful template-based rendering engine. Use the built-in scripts/chart_generator.py script.
  3. First, identify the required chart type. Look into references/prompts/ directory to find the corresponding Prompt skeleton for the exact chart type (e.g., references/prompts/line/stacked_area.md). Read the prompt to understand the data structure requirements.
  4. Formulate the SQL query that aggregates the data correctly according to the prompt's requirements.
  5. Generate the custom_js and echarts_option based on the prompt template.
  6. Construct a JSON configuration file (save it in outputs/configs/) matching this structure:
    {
        "db_path": "workspace.db",
        "query": "SELECT category, SUM(value) as val FROM table GROUP BY category",
        "title": "Chart Title",
        "output_path": "/Users/wuliang/workspace/data-skill/outputs/html/output_chart.html",
        "echarts_option": { ... }, // Generated option from prompt
        "custom_js": "..." // Optional JS logic for complex data binding
    }
    
    Note: For map charts requiring coordinates, use the built-in Geocoding capabilities or ECharts native geo coordinate systems. Output files MUST be stored in the isolated outputs/html/ directory.
  7. Execute the command:
    python scripts/chart_generator.py --config outputs/configs/your_config.json
    
  8. The script will automatically start a local HTTP server and return an access URL. Provide this URL to the user to view the interactive chart.

Scenario 5: File Merging & Splitting

Trigger: User needs to combine multiple identical reports or split a master sheet by department. Action:

  • Merge: Iterate over the files and run data_importer.py pointing to the same table name (the script appends automatically if the table exists, or write a custom Python script).
  • Split: Generate a Python script that reads the master table from SQLite and exports it into multiple Excel files using pandas.DataFrame.to_excel() inside a loop.

Scenario 6: Export & Reporting

Trigger: User wants to download the final result or generate a summary report. Action:

  1. Export CSV/Excel: Use the built-in exporter script to dump a table or query result to .csv or .xlsx:
    # Export an entire table
    python scripts/data_exporter.py "outputs/final_result.csv" --table "final_table"
    
    # Export a specific query
    python scripts/data_exporter.py "outputs/final_result.xlsx" --query "SELECT category, SUM(value) FROM sales GROUP BY category"
    
  2. Report Generation: Write a Markdown file summarizing the analysis steps, key metrics (retrieved via SQL), and referencing any generated charts. Provide the user with the path to the report.

Scenario 7: Data Cleanup

Trigger: Routine maintenance or user request to clean up old data. Action:

  1. Run the cleaner script to remove tables and metadata not accessed in the last 30 days:
    python scripts/data_cleaner.py --db workspace.db --days 30
    

Scenario 8: Metrics Management

Trigger: User describes or defines a specific metric calculation logic or business definition (口径). Action:

  1. When the user provides a metric definition, save it to the local markdown file references/metrics.md to build up context for future SQL generation.
  2. Use the built-in script scripts/metrics_manager.py to append the metric:
    python scripts/metrics_manager.py --name "Metric Name" --desc "Metric calculation logic or business description"
    
  3. When generating SQL queries later, ALWAYS read references/metrics.md to ensure the generated SQL aligns with the saved business definitions.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

run.dev — Local Dev Environment Manager

Local dev environment manager. Process management, automatic HTTPS domains, SSL certificates, reverse proxy, and AI crash diagnosis — single binary, zero con...

Registry SourceRecently Updated
Coding

ifly-image-understanding

iFlytek Image Understanding (图片理解) — analyze and answer questions about images using Spark Vision model. WebSocket API, pure Python stdlib, no pip dependencies.

Registry SourceRecently Updated
Coding

Civic Google

Use gog (Google CLI) without manual OAuth setup — Civic handles token management automatically

Registry SourceRecently Updated
2000Profile unavailable
Coding

Agent Browser.Skip

A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured co...

Registry SourceRecently Updated
2000Profile unavailable