Data Toolkit
Complete data processing utilities for OpenClaw agents.
Features
Converters
- JSON ↔ CSV - Bidirectional conversion with schema inference
- JSON ↔ YAML - Clean formatting, comment preservation
- JSON ↔ XML - Configurable root elements and attributes
- CSV ↔ YAML - Direct conversion without intermediate steps
- Multi-format batch conversion - Process entire directories
Validators
- JSON Schema validation - Validate against JSON Schema specs
- CSV structure validation - Check headers, columns, data types
- Data type inference - Automatic type detection and validation
- Custom rules - Define business logic validations
Cleaners
- Duplicate removal - Smart deduplication with configurable keys
- Null/empty handling - Remove or replace null values
- Data normalization - Standardize formats (dates, numbers, strings)
- Whitespace cleanup - Trim, collapse multiple spaces
- Column operations - Remove, rename, reorder columns
Usage
Convert Data
# JSON to CSV
./src/convert.py --input data.json --output data.csv --format csv
# CSV to JSON
./src/convert.py --input data.csv --output data.json --format json
# JSON to YAML
./src/convert.py --input data.json --output data.yaml --format yaml
# XML to JSON
./src/convert.py --input data.xml --output data.json --format json
# Batch conversion
./src/convert.py --input-dir ./raw --output-dir ./processed --format json
Validate Data
# Validate against JSON schema
./src/validate.py --input data.json --schema schema.json
# Validate CSV structure
./src/validate.py --input data.csv --check-headers --check-types
# Custom validation rules
./src/validate.py --input data.json --rules validation-rules.yaml
Clean Data
# Remove duplicates
./src/clean.py --input data.json --dedupe --key id
# Handle nulls
./src/clean.py --input data.csv --remove-nulls
./src/clean.py --input data.csv --replace-nulls "N/A"
# Normalize data
./src/clean.py --input data.json --normalize dates,numbers,strings
# Full cleanup pipeline
./src/clean.py --input messy.csv --dedupe --remove-nulls --normalize all --output clean.csv
API Usage (Python)
from data_toolkit import convert, validate, clean
# Convert
convert.json_to_csv('input.json', 'output.csv')
convert.csv_to_yaml('input.csv', 'output.yaml')
# Validate
is_valid = validate.json_schema('data.json', 'schema.json')
errors = validate.csv_structure('data.csv')
# Clean
clean.remove_duplicates('data.json', key='id')
clean.normalize_dates('data.csv', format='ISO8601')
Examples
See examples/ directory for complete workflows:
examples/etl-pipeline.sh- Full ETL workflowexamples/api-data-processing.py- API response processingexamples/batch-conversion.sh- Bulk file conversion
Installation
Dependencies are minimal and common:
- Python 3.8+
- PyYAML
- pandas (optional, for advanced CSV operations)
pip install pyyaml pandas
Requirements
- Node.js (for JSON/YAML parsing)
- Python 3.8+
- 10MB disk space
License
MIT
Support
Issues: https://github.com/forge-agent/data-toolkit
Docs: See docs/ directory