SQLite for Structured Data
Decision Check
Before writing any data analysis code, evaluate:
-
Will the data be queried more than once? -> Use SQLite
-
Are GROUP BY, COUNT, AVG, or JOIN operations needed? -> Use SQLite
-
Is custom Python/jq parsing code about to be written? -> Use SQLite instead
-
Is the dataset >100 records? -> Use SQLite
If the answer to any question above is YES, use SQLite. Do not write custom parsing code.
Custom code for every query:
cat data.json | jq '.[] | select(.status=="failed")' | jq -r '.error_type' | sort | uniq -c
SQL does the work:
sqlite3 data.db "SELECT error_type, COUNT(*) FROM errors WHERE status='failed' GROUP BY error_type"
Core Principle
SQLite is just a file -- no server, no setup, zero dependencies. Apply it when custom parsing code would otherwise be written or data would be re-processed for each query.
When to Use SQLite
Apply when ANY of these conditions hold:
-
100 records -- JSON/grep becomes unwieldy
-
Multiple aggregations -- GROUP BY, COUNT, AVG needed
-
Multiple queries -- Follow-up questions about the same data are expected
-
Correlation needed -- Joining data from multiple sources
-
State tracking -- Queryable progress/status over time is needed
When NOT to Use SQLite
Skip SQLite when ALL of these are true:
-
<50 records total
-
Single simple query
-
No aggregations needed
-
No follow-up questions expected
For tiny datasets with simple access, JSON/grep is fine.
Red Flags -- Use SQLite Instead
STOP and use SQLite when about to:
-
Write Python/Node code to parse JSON/CSV for analysis
-
Run the same jq/grep command with slight variations
-
Write custom aggregation logic (COUNT, AVG, GROUP BY in code)
-
Manually correlate data by timestamps or IDs
-
Create temp files to store intermediate results
-
Process the same data multiple times for different questions
All of these mean: Load into SQLite once, query with SQL.
The Threshold
Scenario Tool Why
50 test results, one-time summary Python/jq Fast, appropriate
200+ test results, find flaky tests SQLite GROUP BY simpler than code
3 log files, correlate by time SQLite JOIN simpler than manual grep
Track 1000+ file processing state SQLite Queries beat JSON parsing
Rule of thumb: If parsing code is being written or data is being re-processed, use SQLite instead.
Available Tools
sqlite3 (always available)
sqlite3 ~/.claude-logs/project.db
sqlite-utils (optional, simplifies import)
Check availability before use:
command -v sqlite-utils >/dev/null 2>&1 && echo "available" || echo "not installed"
If sqlite-utils is available:
sqlite-utils insert data.db table_name data.json --pk=id sqlite-utils query data.db "SELECT * FROM table"
If sqlite-utils is NOT available, fall back to sqlite3 with manual import:
sqlite3 data.db <<EOF CREATE TABLE IF NOT EXISTS results (status TEXT, error_message TEXT); .mode json .import data.json results EOF
Alternatively, install sqlite-utils: uv tool install sqlite-utils
Quick Start
Store Location
~/.claude-logs/<project-name>.db # Persists across sessions
Basic Workflow
1. Connect
PROJECT=$(basename $(git rev-parse --show-toplevel 2>/dev/null || pwd)) sqlite3 ~/.claude-logs/$PROJECT.db
2. Create table (first time)
CREATE TABLE results ( id INTEGER PRIMARY KEY, name TEXT, status TEXT, duration_ms INTEGER );
3. Load data
INSERT INTO results (name, status, duration_ms) SELECT json_extract(value, '$.name'), json_extract(value, '$.status'), json_extract(value, '$.duration_ms') FROM json_each(readfile('data.json'));
4. Query (SQL does the work)
SELECT status, COUNT(*), AVG(duration_ms) FROM results GROUP BY status;
Key Mindset Shift
From: "Process data once and done" To: "Make data queryable"
Benefits:
-
Follow-up questions require no re-processing -- data is already loaded
-
Different analyses run instantly against the same dataset
-
State persists across sessions
-
SQL handles complexity -- less custom code needed
Additional Resources
- ./references/patterns.md -- Python vs SQL side-by-side comparison and real-world examples (test analysis, error correlation, file processing state)