managing-databases

Decision guidance for PostgreSQL, DuckDB, Parquet, and Neo4j in hybrid storage architectures.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "managing-databases" with this command: npx skills add rileyhilliard/claude-essentials/rileyhilliard-claude-essentials-managing-databases

Database Management

Decision guidance for PostgreSQL, DuckDB, Parquet, and Neo4j in hybrid storage architectures.

Contents

  • When to use which database

  • PostgreSQL quick reference

  • DuckDB quick reference

  • Parquet quick reference

  • PGVector quick reference

  • Neo4j quick reference

  • Cross-database conventions

  • Performance debugging checklist

When to use which database

Workload Use Why

Transactional (CRUD, users, sessions) PostgreSQL ACID, row-level locking, indexes

Analytical (aggregations, scans) DuckDB Columnar, vectorized, parallel

Data storage/interchange Parquet Compressed, columnar, portable

Metadata + relationships PostgreSQL Foreign keys, constraints

Ad-hoc exploration DuckDB Fast on Parquet, no ETL needed

Time-series with point lookups PostgreSQL + partitioning Partition pruning + indexes

Time-series analytics DuckDB on Parquet Scan performance

Vector similarity search PostgreSQL + PGVector HNSW/IVFFlat indexes, hybrid search

RAG / semantic search PostgreSQL + PGVector Embeddings + metadata in same DB

Graph traversals / relationships Neo4j Native graph, index-free adjacency

Pattern matching / fraud detection Neo4j Multi-hop traversal, path finding

Knowledge graphs / ontologies Neo4j Flexible schema, relationship-first

Hybrid pattern example:

  • PostgreSQL: transactional data, relationships, users (metadata)

  • DuckDB + Parquet: analytical content, aggregations, time-series

PostgreSQL quick reference

Use for: Metadata, relationships, OLTP workloads, anything needing ACID.

Key decisions:

  • Partition tables >100M rows or with retention requirements

  • Index columns in WHERE/JOIN clauses, not everything

  • Tune autovacuum for high-churn tables

See references/postgres-architecture.md for maintenance patterns. See references/postgres-querying.md for advanced query techniques.

DuckDB quick reference

Use for: Analytics, aggregations, Parquet queries, data exploration.

Key decisions:

  • Prefer Parquet files over CSV (10-100x faster)

  • Let DuckDB auto-parallelize; don't micro-optimize

  • For remote data, increase threads beyond CPU count

See references/duckdb-architecture.md for storage and parallelism. See references/duckdb-querying.md for DuckDB-specific SQL features.

Parquet quick reference

Use for: Storing analytical data, data interchange, columnar compression.

Key decisions:

  • Target 128MB-1GB file sizes

  • Partition by low-to-moderate cardinality columns (date, region)

  • Sort by columns used in filters for better pruning

See references/parquet-architecture.md for file design. See references/parquet-querying.md for query optimization.

PGVector quick reference

Use for: Similarity search, RAG applications, semantic search, recommendations.

Key decisions:

  • HNSW for low-latency, high-recall (default choice)

  • IVFFlat for memory-constrained or batch-updated data

  • Use iterative scan for filtered queries

  • Consider hybrid search (vector + keyword) for 8-15% accuracy boost

See references/pgvector-architecture.md for index configuration. See references/pgvector-querying.md for hybrid search and filtering.

Neo4j quick reference

Use for: Graph traversals, relationship-heavy queries, pattern matching, knowledge graphs.

Key decisions:

  • Model around your queries, not your source data

  • Promote properties to nodes when you need to traverse through shared values

  • Use specific relationship types to avoid supernode bottlenecks

  • Bound all variable-length paths ([1..5] , never [] )

  • Use parameters in Cypher for execution plan caching

See references/neo4j-architecture.md for data modeling, indexing, and maintenance. See references/neo4j-querying.md for Cypher optimization and anti-patterns.

Cross-database conventions

Naming

Convention Example Applies to

snake_case tables dataset_jobs

All

snake_case columns created_at

PG, DuckDB, Parquet

camelCase properties createdAt

Neo4j

PascalCase labels :UserAccount

Neo4j

Singular table names dataset not datasets

PostgreSQL

Plural for collections datasets/ directory Parquet files

Normalization decisions

Pattern When to normalize When to denormalize

Lookup tables PostgreSQL, changes frequently DuckDB/Parquet, static data

Repeated values PostgreSQL, storage matters Parquet, compression handles it

Joins at query time PostgreSQL, complex relationships Parquet, pre-join for analytics

Timestamps

  • Store as UTC always

  • PostgreSQL: TIMESTAMPTZ

  • Parquet: TIMESTAMP with isAdjustedToUTC=true

  • DuckDB: reads both correctly

Performance debugging checklist

PostgreSQL slow query

  • Run EXPLAIN (ANALYZE, BUFFERS) on the query

  • Check for sequential scans on large tables

  • Verify indexes exist on filter/join columns

  • Check pg_stat_user_tables for bloat (dead tuples)

  • Review work_mem if seeing disk sorts

DuckDB slow query

  • Check if reading CSV instead of Parquet

  • Verify not doing SELECT * on remote data

  • Check thread count matches workload

  • Look for unnecessary type conversions

Parquet slow reads

  • Verify predicate pushdown is working (check query plan)

  • Check file sizes (too small = overhead, too large = no parallelism)

  • Confirm data is sorted by filter columns

  • Look for high-cardinality partition keys (too many small files)

PGVector slow search

  • Verify index exists and is being used (EXPLAIN)

  • Check ef_search (HNSW) or probes (IVFFlat) settings

  • Enable iterative scan for filtered queries

  • Check if IVFFlat recall degraded (rebuild index if heavily updated)

  • Consider partial indexes for common filters

Neo4j slow query

  • Run PROFILE on the query, read operators bottom-up

  • Look for AllNodesScan or NodeByLabelScan (missing index)

  • Check for CartesianProduct (disconnected MATCH patterns)

  • Verify parameters are used instead of literals (plan caching)

  • Check for unbounded variable-length paths

  • Monitor page_cache.hit_ratio (below 98% = need more page cache memory)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

writer

No summary provided by upstream source.

Repository SourceNeeds Review
General

strategy-writer

No summary provided by upstream source.

Repository SourceNeeds Review
General

reading-logs

No summary provided by upstream source.

Repository SourceNeeds Review