Use this skill for work involving DataFusion, Arrow schemas, SQL planning/execution, and table providers.
Implementation guidance:
-
Use existing schema registry helpers for Arrow schema construction; memoize schemas where supported.
-
Implement or extend TableProvider with correct schema, statistics, and scan behavior.
-
Keep SQL handling in kalamdb-core/sql/executor and route through handler modules.
-
Use DataFusion’s logical plan for validation; avoid manual SQL parsing unless required.
-
Keep table/provider creation cheap; cache shared providers if appropriate.
-
Ensure column types map correctly to Arrow types and are consistent across writes and reads.
Best practices:
-
Respect DataFusion’s async execution model; avoid blocking IO in scan/exec paths.
-
Prefer predicate pushdown where the provider supports it.
-
Align system tables with kalamdb-commons models and constants.
Pitfalls:
-
Mismatched schema ordering or nullability between writer and provider.
-
Unbounded in-memory collection during scans.
-
Creating new providers per request when a shared instance is intended.