polars

Polars Fast DataFrame Library

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "polars" with this command: npx skills add eyadsibai/ltk/eyadsibai-ltk-polars

Polars Fast DataFrame Library

Lightning-fast DataFrame library with lazy evaluation and parallel execution.

When to Use

  • Pandas is too slow for your dataset

  • Working with 1-100GB datasets that fit in RAM

  • Need lazy evaluation for query optimization

  • Building ETL pipelines

  • Want parallel execution without extra config

Lazy vs Eager Evaluation

Mode Function Executes Use Case

Eager read_csv()

Immediately Small data, exploration

Lazy scan_csv()

On .collect()

Large data, pipelines

Key concept: Lazy mode builds a query plan that gets optimized before execution. The optimizer applies predicate pushdown (filter early) and projection pushdown (select columns early).

Core Operations

Data Selection

Operation Purpose

select()

Choose columns

filter()

Choose rows by condition

with_columns()

Add/modify columns

drop()

Remove columns

head(n) / tail(n)

First/last n rows

Aggregation

Operation Purpose

group_by().agg()

Group and aggregate

pivot()

Reshape wide

melt()

Reshape long

unique()

Distinct values

Joins

Join Type Description

inner Matching rows only

left All left + matching right

outer All rows from both

cross Cartesian product

semi Left rows with match

anti Left rows without match

Expression API

Key concept: Polars uses expressions (pl.col() ) instead of indexing. Expressions are lazily evaluated and optimized.

Common Expressions

Expression Purpose

pl.col("name")

Reference column

pl.lit(value)

Literal value

pl.all()

All columns

pl.exclude(...)

All except

Expression Methods

Category Methods

Aggregation .sum() , .mean() , .min() , .max() , .count()

String .str.contains() , .str.replace() , .str.to_lowercase()

DateTime .dt.year() , .dt.month() , .dt.day()

Conditional .when().then().otherwise()

Window .over() , .rolling_mean() , .shift()

Pandas Migration

Pandas Polars

df['col']

df.select('col')

df[df['col'] > 5]

df.filter(pl.col('col') > 5)

df['new'] = df['col'] * 2

df.with_columns((pl.col('col') * 2).alias('new'))

df.groupby('col').mean()

df.group_by('col').agg(pl.all().mean())

df.apply(func)

df.map_rows(func) (avoid if possible)

Key concept: Polars prefers explicit operations over implicit indexing. Use .alias() to name computed columns.

File I/O

Format Read Write Notes

CSV read_csv() / scan_csv()

write_csv()

Human readable

Parquet read_parquet() / scan_parquet()

write_parquet()

Fast, compressed

JSON read_json() / scan_ndjson()

write_json()

Newline-delimited

IPC/Arrow read_ipc() / scan_ipc()

write_ipc()

Zero-copy

Key concept: Use Parquet for performance. Use scan_* for large files to enable lazy optimization.

Performance Tips

Tip Why

Use lazy mode Query optimization

Use Parquet Column-oriented, compressed

Select columns early Projection pushdown

Filter early Predicate pushdown

Avoid Python UDFs Breaks parallelism

Use expressions Vectorized operations

Set dtypes on read Avoid inference overhead

vs Alternatives

Tool Best For Limitations

Polars 1-100GB, speed critical Must fit in RAM

Pandas Small data, ecosystem Slow, memory hungry

Dask Larger than RAM More complex API

Spark Cluster computing Infrastructure overhead

DuckDB SQL interface Different API style

Resources

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

document-processing

No summary provided by upstream source.

Repository SourceNeeds Review
General

stripe-payments

No summary provided by upstream source.

Repository SourceNeeds Review
General

file-organization

No summary provided by upstream source.

Repository SourceNeeds Review
General

literature-review

No summary provided by upstream source.

Repository SourceNeeds Review