obstore: High-Performance Rust-Based Storage

obstore (released 2025) provides a minimal, stateless API built on Rust's object_store crate, offering superior performance for concurrent operations (up to 9x faster than Python-based alternatives).

Installation

pip install obstore

Or with conda

conda install -c conda-forge obstore

Core Concepts

obstore uses top-level functions (not methods) and a functional API. All operations are functions like obs.get(store, path) , not store.get(path) .

Creating Stores

import obstore as obs from obstore.store import S3Store, GCSStore, AzureStore, LocalStore

S3 Store

s3 = S3Store( bucket="my-bucket", region="us-east-1", access_key_id="AKIA...", secret_access_key="...", # Or use environment credentials )

GCS Store

gcs = GCSStore( bucket="my-bucket", # Uses GOOGLE_APPLICATION_CREDENTIALS by default )

Azure Store

azure = AzureStore( container="my-container", account_name="myaccount", account_key="...", # Or use DefaultAzureCredential )

Local filesystem

local = LocalStore("/path/to/root")

From environment (picks up standard env vars)

s3 = S3Store.from_env(bucket="my-bucket") gcs = GCSStore.from_env(bucket="my-bucket")

Basic Operations

import obstore as obs

store = S3Store(bucket="my-bucket", region="us-east-1")

Put object (bytes)

obs.put(store, "hello.txt", b"Hello, World!")

Put from file

with open("local-file.csv", "rb") as f: obs.put(store, "data/file.csv", f)

Get object

response = obs.get(store, "hello.txt") print(response.bytes()) # b"Hello, World!" print(response.meta) # Object metadata (size, mtime, etag, etc.)

Get range (efficient partial reads)

partial = obs.get_range(store, "large-file.bin", offset=0, length=1024)

Stream download

stream = obs.get(store, "large-file.bin") for chunk in stream.stream(min_chunk_size=8 * 1024 * 1024): process(chunk)

List objects (streaming, no pagination needed!)

for obj in obs.list(store, prefix="data/2024/"): print(f"{obj['path']}: {obj['size']} bytes")

List with delimiter (like directory listing)

result = obs.list_with_delimiter(store, prefix="data/") print(result["common_prefixes"]) # "directories" print(result["objects"]) # files

Delete

obs.delete(store, "old-file.txt")

Copy within same store

obs.copy(store, "src/file.txt", "dst/file.txt")

Rename/move

obs.rename(store, "old-name.txt", "new-name.txt")

Check existence (via head)

try: meta = obs.head(store, "file.txt") print(f"Exists: {meta['size']} bytes") except obs.NotFoundError: print("File not found")

Async API

import asyncio import obstore as obs from obstore.store import S3Store

async def main(): store = S3Store(bucket="my-bucket", region="us-east-1")

# Concurrent uploads
await asyncio.gather(
    obs.put_async(store, "file1.txt", b"content1"),
    obs.put_async(store, "file2.txt", b"content2"),
    obs.put_async(store, "file3.txt", b"content3"),
)

# Concurrent downloads
responses = await asyncio.gather(
    obs.get_async(store, "file1.txt"),
    obs.get_async(store, "file2.txt"),
    obs.get_async(store, "file3.txt"),
)

for resp in responses:
    print(await resp.bytes_async())

asyncio.run(main())

Streaming Uploads

import asyncio import obstore as obs from obstore.store import S3Store

store = S3Store(bucket="my-bucket")

Upload from generator (streaming, memory-efficient)

def data_generator(): for i in range(1000): yield f"Row {i}\n".encode()

obs.put(store, "output.txt", data_generator())

Upload from async iterator

async def async_data(): for i in range(1000): await asyncio.sleep(0) yield f"Row {i}\n".encode()

async def upload_async(): await obs.put_async(store, "output-async.txt", async_data())

asyncio.run(upload_async())

Automatic multipart upload for large files

(triggered automatically based on size)

with open("huge-file.bin", "rb") as f: obs.put(store, "huge-file.bin", f) # Multi-part automatically

Arrow Integration

import obstore as obs from obstore.store import S3Store

store = S3Store(bucket="my-bucket")

Return list results as Arrow table (faster, more memory-efficient)

arrow_table = obs.list(store, prefix="data/", return_arrow=True) print(arrow_table.schema)

pyarrow.Schema

├── path: string

├── size: int64

├── last_modified: timestamp[ns]

└── etag: string

Process with PyArrow/Polars

import polars as pl df = pl.from_arrow(arrow_table)

fsspec Compatibility

obstore provides an fsspec-compatible wrapper:

from obstore.fsspec import FsspecStore, register import pyarrow.parquet as pq

Method 1: Register as default handler for protocols

Now fsspec uses obstore internally

import fsspec fs = fsspec.filesystem("s3", region="us-east-1")

Method 2: Use FsspecStore directly

fs = FsspecStore("s3", bucket="my-bucket", region="us-east-1")

or

fs = FsspecStore.from_store(s3_store_object)

Use with PyArrow

parquet_file = pq.ParquetFile( "s3://bucket/data/file.parquet", filesystem=fs )

When to Use obstore

Choose obstore when:

✅ Performance is paramount (many small files, high concurrency)
✅ You need async/await for concurrent operations
✅ Minimal dependencies are desired (Rust-based, no Python C extensions)
✅ Streaming uploads from generators/iterators
✅ Large-scale data ingestion/egestion

Performance Comparison

Operation fsspec pyarrow.fs obstore

Concurrent small files Moderate Moderate 9x faster

Async support Yes (aiohttp) Limited Native

Streaming uploads Yes Limited Yes (efficient)

Parquet pushdown Via PyArrow Excellent Via PyArrow

Maturity (2025) Very high High Rapidly growing

Authentication

See @data-engineering-storage-authentication for credential patterns. All S3Store , GCSStore , AzureStore constructors accept explicit credentials or use environment variables via from_env() .

References

obstore Documentation
PyPI: obstore
object_store (Rust)