Rust Systems & Services
Covers modern application-layer Rust (edition 2024): CLIs, web services, libraries. Not no_std/embedded.
Tooling
| Tool | Purpose |
|---|---|
cargo | Build, dep management, script runner |
clippy | Lint (cargo clippy --workspace --all-targets -- -D warnings) |
rustfmt | Formatter (cargo fmt --all) |
cargo-nextest | Test runner, noticeably faster than cargo test, better isolation |
cargo-deny | License + advisory + duplicate-dep checks |
cargo-machete | Find unused dependencies |
- Pin
rust-toolchain.tomlper repo so every contributor and CI uses the same compiler. cargo update -p <crate>for single-package upgrades.cargo updaterewrites everything — avoid in PR diffs.Cargo.lockgoes in version control for binaries and libraries (modern guidance; reproducibility wins).
Workspaces
Multi-crate projects use a workspace with layered crates. Dependencies point inward only.
Cargo.toml # [workspace] members + [workspace.dependencies]
crates/
protocol/ # Shared types, no deps on other workspace crates
storage/ # Persistence, depends on protocol
service/ # Business logic, depends on protocol + storage
cli/ # Binary, depends on everything
-
Centralize versions in
[workspace.dependencies], reference asfoo = { workspace = true }in members. -
Keep the leaf-most crate (
protocol/ types) dependency-free so every other crate can depend on it without cycles. -
Feature flags belong on the crate that introduces the dependency, not re-exported through the workspace root.
-
Library crates expose one stable facade: a thin
lib.rswith a//!module doc comment stating purpose, followed bypub usere-exports of the public surface. Consumers learn one import path per concept; internal module layout can be reorganized without breaking callers. -
Feature gates must error, never silently degrade. If runtime config requests a capability the binary wasn't compiled with (e.g.
device = "gpu"on a non-CUDA build), fail at startup with a clear error. Silent fallback produces different behavior from what the operator configured, often without anyone noticing. -
Centralize lints at the workspace root with
[workspace.lints.*]. Every member crate inherits the same ruleset — no drift between crates, no per-crate#![deny(...)]stacks. Example:[workspace.lints.rust] unsafe_code = "warn" missing_docs = "warn" [workspace.lints.clippy] all = { level = "warn", priority = -1 } pedantic = { level = "warn", priority = -1 } nursery = { level = "warn", priority = -1 } module_name_repetitions = "allow" must_use_candidate = "allow"Each member crate opts in with
[lints] workspace = truein its ownCargo.toml. Changing a lint in one place updates every crate.
Build Profiles
When tuning Cargo build profiles (release LTO, release-dbg symbols, release-min for distributable binaries) or adding dev-machine speedups (mold linker, target-cpu=native, share-generics), load build-profiles.md.
Error Handling
Split by crate role:
- Libraries / lower crates: define typed errors with
thiserror. Consumers can pattern-match. - Binaries / top-level crates: use
anyhow::Resultwith.context("what was being attempted"). Human-readable error chains. - Never return
Box<dyn Error>from library APIs — it erases variant information. - Use
?liberally. Never.unwrap()or.expect()outside tests andmain. Anexpect("...")is acceptable only when the invariant is provably upheld and the message explains why. - Convert at boundaries:
#[from]on thiserror variants for auto-conversion;.map_err(MyError::from)when explicit. bail!("...")/ensure!(cond, "...")in application code for early exits.- Prefer
Result<T, E>over panics for any recoverable error. Panics are for programmer bugs (broken invariants), not runtime failures. #[must_use]on fallible APIs: annotate functions returningResultor newtype-wrapped results that callers frequently ignore. Catcheslet _ = validate(x);at compile time instead of shipping a silently-dropped error.
Ownership Discipline
-
Take
&strover&String,&[T]over&Vec<T>in function signatures — accepts more call sites for free. -
Return owned (
String,Vec<T>) from constructors and public APIs. Borrow in hot paths where lifetimes are obvious. -
Reach for
Arc<T>only when sharing across threads. Single-threaded sharing usesRc<T>or references. -
Cow<'_, str>when a function sometimes allocates and sometimes borrows (e.g. normalization). -
Lifetime elision handles 90% of cases. If you're writing
'ain more than one signature, reconsider whether that type should own its data instead. -
bytes::Bytesfor zero-copy slicing of shared immutable buffers — network parsers, frame decoders, protocol handlers.BytesMutfor building buffers thatsplit_to/split_offintoByteswithout reallocation. PreferBytesoverArc<Vec<u8>>when slicing is the dominant access pattern. -
Reduce hot-path heap allocations with stack-or-inline collections when the typical size is small and known:
smallvec::SmallVec<[T; N]>— inline for ≤N items, spills to heap beyond. Good for "usually 1-8 items" cases like parsed tag lists, lookup keys, small event batches.arrayvec::ArrayVec<T, CAP>— fixed capacity, never heap-allocates. Returns an error when full. Good for bounded message buffers or per-request scratch space.- String interning for repeatedly-seen strings (enum-like values parsed from config, tenant IDs, route keys):
dashmap::DashMap<String, &'static str>withBox::leakon miss gives&'static strcomparisons without per-call allocations.
These are optimizations — profile first.
Vec/Stringon a cold path isn't the bottleneck.
Async with Tokio
- Default runtime:
#[tokio::main]withfeatures = ["full"]for apps;features = ["rt", "macros", "sync"]for libraries that need to stay slim. tokio::spawnfor independent tasks.JoinSetfor a dynamic group you'll await together with cancellation.tokio::select!for racing futures (timeouts, cancellation, first-wins).- Never block the runtime:
tokio::task::spawn_blockingfor sync CPU work or blocking I/O libs. tokio::sync::Mutexonly when the guard must be held across.await. Otherwisestd::sync::Mutexis faster.tokio::sync::RwLockwhen reads dominate writes (config snapshots, route tables, hot caches). Many readers proceed in parallel;Mutexserializes them. For snapshot-swap semantics (rarely-updated config),arc-swap::ArcSwapis faster still — no lock on the read path.- Cancellation:
CancellationToken(fromtokio-util) propagates shutdown. Long-running tasks must check it. - Backpressure via bounded
mpscchannels — unbounded channels hide memory growth until OOM. Semaphorefor hard concurrency limits on spawn paths that don't fit a channel model (e.g. "at most 50 concurrent outbound HTTP calls").let _permit = sem.acquire().await?;inside the task; dropping the permit releases the slot. Pair withArc<Semaphore>shared across spawners.- Don't mix async runtimes. Pick
tokioand stick with it;async-stdandsmoldon't interop cleanly.
CLI Tools (clap)
- Use the derive API:
#[derive(Parser)]+#[derive(Subcommand)]. Less boilerplate, types drive the help text. - One
enum Commandsvariant per subcommand; flatten shared flags into a#[command(flatten)] struct CommonArgs. --jsonflag on query commands for agent/pipe consumption. Emit viaserde_json::to_string(&value)?.- Exit codes: 0 success, 1 for errors
mainreturned, 2 for argparse (clap handles this), reserve 3+ for domain meanings documented in--help. - Provide
--versionautomatically via#[command(version)].
See cli-tools.md for config layering, logging setup, progress reporting, and shell completions.
HTTP Services (axum)
- Framework default: axum (tokio-native, tower middleware, extractor-based handlers). Pick
actix-webonly if an existing codebase uses it. - Handlers return
Result<impl IntoResponse, AppError>. ImplementIntoResponseforAppErrorto centralize error → status mapping. - Validate input at the boundary:
axum::extract::Json<T>whereT: Deserialize + Validate(usevalidatorcrate). Internal services trust input was validated. - Share state via
State<Arc<AppState>>— not globals, notlazy_static. - Middleware via
tower::ServiceBuilder: tracing → timeout → auth → CORS → handler. Order matters. - Resilience layer stack (outbound HTTP clients and shared services):
ServiceBuilder::new().layer(TimeoutLayer).layer(RateLimitLayer).layer(ConcurrencyLimitLayer).layer(LoadShedLayer).layer(RetryLayer).service(client). Name each layer explicitly —LoadShedLayersheds excess load,ConcurrencyLimitLayercaps in-flight requests,RateLimitLayerbounds request rate,RetryLayerretries classified transient errors. CombiningLoadShedLayer+ConcurrencyLimitLayerproduces proper backpressure instead of unbounded queueing.
See axum-service.md for project layout, extractors, error types, graceful shutdown, and OpenAPI generation.
Concurrency
| Workload | Approach |
|---|---|
| Independent async I/O | tokio::spawn + JoinSet or futures::join! |
| Data-parallel CPU work | rayon with par_iter |
| Shared mutable state across threads | Arc<Mutex<T>> or Arc<RwLock<T>>, smallest scope possible |
| Single-producer pipelines | tokio::sync::mpsc (async) or std::sync::mpsc (sync) |
| Broadcast / fan-out | tokio::sync::broadcast |
rayon and tokio coexist — use tokio::task::spawn_blocking to call a rayon pool from async code. Never call .block_on() from inside a tokio task; it deadlocks the runtime.
Testing
- Built-in
#[test]. Prefercargo nextest run --workspaceovercargo test— it runs tests in parallel processes with proper isolation. - Unit tests live in
mod tests { ... }at the bottom of the file (access to private items). - Integration tests in
tests/directory. One file per public surface area. #[tokio::test]for async tests. Addflavor = "multi_thread"when the code under test spawns tasks.rstestfor parametrized tests and fixtures.proptest/quickcheckfor property-based tests on pure logic.instafor snapshot testing CLI output, serialization, large structs. Review diffs withcargo insta review.assert_cmd+predicatesfor CLI integration tests (invokes the binary, asserts on stdout/stderr/exit code).- Assert on error variants with
matches!:assert!(matches!(result.unwrap_err(), MyError::Validation(_))). Cleaner thanmatcharms when the test only cares whether the error is the right kind, and doesn't force updates when unrelated variants are added. - Coverage:
cargo llvm-cov --workspace --html. Target 70%+ on application code, higher on library crates. - Fuzzing for parsers:
cargo fuzz+libfuzzer-syson any code that parses untrusted input (file formats, protocols, query languages). A short nightly fuzz run surfaces the panics and UB that unit tests miss.
For generic test discipline (anti-patterns, mock rules, rationalization resistance), see the ia-writing-tests skill.
Unsafe Discipline
- Default: no
unsafe. If clippy flags it, don't#[allow]it — refactor. - Every
unsafeblock gets a// SAFETY:comment above it explaining why each invariant holds. No comment = reviewer rejects. - Keep
unsafeblocks minimal — wrap in a safe abstraction at module boundary, mark the modulepub(crate). - Use
miri(cargo +nightly miri test) on any crate containingunsafeor raw pointer arithmetic — catches UB that optimizers mask. - Prefer
bytemuck,zerocopy,bytesover hand-rolled transmutes for zero-copy patterns.
Production Resilience
When productionizing a service (config validation, /health + /ready endpoints, graceful shutdown, retries/timeouts/jitter, connection pools, diagnostic secret redaction), load production-resilience.md.
Observability
For logging (tracing + tracing-subscriber with init recipe), #[instrument] spans, correlation IDs, metrics, and distributed tracing patterns, load observability.md. Never use println! or log:: in new code.
CI
General CI design lives with the ia-infrastructure-engineer agent. For Rust-specific callouts (rustsec/audit-check, cargo-llvm-cov, Swatinem/rust-cache, taiki-e/install-action, matrix coverage guidance, doc-test step), load ci-pipeline.md.
Discipline
- Simplicity first — every change as simple as possible, impact minimal code.
- Only touch what's necessary — avoid unrelated changes in a PR.
- No
#[allow(clippy::...)]as a shortcut — fix the underlying issue. Document exceptions with a rationale. - Before adding a trait or generic, verify it's used in 3+ places. Otherwise a concrete type is clearer.
- Verify: see Verify section — pass all checks with zero warnings before declaring done.
Verify
cargo fmt --all -- --checkpasses with zero diffscargo clippy --workspace --all-targets --all-features -- -D warningspassescargo nextest run --workspace(orcargo test --workspace) passes with zero failurescargo deny checkpasses (licenses, advisories, duplicates) for any crate going to production- No new
unsafewithout// SAFETY:comment
References
- cli-tools.md — clap patterns, config layering, tracing setup, progress, shell completions
- axum-service.md — project layout, extractors, error types, graceful shutdown, testing
- build-profiles.md — release/release-dbg/release-min profiles, mold linker, dev compile speedups
- ci-pipeline.md — Rust-specific CI steps (cargo audit, llvm-cov, rust-cache, matrix strategy, doc tests)
- production-resilience.md — fail-fast config, health/ready endpoints, graceful shutdown, retries, timeouts, connection pools
- observability.md — tracing init recipe, span instrumentation, correlation IDs, metrics, distributed tracing