m10-performance

Performance Optimization

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "m10-performance" with this command: npx skills add rustfs/rustfs/rustfs-rustfs-m10-performance

Performance Optimization

Layer 2: Design Choices

Core Question

What's the bottleneck, and is optimization worth it?

Before optimizing:

  • Have you measured? (Don't guess)

  • What's the acceptable performance?

  • Will optimization add complexity?

Performance Decision → Implementation

Goal Design Choice Implementation

Reduce allocations Pre-allocate, reuse with_capacity , object pools

Improve cache Contiguous data Vec , SmallVec

Parallelize Data parallelism rayon , threads

Avoid copies Zero-copy References, Cow<T>

Reduce indirection Inline data smallvec , arrays

Thinking Prompt

Before optimizing:

Have you measured?

  • Profile first → flamegraph, perf

  • Benchmark → criterion, cargo bench

  • Identify actual hotspots

What's the priority?

  • Algorithm (10x-1000x improvement)

  • Data structure (2x-10x)

  • Allocation (2x-5x)

  • Cache (1.5x-3x)

What's the trade-off?

  • Complexity vs speed

  • Memory vs CPU

  • Latency vs throughput

Trace Up ↑

To domain constraints (Layer 3):

"How fast does this need to be?" ↑ Ask: What's the performance SLA? ↑ Check: domain-* (latency requirements) ↑ Check: Business requirements (acceptable response time)

Question Trace To Ask

Latency requirements domain-* What's acceptable response time?

Throughput needs domain-* How many requests per second?

Memory constraints domain-* What's the memory budget?

Trace Down ↓

To implementation (Layer 1):

"Need to reduce allocations" ↓ m01-ownership: Use references, avoid clone ↓ m02-resource: Pre-allocate with_capacity

"Need to parallelize" ↓ m07-concurrency: Choose rayon or threads ↓ m07-concurrency: Consider async for I/O-bound

"Need cache efficiency" ↓ Data layout: Prefer Vec over HashMap when possible ↓ Access patterns: Sequential over random access

Quick Reference

Tool Purpose

cargo bench

Micro-benchmarks

criterion

Statistical benchmarks

perf / flamegraph

CPU profiling

heaptrack

Allocation tracking

valgrind / cachegrind

Cache analysis

Optimization Priority

  1. Algorithm choice (10x - 1000x)
  2. Data structure (2x - 10x)
  3. Allocation reduction (2x - 5x)
  4. Cache optimization (1.5x - 3x)
  5. SIMD/Parallelism (2x - 8x)

Common Techniques

Technique When How

Pre-allocation Known size Vec::with_capacity(n)

Avoid cloning Hot paths Use references or Cow<T>

Batch operations Many small ops Collect then process

SmallVec Usually small smallvec::SmallVec<[T; N]>

Inline buffers Fixed-size data Arrays over Vec

Common Mistakes

Mistake Why Wrong Better

Optimize without profiling Wrong target Profile first

Benchmark in debug mode Meaningless Always --release

Use LinkedList Cache unfriendly Vec or VecDeque

Hidden .clone()

Unnecessary allocs Use references

Premature optimization Wasted effort Make it work first

Anti-Patterns

Anti-Pattern Why Bad Better

Clone to avoid lifetimes Performance cost Proper ownership

Box everything Indirection cost Stack when possible

HashMap for small sets Overhead Vec with linear search

String concat in loop O(n^2) String::with_capacity or format!

Related Skills

When See

Reducing clones m01-ownership

Concurrency options m07-concurrency

Smart pointer choice m02-resource

Domain requirements domain-*

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

m10-performance

No summary provided by upstream source.

Repository SourceNeeds Review
General

m10-performance

No summary provided by upstream source.

Repository SourceNeeds Review
General

coding-guidelines

No summary provided by upstream source.

Repository SourceNeeds Review