rate-limiting

Rate limiting algorithms, implementation strategies, HTTP conventions, tiered limits, distributed patterns, and client-side handling. Use when protecting APIs from abuse, implementing usage tiers, or configuring gateway-level throttling.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "rate-limiting" with this command: npx skills add wpank/api-rate-limiting

Rate Limiting Patterns

Algorithms

AlgorithmAccuracyBurst HandlingBest For
Token BucketHighAllows controlled burstsAPI rate limiting, traffic shaping
Leaky BucketHighSmooths bursts entirelySteady-rate processing, queues
Fixed WindowLowAllows edge bursts (2x)Simple use cases, prototyping
Sliding Window LogVery HighPrecise controlStrict compliance, billing-critical
Sliding Window CounterHighGood approximationProduction APIs — best tradeoff

Fixed window problem: A user sends the full limit at 11:59 and again at 12:01, doubling the effective rate. Sliding window fixes this.

Token Bucket

Bucket holds tokens up to capacity. Tokens refill at a fixed rate. Each request consumes one.

class TokenBucket:
    def __init__(self, capacity: int, refill_rate: float):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate  # tokens per second
        self.last_refill = time.monotonic()

    def allow(self) -> bool:
        now = time.monotonic()
        elapsed = now - self.last_refill
        self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
        self.last_refill = now
        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False

Sliding Window Counter

Hybrid of fixed window and sliding window log — weights the previous window's count by overlap percentage:

def sliding_window_allow(key: str, limit: int, window_sec: int) -> bool:
    now = time.time()
    current_window = int(now // window_sec)
    position_in_window = (now % window_sec) / window_sec

    prev_count = get_count(key, current_window - 1)
    curr_count = get_count(key, current_window)

    estimated = prev_count * (1 - position_in_window) + curr_count
    if estimated >= limit:
        return False
    increment_count(key, current_window)
    return True

Implementation Options

ApproachScopeBest For
In-memorySingle serverZero latency, no dependencies
Redis (INCR + EXPIRE)DistributedMulti-instance deployments
API GatewayEdgeNo code, built-in dashboards
MiddlewarePer-serviceFine-grained per-user/endpoint control

Use gateway-level limiting as outer defense + application-level for fine-grained control.


HTTP Headers

Always return rate limit info, even on successful requests:

RateLimit-Limit: 1000
RateLimit-Remaining: 742
RateLimit-Reset: 1625097600
Retry-After: 30
HeaderWhen to Include
RateLimit-LimitEvery response
RateLimit-RemainingEvery response
RateLimit-ResetEvery response
Retry-After429 responses only

429 Response Body

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Maximum 1000 requests per hour.",
    "retry_after": 30,
    "limit": 1000,
    "reset_at": "2025-07-01T12:00:00Z"
  }
}

Never return 500 or 503 for rate limiting — 429 is the correct status code.


Rate Limit Tiers

Apply limits at multiple granularities:

ScopeKeyExample LimitPurpose
Per-IPClient IP100 req/minAbuse prevention
Per-UserUser ID1000 req/hrFair usage
Per-API-KeyAPI key5000 req/hrService-to-service
Per-EndpointRoute + key60 req/min on /searchProtect expensive ops

Tiered pricing:

TierRate LimitBurstCost
Free100 req/hr10$0
Pro5,000 req/hr100$49/mo
Enterprise100,000 req/hr2,000Custom

Evaluate from most specific to least specific: per-endpoint > per-user > per-IP.


Distributed Rate Limiting

Redis-based pattern for consistent limiting across instances:

def redis_rate_limit(redis, key: str, limit: int, window: int) -> bool:
    pipe = redis.pipeline()
    now = time.time()
    window_key = f"rl:{key}:{int(now // window)}"
    pipe.incr(window_key)
    pipe.expire(window_key, window * 2)
    results = pipe.execute()
    return results[0] <= limit

Atomic Lua script (prevents race conditions):

local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local current = redis.call('INCR', key)
if current == 1 then
    redis.call('EXPIRE', key, window)
end
return current <= limit and 1 or 0

Never do separate GET then SET — the gap allows overcount.


API Gateway Configuration

NGINX:

http {
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
    server {
        location /api/ {
            limit_req zone=api burst=20 nodelay;
            limit_req_status 429;
        }
    }
}

Kong:

plugins:
  - name: rate-limiting
    config:
      minute: 60
      hour: 1000
      policy: redis
      redis_host: redis.internal

Client-Side Handling

Clients must handle 429 gracefully:

async function fetchWithRetry(url: string, maxRetries = 3): Promise<Response> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const res = await fetch(url);
    if (res.status !== 429) return res;

    const retryAfter = res.headers.get('Retry-After');
    const delay = retryAfter
      ? parseInt(retryAfter, 10) * 1000
      : Math.min(1000 * 2 ** attempt, 30000);
    await new Promise(r => setTimeout(r, delay));
  }
  throw new Error('Rate limit exceeded after retries');
}
  • Always respect Retry-After when present
  • Use exponential backoff with jitter when absent
  • Implement request queuing for batch operations

Monitoring

Track these metrics:

  • Rate limit hit rate — % of requests returning 429 (alert if >5% sustained)
  • Near-limit warnings — requests where remaining < 10% of limit
  • Top offenders — keys/IPs hitting limits most frequently
  • Limit headroom — how close normal traffic is to the ceiling
  • False positives — legitimate users being rate limited

Anti-Patterns

Anti-PatternFix
Application-only limitingAlways combine with infrastructure-level limits
No retry guidanceAlways include Retry-After header on 429
Inconsistent limitsSame endpoint, same limits across services
No burst allowanceAllow controlled bursts for legitimate traffic
Silent droppingAlways return 429 so clients can distinguish from errors
Global single counterPer-endpoint counters to protect expensive operations
Hard-coded limitsUse configuration, not code constants

NEVER Do

  1. NEVER rate limit health check endpoints — monitoring systems will false-alarm
  2. NEVER use client-supplied identifiers as sole rate limit key — trivially spoofed
  3. NEVER return 200 OK when rate limiting — clients must know they were throttled
  4. NEVER set limits without measuring actual traffic first — you'll block legitimate users or set limits too high to matter
  5. NEVER share counters across unrelated tenants — noisy neighbor problem
  6. NEVER skip rate limiting on internal APIs — misbehaving internal services can take down shared infrastructure
  7. NEVER implement rate limiting without logging — you need visibility to tune limits and detect abuse

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

clawhub-install

Download and install skills from ClawHub directly via curl, bypassing official CLI rate limits. Use when the user wants to install one or more ClawHub skills...

Registry SourceRecently Updated
0199
upupc
Coding

Homebrew Bridge

Expose Mac Homebrew tools like brew, gh, and other /opt/homebrew/bin CLIs on a Linux OpenClaw gateway by installing explicit same-LAN SSH wrappers with optio...

Registry SourceRecently Updated
Coding

Dev Tools Pack

Collection of developer tools including Chrome extension templates, AI code reviews, GitHub README generators, SaaS landing pages, tech blogs, and tweet thre...

Registry SourceRecently Updated