WebSocket Client Resilience
6 resilience patterns for WebSocket clients, designed for unreliable network conditions.
Mobile WebSocket connections fail in ways that local development environments don't surface. Tail latency on cellular networks can reach several seconds, which means aggressive health check timeouts (e.g. 5 seconds) may cause false disconnects on slow connections.
When to use: Implementing WebSocket client reconnection logic, building real-time features with persistent connections, mobile app WebSocket handling, any client that maintains long-lived server connections.
When not to use: Server-side WebSocket handlers, HTTP request/response patterns, Server-Sent Events (SSE).
Rationalizations (Do Not Skip)
| Rationalization | Why It's Wrong | Required Action |
|---|---|---|
| "Our users are on fast networks" | Mobile users exist. Even desktop WiFi has transient blips. | Test with throttled networks |
| "Simple retry is enough" | Without jitter, all clients retry at once after an outage | Add randomized jitter |
| "One missed heartbeat means disconnected" | Network blips last 1-3 seconds. Single miss = false positive. | Use hysteresis (2+ misses) |
| "We'll add resilience later" | Reconnection logic is foundational. Retrofitting it is much harder. | Build it in from the start |
| "5 seconds is plenty of timeout" | Mobile P99 is 5-8s. That "timeout" is normal latency for mobile. | Use 10s+ for mobile |
Included Utilities
// WebSocket resilience pattern implementations (zero dependencies)
import {
getBackoffDelay,
circuitBreakerTransition,
shouldDisconnect,
CommandAckTracker,
detectSequenceGap,
classifyTimeout,
} from './resilience.ts';
getBackoffDelay() accepts an optional RNG function for deterministic tests:
const lowJitter = getBackoffDelay(0, 1000, 30000, () => 0); // 750
const highJitter = getBackoffDelay(0, 1000, 30000, () => 1); // 1250
Quick Reference
| Pattern | Detect | Fix | Severity |
|---|---|---|---|
| Backoff without jitter | Math.pow(2, attempt) without Math.random() | Add +/- 25% jitter | must-fail |
| No circuit breaker | Reconnect without failure counter | Trip after 5 failures, 60s cooldown | must-fail |
| Single heartbeat miss | setTimeout disconnect without miss counter | Require 2+ missed heartbeats | should-fail |
| No command ack | ws.send() without commandId tracking | Track pending commands, timeout at 30s | nice-to-have |
| No sequence tracking | onmessage without sequence check | Track lastReceivedSequence, detect gaps | nice-to-have |
| Short mobile timeout | Health timeout < 10s | Use 10s+ for all health checks | must-fail |
Coverage
| Pattern | Utility | Status |
|---|---|---|
| 1. Backoff with jitter | getBackoffDelay() | Code + tests |
| 2. Circuit breaker | circuitBreakerTransition() | Code + tests |
| 3. Heartbeat hysteresis | shouldDisconnect() | Code + tests |
| 4. Command acknowledgment | CommandAckTracker | Code + tests |
| 5. Sequence gap detection | detectSequenceGap() | Code + tests |
| 6. Mobile-aware timeouts | classifyTimeout() | Code + tests |
All 6 patterns have executable utilities and tests.
Companion Skills
This skill provides client-side resilience patterns, not WebSocket server architecture guidance. For broader methodology:
- Search
websocketon skills.sh for server-side handlers, protocol design, and connection management - The circuit breaker is a state machine (closed/open/half-open) — use model-based-testing for systematic transition matrix coverage of all state pairs
- Backoff, circuit breaker, and retry patterns need fault simulation — use fault-injection-testing for circuit breaker testing utilities and queue preservation assertions
- Connection lifecycle events should log at correct levels — use observability-testing to assert structured log output on connect/disconnect/reconnect
Framework Adaptation
These patterns are framework-agnostic. They work with:
- Browser: Native
WebSocket, Socket.IO, ws library - React/Vue/Svelte: Wrap in composable/hook
- React Native / Flutter: Same patterns, different APIs
- Node.js:
wslibrary for server-to-server WebSocket clients
The core principle: real-world network conditions are more variable than controlled environments. Design for mobile latency, not localhost.
See patterns.md for full before/after code examples and detection commands for each pattern.