Debug Detective — Systematic Debugging Methodology
Find and fix bugs efficiently across the full stack using structured investigation techniques.
1. Debugging Mindset
1.1 The scientific method for debugging
1. OBSERVE — What exactly is happening? (symptoms, error messages, logs)
2. HYPOTHESIZE — What could cause this? (list 3+ possibilities)
3. PREDICT — If hypothesis X is correct, then Y should be true
4. TEST — Design the smallest experiment to test the prediction
5. ANALYZE — Did the test confirm or refute the hypothesis?
6. REPEAT — If refuted, move to next hypothesis; if confirmed, fix and verify
1.2 Key debugging principles
- The bug is never where you think it is. Widen your search radius before going deep.
- Reproduce first, fix second. A bug you can't reproduce is a bug you can't verify as fixed.
- Change one thing at a time. Multiple simultaneous changes make it impossible to identify the fix.
- Trust nothing. Verify assumptions — check that the code you're reading is the code that's running.
- Read the error message. Fully. Including the stack trace. Including the "caused by" chain.
1.3 Cognitive biases that hinder debugging
| Bias | How it hurts | Counter-strategy |
|---|---|---|
| Confirmation bias | You look for evidence supporting your theory, ignore contradicting evidence | Actively try to disprove your hypothesis |
| Anchoring | First theory dominates even when evidence points elsewhere | Write down 3+ hypotheses before investigating any |
| Recency bias | "I just changed X, so X must be the problem" | Check git log — the bug might predate your change |
| Availability bias | "Last time it was a race condition, so it must be again" | Consider all categories: data, logic, timing, config, environment |
| Sunk cost | "I've spent 2 hours on this theory, it must be right" | Set a timebox: 30 min per hypothesis, then move on |
1.4 Rubber duck debugging
Explain the problem out loud (to a duck, a colleague, or a text file):
- State what the code is supposed to do
- Walk through the code line by line, explaining each step
- The act of articulating often reveals the gap between expectation and reality
1.5 Feynman technique
- Write the bug description as if explaining to a non-programmer
- Identify gaps in your explanation — those are gaps in your understanding
- Go back to the code to fill those gaps
- Simplify your explanation further
2. Systematic Debugging Workflow
2.1 The six-step process
┌─────────────┐
│ 1. REPRODUCE │ ← Can you trigger the bug reliably?
└──────┬──────┘
▼
┌─────────────┐
│ 2. ISOLATE │ ← Narrow down: which component, input, or path?
└──────┬──────┘
▼
┌─────────────┐
│ 3. IDENTIFY │ ← Root cause found
└──────┬──────┘
▼
┌─────────────┐
│ 4. FIX │ ← Minimal, targeted change
└──────┬──────┘
▼
┌─────────────┐
│ 5. VERIFY │ ← Bug no longer reproduces; no regressions
└──────┬──────┘
▼
┌─────────────┐
│ 6. PREVENT │ ← Add test, monitoring, or documentation
└─────────────┘
2.2 Reproducing the bug
Minimal reproduction checklist:
- Start from a clean state (fresh install, empty database, incognito browser)
- List exact steps to trigger the bug
- Note the environment: OS, runtime version, browser, config
- Strip away unrelated code until the bug is isolated
- If intermittent: identify the timing/concurrency pattern
# Create a minimal reproduction project
mkdir bug-repro && cd bug-repro
npm init -y
# Add only the minimum dependencies needed to demonstrate the bug
npm install problematic-library@1.2.3
# Write the smallest possible script that triggers the issue
2.3 Binary search debugging
When you don't know where the bug is, bisect:
Code bisection:
// Add a return/exit at the midpoint of the suspect code
// If the bug disappears → bug is after the midpoint
// If the bug persists → bug is before the midpoint
// Repeat on the narrowed half
Data bisection:
# If a large input causes the bug, split it in half
head -n 500 input.csv > first_half.csv
tail -n 500 input.csv > second_half.csv
# Test each half — which one triggers the bug?
Config bisection:
# Comment out half the config, test
# Narrow down which config option causes the issue
2.4 Reading stack traces
Error: Cannot read properties of undefined (reading 'email')
at getUserEmail (src/services/user.ts:42:18) ← WHERE it crashed
at processOrder (src/services/order.ts:87:24) ← WHO called it
at OrderController.create (src/controllers/order.ts:23:5) ← Entry point
at Layer.handle (node_modules/express/lib/router/layer.js:95:5)
Read bottom-up: The bottom shows where the call originated. The top shows where it failed. The line src/services/user.ts:42 is where to look, but the cause might be in order.ts:87 (passing undefined).
3. Git Bisect
3.1 Manual bisect
# Start bisecting
git bisect start
# Mark current (broken) commit as bad
git bisect bad
# Mark a known-good commit (e.g., last release tag)
git bisect good v2.0.0
# Git checks out the midpoint — test it
# If this commit is broken:
git bisect bad
# If this commit works:
git bisect good
# Repeat until git identifies the first bad commit
# Git outputs: "abc1234 is the first bad commit"
# Done — reset
git bisect reset
3.2 Automated bisect
# Automated: provide a test script that exits 0 (good) or 1 (bad)
git bisect start HEAD v2.0.0
git bisect run npm test
# Or with a custom script
git bisect run bash -c '
npm run build 2>/dev/null && \
node -e "
const { buggyFunction } = require(\"./dist\");
const result = buggyFunction(\"test-input\");
process.exit(result === expected ? 0 : 1);
"
'
# Reset when done
git bisect reset
3.3 Bisect with skip
# If a commit can't be tested (e.g., build broken for unrelated reason)
git bisect skip
# Skip a range of untestable commits
git bisect skip v2.0.1..v2.0.5
4. Frontend Debugging
4.1 Chrome DevTools — Console power features
// $0 — reference to currently selected element in Elements panel
$0.textContent
// $$() — querySelectorAll shortcut
$$('button.primary').length
// copy() — copy any value to clipboard
copy(JSON.stringify(data, null, 2))
// monitor() — log all calls to a function
monitor(fetch)
// unmonitor(fetch) to stop
// monitorEvents() — log all events on an element
monitorEvents($0, 'click')
// unmonitorEvents($0) to stop
// queryObjects() — find all instances of a constructor
queryObjects(Promise) // Find all live Promises
// table() — display array/object as table
console.table(users, ['name', 'email', 'role'])
// time/timeEnd — measure execution time
console.time('render')
renderComponent()
console.timeEnd('render') // render: 42.3ms
// group — organize related logs
console.group('API Request')
console.log('URL:', url)
console.log('Method:', method)
console.log('Body:', body)
console.groupEnd()
// assert — log only when condition fails
console.assert(user.id, 'User ID is missing', user)
4.2 Sources panel — Advanced breakpoints
| Breakpoint type | How to set | Use case |
|---|---|---|
| Line breakpoint | Click line number | Stop at specific line |
| Conditional | Right-click line → "Add conditional" | Stop only when condition is true |
| Logpoint | Right-click → "Add logpoint" | Log without modifying code |
| DOM breakpoint | Elements panel → right-click → "Break on" | Stop when DOM changes |
| XHR breakpoint | Sources → XHR Breakpoints → add URL pattern | Stop on matching fetch/XHR |
| Event listener | Sources → Event Listener Breakpoints | Stop on click, keypress, etc. |
| Exception | Sources → pause icon → "Pause on exceptions" | Stop on any thrown error |
4.3 Performance panel — Finding slow code
1. Click Record (or Ctrl+E)
2. Perform the slow action in the app
3. Click Stop
4. Analyze the flame chart:
- Wide bars = slow functions
- Look for "Long Task" markers (>50ms)
- Check "Bottom-Up" tab for aggregate time per function
- Check "Call Tree" for the hot path
4.4 Memory panel — Finding leaks
1. Take Heap Snapshot (baseline)
2. Perform the action suspected of leaking
3. Take another Heap Snapshot
4. Select Snapshot 2, change view to "Comparison"
5. Sort by "# Delta" — positive deltas are new allocations
6. Look for:
- Detached DOM trees (elements removed from page but still referenced)
- Growing arrays or maps
- Event listeners not cleaned up
4.5 CSS debugging techniques
/* Outline all elements to see layout issues */
* { outline: 1px solid red !important; }
/* More detailed — color by nesting depth */
* { outline: 1px solid rgba(255, 0, 0, 0.3) !important; }
* * { outline: 1px solid rgba(0, 255, 0, 0.3) !important; }
* * * { outline: 1px solid rgba(0, 0, 255, 0.3) !important; }
/* Debug z-index stacking */
* { position: relative; }
*::after {
content: attr(class);
position: absolute;
top: 0; left: 0;
font-size: 10px;
background: yellow;
z-index: 99999;
}
4.6 React DevTools profiler
1. Open React DevTools → Profiler tab
2. Click Record
3. Interact with the app
4. Click Stop
5. Analyze:
- Flame chart shows component render times
- Ranked chart shows slowest components
- "Why did this render?" shows trigger reasons
- Look for unnecessary re-renders (grey = didn't render)
5. Node.js / JavaScript Debugging
5.1 Inspect flag
# Start with debugger
node --inspect src/server.js
# Break on first line
node --inspect-brk src/server.js
# Then open chrome://inspect in Chrome and click "inspect"
5.2 VS Code launch.json
{
"version": "0.2.0",
"configurations": [
{
"name": "Debug Server",
"type": "node",
"request": "launch",
"program": "${workspaceFolder}/src/server.ts",
"runtimeExecutable": "tsx",
"console": "integratedTerminal",
"env": { "NODE_ENV": "development" }
},
{
"name": "Debug Tests",
"type": "node",
"request": "launch",
"program": "${workspaceFolder}/node_modules/.bin/vitest",
"args": ["run", "--reporter=verbose", "${file}"],
"console": "integratedTerminal"
},
{
"name": "Attach to Process",
"type": "node",
"request": "attach",
"port": 9229,
"restart": true
}
]
}
5.3 Memory leak hunting in Node.js
# Take heap snapshots programmatically
node --expose-gc -e "
const v8 = require('v8');
const fs = require('fs');
global.gc(); // Force GC before snapshot
const snapshot = v8.writeHeapSnapshot();
console.log('Snapshot written to:', snapshot);
"
# Monitor memory usage over time
node -e "
setInterval(() => {
const used = process.memoryUsage();
console.log(
'RSS:', (used.rss / 1024 / 1024).toFixed(1), 'MB',
'Heap:', (used.heapUsed / 1024 / 1024).toFixed(1), 'MB'
);
}, 5000);
"
5.4 Debugging async code
// Enable async stack traces (default in Node 16+)
// Errors will show the full async call chain
// Common async debugging pattern: add context to errors
async function processOrder(orderId: string) {
try {
const order = await fetchOrder(orderId);
const payment = await chargePayment(order);
return await fulfillOrder(order, payment);
} catch (error) {
// Wrap with context — don't lose the original stack
throw new Error(`Failed to process order ${orderId}`, { cause: error });
}
}
5.5 Why is Node.js not exiting?
# Find what's keeping Node alive
npm install why-is-node-running
import why from "why-is-node-running";
// After your work is done, if the process doesn't exit:
setTimeout(() => {
why(); // Prints active handles keeping the process alive
}, 5000);
6. Python Debugging
6.1 Built-in debugger
# Insert breakpoint anywhere in code
def process_data(items):
result = []
for item in items:
breakpoint() # Drops into pdb (or ipdb if installed)
transformed = transform(item)
result.append(transformed)
return result
# pdb commands:
# n — next line (step over)
# s — step into function
# c — continue to next breakpoint
# p expr — print expression
# pp expr — pretty print
# l — show current code location
# w — show call stack
# u/d — move up/down the call stack
# b 42 — set breakpoint at line 42
# cl — clear all breakpoints
# q — quit debugger
6.2 ipdb (enhanced debugger)
pip install ipdb
# Use ipdb instead of pdb for better UX (tab completion, syntax highlighting)
import ipdb; ipdb.set_trace()
# Or set as default debugger
# In ~/.bashrc or environment:
# export PYTHONBREAKPOINT=ipdb.set_trace
6.3 py-spy — Production profiling
pip install py-spy
# Profile a running process (no restart needed!)
py-spy top --pid 12345
# Generate flame graph
py-spy record -o flamegraph.svg --pid 12345
# Profile a script
py-spy record -o flamegraph.svg -- python my_script.py
# Dump current stack traces
py-spy dump --pid 12345
6.4 Memory profiling
pip install memory-profiler
from memory_profiler import profile
@profile
def memory_intensive():
data = [i ** 2 for i in range(1_000_000)] # Watch memory spike here
filtered = [x for x in data if x % 2 == 0]
return len(filtered)
# Output shows per-line memory usage:
# Line # Mem usage Increment Line Contents
# 4 45.2 MiB 0.0 MiB @profile
# 5 45.2 MiB 0.0 MiB def memory_intensive():
# 6 83.5 MiB 38.3 MiB data = [i ** 2 for ...]
# 7 99.1 MiB 15.6 MiB filtered = [x for ...]
6.5 tracemalloc — Built-in memory tracking
import tracemalloc
tracemalloc.start()
# ... run your code ...
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")
print("Top 10 memory allocations:")
for stat in top_stats[:10]:
print(stat)
7. System-Level Debugging
7.1 strace — Trace system calls
# Trace a running process
strace -p 12345
# Trace a command from start
strace -f node server.js # -f follows child processes
# Only trace specific calls
strace -e trace=open,read,write node server.js
# Trace network calls only
strace -e trace=network node server.js
# Count call statistics
strace -c node server.js
# Output to file with timestamps
strace -tt -o trace.log node server.js
Common findings:
open("/etc/resolv.conf", O_RDONLY) = -1 ENOENT ← DNS config missing
connect(3, {AF_INET, 10.0.0.5:5432}, 16) = -1 ETIMEDOUT ← DB unreachable
write(1, "Error: ENOSPC\n", 14) ← Disk full
7.2 Process inspection
# What files does a process have open?
lsof -p 12345
# What ports is a process listening on?
ss -tlnp | grep node
# Process resource usage
top -p 12345
# Or more detailed:
cat /proc/12345/status | grep -E "VmRSS|VmSize|Threads"
# File descriptors (detect fd leaks)
ls /proc/12345/fd | wc -l
7.3 tcpdump — Network packet capture
# Capture traffic on port 5432 (PostgreSQL)
sudo tcpdump -i any port 5432 -w capture.pcap
# Capture HTTP traffic to specific host
sudo tcpdump -i any host api.example.com and port 443
# Read captured packets
tcpdump -r capture.pcap
# Show packet contents as ASCII
sudo tcpdump -A -i any port 8080
# Count packets per source IP
sudo tcpdump -i any -c 1000 -nn 2>/dev/null | awk '{print $3}' | sort | uniq -c | sort -rn
8. Database Debugging
8.1 EXPLAIN ANALYZE
-- Always use ANALYZE to get actual execution times
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.name, count(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 20;
Reading the output:
Sort (cost=1234..1235 rows=20 width=40) (actual time=45.2..45.3 rows=20 loops=1)
Sort Key: (count(o.id)) DESC
Sort Method: top-N heapsort Memory: 27kB
-> HashAggregate (cost=1200..1220 rows=500 width=40) (actual time=44.1..44.5 rows=500 loops=1)
Group Key: u.id
-> Hash Right Join (cost=100..800 rows=5000 width=40) (actual time=2.1..30.5 rows=5000 loops=1)
-> Seq Scan on orders o (cost=0..500 rows=10000 width=16) (actual time=0.01..10.2 rows=10000 loops=1)
←── PROBLEM: Sequential scan on orders (missing index?)
Key things to look for:
Seq Scanon large tables → missing indexactual rowsmuch larger thanrowsestimate → stale statistics (ANALYZE table)loops=1000→ N+1 query patternSort Method: external merge Disk→ not enough work_mem
8.2 Finding slow queries
-- PostgreSQL: enable slow query log
-- In postgresql.conf:
-- log_min_duration_statement = 100 (log queries > 100ms)
-- Find slow queries with pg_stat_statements
SELECT
calls,
round(total_exec_time::numeric, 2) as total_ms,
round(mean_exec_time::numeric, 2) as mean_ms,
round(max_exec_time::numeric, 2) as max_ms,
left(query, 80) as query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
8.3 Lock debugging
-- Find blocked queries
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query,
now() - blocked.query_start AS blocked_duration
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid
JOIN pg_locks kl ON kl.locktype = bl.locktype
AND kl.database IS NOT DISTINCT FROM bl.database
AND kl.relation IS NOT DISTINCT FROM bl.relation
AND kl.pid != bl.pid
JOIN pg_stat_activity blocking ON blocking.pid = kl.pid
WHERE NOT bl.granted;
8.4 N+1 query detection
# Django: use django-debug-toolbar or nplusone
# pip install nplusone
INSTALLED_APPS = ['nplusone.ext.django', ...]
MIDDLEWARE = ['nplusone.ext.django.NPlusOneMiddleware', ...]
NPLUSONE_RAISE = True # Raise exception on N+1
# SQLAlchemy: enable echo to see all queries
engine = create_engine("postgresql://...", echo=True)
# Count queries in tests
9. Network Debugging
9.1 curl deep dive
# Verbose output — see full request/response headers
curl -v https://api.example.com/health
# Show timing breakdown
curl -w "\
DNS: %{time_namelookup}s\n\
Connect: %{time_connect}s\n\
TLS: %{time_appconnect}s\n\
TTFB: %{time_starttransfer}s\n\
Total: %{time_total}s\n\
HTTP Code: %{http_code}\n\
Size: %{size_download} bytes\n" \
-o /dev/null -s https://api.example.com/health
# Test specific HTTP method with headers
curl -X POST https://api.example.com/data \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{"key": "value"}' \
-w "\nHTTP %{http_code} in %{time_total}s\n"
# Follow redirects
curl -L -v https://example.com
# Ignore TLS errors (debugging only!)
curl -k https://self-signed.example.com
# Resolve to specific IP (bypass DNS)
curl --resolve api.example.com:443:10.0.0.5 https://api.example.com
9.2 DNS debugging
# Query DNS records
dig example.com A # IPv4 address
dig example.com AAAA # IPv6 address
dig example.com CNAME # Canonical name
dig example.com MX # Mail servers
dig example.com TXT # TXT records (SPF, DKIM, verification)
# Use specific DNS server
dig @8.8.8.8 example.com
# Trace full resolution path
dig +trace example.com
# Check reverse DNS
dig -x 93.184.216.34
# Quick check
nslookup example.com
host example.com
9.3 SSL/TLS debugging
# Check certificate chain
openssl s_client -connect example.com:443 -servername example.com < /dev/null
# Check certificate expiry
openssl s_client -connect example.com:443 2>/dev/null | openssl x509 -noout -dates
# Check supported TLS versions
openssl s_client -connect example.com:443 -tls1_2
openssl s_client -connect example.com:443 -tls1_3
# Verify certificate chain
openssl verify -CAfile chain.pem server.pem
9.4 CORS debugging checklist
1. Check the browser console for the exact CORS error message
2. Verify the response includes:
- Access-Control-Allow-Origin: <your origin or *>
- Access-Control-Allow-Methods: <the method you're using>
- Access-Control-Allow-Headers: <any custom headers>
3. For preflight requests (OPTIONS):
- Is the server handling OPTIONS requests?
- Is it returning 200/204 for OPTIONS?
- Is Access-Control-Max-Age set for caching?
4. Common causes:
- Origin mismatch (http vs https, port difference, www vs non-www)
- Missing Access-Control-Allow-Credentials: true (for cookies)
- Wildcard (*) origin not allowed with credentials
10. Memory Debugging
10.1 Common memory leak patterns
| Language | Common cause | Detection |
|---|---|---|
| JavaScript | Event listeners not removed | Heap snapshot comparison |
| JavaScript | Closures capturing large objects | Heap snapshot retainer tree |
| JavaScript | Detached DOM nodes | DevTools Memory → "Detached" filter |
| JavaScript | Growing Map/Set/Array (cache without eviction) | Monitor process.memoryUsage() |
| Python | Circular references with __del__ | gc.get_referrers(), objgraph |
| Python | Global/module-level caches | tracemalloc |
| Go | Goroutine leaks | runtime.NumGoroutine(), pprof |
| Go | Unclosed channels | runtime.Stack() |
10.2 JavaScript memory leak debugging workflow
1. Open DevTools → Memory tab
2. Take Heap Snapshot #1 (baseline)
3. Perform the suspected leaking action 5-10 times
4. Force GC (click trash can icon)
5. Take Heap Snapshot #2
6. Select Snapshot #2 → "Objects allocated between Snapshot 1 and Snapshot 2"
7. Sort by "Retained Size" descending
8. Inspect the retainer tree to find what's holding references
10.3 Container OOM debugging
# Check if process was OOM killed
dmesg | grep -i "oom\|killed"
# Check container memory limits
docker stats container_name
# Check Kubernetes pod events
kubectl describe pod my-pod | grep -A5 "Events"
# Set memory limits with monitoring
docker run --memory=512m --memory-swap=512m my-app
# Profile in container
docker exec -it container_name node --expose-gc --max-old-space-size=256 app.js
11. Performance Profiling
11.1 Flame graphs
# Node.js — generate flame graph
node --prof app.js
# Process the log
node --prof-process isolate-*.log > processed.txt
# Better: use 0x for automatic flame graph
npx 0x app.js
# Open the generated HTML flame graph
# Python — py-spy flame graph
py-spy record -o flamegraph.svg -- python app.py
# Open flamegraph.svg in browser
# Linux — perf + flame graph
perf record -g -p $(pgrep my-app)
perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg
Reading flame graphs:
- X-axis = proportion of time (NOT chronological)
- Y-axis = call stack depth (bottom = entry point, top = leaf functions)
- Wide bars = functions consuming the most CPU time
- Look for "plateaus" — wide, flat tops indicate hot functions
11.2 Core Web Vitals debugging
| Metric | Target | How to debug |
|---|---|---|
| LCP (Largest Contentful Paint) | < 2.5s | DevTools → Performance → "LCP" marker; check image loading, font loading, render-blocking resources |
| INP (Interaction to Next Paint) | < 200ms | DevTools → Performance → click "Interactions"; look for long tasks blocking the main thread |
| CLS (Cumulative Layout Shift) | < 0.1 | DevTools → Performance → "Layout Shifts"; add explicit width/height to images and ads |
# Measure from command line
npx lighthouse https://example.com --output=html --output-path=report.html
# Core Web Vitals in JavaScript
import { onLCP, onINP, onCLS } from 'web-vitals';
onLCP(console.log);
onINP(console.log);
onCLS(console.log);
11.3 Load testing for debugging
# k6 — modern load testing
k6 run --vus 50 --duration 30s script.js
# wrk — simple HTTP benchmarking
wrk -t4 -c100 -d30s http://localhost:3000/api/users
# Apache Bench
ab -n 1000 -c 50 http://localhost:3000/api/health
12. Logging Strategies
12.1 Structured logging
// BAD: unstructured
console.log("User " + userId + " failed to login: " + error.message);
// GOOD: structured (JSON)
import pino from "pino";
const logger = pino({ level: "info" });
logger.error({
event: "login_failed",
userId,
error: error.message,
ip: request.ip,
userAgent: request.headers["user-agent"],
}, "Login failed for user");
12.2 Log levels
| Level | When to use | Example |
|---|---|---|
fatal | Application cannot continue | Database connection lost permanently |
error | Operation failed, needs attention | Payment processing failed |
warn | Unexpected but handled | Rate limit approaching threshold |
info | Significant business events | User registered, order placed |
debug | Detailed technical info | SQL query executed, cache hit/miss |
trace | Very fine-grained | Function entry/exit, variable values |
12.3 Correlation IDs
// Generate a unique ID per request for tracing
import { randomUUID } from "crypto";
app.use((req, res, next) => {
req.requestId = req.headers["x-request-id"]?.toString() ?? randomUUID();
res.setHeader("x-request-id", req.requestId);
// Attach to all logs for this request
req.log = logger.child({ requestId: req.requestId });
next();
});
// Now all logs from this request are correlated
app.get("/api/orders", (req, res) => {
req.log.info({ userId: req.user.id }, "Fetching orders");
// ...
req.log.info({ count: orders.length }, "Orders fetched");
});
12.4 OpenTelemetry basics
// Distributed tracing across services
import { trace, context, SpanStatusCode } from "@opentelemetry/api";
const tracer = trace.getTracer("order-service");
async function processOrder(orderId: string) {
return tracer.startActiveSpan("processOrder", async (span) => {
try {
span.setAttribute("order.id", orderId);
const order = await tracer.startActiveSpan("fetchOrder", async (childSpan) => {
const result = await db.orders.findById(orderId);
childSpan.end();
return result;
});
await tracer.startActiveSpan("chargePayment", async (childSpan) => {
childSpan.setAttribute("payment.amount", order.total);
await paymentService.charge(order);
childSpan.end();
});
span.setStatus({ code: SpanStatusCode.OK });
} catch (error) {
span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
span.recordException(error);
throw error;
} finally {
span.end();
}
});
}
13. Debugging in Production
13.1 Debug without redeploying
// Feature flag for debug mode
const debugMode = await featureFlags.isEnabled("debug-orders", {
userId: currentUser.id,
});
if (debugMode) {
logger.level = "debug";
logger.debug({ order, paymentResult }, "Order processing debug info");
}
13.2 Safe debug endpoints
// Secure debug endpoint (requires admin role + API key)
app.get("/debug/connections", requireAdmin, requireApiKey, async (req, res) => {
const pool = db.pool;
res.json({
total: pool.totalCount,
idle: pool.idleCount,
waiting: pool.waitingCount,
activeQueries: await getActiveQueries(),
});
});
13.3 Sentry error tracking
import * as Sentry from "@sentry/node";
Sentry.init({
dsn: process.env.SENTRY_DSN,
environment: process.env.NODE_ENV,
tracesSampleRate: 0.1, // 10% of transactions
beforeSend(event) {
// Scrub PII
if (event.request?.headers) {
delete event.request.headers["authorization"];
delete event.request.headers["cookie"];
}
return event;
},
});
// Add context to errors
Sentry.setUser({ id: user.id, email: user.email });
Sentry.setTag("feature", "checkout");
Sentry.addBreadcrumb({
category: "payment",
message: `Charging ${amount} to card ending ${last4}`,
level: "info",
});
14. Common Pitfalls
| Pitfall | Symptom | Investigation Approach |
|---|---|---|
| Debugging the wrong environment | Fix works locally, not in staging | Compare env vars, node versions, OS; use printenv diff |
| Stale code running | Changes seem to have no effect | Hard refresh (Ctrl+Shift+R); restart dev server; check build output timestamps |
| Caching hiding the bug | Bug appears intermittently | Disable all caches (browser, CDN, Redis, ORM query cache); test in incognito |
| Race condition | Bug only happens under load or "randomly" | Add logging with timestamps; use --inspect-brk to slow execution; test with concurrent requests |
| Timezone bug | Dates off by hours; works in some regions | Log new Date().toISOString() at each step; check DB timezone settings; use UTC everywhere |
| Encoding issue | Garbled text, emoji broken, special chars wrong | Check Content-Type headers; verify UTF-8 at every boundary (DB, API, file I/O) |
| Silent error swallowed | Code does nothing; no error visible | Search for empty catch blocks; add .catch(console.error) to all promises |
| Missing await | Function returns Promise instead of value | TypeScript strict mode; search for async functions without await on calls |
| Circular dependency | Module is undefined at import time | Check import order; use dynamic imports; restructure to break the cycle |
| DNS resolution failure | "ENOTFOUND" errors in containers | Check /etc/resolv.conf; verify DNS from inside the container with nslookup |
| Connection pool exhaustion | Timeouts after running fine for hours | Monitor active connections; check for uncommitted transactions; add pool max/idle settings |
| Off-by-one error | Wrong count, missing first/last item | Log array lengths and indices; test boundary values: 0, 1, N-1, N |
| Environment variable missing | undefined used as string, silent failures | Log all env vars on startup (redacted); use zod to validate env at boot |
| File descriptor leak | "EMFILE: too many open files" | `lsof -p PID |
| Wrong dependency version | Code works in one project but not another | Check npm ls package-name; delete node_modules and reinstall; check for hoisting issues |
| Debugging minified code | Stack traces show line 1, column 43827 | Enable source maps; upload them to Sentry; use --no-minify for debugging |