/verify-fix - Verify Incident Fix with Observables
MANDATORY verification step after any production incident fix.
Philosophy
A fix is just a hypothesis until proven by metrics. "That should fix it" is not verification.
When to Use
-
After applying ANY fix to a production incident
-
Before declaring an incident resolved
-
When someone says "I think that fixed it"
Verification Protocol
- Define Observable Success Criteria
Before testing, explicitly state what we expect to see:
SUCCESS CRITERIA:
- Log entry: "[specific log message]"
- Metric change: [metric] goes from [X] to [Y]
- Database state: [field] = [expected value]
- API response: [endpoint] returns [expected response]
- Trigger Test Event
For webhook issues:
stripe events resend [event_id] --webhook-endpoint [endpoint_id]
For API issues:
curl -X POST [endpoint] -d '[test payload]'
For auth issues:
Log in as test user, perform action
- Observe Results
Watch logs in real-time
vercel logs [app] --json | grep [pattern]
Or for Convex:
npx convex logs --prod | grep [pattern]
Check metrics
stripe events retrieve [event_id] | jq '.pending_webhooks'
- Verify Database State
Check the affected record
npx convex run --prod [query] '{"id": "[affected_id]"}'
- Document Evidence
VERIFICATION EVIDENCE:
- Timestamp: [when]
- Test performed: [what we did]
- Log entry observed: [paste relevant log]
- Metric before: [value]
- Metric after: [value]
- Database state confirmed: [yes/no]
VERDICT: [VERIFIED / NOT VERIFIED]
Red Flags (Fix NOT Verified)
-
"The code looks right now"
-
"The config is correct"
-
"It should work"
-
"Let's wait and see"
-
No log entry observed
-
Metrics unchanged
-
Can't reproduce the original symptom
Example: Webhook Fix Verification
1. Resend the failing event
stripe events resend evt_xxx --webhook-endpoint we_xxx
2. Watch logs (expect to see "Webhook received")
timeout 15 vercel logs app --json | grep webhook
3. Check delivery metric (expect decrease)
stripe events retrieve evt_xxx | jq '.pending_webhooks'
Before: 4, After: 3 = DELIVERY SUCCEEDED
4. Check database state
npx convex run --prod users/queries:getUserByClerkId '{"clerkId": "user_xxx"}'
Expect: subscriptionStatus = "active"
VERDICT: VERIFIED - all 4 checks passed
If Verification Fails
-
Don't panic - the fix hypothesis was wrong, that's okay
-
Revert if the fix made things worse
-
Loop back to observation phase (OODA-V)
-
Question assumptions - what did we miss?