Rate Limit Guard
Before Expensive Calls
Check: provider/model, recent failures, concurrency, context size.
On 429
- Set concurrency to 1.
- Stop parallel retries.
- Shrink context.
- Send one minimal probe.
- Back off (exponential).
Rules
- Never retry a giant request unchanged.
- Never fallback-spam premium providers.
- Batch corpus work in 10–20 item chunks.
Output
Next action: proceed / recovery mode / pause.