Token Cost Skill
Use this skill to calculate or review token billing without double-counting cached tokens or mixing incompatible pricing units.
Core Rules
- Identify the billing mode before calculating:
openai-raw: provider price table has input, cached input, and output prices.newapi-quota: gateway billing uses model ratio, completion/output ratio, and group ratio.custom-multiplier: dashboard-specific per-1K/per-1M cost projection.
- Normalize price units first:
/1Kprices use divisor1000./1Mprices use divisor1000000.
- Treat cached input as a subset of input tokens, not an extra token bucket.
- Keep quota formulas separate from money formulas; only convert quota to money after quota is computed.
- Show the expanded formula with the final result so users can audit every multiplier.
OpenAI-Style Raw Pricing
Inputs:
inputTokens: total input tokens reported for the request.cachedInputTokens: cached input tokens insideinputTokens.outputTokens: output tokens.inputPrice: non-cached input price per selected unit.cachedInputPrice: cached input price per selected unit.outputPrice: output price per selected unit.
Formula:
unitDivisor = 1000 if priceUnit = "1K", else 1000000
cachedInputTokens = min(max(cachedInputTokens, 0), max(inputTokens, 0))
nonCachedInputTokens = max(inputTokens, 0) - cachedInputTokens
inputCost = (nonCachedInputTokens / unitDivisor) * inputPrice
cachedInputCost = (cachedInputTokens / unitDivisor) * cachedInputPrice
outputCost = (max(outputTokens, 0) / unitDivisor) * outputPrice
totalCost = inputCost + cachedInputCost + outputCost
totalTokens = max(inputTokens, 0) + max(outputTokens, 0)
Never use:
totalTokens = inputTokens + cachedInputTokens + outputTokens
cachedInputTokens is already included in inputTokens.
NewAPI-Compatible Quota Pricing
Use this mode when the gateway bills quota/credits by ratios.
Standard request quota formula:
quota = (promptTokens + completionTokens * completionRatio) * modelRatio * groupRatio
Common money conversion:
usdEquivalent = quota / 500000
actualCost = usdEquivalent / rechargeRatio
Notes:
completionRatiois also called output ratio, completion multiplier, or completion billing ratio.groupRatiois the group multiplier.modelRatiois the model multiplier.rechargeRatiois not part of the quota formula; it is a post-processing conversion for real paid cost.
Custom Multiplier Dashboards
Use only when a dashboard intentionally projects per-1K or per-1M prices instead of computing gateway quota.
inputCost1K = basePrice1K * modelMultiplier * groupMultiplier / rechargeRatio
outputCost1K = basePrice1K * modelMultiplier * outputMultiplier * groupMultiplier / rechargeRatio
cacheReadCost1K = basePrice1K * modelMultiplier * cacheReadMultiplier * groupMultiplier / rechargeRatio
cacheCreateCost1K = basePrice1K * modelMultiplier * cacheCreateMultiplier * groupMultiplier / rechargeRatio
If basePrice is entered per 1M, convert first:
basePrice1K = basePrice1M / 1000
Verification Checklist
- Confirm whether token counts are request usage counts or price table units.
- Confirm whether
cachedInputTokensis included insideinputTokens. - Confirm whether prices are per
1Kor per1M. - For OpenAI-style pricing, calculate non-cached input before calculating cost.
- For NewAPI-style pricing, calculate quota before converting to USD or local currency.
- Clamp negative token counts to
0. - Clamp cached input to no more than total input.
- Protect against division by zero in recharge ratio.
- Add tests for one normal case, one cached-input case, one unit-conversion case, and one zero-token case.
Common Mistakes
| Mistake | Correct handling |
|---|---|
| Adding cached input to total input | Subtract cached input from input to get non-cached input |
Mixing /1K and /1M prices | Normalize with the correct divisor first |
| Applying recharge ratio inside NewAPI quota | Compute quota first; apply recharge ratio only to money conversion |
| Using output price for cached input | Cached input uses its own price |
| Letting cached input exceed input | Clamp cached input to input tokens |
Reference
Read references/formulas.md when you need source notes, examples, or implementation snippets.