Groq Multi-Environment Setup
Overview
Configure Groq across environments with the right balance of cost, speed, and capability per tier. Groq's key differentiator is inference speed (100-300 tokens/second), but rate limits differ dramatically by plan: free tier is 30 RPM / 14,400 RPD for llama-3.1-70b, while paid tier removes most limits.
Prerequisites
-
Groq API key(s) per environment from console.groq.com
-
Environment variable management (.env.local , GitHub Secrets, or cloud secret manager)
-
Understanding of Groq's model tiers and rate limits
Environment Strategy
Environment Model Rate Limit Risk Config Source
Development llama-3.1-8b-instant
Low (small model) .env.local
Staging llama-3.1-70b-versatile
Medium CI/CD secrets
Production llama-3.1-70b-versatile or llama-3.3-70b-specdec
Managed with retry Secret manager
Instructions
Step 1: Configuration Structure
config/ groq/ base.ts # Shared Groq client setup development.ts # Dev: fast small models, verbose logging staging.ts # Staging: production models, test rate limits production.ts # Prod: hardened retry, error handling index.ts # Environment resolver
Step 2: Base Configuration with Groq SDK
// config/groq/base.ts import Groq from "groq-sdk";
export const BASE_GROQ_CONFIG = { maxRetries: 3, timeout: 30000, # 30000: 30 seconds in ms };
Step 3: Environment-Specific Configs
// config/groq/development.ts export const devConfig = { ...BASE_GROQ_CONFIG, apiKey: process.env.GROQ_API_KEY, model: "llama-3.1-8b-instant", // fastest, cheapest for dev iteration maxTokens: 1024, # 1024: 1 KB temperature: 0.7, logRequests: true, // verbose logging in dev };
// config/groq/staging.ts export const stagingConfig = { ...BASE_GROQ_CONFIG, apiKey: process.env.GROQ_API_KEY_STAGING, model: "llama-3.1-70b-versatile", // match production model maxTokens: 4096, # 4096: 4 KB temperature: 0.3, logRequests: false, };
// config/groq/production.ts export const productionConfig = { ...BASE_GROQ_CONFIG, apiKey: process.env.GROQ_API_KEY_PROD, model: "llama-3.1-70b-versatile", // or llama-3.3-70b-specdec for faster maxTokens: 4096, # 4 KB temperature: 0.3, maxRetries: 5, // more retries for production reliability logRequests: false, };
Step 4: Environment Resolver with Groq Client
// config/groq/index.ts import Groq from "groq-sdk";
type Env = "development" | "staging" | "production";
function detectEnvironment(): Env { const env = process.env.NODE_ENV || "development"; if (env === "production") return "production"; if (env === "staging") return "staging"; return "development"; }
let _client: Groq | null = null;
export function getGroqClient(): Groq { if (_client) return _client;
const env = detectEnvironment(); const configs = { development: devConfig, staging: stagingConfig, production: productionConfig }; const config = configs[env];
if (!config.apiKey) {
throw new Error(GROQ_API_KEY not configured for ${env} environment);
}
_client = new Groq({ apiKey: config.apiKey, maxRetries: config.maxRetries, timeout: config.timeout, });
return _client; }
export function getModelConfig() { const env = detectEnvironment(); const configs = { development: devConfig, staging: stagingConfig, production: productionConfig }; return configs[env]; }
Step 5: Usage with Rate Limit Handling
// lib/groq-service.ts import { getGroqClient, getModelConfig } from "../config/groq";
export async function complete(prompt: string): Promise<string> { const groq = getGroqClient(); const { model, maxTokens, temperature } = getModelConfig();
try {
const completion = await groq.chat.completions.create({
model,
messages: [{ role: "user", content: prompt }],
max_tokens: maxTokens,
temperature,
});
return completion.choices[0].message.content || "";
} catch (err: any) {
if (err.status === 429) { # HTTP 429 Too Many Requests
const retryAfter = parseInt(err.headers?.["retry-after"] || "10");
console.warn(Groq rate limited. Retry after ${retryAfter}s);
throw new Error(Rate limited on model ${model}. Retry after ${retryAfter}s);
}
throw err;
}
}
Error Handling
Issue Cause Solution
401 Unauthorized
Invalid API key for environment Verify GROQ_API_KEY in secret manager
429 rate_limit_exceeded
Free tier limit hit Switch to paid plan or implement request queuing
Model not found Deprecated model ID Check console.groq.com/docs/models for current list
Slow responses in dev Using 70b model for iteration Switch dev config to llama-3.1-8b-instant
Examples
Check Which Config Is Active
import { getModelConfig } from "./config/groq";
const cfg = getModelConfig();
console.log(Model: ${cfg.model}, max_tokens: ${cfg.maxTokens});
Test Rate Limits Per Environment
set -euo pipefail
Quick check: what's my current rate limit status?
curl -s "https://api.groq.com/openai/v1/models"
-H "Authorization: Bearer $GROQ_API_KEY" | jq '.data[].id'
Resources
-
Groq API Documentation
-
Groq Models Reference
-
Groq Rate Limits by Tier
Next Steps
For deployment configuration, see groq-deploy-integration .
Output
-
Configuration files or code changes applied to the project
-
Validation report confirming correct implementation
-
Summary of changes made and their rationale