<quick_start> Basic chat with GROQ:

from groq import Groq client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

response = client.chat.completions.create( model="llama-3.3-70b-versatile", # Best all-around messages=[{"role": "user", "content": prompt}], )

Model selection:

Use Case Model

General chat llama-3.3-70b-versatile

Vision/OCR meta-llama/llama-4-scout-17b-16e-instruct

STT whisper-large-v3 (GROQ-hosted, NOT OpenAI)

TTS playai-tts

</quick_start>

<success_criteria> GROQ integration is successful when:

Correct model selected for use case (see model table)
API key in environment variable (GROQ_API_KEY )
Retry logic with tenacity for rate limits
Streaming enabled for real-time applications
Async patterns used for parallel queries
NOT using OpenAI (constraint: NO OPENAI) </success_criteria>

<core_content> Ultra-fast LLM inference for real-time applications. GROQ delivers 10-100x faster inference than standard providers.

Quick Reference: Model Selection

Use Case Model ID Context Notes

General Chat llama-3.3-70b-versatile

128K Best all-around

Fast Chat llama-3.1-8b-instant

128K Simple tasks, fastest

Vision/OCR meta-llama/llama-4-scout-17b-16e-instruct

128K Up to 5 images

STT whisper-large-v3

448 GROQ-hosted (NOT OpenAI API)

TTS playai-tts

Fritz-PlayAI voice

Reasoning meta-llama/llama-4-maverick-17b-128e-instruct

128K Thinking models

Tool Use compound-beta

Built-in web search, code exec

Core Patterns

Chat Completion (Basic + Streaming)

import os from groq import Groq, AsyncGroq

client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

def chat(prompt: str, system: str = "You are helpful.") -> str: response = client.chat.completions.create( model="llama-3.3-70b-versatile", messages=[ {"role": "system", "content": system}, {"role": "user", "content": prompt} ], temperature=0.7, max_completion_tokens=1024, ) return response.choices[0].message.content

Streaming

def stream_chat(prompt: str): stream = client.chat.completions.create( model="llama-3.3-70b-versatile", messages=[{"role": "user", "content": prompt}], stream=True, ) for chunk in stream: if chunk.choices[0].delta.content: yield chunk.choices[0].delta.content

Vision / Multimodal

import base64

def analyze_image(image_path: str, prompt: str) -> str: with open(image_path, "rb") as f: image_b64 = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": prompt},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}}
        ]
    }],
)
return response.choices[0].message.content

URL-based: just pass {"url": "https://..."} instead of base64

Audio: Speech-to-Text (GROQ-Hosted Whisper)

Note: Whisper on GROQ runs on GROQ hardware - NOT calling OpenAI's API. Whisper is an open-source model that GROQ hosts for fast inference.

def transcribe(audio_path: str, language: str = "en") -> str: with open(audio_path, "rb") as f: result = client.audio.transcriptions.create( file=f, model="whisper-large-v3", # GROQ-hosted, not OpenAI API language=language, response_format="verbose_json", # Includes timestamps ) return result.text

def translate_to_english(audio_path: str) -> str: with open(audio_path, "rb") as f: result = client.audio.translations.create(file=f, model="whisper-large-v3") return result.text

Alternative STT Providers (if you prefer non-Whisper options):

Deepgram - Real-time streaming, lowest latency (pip install deepgram-sdk )
AssemblyAI - High accuracy, speaker diarization (pip install assemblyai )
See voice-ai-skill for Deepgram/AssemblyAI integration patterns

Audio: Text-to-Speech (PlayAI)

def text_to_speech(text: str, output_path: str = "output.wav"): response = client.audio.speech.create( model="playai-tts", voice="Fritz-PlayAI", # Also: Arista-PlayAI input=text, response_format="wav", ) response.write_to_file(output_path)

Streaming TTS

def stream_tts(text: str): with client.audio.speech.with_streaming_response.create( model="playai-tts", voice="Fritz-PlayAI", input=text, response_format="wav" ) as response: for chunk in response.iter_bytes(1024): yield chunk

Alternative TTS Providers (beyond GROQ's PlayAI):

Cartesia - Ultra-low latency, emotional control (pip install cartesia )
ElevenLabs - Most natural voices, voice cloning (pip install elevenlabs )
Deepgram - Fast, cost-effective (pip install deepgram-sdk )
See voice-ai-skill for Cartesia/ElevenLabs/Deepgram TTS integration patterns

Tool Use / Function Calling

import json

tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get weather for a location", "parameters": { "type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"] } } }]

def chat_with_tools(prompt: str): messages = [{"role": "user", "content": prompt}] response = client.chat.completions.create( model="llama-3.3-70b-versatile", messages=messages, tools=tools, tool_choice="auto" ) msg = response.choices[0].message

if msg.tool_calls:
    for tc in msg.tool_calls:
        result = execute_function(tc.function.name, json.loads(tc.function.arguments))
        messages.extend([msg, {"role": "tool", "tool_call_id": tc.id, "content": json.dumps(result)}])
    return client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools).choices[0].message.content
return msg.content

6. Compound Beta (Built-in Web Search + Code Exec)

def compound_query(prompt: str): """Built-in tools: web_search, code_execution.""" response = client.chat.completions.create( model="compound-beta", messages=[{"role": "user", "content": prompt}], ) msg = response.choices[0].message # Access msg.executed_tools for tool results return msg.content

Reasoning Models

def reasoning_query(prompt: str, format: str = "parsed"): """format: 'parsed' (structured), 'raw' (visible), 'hidden' (no thinking)""" response = client.chat.completions.create( model="meta-llama/llama-4-maverick-17b-128e-instruct", messages=[{"role": "user", "content": prompt}], reasoning_format=format, ) msg = response.choices[0].message if format == "parsed" and hasattr(msg, 'reasoning'): return {"thinking": msg.reasoning, "answer": msg.content} return msg.content

Async Patterns

async_client = AsyncGroq(api_key=os.environ.get("GROQ_API_KEY"))

async def async_chat(prompt: str) -> str: response = await async_client.chat.completions.create( model="llama-3.3-70b-versatile", messages=[{"role": "user", "content": prompt}], ) return response.choices[0].message.content

async def parallel_queries(prompts: list[str]) -> list[str]: import asyncio return await asyncio.gather(*[async_chat(p) for p in prompts])

Rate Limits

Tier Requests/min Tokens/min Tokens/day

Free 30 15,000 500,000

Paid 100+ 100,000+ Unlimited

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10)) def reliable_chat(prompt: str) -> str: return chat(prompt)

Integration Notes

Pairs with: voice-ai-skill (Whisper STT + PlayAI TTS), langgraph-agents-skill
Complements: trading-signals-skill (fast analysis), data-analysis-skill
Projects: VozLux (voice agents), FieldVault-AI (document processing)
Constraint: NO OPENAI - GROQ is the fast inference layer

Environment Variables

GROQ_API_KEY=gsk_... # Required - get from console.groq.com

Optional multi-provider

ANTHROPIC_API_KEY= # Claude for complex reasoning GOOGLE_API_KEY= # Gemini fallback

Reference Files

reference/models-catalog.md
Complete model catalog with specs
reference/audio-speech.md
Whisper STT and PlayAI TTS deep dive
reference/vision-multimodal.md
Multimodal and image processing
reference/tool-use-patterns.md
Function calling and Compound Beta
reference/reasoning-models.md
Thinking models and reasoning_format
reference/cost-optimization.md
Batch API, caching, provider routing

groq-inference

Safety Notice

Copy this and send it to your AI assistant to learn

Streaming

URL-based: just pass {"url": "https://..."} instead of base64

Streaming TTS

Optional multi-provider

Source Transparency

Related Skills

business-model-canvas

trading-signals

content-marketing

crm-integration