groq-inference

<quick_start> Basic chat with GROQ:

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "groq-inference" with this command: npx skills add scientiacapital/skills/scientiacapital-skills-groq-inference

<quick_start> Basic chat with GROQ:

from groq import Groq client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

response = client.chat.completions.create( model="llama-3.3-70b-versatile", # Best all-around messages=[{"role": "user", "content": prompt}], )

Model selection:

Use Case Model

General chat llama-3.3-70b-versatile

Vision/OCR meta-llama/llama-4-scout-17b-16e-instruct

STT whisper-large-v3 (GROQ-hosted, NOT OpenAI)

TTS playai-tts

</quick_start>

<success_criteria> GROQ integration is successful when:

  • Correct model selected for use case (see model table)

  • API key in environment variable (GROQ_API_KEY )

  • Retry logic with tenacity for rate limits

  • Streaming enabled for real-time applications

  • Async patterns used for parallel queries

  • NOT using OpenAI (constraint: NO OPENAI) </success_criteria>

<core_content> Ultra-fast LLM inference for real-time applications. GROQ delivers 10-100x faster inference than standard providers.

Quick Reference: Model Selection

Use Case Model ID Context Notes

General Chat llama-3.3-70b-versatile

128K Best all-around

Fast Chat llama-3.1-8b-instant

128K Simple tasks, fastest

Vision/OCR meta-llama/llama-4-scout-17b-16e-instruct

128K Up to 5 images

STT whisper-large-v3

448 GROQ-hosted (NOT OpenAI API)

TTS playai-tts

Fritz-PlayAI voice

Reasoning meta-llama/llama-4-maverick-17b-128e-instruct

128K Thinking models

Tool Use compound-beta

Built-in web search, code exec

Core Patterns

  1. Chat Completion (Basic + Streaming)

import os from groq import Groq, AsyncGroq

client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

def chat(prompt: str, system: str = "You are helpful.") -> str: response = client.chat.completions.create( model="llama-3.3-70b-versatile", messages=[ {"role": "system", "content": system}, {"role": "user", "content": prompt} ], temperature=0.7, max_completion_tokens=1024, ) return response.choices[0].message.content

Streaming

def stream_chat(prompt: str): stream = client.chat.completions.create( model="llama-3.3-70b-versatile", messages=[{"role": "user", "content": prompt}], stream=True, ) for chunk in stream: if chunk.choices[0].delta.content: yield chunk.choices[0].delta.content

  1. Vision / Multimodal

import base64

def analyze_image(image_path: str, prompt: str) -> str: with open(image_path, "rb") as f: image_b64 = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": prompt},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}}
        ]
    }],
)
return response.choices[0].message.content

URL-based: just pass {"url": "https://..."} instead of base64

  1. Audio: Speech-to-Text (GROQ-Hosted Whisper)

Note: Whisper on GROQ runs on GROQ hardware - NOT calling OpenAI's API. Whisper is an open-source model that GROQ hosts for fast inference.

def transcribe(audio_path: str, language: str = "en") -> str: with open(audio_path, "rb") as f: result = client.audio.transcriptions.create( file=f, model="whisper-large-v3", # GROQ-hosted, not OpenAI API language=language, response_format="verbose_json", # Includes timestamps ) return result.text

def translate_to_english(audio_path: str) -> str: with open(audio_path, "rb") as f: result = client.audio.translations.create(file=f, model="whisper-large-v3") return result.text

Alternative STT Providers (if you prefer non-Whisper options):

  • Deepgram - Real-time streaming, lowest latency (pip install deepgram-sdk )

  • AssemblyAI - High accuracy, speaker diarization (pip install assemblyai )

  • See voice-ai-skill for Deepgram/AssemblyAI integration patterns

  1. Audio: Text-to-Speech (PlayAI)

def text_to_speech(text: str, output_path: str = "output.wav"): response = client.audio.speech.create( model="playai-tts", voice="Fritz-PlayAI", # Also: Arista-PlayAI input=text, response_format="wav", ) response.write_to_file(output_path)

Streaming TTS

def stream_tts(text: str): with client.audio.speech.with_streaming_response.create( model="playai-tts", voice="Fritz-PlayAI", input=text, response_format="wav" ) as response: for chunk in response.iter_bytes(1024): yield chunk

Alternative TTS Providers (beyond GROQ's PlayAI):

  • Cartesia - Ultra-low latency, emotional control (pip install cartesia )

  • ElevenLabs - Most natural voices, voice cloning (pip install elevenlabs )

  • Deepgram - Fast, cost-effective (pip install deepgram-sdk )

  • See voice-ai-skill for Cartesia/ElevenLabs/Deepgram TTS integration patterns

  1. Tool Use / Function Calling

import json

tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get weather for a location", "parameters": { "type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"] } } }]

def chat_with_tools(prompt: str): messages = [{"role": "user", "content": prompt}] response = client.chat.completions.create( model="llama-3.3-70b-versatile", messages=messages, tools=tools, tool_choice="auto" ) msg = response.choices[0].message

if msg.tool_calls:
    for tc in msg.tool_calls:
        result = execute_function(tc.function.name, json.loads(tc.function.arguments))
        messages.extend([msg, {"role": "tool", "tool_call_id": tc.id, "content": json.dumps(result)}])
    return client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools).choices[0].message.content
return msg.content

6. Compound Beta (Built-in Web Search + Code Exec)

def compound_query(prompt: str): """Built-in tools: web_search, code_execution.""" response = client.chat.completions.create( model="compound-beta", messages=[{"role": "user", "content": prompt}], ) msg = response.choices[0].message # Access msg.executed_tools for tool results return msg.content

  1. Reasoning Models

def reasoning_query(prompt: str, format: str = "parsed"): """format: 'parsed' (structured), 'raw' (visible), 'hidden' (no thinking)""" response = client.chat.completions.create( model="meta-llama/llama-4-maverick-17b-128e-instruct", messages=[{"role": "user", "content": prompt}], reasoning_format=format, ) msg = response.choices[0].message if format == "parsed" and hasattr(msg, 'reasoning'): return {"thinking": msg.reasoning, "answer": msg.content} return msg.content

  1. Async Patterns

async_client = AsyncGroq(api_key=os.environ.get("GROQ_API_KEY"))

async def async_chat(prompt: str) -> str: response = await async_client.chat.completions.create( model="llama-3.3-70b-versatile", messages=[{"role": "user", "content": prompt}], ) return response.choices[0].message.content

async def parallel_queries(prompts: list[str]) -> list[str]: import asyncio return await asyncio.gather(*[async_chat(p) for p in prompts])

Rate Limits

Tier Requests/min Tokens/min Tokens/day

Free 30 15,000 500,000

Paid 100+ 100,000+ Unlimited

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10)) def reliable_chat(prompt: str) -> str: return chat(prompt)

Integration Notes

  • Pairs with: voice-ai-skill (Whisper STT + PlayAI TTS), langgraph-agents-skill

  • Complements: trading-signals-skill (fast analysis), data-analysis-skill

  • Projects: VozLux (voice agents), FieldVault-AI (document processing)

  • Constraint: NO OPENAI - GROQ is the fast inference layer

Environment Variables

GROQ_API_KEY=gsk_... # Required - get from console.groq.com

Optional multi-provider

ANTHROPIC_API_KEY= # Claude for complex reasoning GOOGLE_API_KEY= # Gemini fallback

Reference Files

  • reference/models-catalog.md

  • Complete model catalog with specs

  • reference/audio-speech.md

  • Whisper STT and PlayAI TTS deep dive

  • reference/vision-multimodal.md

  • Multimodal and image processing

  • reference/tool-use-patterns.md

  • Function calling and Compound Beta

  • reference/reasoning-models.md

  • Thinking models and reasoning_format

  • reference/cost-optimization.md

  • Batch API, caching, provider routing

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

business-model-canvas

No summary provided by upstream source.

Repository SourceNeeds Review
General

trading-signals

No summary provided by upstream source.

Repository SourceNeeds Review
General

content-marketing

No summary provided by upstream source.

Repository SourceNeeds Review
General

crm-integration

No summary provided by upstream source.

Repository SourceNeeds Review