together-api

Together AI API integration for building AI-powered applications with open-source models. Use when working with Together's Chat Completions API, Python SDK (together), TypeScript SDK (together-ai), CLI tool, tool use/function calling, vision/image understanding, image generation (FLUX, Stable Diffusion), video generation (Veo, Sora, Kling), audio transcription (Whisper), text-to-speech, streaming responses, embeddings, reranking, fine-tuning, or any Together AI API integration task. Triggers on mentions of Together AI, Together API, GroqCloud, open-source LLM inference, FLUX image generation, or Whisper transcription via Together.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "together-api" with this command: npx skills add diskd-ai/together-api/diskd-ai-together-api-together-api

Together AI API

Build applications with Together AI's open-source model inference platform (200+ models).

Quick Start

Installation

# Python SDK + CLI
pip install --upgrade together

# TypeScript/JavaScript SDK
npm install together-ai

Environment Setup

export TOGETHER_API_KEY=<your-api-key>

Get your API key at https://api.together.xyz/settings/api-keys

Basic Chat Completion

Python:

from together import Together

client = Together()  # Uses TOGETHER_API_KEY env var

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

TypeScript:

import Together from "together-ai";

const client = new Together();

const response = await client.chat.completions.create({
    model: "meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages: [{ role: "user", content: "Hello" }],
});
console.log(response.choices[0].message.content);

CLI:

together chat.completions \
  --message "system" "You are a helpful assistant" \
  --message "user" "Hello" \
  --model meta-llama/Llama-3.3-70B-Instruct-Turbo

CLI Reference

The Together CLI provides command-line access to all API features. Streaming is enabled by default.

Chat Completions

together chat.completions \
  --message "system" "You are a helpful assistant" \
  --message "user" "What is the capital of France?" \
  --model meta-llama/Llama-3.3-70B-Instruct-Turbo

# Disable streaming
together chat.completions \
  --message "user" "Hello" \
  --model meta-llama/Llama-3.3-70B-Instruct-Turbo \
  --no-stream

Text Completions

together completions \
  "Large language models are " \
  --model meta-llama/Llama-3.3-70B-Instruct-Turbo \
  --max-tokens 512 \
  --stop "."

Image Generation

together images generate \
  "A futuristic cityscape at sunset" \
  --model black-forest-labs/FLUX.1-schnell \
  --n 4

# Skip opening image viewer
together images generate "space robots" \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --no-show

Models

together models list          # List all available models
together models --help        # Show model commands

Files Management

together files check example.jsonl              # Validate file format
together files upload example.jsonl             # Upload file
together files list                             # List uploaded files
together files retrieve <file-id>               # Get file metadata
together files retrieve-content <file-id>       # Get file content
together files delete <file-id>                 # Delete file

Fine-Tuning

# Create fine-tuning job
together fine-tuning create \
  --model togethercomputer/llama-2-7b-chat \
  --training-file <file-id>

together fine-tuning list                       # List jobs
together fine-tuning retrieve <job-id>          # Get job details
together fine-tuning list-events <job-id>       # Get job events
together fine-tuning cancel <job-id>            # Cancel job
together fine-tuning download <job-id>          # Download model

See references/cli.md for complete CLI reference.

Model Selection

together models list
Use CaseModelNotes
Fast + cheapmeta-llama/Llama-3.2-3B-Instruct-Turbo$0.06/1M, 131K context
Balancedmeta-llama/Llama-3.3-70B-Instruct-TurboQuality/cost balance
Highest qualitydeepseek-ai/DeepSeek-V3131K context
Reasoningdeepseek-ai/DeepSeek-R1Chain-of-thought
Long contextmeta-llama/Llama-4-Scout-17B-16E-Instruct1M context
Visionmeta-llama/Llama-4-Scout-17B-16E-InstructMultimodal
CodeQwen/Qwen3-Coder-480B-A35B-Instruct-FP8Specialized
Audio STTopenai/whisper-large-v3Transcription
TTScanopylabs/orpheus-3bNatural voices
Image Genblack-forest-labs/FLUX.1-schnellFast generation
Video Gengoogle/veo-3.0Video generation

See references/models.md for full model list and pricing.

Common Patterns

Streaming Responses

stream = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Async Client

import asyncio
from together import AsyncTogether

async def main():
    client = AsyncTogether()
    response = await client.chat.completions.create(
        model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
        messages=[{"role": "user", "content": "Hello"}]
    )
    return response.choices[0].message.content

print(asyncio.run(main()))

JSON Mode

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "List 3 colors as JSON array"}],
    response_format={"type": "json_object"}
)

Structured Outputs (JSON Schema)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Extract: John is 30. Respond in JSON."}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"}
                },
                "required": ["name", "age"]
            }
        }
    }
)

Vision

Process images with vision-language models. Models: meta-llama/Llama-4-Scout-17B-16E-Instruct (faster), meta-llama/Llama-4-Maverick-17B-128E-Instruct (higher quality)

Image from URL

response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
    }]
)

Local Image (Base64)

import base64

def encode_image(path: str) -> str:
    with open(path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image"},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encode_image('photo.jpg')}"}}
        ]
    }]
)

See references/vision.md for multi-image, video, and OCR patterns.

Image Generation

response = client.images.generate(
    prompt="A serene mountain landscape at sunset",
    model="black-forest-labs/FLUX.1-schnell",
    steps=4,
    width=1024,
    height=1024
)
print(f"Image URL: {response.data[0].url}")

See references/images.md for all parameters and models.

Audio

Transcription (Speech-to-Text)

response = client.audio.transcriptions.create(
    file="meeting.mp3",
    model="openai/whisper-large-v3",
    language="en",
    response_format="verbose_json",
    timestamp_granularities=["word", "segment"]
)
print(response.text)

Text-to-Speech

response = client.audio.speech.create(
    model="canopylabs/orpheus-3b",
    input="Hello, world!",
    voice="tara"
)
with open("output.wav", "wb") as f:
    f.write(response.content)

See references/audio.md for streaming, WebSocket, and voice options.

Tool Use / Function Calling

See references/tool-use.md for complete patterns.

import json

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Weather in Paris?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    for tc in response.choices[0].message.tool_calls:
        args = json.loads(tc.function.arguments)
        # Execute function and continue conversation

Embeddings

response = client.embeddings.create(
    model="togethercomputer/m2-bert-80M-8k-retrieval",
    input="Hello, world!"
)
print(response.data[0].embedding)

Reranking

response = client.rerank.create(
    model="Salesforce/Llama-Rank-V1",
    query="What is the capital of France?",
    documents=["Paris is the capital.", "London is in England.", "Berlin is in Germany."],
    top_n=2
)
for result in response.results:
    print(f"Index {result.index}: {result.relevance_score}")

Error Handling

from together import Together
from together._exceptions import RateLimitError, APIConnectionError, APIStatusError

client = Together()

try:
    response = client.chat.completions.create(
        model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError:
    # Wait and retry with exponential backoff
    pass
except APIConnectionError:
    # Network issue
    pass
except APIStatusError as e:
    if e.status_code == 402:
        # Insufficient credits
        pass
    elif e.status_code == 400:
        # Invalid parameters
        pass

Resources

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

code-review

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

redmine-cli

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

assemblyai-cli

No summary provided by upstream source.

Repository SourceNeeds Review