Together AI API

Build applications with Together AI's open-source model inference platform (200+ models).

Quick Start

Installation

# Python SDK + CLI
pip install --upgrade together

# TypeScript/JavaScript SDK
npm install together-ai

Environment Setup

export TOGETHER_API_KEY=<your-api-key>

Get your API key at https://api.together.xyz/settings/api-keys

Basic Chat Completion

Python:

from together import Together

client = Together()  # Uses TOGETHER_API_KEY env var

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

TypeScript:

import Together from "together-ai";

const client = new Together();

const response = await client.chat.completions.create({
    model: "meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages: [{ role: "user", content: "Hello" }],
});
console.log(response.choices[0].message.content);

CLI:

together chat.completions \
  --message "system" "You are a helpful assistant" \
  --message "user" "Hello" \
  --model meta-llama/Llama-3.3-70B-Instruct-Turbo

CLI Reference

The Together CLI provides command-line access to all API features. Streaming is enabled by default.

Chat Completions

together chat.completions \
  --message "system" "You are a helpful assistant" \
  --message "user" "What is the capital of France?" \
  --model meta-llama/Llama-3.3-70B-Instruct-Turbo

# Disable streaming
together chat.completions \
  --message "user" "Hello" \
  --model meta-llama/Llama-3.3-70B-Instruct-Turbo \
  --no-stream

Text Completions

together completions \
  "Large language models are " \
  --model meta-llama/Llama-3.3-70B-Instruct-Turbo \
  --max-tokens 512 \
  --stop "."

Image Generation

together images generate \
  "A futuristic cityscape at sunset" \
  --model black-forest-labs/FLUX.1-schnell \
  --n 4

# Skip opening image viewer
together images generate "space robots" \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --no-show

Models

together models list          # List all available models
together models --help        # Show model commands

Files Management

together files check example.jsonl              # Validate file format
together files upload example.jsonl             # Upload file
together files list                             # List uploaded files
together files retrieve <file-id>               # Get file metadata
together files retrieve-content <file-id>       # Get file content
together files delete <file-id>                 # Delete file

Fine-Tuning

# Create fine-tuning job
together fine-tuning create \
  --model togethercomputer/llama-2-7b-chat \
  --training-file <file-id>

together fine-tuning list                       # List jobs
together fine-tuning retrieve <job-id>          # Get job details
together fine-tuning list-events <job-id>       # Get job events
together fine-tuning cancel <job-id>            # Cancel job
together fine-tuning download <job-id>          # Download model

See references/cli.md for complete CLI reference.

Model Selection

together models list

Use Case	Model	Notes
Fast + cheap	`meta-llama/Llama-3.2-3B-Instruct-Turbo`	$0.06/1M, 131K context
Balanced	`meta-llama/Llama-3.3-70B-Instruct-Turbo`	Quality/cost balance
Highest quality	`deepseek-ai/DeepSeek-V3`	131K context
Reasoning	`deepseek-ai/DeepSeek-R1`	Chain-of-thought
Long context	`meta-llama/Llama-4-Scout-17B-16E-Instruct`	1M context
Vision	`meta-llama/Llama-4-Scout-17B-16E-Instruct`	Multimodal
Code	`Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8`	Specialized
Audio STT	`openai/whisper-large-v3`	Transcription
TTS	`canopylabs/orpheus-3b`	Natural voices
Image Gen	`black-forest-labs/FLUX.1-schnell`	Fast generation
Video Gen	`google/veo-3.0`	Video generation

See references/models.md for full model list and pricing.

Common Patterns

Streaming Responses

stream = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Async Client

import asyncio
from together import AsyncTogether

async def main():
    client = AsyncTogether()
    response = await client.chat.completions.create(
        model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
        messages=[{"role": "user", "content": "Hello"}]
    )
    return response.choices[0].message.content

print(asyncio.run(main()))

JSON Mode

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "List 3 colors as JSON array"}],
    response_format={"type": "json_object"}
)

Structured Outputs (JSON Schema)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Extract: John is 30. Respond in JSON."}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"}
                },
                "required": ["name", "age"]
            }
        }
    }
)

Vision

Process images with vision-language models. Models: meta-llama/Llama-4-Scout-17B-16E-Instruct (faster), meta-llama/Llama-4-Maverick-17B-128E-Instruct (higher quality)

Image from URL

response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
    }]
)

Local Image (Base64)

import base64

def encode_image(path: str) -> str:
    with open(path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image"},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encode_image('photo.jpg')}"}}
        ]
    }]
)

See references/vision.md for multi-image, video, and OCR patterns.

Image Generation

response = client.images.generate(
    prompt="A serene mountain landscape at sunset",
    model="black-forest-labs/FLUX.1-schnell",
    steps=4,
    width=1024,
    height=1024
)
print(f"Image URL: {response.data[0].url}")

See references/images.md for all parameters and models.

Audio

Transcription (Speech-to-Text)

response = client.audio.transcriptions.create(
    file="meeting.mp3",
    model="openai/whisper-large-v3",
    language="en",
    response_format="verbose_json",
    timestamp_granularities=["word", "segment"]
)
print(response.text)

Text-to-Speech

response = client.audio.speech.create(
    model="canopylabs/orpheus-3b",
    input="Hello, world!",
    voice="tara"
)
with open("output.wav", "wb") as f:
    f.write(response.content)

See references/audio.md for streaming, WebSocket, and voice options.

Tool Use / Function Calling

See references/tool-use.md for complete patterns.

import json

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Weather in Paris?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    for tc in response.choices[0].message.tool_calls:
        args = json.loads(tc.function.arguments)
        # Execute function and continue conversation

Embeddings

response = client.embeddings.create(
    model="togethercomputer/m2-bert-80M-8k-retrieval",
    input="Hello, world!"
)
print(response.data[0].embedding)

Reranking

response = client.rerank.create(
    model="Salesforce/Llama-Rank-V1",
    query="What is the capital of France?",
    documents=["Paris is the capital.", "London is in England.", "Berlin is in Germany."],
    top_n=2
)
for result in response.results:
    print(f"Index {result.index}: {result.relevance_score}")

Error Handling

from together import Together
from together._exceptions import RateLimitError, APIConnectionError, APIStatusError

client = Together()

try:
    response = client.chat.completions.create(
        model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError:
    # Wait and retry with exponential backoff
    pass
except APIConnectionError:
    # Network issue
    pass
except APIStatusError as e:
    if e.status_code == 402:
        # Insufficient credits
        pass
    elif e.status_code == 400:
        # Invalid parameters
        pass

Resources

CLI reference: references/cli.md
Models & pricing: references/models.md
Tool use guide: references/tool-use.md
Vision guide: references/vision.md
Audio guide: references/audio.md
Image generation: references/images.md
Official docs: https://docs.together.ai
Python SDK: https://github.com/togethercomputer/together-python
TypeScript SDK: https://github.com/togethercomputer/together-typescript

together-api

Safety Notice

Copy this and send it to your AI assistant to learn

Together AI API

Quick Start

Installation

Environment Setup

Basic Chat Completion

CLI Reference

Chat Completions

Text Completions

Image Generation

Models

Files Management

Fine-Tuning

Model Selection

Common Patterns

Streaming Responses

Async Client

JSON Mode

Structured Outputs (JSON Schema)

Vision

Image from URL

Local Image (Base64)

Image Generation

Audio

Transcription (Speech-to-Text)

Text-to-Speech

Tool Use / Function Calling

Embeddings

Reranking

Error Handling

Resources

Source Transparency

Related Skills

code-review

redmine-cli

assemblyai-cli