gemini-api

Google Gemini API integration for building AI-powered applications. Use when working with Google's Gemini API, Python SDK (google-genai), TypeScript SDK (@google/genai), multimodal inputs (image, video, audio, PDF), thinking/reasoning features, streaming responses, structured outputs with JSON schemas, multi-turn chat, system instructions, image generation (Nano Banana), video generation (Veo), music generation (Lyria), embeddings, document/PDF processing, or any Gemini API integration task. Triggers on mentions of Gemini, Gemini 3, Gemini 2.5, Google AI, Nano Banana, Veo, Lyria, google-genai, or @google/genai SDK usage.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "gemini-api" with this command: npx skills add diskd-ai/gemini-api/diskd-ai-gemini-api-gemini-api

Gemini API

Generate text from text, images, video, and audio using Google's Gemini API.

Models

ModelCodeI/OContextThinking
Gemini 3 Progemini-3-pro-previewText/Image/Video/Audio/PDF -> Text1M/64KYes
Gemini 3 Flashgemini-3-flash-previewText/Image/Video/Audio/PDF -> Text1M/64KYes
Gemini 2.5 Progemini-2.5-proText/Image/Video/Audio/PDF -> Text1M/65KYes
Gemini 2.5 Flashgemini-2.5-flashText/Image/Video/Audio -> Text1M/65KYes
Nano Bananagemini-2.5-flash-imageText/Image -> Image-No
Nano Banana Progemini-3-pro-image-previewText/Image -> Image (up to 4K)65K/32KYes
Veo 3.1veo-3.1-generate-previewText/Image/Video -> Video+Audio--
Veo 3veo-3-generate-previewText/Image -> Video+Audio--
Veo 2veo-2.0-generate-001Text/Image -> Video (silent)--
Lyria RealTimelyria-realtime-expText -> Music (streaming)--
Embeddingsgemini-embedding-001Text -> Embeddings2KNo

Free Tier: Flash models only (no free tier for gemini-3-pro-preview in API). Default Temperature: 1.0 (do not change for Gemini 3).

Pricing (per 1M tokens):

  • Gemini 3 Pro: $2/$12 (<200k), $4/$18 (>200k)
  • Gemini 3 Flash: $0.50/$3
  • Nano Banana Pro: $2 (text) / $0.134 (image)

Basic Text Generation

Python

from google import genai

client = genai.Client()
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="How does AI work?"
)
print(response.text)

JavaScript

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({});
const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: "How does AI work?",
});
console.log(response.text);

REST

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{"contents": [{"parts": [{"text": "How does AI work?"}]}]}'

System Instructions

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    config=types.GenerateContentConfig(
        system_instruction="You are a helpful assistant."
    ),
    contents="Hello"
)
const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: "Hello",
  config: { systemInstruction: "You are a helpful assistant." },
});

Streaming

for chunk in client.models.generate_content_stream(
    model="gemini-3-flash-preview",
    contents="Tell me a story"
):
    print(chunk.text, end="")
const response = await ai.models.generateContentStream({
  model: "gemini-3-flash-preview",
  contents: "Tell me a story",
});
for await (const chunk of response) {
  console.log(chunk.text);
}

Multi-turn Chat

chat = client.chats.create(model="gemini-3-flash-preview")
response = chat.send_message("I have 2 dogs.")
print(response.text)
response = chat.send_message("How many paws total?")
print(response.text)
const chat = ai.chats.create({ model: "gemini-3-flash-preview" });
const response = await chat.sendMessage({ message: "I have 2 dogs." });
console.log(response.text);

Multimodal (Image)

from PIL import Image

image = Image.open("/path/to/image.png")
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents=[image, "Describe this image"]
)
const image = await ai.files.upload({ file: "/path/to/image.png" });
const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: [
    createUserContent([
      "Describe this image",
      createPartFromUri(image.uri, image.mimeType),
    ]),
  ],
});

Document Processing (PDF)

Process PDFs with native vision understanding (up to 1000 pages).

from google.genai import types
import pathlib

filepath = pathlib.Path('document.pdf')
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents=[
        types.Part.from_bytes(data=filepath.read_bytes(), mime_type='application/pdf'),
        "Summarize this document"
    ]
)
import * as fs from 'fs';

const response = await ai.models.generateContent({
    model: "gemini-3-flash-preview",
    contents: [
        { text: "Summarize this document" },
        {
            inlineData: {
                mimeType: 'application/pdf',
                data: Buffer.from(fs.readFileSync("document.pdf")).toString("base64")
            }
        }
    ]
});

For large PDFs, use Files API (stored 48 hours):

uploaded_file = client.files.upload(file=pathlib.Path('large.pdf'))
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents=[uploaded_file, "Summarize this document"]
)

See references/documents.md for Files API, multiple PDFs, and best practices.


Image Generation (Nano Banana)

Generate and edit images conversationally.

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents="Create a picture of a sunset over mountains",
)

for part in response.parts:
    if part.inline_data is not None:
        part.as_image().save("generated.png")
const response = await ai.models.generateContent({
  model: "gemini-2.5-flash-image",
  contents: "Create a picture of a sunset over mountains",
});

for (const part of response.candidates[0].content.parts) {
  if (part.inlineData) {
    const buffer = Buffer.from(part.inlineData.data, "base64");
    fs.writeFileSync("generated.png", buffer);
  }
}

Nano Banana Pro (gemini-3-pro-image-preview): 4K output, Google Search grounding, up to 14 reference images, conversational editing with thought signatures.

See references/image-generation.md for editing, multi-turn, and advanced features. See references/gemini-3.md for Gemini 3 image capabilities.


Video Generation (Veo)

Generate 8-second 720p, 1080p, or 4K videos with native audio using Veo.

import time
from google import genai

client = genai.Client()

operation = client.models.generate_videos(
    model="veo-3.1-generate-preview",
    prompt="A cinematic shot of a majestic lion in the savannah at golden hour",
)

# Poll until complete (video generation is async)
while not operation.done:
    time.sleep(10)
    operation = client.operations.get(operation)

# Download the video
video = operation.response.generated_videos[0]
client.files.download(file=video.video)
video.video.save("lion.mp4")
let operation = await ai.models.generateVideos({
    model: "veo-3.1-generate-preview",
    prompt: "A cinematic shot of a majestic lion in the savannah at golden hour",
});

while (!operation.done) {
    await new Promise(resolve => setTimeout(resolve, 10000));
    operation = await ai.operations.getVideosOperation({ operation });
}

ai.files.download({
    file: operation.response.generatedVideos[0].video,
    downloadPath: "lion.mp4",
});

Veo 3.1 features: Portrait (9:16), video extension (up to 148s), 4K resolution, native audio with dialogue/SFX.

See references/veo.md for image-to-video, reference images, video extension, and prompting guide.


Music Generation (Lyria RealTime)

Generate continuous instrumental music in real-time with dynamic steering.

import asyncio
from google import genai
from google.genai import types

client = genai.Client()

async def main():
    async with client.aio.live.music.connect(model='models/lyria-realtime-exp') as session:
        # Set prompts and config
        await session.set_weighted_prompts(
            prompts=[types.WeightedPrompt(text='minimal techno', weight=1.0)]
        )
        await session.set_music_generation_config(
            config=types.LiveMusicGenerationConfig(bpm=90, temperature=1.0)
        )

        # Start streaming
        await session.play()

        # Receive audio chunks
        async for message in session.receive():
            if message.server_content and message.server_content.audio_chunks:
                audio_data = message.server_content.audio_chunks[0].data
                # Process audio...

asyncio.run(main())
const session = await ai.live.music.connect({
    model: "models/lyria-realtime-exp",
    callbacks: {
        onmessage: (message) => {
            if (message.serverContent?.audioChunks) {
                for (const chunk of message.serverContent.audioChunks) {
                    const audioBuffer = Buffer.from(chunk.data, "base64");
                    // Process audio...
                }
            }
        },
    },
});

await session.setWeightedPrompts({
    weightedPrompts: [{ text: "minimal techno", weight: 1.0 }],
});

await session.setMusicGenerationConfig({
    musicGenerationConfig: { bpm: 90, temperature: 1.0 },
});

await session.play();

Output: 48kHz stereo 16-bit PCM. Instrumental only. Configurable BPM, scale, density, brightness.

See references/lyria.md for steering music, configuration, and prompting guide.


Embeddings

Generate text embeddings for semantic similarity, search, and classification.

result = client.models.embed_content(
    model="gemini-embedding-001",
    contents="What is the meaning of life?"
)
print(result.embeddings)
const response = await ai.models.embedContent({
    model: 'gemini-embedding-001',
    contents: 'What is the meaning of life?',
});
console.log(response.embeddings);

Task types: SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING, RETRIEVAL_DOCUMENT, RETRIEVAL_QUERY

Output dimensions: 768, 1536, 3072 (default)

See references/embeddings.md for batch processing, task types, and normalization.


Thinking (Gemini 3)

Control reasoning depth with thinking_level: minimal (Flash only), low, medium (Flash only), high (default).

from google.genai import types

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Solve this math problem...",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_level="high")
    ),
)
import { ThinkingLevel } from "@google/genai";

const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: "Solve this math problem...",
  config: { thinkingConfig: { thinkingLevel: ThinkingLevel.HIGH } },
});

Note: Cannot mix thinking_level with legacy thinking_budget (returns 400 error).

For Gemini 2.5, use thinking_budget (0-32768) instead. See references/thinking.md.

For complete Gemini 3 features (thought signatures, media resolution, etc.), see references/gemini-3.md.


Structured Outputs

Generate JSON responses adhering to a schema.

from pydantic import BaseModel
from typing import List

class Recipe(BaseModel):
    name: str
    ingredients: List[str]

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Extract: chocolate chip cookies need flour, sugar, chips",
    config={
        "response_mime_type": "application/json",
        "response_json_schema": Recipe.model_json_schema(),
    },
)
recipe = Recipe.model_validate_json(response.text)
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";

const recipeSchema = z.object({
  name: z.string(),
  ingredients: z.array(z.string()),
});

const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: "Extract: chocolate chip cookies need flour, sugar, chips",
  config: {
    responseMimeType: "application/json",
    responseJsonSchema: zodToJsonSchema(recipeSchema),
  },
});

See references/structured-outputs.md for advanced patterns.


Built-in Tools (Gemini 3)

Available: Google Search, File Search, Code Execution, URL Context, Function Calling

Not supported: Google Maps grounding, Computer Use (use Gemini 2.5 for these)

response = client.models.generate_content(
    model="gemini-3-pro-preview",
    contents="What's the latest news on AI?",
    config={"tools": [{"google_search": {}}]},
)
const response = await ai.models.generateContent({
  model: "gemini-3-pro-preview",
  contents: "What's the latest news on AI?",
  config: { tools: [{ googleSearch: {} }] },
});

Structured outputs + tools: Gemini 3 supports combining JSON schemas with built-in tools (Google Search, URL Context, Code Execution). See references/gemini-3.md.

See references/tools.md for all tool patterns.


Function Calling

Connect models to external tools and APIs. The model determines when to call functions and provides parameters.

from google.genai import types

# Define function
get_weather = {
    "name": "get_weather",
    "description": "Get weather for a location",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City name"},
        },
        "required": ["location"],
    },
}

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="What's the weather in Tokyo?",
    config=types.GenerateContentConfig(
        tools=[types.Tool(function_declarations=[get_weather])]
    ),
)

# Check for function call
if response.function_calls:
    fc = response.function_calls[0]
    print(f"Call {fc.name} with {fc.args}")
const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: "What's the weather in Tokyo?",
  config: {
    tools: [{ functionDeclarations: [getWeather] }],
  },
});

if (response.functionCalls) {
  const { name, args } = response.functionCalls[0];
  // Execute function and send result back
}

Automatic function calling (Python): Pass functions directly as tools for automatic execution.

See references/function-calling.md for execution modes, compositional calling, multimodal responses, MCP integration, and best practices.


Quick Reference

FeaturePythonJavaScript
Generategenerate_content()generateContent()
Streamgenerate_content_stream()generateContentStream()
Chatchats.create()chats.create()
Structuredresponse_json_schema=responseJsonSchema:
Image Gengemini-2.5-flash-imagegemini-2.5-flash-image
Video Gengenerate_videos()generateVideos()
Music Genlive.music.connect()live.music.connect()
Function Callfunction_declarationsfunctionDeclarations
Embeddingsembed_content()embedContent()
Files APIfiles.upload()files.upload()

Gemini 3 Specific Features

For advanced Gemini 3 features, see references/gemini-3.md:

  • Thinking levels: Control reasoning depth (minimal, low, medium, high)
  • Media resolution: Fine-grained multimodal processing (media_resolution_low to ultra_high)
  • Thought signatures: Required for function calling and image editing context
  • Structured outputs + tools: Combine JSON schemas with Google Search, URL Context
  • Multimodal function responses: Return images in tool responses

Resources

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

code-review

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

redmine-cli

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

assemblyai-cli

No summary provided by upstream source.

Repository SourceNeeds Review