nebius-dedicated-endpoint

Create and manage dedicated inference endpoints on Nebius Token Factory. Use this skill whenever the user wants to deploy a model to a dedicated GPU instance, configure autoscaling, run inference against a private endpoint, update endpoint settings, or tear down a deployment. Trigger for phrases like "create a dedicated endpoint", "deploy a model on dedicated GPU", "set up autoscaling for my Nebius endpoint", "run inference on my dedicated model", "scale my endpoint", or any question about isolated/private inference deployments on Token Factory.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "nebius-dedicated-endpoint" with this command: npx skills add arindam200/nebius-skills/arindam200-nebius-skills-nebius-dedicated-endpoint

Nebius Dedicated Endpoints

Dedicated endpoints give you an isolated, GPU-backed deployment of a supported model template with per-region data residency, configurable autoscaling, and OpenAI-compatible inference.

Prerequisites

pip install requests openai
export NEBIUS_API_KEY="your-key"

Control plane (manage endpoints): https://api.tokenfactory.nebius.com Data plane (inference), pick by region:

RegionInference base URL
eu-north1https://api.tokenfactory.nebius.com/v1/
eu-west1https://api.tokenfactory.eu-west1.nebius.com/v1/
us-central1https://api.tokenfactory.us-central1.nebius.com/v1/

Key concepts

  • Template — deployable blueprint (model + supported GPU types/regions)
  • Flavorbase (throughput-optimized) or fast (low-latency, speculative decoding)
  • Endpoint — your live deployment, identified by endpoint_id
  • routing_key — the model name to pass in inference calls

Operations

List available templates

import requests
r = requests.get("https://api.tokenfactory.nebius.com/v0/dedicated_endpoints/templates",
                 headers={"Authorization": f"Bearer {API_KEY}"})
templates = r.json().get("templates", [])
for t in templates:
    print(t["template_name"], [f["flavor_name"] for f in t.get("flavors", [])])

Create an endpoint

payload = {
    "name":     "my-endpoint",
    "template": "openai/gpt-oss-20b",      # from list_templates
    "flavor":   "base",
    "region":   "eu-north1",
    "scaling":  {"min_replicas": 1, "max_replicas": 2},
}
r = requests.post("https://api.tokenfactory.nebius.com/v0/dedicated_endpoints",
                  headers=HEADERS, json=payload)
endpoint = r.json()
endpoint_id  = endpoint["endpoint_id"]
routing_key  = endpoint["routing_key"]

Poll GET /v0/dedicated_endpoints/{endpoint_id} until status == "ready".

Run inference

from openai import OpenAI
client = OpenAI(base_url="https://api.tokenfactory.nebius.com/v1/", api_key=API_KEY)

resp = client.chat.completions.create(
    model=routing_key,          # the routing_key from endpoint creation
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)

Update autoscaling (live, no downtime)

requests.patch(
    f"https://api.tokenfactory.nebius.com/v0/dedicated_endpoints/{endpoint_id}",
    headers=HEADERS,
    json={"scaling": {"min_replicas": 2, "max_replicas": 8}},
)

Delete endpoint

requests.delete(
    f"https://api.tokenfactory.nebius.com/v0/dedicated_endpoints/{endpoint_id}",
    headers=HEADERS,
)

Choosing flavor

NeedUse
High throughput, cost-efficientbase
Low latency, real-time UXfast (uses speculative decoding + smaller batches)

Data residency

Choose region to control where inference runs. Metrics are collected locally but stored in eu-north1.

Bundled reference

Read references/templates-regions.md when the user asks about available templates, GPU types, regions, or flavor differences.

Reference script

Full working script: scripts/02_dedicated_endpoints.py

Docs: https://docs.tokenfactory.nebius.com/ai-models-inference/dedicated-endpoints

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

nebius-batch-synthetic

No summary provided by upstream source.

Repository SourceNeeds Review
General

nebius-datalab-pipeline

No summary provided by upstream source.

Repository SourceNeeds Review
General

ll-feishu-audio

飞书语音交互技能。支持语音消息自动识别、AI 处理、语音回复全流程。需要配置 FEISHU_APP_ID 和 FEISHU_APP_SECRET 环境变量。使用 faster-whisper 进行语音识别,Edge TTS 进行语音合成,自动转换 OPUS 格式并通过飞书发送。适用于飞书平台的语音对话场景。

Archived SourceRecently Updated
General

test_skill

import json import tkinter as tk from tkinter import messagebox, simpledialog

Archived SourceRecently Updated