llm-ops
Purpose
This skill automates the deployment, scaling, and monitoring of large language models (LLMs) in AI/ML operations, handling infrastructure for models like GPT or BERT variants to ensure efficient runtime management.
When to Use
Use this skill when deploying LLMs in production environments, such as scaling a chatbot backend during peak traffic, monitoring model performance in real-time, or updating models in Kubernetes-based ML ops setups. Apply it in scenarios involving resource-constrained environments or when integrating LLMs with CI/CD pipelines for automated deployments.
Key Capabilities
-
Deploy LLMs to cloud providers (e.g., AWS, GCP) with automatic containerization.
-
Scale instances dynamically based on metrics like CPU usage or request volume.
-
Monitor key metrics including latency, throughput, and error rates via integrated dashboards.
-
Handle model versioning and rollbacks for safe updates.
-
Integrate with logging tools like ELK stack for detailed tracing.
Usage Patterns
To deploy an LLM, first set the environment variable for authentication: export OPENCLAW_API_KEY=your_api_key . Then, use the CLI to initiate deployment with specific flags. For scaling, monitor metrics and trigger adjustments programmatically. Always specify the model ID and target environment in commands to avoid conflicts. For API-based usage, include the API key in headers and handle responses for asynchronous operations.
Common Commands/API
Use the OpenClaw CLI for quick operations; prefix commands with openclaw llm . For API calls, target the base endpoint https://api.openclaw.ai/llm and include the header Authorization: Bearer $OPENCLAW_API_KEY .
Deploy Command: openclaw llm deploy --model-id my-llm-123 --env production --replicas 3 --config-path ./config.json
-
Example config.json: {"image": "my-llm-image:v1", "resources": {"cpu": "2", "memory": "4Gi"}}
-
Code snippet (Python): import requests response = requests.post('https://api.openclaw.ai/llm/deploy', json={'model_id': 'my-llm-123', 'replicas': 3}, headers={'Authorization': f'Bearer {os.environ["OPENCLAW_API_KEY"]}'}) print(response.json())
Scale Command: openclaw llm scale --model-id my-llm-123 --scale-to 5 --metric cpu_utilization
-
This adjusts replicas based on the specified metric threshold (e.g., >80% CPU).
-
API Endpoint: POST /api/llm/scale with body: {"model_id": "my-llm-123", "scale_to": 5}
Monitor Command: openclaw llm monitor --model-id my-llm-123 --duration 60 --output json
-
Outputs metrics to stdout or file; use --alert-threshold 0.9 for CPU alerts.
-
API Endpoint: GET /api/llm/metrics?model_id=my-llm-123&duration=60
Rollback Command: openclaw llm rollback --model-id my-llm-123 --version v1.0
- Reverts to a previous model version; requires versioning enabled in config.
Config formats are JSON-based, e.g.,:
{ "model_id": "my-llm-123", "deployment": { "type": "kubernetes", "namespace": "aiml" } }
Integration Notes
Integrate this skill with existing ML ops tools by exporting metrics to Prometheus or using webhooks for CI/CD. For Kubernetes, apply manifests generated by openclaw llm generate-k8s --model-id my-llm-123 . When combining with other OpenClaw skills, chain commands like openclaw llm deploy && openclaw monitoring setup . Use environment variables for secrets, e.g., set $OPENCLAW_API_KEY in your .env file and load it via dotenv in Python scripts. Ensure network accessibility to API endpoints; configure firewalls to allow traffic to api.openclaw.ai .
Error Handling
Check command exit codes; for example, if openclaw llm deploy fails with code 1, parse the error message for details like "Model not found". In API responses, handle HTTP status codes: 401 for authentication issues (retry with export OPENCLAW_API_KEY=new_key ), 404 for missing models, or 500 for server errors (wait and retry with exponential backoff). Include try-except blocks in code snippets:
try: response = requests.post('https://api.openclaw.ai/llm/deploy', ...) response.raise_for_status() except requests.exceptions.HTTPError as e: print(f"Error: {e.response.status_code} - {e.response.text}") sys.exit(1)
Log errors to files using --log-file errors.log in CLI commands and monitor for common issues like resource limits.
Concrete Usage Examples
-
Deploy and Scale an LLM: First, export your API key: export OPENCLAW_API_KEY=abc123 . Deploy a model with: openclaw llm deploy --model-id gpt-finetuned --env staging --replicas 2 . Then, scale it based on load: openclaw llm scale --model-id gpt-finetuned --scale-to 10 --metric request_rate .
-
Monitor and Rollback: Run monitoring: openclaw llm monitor --model-id gpt-finetuned --duration 300 . If issues arise, rollback: openclaw llm rollback --model-id gpt-finetuned --version v2.1 .
Graph Relationships
-
Related to: aimlops (cluster), llm (tag), mlops (tag)
-
Depends on: authentication services for API access
-
Integrates with: monitoring tools, deployment orchestrators like Kubernetes
-
Conflicts with: none specified; ensure no overlapping model IDs in multi-skill environments