senior-ml-engineer

Production ML engineering patterns for model deployment, MLOps infrastructure, and LLM integration.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "senior-ml-engineer" with this command: npx skills add alirezarezvani/claude-skills/alirezarezvani-claude-skills-senior-ml-engineer

Senior ML Engineer

Production ML engineering patterns for model deployment, MLOps infrastructure, and LLM integration.

Table of Contents

  • Model Deployment Workflow

  • MLOps Pipeline Setup

  • LLM Integration Workflow

  • RAG System Implementation

  • Model Monitoring

  • Reference Documentation

  • Tools

Model Deployment Workflow

Deploy a trained model to production with monitoring:

  • Export model to standardized format (ONNX, TorchScript, SavedModel)

  • Package model with dependencies in Docker container

  • Deploy to staging environment

  • Run integration tests against staging

  • Deploy canary (5% traffic) to production

  • Monitor latency and error rates for 1 hour

  • Promote to full production if metrics pass

  • Validation: p95 latency < 100ms, error rate < 0.1%

Container Template

FROM python:3.11-slim

COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt

COPY model/ /app/model/ COPY src/ /app/src/

HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1

EXPOSE 8080 CMD ["uvicorn", "src.server:app", "--host", "0.0.0.0", "--port", "8080"]

Serving Options

Option Latency Throughput Use Case

FastAPI + Uvicorn Low Medium REST APIs, small models

Triton Inference Server Very Low Very High GPU inference, batching

TensorFlow Serving Low High TensorFlow models

TorchServe Low High PyTorch models

Ray Serve Medium High Complex pipelines, multi-model

MLOps Pipeline Setup

Establish automated training and deployment:

  • Configure feature store (Feast, Tecton) for training data

  • Set up experiment tracking (MLflow, Weights & Biases)

  • Create training pipeline with hyperparameter logging

  • Register model in model registry with version metadata

  • Configure staging deployment triggered by registry events

  • Set up A/B testing infrastructure for model comparison

  • Enable drift monitoring with alerting

  • Validation: New models automatically evaluated against baseline

Feature Store Pattern

from feast import Entity, Feature, FeatureView, FileSource

user = Entity(name="user_id", value_type=ValueType.INT64)

user_features = FeatureView( name="user_features", entities=["user_id"], ttl=timedelta(days=1), features=[ Feature(name="purchase_count_30d", dtype=ValueType.INT64), Feature(name="avg_order_value", dtype=ValueType.FLOAT), ], online=True, source=FileSource(path="data/user_features.parquet"), )

Retraining Triggers

Trigger Detection Action

Scheduled Cron (weekly/monthly) Full retrain

Performance drop Accuracy < threshold Immediate retrain

Data drift PSI > 0.2 Evaluate, then retrain

New data volume X new samples Incremental update

LLM Integration Workflow

Integrate LLM APIs into production applications:

  • Create provider abstraction layer for vendor flexibility

  • Implement retry logic with exponential backoff

  • Configure fallback to secondary provider

  • Set up token counting and context truncation

  • Add response caching for repeated queries

  • Implement cost tracking per request

  • Add structured output validation with Pydantic

  • Validation: Response parses correctly, cost within budget

Provider Abstraction

from abc import ABC, abstractmethod from tenacity import retry, stop_after_attempt, wait_exponential

class LLMProvider(ABC): @abstractmethod def complete(self, prompt: str, **kwargs) -> str: pass

@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10)) def call_llm_with_retry(provider: LLMProvider, prompt: str) -> str: return provider.complete(prompt)

Cost Management

Provider Input Cost Output Cost

GPT-4 $0.03/1K $0.06/1K

GPT-3.5 $0.0005/1K $0.0015/1K

Claude 3 Opus $0.015/1K $0.075/1K

Claude 3 Haiku $0.00025/1K $0.00125/1K

RAG System Implementation

Build retrieval-augmented generation pipeline:

  • Choose vector database (Pinecone, Qdrant, Weaviate)

  • Select embedding model based on quality/cost tradeoff

  • Implement document chunking strategy

  • Create ingestion pipeline with metadata extraction

  • Build retrieval with query embedding

  • Add reranking for relevance improvement

  • Format context and send to LLM

  • Validation: Response references retrieved context, no hallucinations

Vector Database Selection

Database Hosting Scale Latency Best For

Pinecone Managed High Low Production, managed

Qdrant Both High Very Low Performance-critical

Weaviate Both High Low Hybrid search

Chroma Self-hosted Medium Low Prototyping

pgvector Self-hosted Medium Medium Existing Postgres

Chunking Strategies

Strategy Chunk Size Overlap Best For

Fixed 500-1000 tokens 50-100 General text

Sentence 3-5 sentences 1 sentence Structured text

Semantic Variable Based on meaning Research papers

Recursive Hierarchical Parent-child Long documents

Model Monitoring

Monitor production models for drift and degradation:

  • Set up latency tracking (p50, p95, p99)

  • Configure error rate alerting

  • Implement input data drift detection

  • Track prediction distribution shifts

  • Log ground truth when available

  • Compare model versions with A/B metrics

  • Set up automated retraining triggers

  • Validation: Alerts fire before user-visible degradation

Drift Detection

from scipy.stats import ks_2samp

def detect_drift(reference, current, threshold=0.05): statistic, p_value = ks_2samp(reference, current) return { "drift_detected": p_value < threshold, "ks_statistic": statistic, "p_value": p_value }

Alert Thresholds

Metric Warning Critical

p95 latency

100ms 200ms

Error rate

0.1% 1%

PSI (drift)

0.1 0.2

Accuracy drop

2% 5%

Reference Documentation

MLOps Production Patterns

references/mlops_production_patterns.md contains:

  • Model deployment pipeline with Kubernetes manifests

  • Feature store architecture with Feast examples

  • Model monitoring with drift detection code

  • A/B testing infrastructure with traffic splitting

  • Automated retraining pipeline with MLflow

LLM Integration Guide

references/llm_integration_guide.md contains:

  • Provider abstraction layer pattern

  • Retry and fallback strategies with tenacity

  • Prompt engineering templates (few-shot, CoT)

  • Token optimization with tiktoken

  • Cost calculation and tracking

RAG System Architecture

references/rag_system_architecture.md contains:

  • RAG pipeline implementation with code

  • Vector database comparison and integration

  • Chunking strategies (fixed, semantic, recursive)

  • Embedding model selection guide

  • Hybrid search and reranking patterns

Tools

Model Deployment Pipeline

python scripts/model_deployment_pipeline.py --model model.pkl --target staging

Generates deployment artifacts: Dockerfile, Kubernetes manifests, health checks.

RAG System Builder

python scripts/rag_system_builder.py --config rag_config.yaml --analyze

Scaffolds RAG pipeline with vector store integration and retrieval logic.

ML Monitoring Suite

python scripts/ml_monitoring_suite.py --config monitoring.yaml --deploy

Sets up drift detection, alerting, and performance dashboards.

Tech Stack

Category Tools

ML Frameworks PyTorch, TensorFlow, Scikit-learn, XGBoost

LLM Frameworks LangChain, LlamaIndex, DSPy

MLOps MLflow, Weights & Biases, Kubeflow

Data Spark, Airflow, dbt, Kafka

Deployment Docker, Kubernetes, Triton

Databases PostgreSQL, BigQuery, Pinecone, Redis

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

senior-secops

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

self-improving-agent

No summary provided by upstream source.

Repository SourceNeeds Review
General

aws-solution-architect

No summary provided by upstream source.

Repository SourceNeeds Review
General

social-media-analyzer

No summary provided by upstream source.

Repository SourceNeeds Review