ML Systems
Building production-ready machine learning systems.
Overview
This skill category covers the complete ML system lifecycle:
-
Foundations - Core concepts, architectures, paradigms
-
Data Engineering - Data collection, quality, feature engineering
-
Model Development - Training, evaluation, frameworks
-
Performance - Optimization, acceleration, efficiency
-
Deployment - Serving, edge deployment, scaling
-
Operations - MLOps, monitoring, reliability
Categories
Foundations
-
ml-systems-fundamentals
-
Core ML systems concepts
-
deep-learning-primer
-
Deep learning foundations
-
dnn-architectures
-
Neural network architectures
-
deployment-paradigms
-
Deployment patterns
Data Engineering
-
data-engineering
-
Data pipelines and quality
-
training-data
-
Training data management
-
feature-engineering
-
Feature creation and stores
Model Development
-
ml-workflow
-
ML development workflow
-
model-development
-
Model training and selection
-
ml-frameworks
-
Framework best practices
Performance
-
efficient-ai
-
Efficiency techniques
-
model-optimization
-
Quantization, pruning, distillation
-
ai-accelerators
-
Hardware acceleration
Deployment
-
model-deployment
-
Production deployment
-
inference-optimization
-
Inference optimization
-
edge-deployment
-
Edge and mobile deployment
Operations
-
mlops
-
ML operations and lifecycle
-
robust-ai
-
Reliability and robustness
Key Principles
-
Data-Centric AI - Focus on data quality over model complexity
-
Iterative Development - Start simple, iterate based on metrics
-
Production-First - Design for deployment from the start
-
Monitoring - Continuous monitoring and improvement
-
Reproducibility - Version everything (data, code, models)
References
-
Harvard CS 329S: Machine Learning Systems Design
-
Designing Machine Learning Systems by Chip Huyen
-
MLOps: Continuous Delivery and Automation Pipelines