iOS On-Device AI Models
Production-ready guide for implementing on-device AI models in iOS apps using Apple's Foundation Models framework and MLX Swift.
When to Use This Skill
-
Implementing local LLM inference in iOS apps
-
Building chat interfaces with Foundation Models
-
Integrating Vision Language Models (VLMs)
-
Adding text embeddings or image generation
-
Implementing tool/function calling with LLMs
-
Managing multi-turn conversations
-
Optimizing memory usage for on-device models
-
Supporting internationalization in AI features
Core Principles
-
Availability First - Always check model availability before initialization
-
Stream Responses - Provide progressive UI updates for better UX
-
Session Persistence - Reuse LanguageModelSession for multi-turn conversations (Foundation Models)
-
Memory Awareness - Use quantized models and monitor memory usage
-
Async Everything - Load models asynchronously, never block the main thread
-
Locale Support - Use supportsLocale(_:) and locale instructions for Foundation Models
Quick Reference
Framework Comparison
Topic Guide
Framework comparison and selection framework-selection.md
Foundation Models (Apple's Framework)
Topic Guide
Setup and configuration foundation-models/setup.md
Chat patterns and conversations foundation-models/chat-patterns.md
MLX Swift (Advanced Features)
Topic Guide
Setup and configuration mlx-swift/setup.md
Chat patterns with custom models mlx-swift/chat-patterns.md
Vision Language Models (VLMs) mlx-swift/vision-patterns.md
Tool calling, embeddings, structured gen mlx-swift/advanced-patterns.md
Model quantization with MLX-LM mlx-swift/quantization.md
Shared (Both Frameworks)
Topic Guide
Best practices and optimization shared/best-practices.md
Error handling and recovery shared/error-handling.md
Testing strategies shared/testing.md
Quick Decision Trees
Which framework should I use?
Do you need advanced features like:
- Vision Language Models (VLMs)
- Image generation
- Custom models beyond the system model ├── Yes → MLX Swift (references/mlx-swift/) └── No → Is this a standard chat interface? ├── Yes → Foundation Models (simpler, recommended) └── No → Check framework-selection.md for guidance
Where should I start?
New to on-device AI? └── Start with Foundation Models: 1. Read framework-selection.md 2. Follow foundation-models/setup.md 3. Implement foundation-models/chat-patterns.md
Need advanced features? └── Use MLX Swift: 1. Read framework-selection.md 2. Follow mlx-swift/setup.md 3. Choose pattern: - Chat: mlx-swift/chat-patterns.md - Vision: mlx-swift/vision-patterns.md - Advanced: mlx-swift/advanced-patterns.md
Where should my model loading code live?
Is this model shared across features? ├── Yes → Create @Observable service in app/services/ └── No → Is it feature-specific? ├── Yes → Create @Observable class in feature/ └── No → Load inline with @State (simple cases only)
How should I handle conversations?
Foundation Models: └── Reuse LanguageModelSession for context (references/foundation-models/chat-patterns.md #multi-turn)
MLX Swift: └── Implement custom context management (references/mlx-swift/chat-patterns.md)
What generation parameters should I use?
What's the use case?
Factual answers (summaries, facts) └── temperature: 0.1-0.3
Balanced (chat, Q&A) └── temperature: 0.6-0.8
Creative (storytelling, ideas) └── temperature: 0.9-1.2
See references/shared/best-practices.md for details
Resources
-
MLX Swift Examples
-
Foundation Models Docs
-
Hugging Face Model Hub
-
MLX-LM Quantization
-
MLX Community Models