iOS Computer Vision Router
You MUST use this skill for ANY computer vision work using the Vision framework.
When to Use
Use this router when:
-
Analyzing images or video
-
Detecting objects, faces, or people
-
Tracking hand or body pose
-
Segmenting people or subjects
-
Lifting subjects from backgrounds
-
Recognizing text in images (OCR)
-
Detecting barcodes or QR codes
-
Scanning documents
-
Using VisionKit or DataScannerViewController
-
Integrating with Visual Intelligence (iOS 26+ system camera feature)
Routing Logic
Vision Work
Implementation patterns → /skill axiom-vision
-
Subject segmentation (VisionKit)
-
Hand pose detection (21 landmarks)
-
Body pose detection (2D/3D)
-
Person segmentation
-
Face detection
-
Isolating objects while excluding hands
-
Text recognition (VNRecognizeTextRequest)
-
Barcode/QR detection (VNDetectBarcodesRequest)
-
Document scanning (VNDocumentCameraViewController)
-
Live scanning (DataScannerViewController)
-
Structured document extraction (RecognizeDocumentsRequest, iOS 26+)
API reference → /skill axiom-vision-ref
-
Complete Vision framework API
-
VNDetectHumanHandPoseRequest
-
VNDetectHumanBodyPoseRequest
-
VNGenerateForegroundInstanceMaskRequest
-
VNRecognizeTextRequest (fast/accurate modes)
-
VNDetectBarcodesRequest (symbologies)
-
DataScannerViewController delegates
-
RecognizeDocumentsRequest (iOS 26+)
-
Coordinate conversion patterns
Visual Intelligence integration → /skill axiom-vision-ref (see Visual Intelligence Integration section)
-
Making app content discoverable to Visual Intelligence camera
-
IntentValueQuery and SemanticContentDescriptor
-
Deep linking from Visual Intelligence results
Diagnostics → /skill axiom-vision-diag
-
Subject not detected
-
Hand pose missing landmarks
-
Low confidence observations
-
Performance issues
-
Coordinate conversion bugs
-
Text not recognized or wrong characters
-
Barcodes not detected
-
DataScanner showing blank or no items
-
Document edges not detected
Decision Tree
-
Implementing (pose, segmentation, OCR, barcodes, documents, live scanning)? → vision
-
Visual Intelligence system integration (camera feature, iOS 26+)? → vision-ref (Visual Intelligence section)
-
Need API reference / code examples? → vision-ref
-
Debugging issues (detection failures, confidence, coordinates)? → vision-diag
Anti-Rationalization
Thought Reality
"Vision framework is just a request/handler pattern" Vision has coordinate conversion, confidence thresholds, and performance gotchas. vision covers them.
"I'll handle text recognition without the skill" VNRecognizeTextRequest has fast/accurate modes and language-specific settings. vision has the patterns.
"Subject segmentation is straightforward" Instance masks have HDR compositing and hand-exclusion patterns. vision covers complex scenarios.
"Visual Intelligence is just the camera API" Visual Intelligence is a system-level feature requiring IntentValueQuery and SemanticContentDescriptor. vision-ref has the integration section.
Critical Patterns
vision:
-
Subject segmentation with VisionKit
-
Hand pose detection (21 landmarks)
-
Body pose detection (2D/3D, up to 4 people)
-
Isolating objects while excluding hands
-
CoreImage HDR compositing
-
Text recognition (fast vs accurate modes)
-
Barcode detection (symbology selection)
-
Document scanning with perspective correction
-
Live scanning with DataScannerViewController
-
Structured document extraction (iOS 26+)
vision-diag:
-
Subject detection failures
-
Landmark tracking issues
-
Performance optimization
-
Observation confidence thresholds
-
Text recognition failures (language, contrast)
-
Barcode detection issues (symbology, distance)
-
DataScanner troubleshooting
-
Document edge detection problems
Example Invocations
User: "How do I detect hand pose in an image?" → Invoke: /skill axiom-vision
User: "Isolate a subject but exclude the user's hands" → Invoke: /skill axiom-vision
User: "How do I read text from an image?" → Invoke: /skill axiom-vision
User: "Scan QR codes with the camera" → Invoke: /skill axiom-vision
User: "How do I implement document scanning?" → Invoke: /skill axiom-vision
User: "Use DataScannerViewController for live text" → Invoke: /skill axiom-vision
User: "Subject detection isn't working" → Invoke: /skill axiom-vision-diag
User: "Text recognition returns wrong characters" → Invoke: /skill axiom-vision-diag
User: "Barcode not being detected" → Invoke: /skill axiom-vision-diag
User: "Show me VNDetectHumanBodyPoseRequest examples" → Invoke: /skill axiom-vision-ref
User: "What symbologies does VNDetectBarcodesRequest support?" → Invoke: /skill axiom-vision-ref
User: "RecognizeDocumentsRequest API reference" → Invoke: /skill axiom-vision-ref
User: "How do I make my app work with Visual Intelligence?" → Invoke: /skill axiom-vision-ref
User: "How do users discover my app content through the camera?" → Invoke: /skill axiom-vision-ref