CareMax Upload & OCR
Requires
caremax-authas a sibling directory (../caremax-auth/). If missing, tell the user to install caremax-auth first (e.g.npx skills add KittenYang/caremax-skills).
Upload medical report files (PDF, JPG, PNG, HEIC) and extract structured data via AI-powered OCR.
Session-based workflow: upload → OCR → review → confirm. All operations are on a single session.
Checkpoint & resume: Every pipeline step saves progress to the database. If OCR fails mid-way (LLM timeout, worker crash, network error), retrying automatically resumes from the last checkpoint — no work is lost.
Agent default behavior (MANDATORY)
- Upload and OCR are one continuous workflow. When the user uploads report files (or asks you to upload/扫描/识别体检报告等), after
$UPLOADreturns successfully you must in the same turn run$OCRSTREAM <session_id>using the returnedsession_id. Do not end the task afterupload.shalone. - Upload-only exception: Skip immediate OCR only if the user explicitly asked to upload without recognition (e.g. 只上传、不要识别、别跑 OCR、只存文件). If unclear, default to running OCR after upload.
- Progress: Stream each SSE line to the user as it arrives (normalize / ocr / structure / …).
- After
step=done: Always continue to Step 3 (review). Do not auto-call confirm — wait for user approval before Step 4.
Prerequisites — Auto-Auth (MANDATORY)
APICALL="bash ../caremax-auth/scripts/api-call.sh"
UPLOAD="bash ../caremax-auth/scripts/upload.sh"
OCRSTREAM="bash ../caremax-auth/scripts/ocr-stream.sh"
If any script returns no_credentials → run bash ../caremax-auth/scripts/auth-flow.sh [base_url] (from this skill’s root, sibling of caremax-auth/).
Step 1: Upload (creates session)
$UPLOAD /path/to/report1.jpg /path/to/report2.jpg /path/to/report.pdf
Returns:
{
"session_id": "uuid-xxx",
"member_id": "uuid-yyy",
"files": [
{ "id": "file-1", "original_name": "report1.jpg" },
{ "id": "file-2", "original_name": "report2.jpg" },
{ "id": "file-3", "original_name": "report.pdf" }
]
}
Save the session_id.
Step 2: OCR with real-time progress
$OCRSTREAM <session_id>
Outputs one JSON per line:
{"step":"resume","progress":1,"message":"Resuming from checkpoint (last completed: ocr)..."}
{"step":"normalize","progress":5,"message":"Loading file 1/3..."}
{"step":"ocr","progress":30,"message":"OCR page 2/3: report2.jpg"}
{"step":"ocr_retry","progress":35,"message":"Retrying OCR page 1/1: report1.jpg"}
{"step":"structure","progress":62,"message":"Detecting report groups..."}
{"step":"structure","progress":75,"message":"Structuring report 2/2..."}
{"step":"normalize_indicators","progress":88,"message":"Standardizing..."}
{"step":"done","progress":100,"data":{"session_id":"...","reports":[...],"resumed":true}}
Display progress to the user as each line arrives.
Key progress events
| step | meaning |
|---|---|
resume | Pipeline is resuming from a saved checkpoint (not starting from zero) |
info | Informational message (e.g. which step was resumed from) |
normalize | Loading and preprocessing files |
ocr | OCR text extraction per page |
ocr_retry | Retrying previously failed pages only |
structure | AI analyzing and grouping reports |
normalize_indicators | Standardizing indicator names |
done | Complete — data field contains the full results |
error | Pipeline failed — check message for details |
If step=resume appears, tell the user: "正在从上次的进度继续处理(不需要重新开始)"
Error responses from $OCRSTREAM
| code | meaning | action |
|---|---|---|
processing_in_progress | Another OCR run is still active | Wait and retry, or poll /status |
ocr_limit_exceeded | Free OCR quota exhausted | Tell user to upgrade |
| (no code) | Pipeline error (LLM timeout etc.) | Retry — will auto-resume from checkpoint |
Step 2b: Poll status (when SSE disconnects)
If the SSE stream disconnects (network timeout, terminal closed), use the status endpoint to check progress:
$APICALL GET "/api/skill/sessions/<session_id>/status"
Returns:
{
"session_id": "uuid",
"status": "processing",
"pipeline": {
"completedStep": "ocr",
"pageCount": 5,
"ocrCompleted": 4,
"ocrFailed": 1,
"reportCount": 0,
"errors": [{"step":"ocr","pageIndex":2,"message":"PaddleOCR timeout"}]
},
"error": null,
"is_stale": false
}
Field guide:
status = processing+is_stale = false→ OCR is still running normallystatus = processing+is_stale = true→ Worker crashed/timed out, safe to retry OCRstatus = awaiting_confirm→ OCR completed! Fetch session detail for resultsstatus = uploading+errorpresent → Last OCR attempt failed, retry will resume from checkpointpipeline.completedStep→ How far the pipeline got (normalize → ocr → structure → done)pipeline.ocrFailed→ Number of pages that failed OCR (will be retried on next attempt)
Polling workflow:
1. Call $OCRSTREAM → SSE disconnects mid-way
2. Poll GET /sessions/<id>/status every 5-10 seconds
3. When status = "awaiting_confirm" → fetch full results with GET /sessions/<id>
4. If status = "uploading" (failed) → retry with $OCRSTREAM (auto-resumes)
5. If is_stale = true → retry with $OCRSTREAM (auto-resumes from checkpoint)
Step 3: Review results (MANDATORY)
Parse the step=done data. Show formatted summary. Do NOT auto-confirm.
Each report has a reportType field: lab, genetic, imaging, pathology, or other.
Lab reports (reportType = "lab")
Show indicators table:
📋 报告 1: [lab] 尿生化 (编号: 114431194)
日期: 2025-02-05 医生: 俞海瑾
指标: 12 个 (3 个异常)
┌──────────────────────┬────────┬──────────┬────────────┬──────┐
│ 指标 │ 结果 │ 单位 │ 参考范围 │ 异常 │
├──────────────────────┼────────┼──────────┼────────────┼──────┤
│ 24H尿钠 │ 130.0 │ mmol/24h │ 137-257 │ ⬇ │
└──────────────────────┴────────┴──────────┴────────────┴──────┘
Non-lab reports (reportType = "genetic" / "imaging" / etc.)
Show summary + sections:
📋 报告 1: [genetic] 基因检测报告
日期: 2025-09-12 检测机构: 南京申友医学检验所
摘要: 心血管18项基因检测...高血压、冠心病风险一般...
段落: 18 sections
[gene_variant] 高血压 — 风险: 正常
[gene_variant] 冠心病 — 风险: 一般
[medication] ACEI类降压药 — 正常代谢型
...
Supported file types
- Images (JPG/PNG/HEIC): PaddleOCR → structure
- PDF (any size): Azure Mistral Document AI page-split → structure
- Large PDFs (e.g. 23-page gene report, 9.6MB) are fully supported
Step 4: Confirm and save
After user confirms:
$APICALL POST "/api/skill/sessions/<session_id>/confirm" '{"reports":[<reports from step 2>]}'
Returns: {"success":true,"message":"2 report(s) saved","recordIds":[...]}
Resuming incomplete sessions
When the user asks to continue/resume a previous upload, or when checking for unfinished work:
Step A: Find pending sessions
# List sessions that need OCR (uploaded but not processed)
$APICALL GET "/api/skill/sessions?status=uploading"
# List sessions stuck in processing (user exited mid-OCR)
$APICALL GET "/api/skill/sessions?status=processing"
# List sessions with OCR done but not yet confirmed
$APICALL GET "/api/skill/sessions?status=awaiting_confirm"
Show a summary of pending sessions to the user (file names, dates, status).
Step B: Resume based on status
uploading: Start OCR directly → go to Step 2 ($OCRSTREAM <session_id>)- If there's a saved checkpoint (previous failed attempt), OCR auto-resumes from it
processing: Check with status endpoint first:$APICALL GET "/api/skill/sessions/<session_id>/status"is_stale = false→ still running, wait or pollis_stale = true→ worker died, safe to retry:$OCRSTREAM <session_id>(auto-resumes from checkpoint)
awaiting_confirm: Get session detail → show results → go to Step 3 (review & confirm)
# Get full detail of a pending session (includes OCR results if awaiting_confirm)
$APICALL GET "/api/skill/sessions/<session_id>"
If the session is awaiting_confirm, the response includes ocr_result with the previously parsed reports — display them for review and proceed to Step 3 (confirm).
Resume-aware response handling
When $OCRSTREAM outputs step=done:
resumed = truein the data → tell user: "已从上次的进度恢复,OCR 结果已就绪"resumed = false(or absent) → normal fresh run
When $OCRSTREAM outputs step=error:
code = processing_in_progress→ tell user OCR is still running, poll/statusinsteadcode = ocr_limit_exceeded→ tell user to upgrade- No code → LLM/network error, safe to retry (will auto-resume from checkpoint)
Step C: Delete individual reports or stale sessions
Delete a single report (does NOT affect other reports in the same session):
$APICALL DELETE "/api/skill/sessions/<session_id>/records/<record_id>"
Delete an entire session (cascade deletes ALL files + reports):
$APICALL DELETE "/api/skill/sessions/<session_id>"
Other session operations
# List all sessions (all statuses)
$APICALL GET /api/skill/sessions
# List sessions filtered by status: uploading | processing | awaiting_confirm | completed
$APICALL GET "/api/skill/sessions?status=<status>"
# Get session detail (includes OCR results if awaiting_confirm, saved reports if completed)
$APICALL GET "/api/skill/sessions/<session_id>"
# Poll OCR progress (lightweight, use when SSE disconnects)
$APICALL GET "/api/skill/sessions/<session_id>/status"
# Delete single report (keeps session and other reports intact)
$APICALL DELETE "/api/skill/sessions/<session_id>/records/<record_id>"
# Delete entire session (undo everything: files + reports)
$APICALL DELETE "/api/skill/sessions/<session_id>"