Local Document AI with OpenVINO
Use this skill as a local document-to-action pipeline:
- Parse the document into a canonical structured representation.
- Optionally continue into
to-dataorto-code. - Save outputs into a predictable artifact folder with traceability.
Read only if needed
Load these references when you need the schema or output contracts:
{baseDir}/references/schema.md{baseDir}/references/mode_guide.md{baseDir}/references/output_contracts.md
Primary entrypoints
Use exactly one of these entrypoints:
- CLI orchestrator:
{baseDir}/scripts/run_skill.py - Optional local demo UI:
{baseDir}/scripts/serve_skill_ui.py
Do not call these implementation scripts directly from the skill:
parse_document.pytransform_doc_to_data.pytransform_doc_to_code.py
Local readiness
Check the environment before processing real documents:
python "{baseDir}/scripts/check_env.py"
Install the base dependencies in a virtual environment:
python -m pip install -r "{baseDir}/requirements.txt"
Install the third-party paddleocr_vl_openvino package only after reviewing the source or wheel and only when you intend to run the real OCR pipeline. Prefer installing from a reviewed local wheel path inside a virtual environment.
Run a quick orchestration smoke test:
python "{baseDir}/scripts/smoke_test.py"
Model assets are discovered from:
PADDLEOCR_VL_OPENVINO_MODEL_DIRPADDLEOCR_VL_LAYOUT_MODEL_DIRplusPADDLEOCR_VL_VLM_MODEL_DIR{baseDir}/models/paddleocr-vl-1.5-openvino/{baseDir}/models/paddleocr-vl-openvino/
Allow model auto-download only when the user explicitly approves it.
Supported modes
parse
Use when the user wants the structured parse only.
Outputs:
parsed.jsonparsed.mdresult_report.html- extracted layout, tables, or figures when available
to-data
Use when the user wants structured extraction, normalization, or document classification.
Typical outputs under task_output/:
entities.jsonkv_pairs.jsontable_index.jsonnormalized.jsonstructured_record.jsontraceability.json
to-code
Use when the user wants implementation-oriented output from the parse result.
Supported targets:
reacthtml-cssjson-schemajupyter-notebook
Typical outputs under task_output/:
component_map.jsonfield_schema.jsonui_blueprint.jsonnotes.mdtraceability.json- target-specific artifacts such as
app.jsx,index.html,styles.css,schema.json,notebook.ipynb, ornotebook_plan.json
Treat all generated code and notebooks as drafts. Review them before running, publishing, or connecting them to real systems.
Pipeline rules
Always follow these rules:
- Prefer local execution.
- Always parse first into
parsed.json. - Generate downstream artifacts from
parsed.json, not raw OCR text alone. - Preserve page numbers, reading order, block types, and source anchors when possible.
- Write traceability for downstream outputs.
- Mark low-confidence regions or assumptions explicitly.
- Do not silently drop tables, figures, formulas, charts, or key-value regions.
- Save outputs into one artifact folder per run.
- For confidential documents, prefer an explicit private
--outdirectory and remove artifacts after review.
Output contract
Default output folder:
./artifacts/<document_stem>/
Expected top-level outputs:
effective_config.jsonrun_report.jsonparsed.jsonparsed.mdresult_report.htmltask_output/
to-code runs may also emit:
code_preview.html
CLI examples
Parse
python "{baseDir}/scripts/run_skill.py" \
--mode parse \
--file "/absolute/path/to/report.pdf" \
--out "/absolute/path/to/artifacts/report_parse"
To-data
python "{baseDir}/scripts/run_skill.py" \
--mode to-data \
--file "/absolute/path/to/invoice.pdf" \
--out "/absolute/path/to/artifacts/invoice_data" \
--extract "tables,entities,kv_pairs"
To-code
python "{baseDir}/scripts/run_skill.py" \
--mode to-code \
--file "/absolute/path/to/ui_mockup.png" \
--out "/absolute/path/to/artifacts/ui_code" \
--target "react" \
--title "Generated App"
To-code notebook target
python "{baseDir}/scripts/run_skill.py" \
--mode to-code \
--file "/absolute/path/to/architecture_diagram.png" \
--out "/absolute/path/to/artifacts/notebook_code" \
--target "jupyter-notebook" \
--title "OpenVINO Notebook"
Slash-command examples
/skill local-document-ai-openvino parse file=./docs/report.pdf
/skill local-document-ai-openvino to-data file=./docs/invoice.pdf extract=tables,entities,kv_pairs
/skill local-document-ai-openvino to-code file=./mockups/architecture.png target=jupyter-notebook
Optional local demo UI
Start the local UI when the user wants an interactive demo page:
python "{baseDir}/scripts/serve_skill_ui.py"
The UI lets the user:
- preview a local file
- choose
parse,to-data, orto-code - choose the
to-codetarget - run the pipeline and inspect the generated local HTML reports
The bundled UI only allows preview/run access for local files under the skill directory and common user content folders such as Downloads, Documents, Desktop, and Pictures.
Failure behavior
If a run fails:
- state which stage failed
- do not claim outputs were created if they were not
- prefer writing
error.jsonwith failure details - recommend
parsefirst when the downstream request is ambiguous - surface stderr or a concise failure summary when available
Safety notes
- Use a virtual environment for dependency installation.
- Review and approve model downloads only when you explicitly intend to.
- Keep outputs in a private local folder when documents are sensitive.
- Review generated code and notebooks before execution.
- Delete artifacts when they are no longer needed.
- The wrapper always uses the bundled local scripts and the current Python interpreter. It does not allow custom interpreter or script-directory overrides.
Short reminder
Present this skill as a local document-understanding workflow with downstream actions, not as a plain OCR wrapper.