pdf-filler
Operate on PDF AcroForms: list every field with its type and current value, then fill the PDF with values supplied as JSON. The skill calls a small Python package (oc-pdf-filler) that wraps a fallback chain of PDF libraries, so a single recalcitrant PDF doesn't block the workflow.
Workspace rules (read this first)
Many agent hosts run inside a sandbox that only allows reads/writes inside a specific workspace folder. Files written outside that folder show up as "Unavailable / Outside allowed folders" and the user can't download them.
The CLI enforces this for you:
- The "workspace" is resolved from the first set environment variable in this list,
falling back to the current working directory:
OC_PDF_FILLER_WORKSPACE,OPENCLAW_WORKSPACE,CLAWHUB_WORKSPACE,AGENT_WORKSPACE,SKILL_WORKSPACE,WORKSPACE. - All output paths (schema JSON and filled PDF) are resolved relative to the workspace.
- If you pass an absolute
--outputthat points outside the workspace, the CLI rewrites it to the same basename inside the workspace (and prints a warning to stderr). - You can override the workspace explicitly with
--workspace DIR.
In practice: pass relative paths (e.g. -o form_done.pdf), or omit --output
entirely. The default is <input-stem>_done.pdf inside the workspace.
When to use
Trigger this skill when the user:
- Asks to inspect, list, or extract the form fields of a PDF
- Wants to fill out / populate / complete a PDF form programmatically
- Mentions AcroForm, checkbox, radio button, or dropdown handling in a PDF
- Has a batch of PDFs to fill from structured data (JSON)
Setup (once per workspace)
The skill scripts call the oc-pdf-filler Python package. Install it first:
pip install "oc-pdf-filler[all]"
# or, if working from the source repo: pip install -e ".[all]"
The [all] extra pulls in pdfrw and PyMuPDF for the full fallback chain. Install pdftk from your package manager for the last-resort backend (optional, but useful for stubborn PDFs).
Verify which backends are active:
python scripts/list_backends.py
Step 1: Extract the field schema
Always extract first so you know the exact field names and types before constructing the JSON values file. Use a workspace-relative path for the output (the CLI confines it to the workspace automatically).
python scripts/extract.py /path/to/form.pdf --output schema.json --include-values
Each entry in the resulting JSON has:
name: the AcroForm field name (use this verbatim as the key when filling)type: one oftext,checkbox,radio,choice,signature,pushbutton,unknownoptions: for radios and checkboxes, the accepted export values; for choices, the dropdown optionsvalue: current value if the form is partially filled (only when--include-valuesis set)max_length,multiline,required,read_only: hints for validation
See references/FIELD_TYPES.md for the value contract per field type.
Step 2: Build a values JSON file
The fill input is a flat JSON object { "FieldName": value }. Example:
{
"Name Verantwortlicher": "ACME GmbH",
"Postleitzahl Verantwortlicher": "10115",
"Beschäftigte": true,
"Verarbeitungstyp": "Automatisiert"
}
A starter template is included at assets/values.example.json.
Step 3: Fill the PDF
Omit --output to get the recommended default <input-stem>_done.pdf inside the workspace, or pass a workspace-relative filename. Absolute paths outside the workspace are automatically rewritten into it (the host's sandbox would reject them otherwise).
# preferred: keep the original name with _done suffix, in the workspace
python scripts/fill.py /path/to/form.pdf values.json
# or pass an explicit relative path
python scripts/fill.py /path/to/form.pdf values.json --output form_done.pdf
By default the orchestrator uses --backend auto, walking the chain pypdf -> pdfrw -> PyMuPDF -> pdftk and stopping at the first backend that fills every field.
Useful flags:
--backend pymupdf-- force a specific backend (e.g. when the auto winner produces a PDF that doesn't render correctly in your viewer)--best-effort-- chain backends so partial fills accumulate (use when no single backend handles every field)--flatten-- bake values into the PDF so they can't be edited (best support: PyMuPDF, pdftk)--strict-- exit non-zero if any requested field is missing or unfillable
The script prints a JSON summary including winning_backend, workspace, output_path (absolute path of the resulting PDF, always inside the workspace), filled, missing, failed, and per-attempt details. If filling fails, see references/BACKENDS.md for backend-specific troubleshooting tips.
Delivering the result to the user
After a successful fill, the user expects to receive the PDF as an attachment. To make that work reliably across hosts:
- Read
output_pathfrom the fill summary. It is guaranteed to be inside the workspace, so the host can attach it. - Surface that path in your final message and use whatever attachment / file-return mechanism the host provides so the PDF arrives as a downloadable attachment.
- Never write the PDF outside the workspace (e.g.
/tmp,/var, your home directory). Sandboxed hosts will mark it as "Unavailable / Outside allowed folders".
End-to-end example
python scripts/extract.py form.pdf -o schema.json
# ... agent inspects schema.json, builds values.json based on user input ...
python scripts/fill.py form.pdf values.json
# writes <workspace>/form_done.pdf
After filling, re-run extract.py --include-values form_done.pdf and confirm the values stuck before delivering the PDF to the user.
Notes and edge cases
- Field names may contain spaces, German umlauts, or punctuation. Always copy them verbatim from
extract.pyoutput. - For radio groups, set the value to the export name of the chosen option (one of the strings in
options), not a boolean. - Signature fields (
type: signature) are reported but not auto-filled. - Encrypted PDFs are out of scope; the tool will surface the underlying library error.
- Some PDF viewers cache appearance streams; if a viewer shows blank fields after filling, try opening with a different viewer or use
--flatten.