Nutrient Document Processing

Use Nutrient DWS for managed document workflows where fidelity, compliance, or multi-step processing matters more than local-tool convenience.

Setup

Get a Nutrient DWS API key at https://dashboard.nutrient.io/sign_up/?product=processor.
Direct API calls use Authorization: Bearer $NUTRIENT_API_KEY.
```
export NUTRIENT_API_KEY="nutr_sk_..."
```
MCP setups commonly use @nutrient-sdk/dws-mcp-server with NUTRIENT_DWS_API_KEY.
Scripts live in scripts/ relative to this SKILL.md. Use the directory containing this SKILL.md as the working directory:
```
cd <directory containing this SKILL.md> && uv run scripts/<script>.py --help
```
Page ranges use start:end with 0-based indexes and end-exclusive semantics. Negative indexes count from the end.

Generate PDFs from HTML templates, uploaded assets, or remote URLs.
Convert Office, HTML, image, and PDF files between supported formats.
OCR scans and extract text, tables, or key-value pairs.
Redact PII, watermark, sign, fill forms, merge, split, rotate, flatten, or encrypt PDFs.
Produce delivery targets like PDF/A, PDF/UA, optimized PDFs, or linearized PDFs.
Check credits before large, batch, or AI-heavy runs.

Prefer scripts/*.py for covered single-operation workflows.
Use assets/templates/custom-workflow-template.py for multi-step jobs that should still run through the Python client.
Use the modular references/ docs and direct API payloads for capabilities that do not yet have a dedicated helper script, especially HTML/URL generation and compliance tuning.
Use local PDF utilities only for lightweight inspection. Use Nutrient when output fidelity or compliance matters.

convert.py -> convert between pdf, pdfa, pdfua, docx, xlsx, pptx, png, jpeg, webp, html, and markdown
merge.py -> merge multiple files into one PDF
split.py -> split one PDF into multiple PDFs by page ranges
add-pages.py -> append blank pages
delete-pages.py -> remove specific pages
duplicate-pages.py -> reorder or duplicate pages into a new PDF
rotate.py -> rotate selected pages
ocr.py -> OCR scanned PDFs or images
extract-text.py -> extract text to JSON
extract-table.py -> extract tables
extract-key-value-pairs.py -> extract key-value pairs
watermark-text.py -> apply a text watermark
redact-ai.py -> detect and apply AI-powered redactions
sign.py -> digitally sign a local PDF
password-protect.py -> write encrypted output PDFs
optimize.py -> apply optimization and linearization-style options via JSON

Do not add new committed pipeline scripts under scripts/.

When the user asks for multiple operations in one run:

Copy assets/templates/custom-workflow-template.py to a temporary location such as /tmp/ndp-workflow-<task>.py.
Implement the combined workflow in that temporary script.
Run it with uv run /tmp/ndp-workflow-<task>.py ....
Return generated output files.
Delete the temporary script unless the user explicitly asks to keep it.

split.py requires a multi-page PDF and cannot extract ranges from a single-page document.
delete-pages.py must retain at least one page and cannot delete the entire document.
sign.py only accepts local file paths for the main PDF.

Prefer a helper script when one already covers the requested operation cleanly.
If you control the source markup, prefer HTML generation over browser print workflows.
Use remote file.url inputs when the source already lives at a stable URL and you want to avoid local uploads.
Use output.type for conversion and finalization targets. Use actions for transformations when building direct API payloads.
OCR before text extraction, key-value extraction, or semantic redaction on scans.
Prefer preset or regex redaction when the target is explicit. Use AI redaction only for contextual or natural-language requests.
Use the PDF manipulation reference for merge, split, rotate, flatten, and page-range workflows instead of inferring those payloads from conversion examples.
Treat PDF/A and PDF/UA as compliance targets, not cosmetic export formats. Choose the target up front and validate final artifacts when requirements are contractual.
For PDF/UA, clean born-digital inputs and structured HTML usually tag better than rasterized or flattened source PDFs.
For delivery optimization, linearize or optimize unsigned output artifacts instead of mutating already signed files.
When the user asks for multiple steps, keep destructive or final steps late in the sequence. Use the workflow recipes when ordering is ambiguous.

Do not OCR born-digital PDFs just because the task mentions extraction. Extract first and OCR only if the text layer is missing.
Do not flatten forms or annotations until the user confirms the artifact no longer needs to stay editable.
Do not sign, archive, or linearize intermediate working files. Keep those as final-delivery steps.
Do not promise PDF/A or PDF/UA compliance without a validation step when the requirement is contractual.
Do not commit temporary workflow scripts under scripts/.

Read only what you need:

references/request-basics.md -> endpoint model, auth, multipart vs JSON, credits, limits, and errors
references/generation-and-conversion.md -> HTML/URL generation and format conversion
references/pdf-manipulation.md -> merge, split, page-range, rotate, and flatten workflows
references/extraction-and-ocr.md -> OCR, text extraction, tables, and key-value workflows
references/security-signing-and-forms.md -> redaction, watermarking, signatures, forms, and passwords
references/compliance-and-optimization.md -> PDF/A, PDF/UA, optimization, and linearization
references/workflow-recipes.md -> end-to-end sequencing patterns for common business document workflows