Nutrient Document Processing
Use Nutrient DWS for managed document workflows where fidelity, compliance, or multi-step processing matters more than local-tool convenience.
Setup
- Get a Nutrient DWS API key at https://dashboard.nutrient.io/sign_up/?product=processor.
- Direct API calls use
Authorization: Bearer $NUTRIENT_API_KEY.export NUTRIENT_API_KEY="nutr_sk_..." - MCP setups commonly use
@nutrient-sdk/dws-mcp-serverwithNUTRIENT_DWS_API_KEY. - Scripts live in
scripts/relative to this SKILL.md. Use the directory containing this SKILL.md as the working directory:cd <directory containing this SKILL.md> && uv run scripts/<script>.py --help - Page ranges use
start:endwith 0-based indexes and end-exclusive semantics. Negative indexes count from the end.
When to use
- Generate PDFs from HTML templates, uploaded assets, or remote URLs.
- Convert Office, HTML, image, and PDF files between supported formats.
- OCR scans and extract text, tables, or key-value pairs.
- Redact PII, watermark, sign, fill forms, merge, split, rotate, flatten, or encrypt PDFs.
- Produce delivery targets like PDF/A, PDF/UA, optimized PDFs, or linearized PDFs.
- Check credits before large, batch, or AI-heavy runs.
Tool preference
- Prefer
scripts/*.pyfor covered single-operation workflows. - Use
assets/templates/custom-workflow-template.pyfor multi-step jobs that should still run through the Python client. - Use the modular
references/docs and direct API payloads for capabilities that do not yet have a dedicated helper script, especially HTML/URL generation and compliance tuning. - Use local PDF utilities only for lightweight inspection. Use Nutrient when output fidelity or compliance matters.
Single-operation scripts
convert.py-> convert betweenpdf,pdfa,pdfua,docx,xlsx,pptx,png,jpeg,webp,html, andmarkdownmerge.py-> merge multiple files into one PDFsplit.py-> split one PDF into multiple PDFs by page rangesadd-pages.py-> append blank pagesdelete-pages.py-> remove specific pagesduplicate-pages.py-> reorder or duplicate pages into a new PDFrotate.py-> rotate selected pagesocr.py-> OCR scanned PDFs or imagesextract-text.py-> extract text to JSONextract-table.py-> extract tablesextract-key-value-pairs.py-> extract key-value pairswatermark-text.py-> apply a text watermarkredact-ai.py-> detect and apply AI-powered redactionssign.py-> digitally sign a local PDFpassword-protect.py-> write encrypted output PDFsoptimize.py-> apply optimization and linearization-style options via JSON
Multi-Step Workflow Rule
Do not add new committed pipeline scripts under scripts/.
When the user asks for multiple operations in one run:
- Copy
assets/templates/custom-workflow-template.pyto a temporary location such as/tmp/ndp-workflow-<task>.py. - Implement the combined workflow in that temporary script.
- Run it with
uv run /tmp/ndp-workflow-<task>.py .... - Return generated output files.
- Delete the temporary script unless the user explicitly asks to keep it.
PDF Requirements
split.pyrequires a multi-page PDF and cannot extract ranges from a single-page document.delete-pages.pymust retain at least one page and cannot delete the entire document.sign.pyonly accepts local file paths for the main PDF.
Decision rules
- Prefer a helper script when one already covers the requested operation cleanly.
- If you control the source markup, prefer HTML generation over browser print workflows.
- Use remote
file.urlinputs when the source already lives at a stable URL and you want to avoid local uploads. - Use
output.typefor conversion and finalization targets. Useactionsfor transformations when building direct API payloads. - OCR before text extraction, key-value extraction, or semantic redaction on scans.
- Prefer preset or regex redaction when the target is explicit. Use AI redaction only for contextual or natural-language requests.
- Use the PDF manipulation reference for merge, split, rotate, flatten, and page-range workflows instead of inferring those payloads from conversion examples.
- Treat PDF/A and PDF/UA as compliance targets, not cosmetic export formats. Choose the target up front and validate final artifacts when requirements are contractual.
- For PDF/UA, clean born-digital inputs and structured HTML usually tag better than rasterized or flattened source PDFs.
- For delivery optimization, linearize or optimize unsigned output artifacts instead of mutating already signed files.
- When the user asks for multiple steps, keep destructive or final steps late in the sequence. Use the workflow recipes when ordering is ambiguous.
Anti-patterns
- Do not OCR born-digital PDFs just because the task mentions extraction. Extract first and OCR only if the text layer is missing.
- Do not flatten forms or annotations until the user confirms the artifact no longer needs to stay editable.
- Do not sign, archive, or linearize intermediate working files. Keep those as final-delivery steps.
- Do not promise PDF/A or PDF/UA compliance without a validation step when the requirement is contractual.
- Do not commit temporary workflow scripts under
scripts/.
Reference map
Read only what you need:
references/request-basics.md-> endpoint model, auth, multipart vs JSON, credits, limits, and errorsreferences/generation-and-conversion.md-> HTML/URL generation and format conversionreferences/pdf-manipulation.md-> merge, split, page-range, rotate, and flatten workflowsreferences/extraction-and-ocr.md-> OCR, text extraction, tables, and key-value workflowsreferences/security-signing-and-forms.md-> redaction, watermarking, signatures, forms, and passwordsreferences/compliance-and-optimization.md-> PDF/A, PDF/UA, optimization, and linearizationreferences/workflow-recipes.md-> end-to-end sequencing patterns for common business document workflows
Rules
- Fail fast when required arguments are missing.
- Write outputs to explicit paths and print created files.
- Do not log secrets.
- All client methods are async and should run via
asyncio.run(main()). - If import fails, install dependency with
uv add nutrient-dws.