affinda

Integrate with Affinda's document AI API to extract structured data from documents (invoices, resumes, receipts, contracts, and custom types). Covers authentication, client libraries (Python, TypeScript), structured outputs with Pydantic models and TypeScript interfaces, webhooks, upload patterns, and the full documentation map. Use when building integrations that parse, classify, or extract data from documents using Affinda.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "affinda" with this command: npx skills add affinda/skills/affinda-skills-affinda

Affinda — AI Document Processing Platform

Affinda extracts structured data from documents (invoices, resumes, receipts, contracts, and any custom document type) using machine learning. The API turns uploaded files into clean JSON. Over 250 million documents processed for 500+ organisations in 40 countries.

Full documentation: https://docs.affinda.com OpenAPI spec: https://api.affinda.com/static/v3/api_spec.yaml Support: support@affinda.com


Core Concepts

ConceptDescription
OrganizationTop-level account. Contains users, billing, document types, and workspaces.
WorkspaceLogical container for documents. Scopes permissions, webhooks, and processing settings.
Document TypeA model configuration defining how a specific kind of document is parsed (invoice, resume, custom).
DocumentAn uploaded file (PDF, image, DOCX, etc.) plus its extracted data and metadata.

The workflow is: Upload -> Pre-process -> Split -> Classify -> Extract -> Validate -> Export.


API Basics

Base URLs

RegionAPI Base URLApp URL
Australia (Global)https://api.affinda.comhttps://app.affinda.com
United Stateshttps://api.us1.affinda.comhttps://app.us1.affinda.com
European Unionhttps://api.eu1.affinda.comhttps://app.eu1.affinda.com

Use the base URL matching the region where the user's account was created.

Authentication

All requests require a Bearer token:

Authorization: Bearer <API_KEY>

API keys are per-user, managed at Settings -> API Keys in the Affinda dashboard. Up to 3 keys per user. Keys can have custom names and expiry dates. A key is only visible once at creation -- store it securely.

Rate Limits and File Constraints

  • High-priority queue: 30 documents/minute (exceeding returns 429)
  • Low-priority queue: No submission limit (set lowPriority: true)
  • Max file size: 20 MB (5 MB for resumes)
  • Default page limit: 20 pages per document (can be increased on request)
  • Supported formats: PDF, DOC, DOCX, XLSX, ODT, RTF, TXT, HTML, PNG, JPG, TIFF, JPEG

Client Libraries

Python (recommended)

pip install affinda
from pathlib import Path
from affinda import AffindaAPI, TokenCredential

credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)

with Path("invoice.pdf").open("rb") as f:
    doc = client.create_document(file=f, workspace="YOUR_WORKSPACE_ID")

print(doc.data)  # Extracted JSON

GitHub: https://github.com/affinda/affinda-python PyPI: https://pypi.org/project/affinda/

TypeScript / JavaScript (recommended)

npm install @affinda/affinda
import { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";

const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);

const doc = await client.createDocument({
  file: fs.createReadStream("invoice.pdf"),
  workspace: "YOUR_WORKSPACE_ID",
});

console.log(doc.data); // Extracted JSON

GitHub: https://github.com/affinda/affinda-typescript npm: https://www.npmjs.com/package/@affinda/affinda

Other Libraries

Note: The .NET and Java libraries may lag behind the Python and TypeScript libraries in feature parity.

Direct HTTP (cURL)

curl -X POST https://api.affinda.com/v3/documents \
  -H "Authorization: Bearer $AFFINDA_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "workspace=YOUR_WORKSPACE_ID"

Structured Outputs (Type-Safe Responses)

This is the recommended approach for building robust integrations. Affinda can generate typed models from your document type configuration, giving you auto-completion, validation, and type safety.

Python -- Pydantic Models

Generate Pydantic v2 models that match your document type's field schema:

# Set your API key (or export AFFINDA_API_KEY)
python -m affinda generate_models --workspace-id=YOUR_WORKSPACE_ID

This creates a ./affinda_models/ directory with one .py file per document type. Each file contains Pydantic BaseModel classes with all your configured fields as typed, optional attributes.

Use the generated models when calling the API:

from pathlib import Path
from affinda import AffindaAPI, TokenCredential
from affinda_models.invoice import Invoice  # Generated model

credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)

with Path("invoice.pdf").open("rb") as f:
    doc = client.create_document(
        file=f,
        workspace="YOUR_WORKSPACE_ID",
        data_model=Invoice,  # Enables Pydantic validation
    )

# doc.parsed is a typed Invoice instance
print(doc.parsed.invoice_number)
print(doc.parsed.total_amount)

# doc.data is still available as raw JSON
print(doc.data)

Handling validation errors gracefully:

with Path("invoice.pdf").open("rb") as f:
    doc = client.create_document(
        file=f,
        workspace="YOUR_WORKSPACE_ID",
        data_model=Invoice,
        ignore_validation_errors=True,  # Don't raise on schema mismatch
    )

if doc.parsed:
    print(doc.parsed.invoice_number)  # Type-safe access
else:
    print("Validation failed, falling back to raw data")
    print(doc.data)

CLI options:

python -m affinda generate_models --workspace-id=ID        # All types in a workspace
python -m affinda generate_models --document-type-id=ID    # Single document type
python -m affinda generate_models --organization-id=ID     # All types in an org
python -m affinda generate_models --output-dir=./my_models # Custom output path
python -m affinda generate_models --help                   # All options

TypeScript -- Generated Interfaces

Generate TypeScript interfaces that match your document type's field schema:

# Set your API key (or export AFFINDA_API_KEY)
npm exec affinda-generate-interfaces -- --workspace-id=YOUR_WORKSPACE_ID

This creates an ./affinda-interfaces/ directory with one .ts file per document type. Each file contains TypeScript interfaces with all your configured fields.

Use the generated interfaces for type-safe access:

import { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";
import { Invoice } from "./affinda-interfaces/Invoice";

const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);

const doc = await client.createDocument({
  file: fs.createReadStream("invoice.pdf"),
  workspace: "YOUR_WORKSPACE_ID",
});

const parsed = doc.data as Invoice;
console.log(parsed.invoiceNumber);  // Type-safe access
console.log(parsed.totalAmount);

CLI options:

npm exec affinda-generate-interfaces -- --workspace-id=ID       # All types in workspace
npm exec affinda-generate-interfaces -- --document-type-id=ID   # Single document type
npm exec affinda-generate-interfaces -- --output-dir=./types    # Custom output path
npm exec affinda-generate-interfaces -- --help                  # All options

Why Use Structured Outputs?

  • Type safety: Catch field name typos and type mismatches at compile/lint time
  • Auto-completion: IDE support for all extracted fields
  • Validation: Pydantic automatically validates the API response structure
  • Schema-driven: Models stay in sync with your document type configuration -- regenerate after schema changes
  • Documentation as code: The generated models serve as living documentation of your extraction schema

Document Upload Options

There are three patterns for submitting documents and retrieving results:

1. Synchronous (simplest)

Upload and block until parsing completes. The response contains the extracted data.

doc = client.create_document(file=f, workspace="WORKSPACE_ID")
# wait defaults to True -- blocks until ready
print(doc.data)

Best for: Interactive apps, low volume, quick prototyping. Limitation: Can timeout on large or complex documents.

2. Asynchronous with Polling

Upload with wait=false, receive a document ID, then poll GET /documents/{id} until ready is true.

doc = client.create_document(file=f, workspace="WORKSPACE_ID", wait=False)
# doc.data is empty -- poll until ready
doc = client.get_document(doc.meta.identifier)

Best for: Batch processing, large documents, high volume.

3. Asynchronous with Webhooks (recommended for production)

Upload the document, then receive a webhook notification when processing completes. This is the most efficient pattern for production systems.

# 1. Upload
doc = client.create_document(file=f, workspace="WORKSPACE_ID", wait=False)

# 2. Receive webhook at your endpoint when ready
# 3. Fetch full data
doc = client.get_document(identifier_from_webhook)

Best for: Real-time workflows, event-driven architectures, production systems.

See the Webhooks section below for setup details.

Upload Parameters

ParameterTypeDescription
filebinaryThe document file. Mutually exclusive with url.
urlstringURL to download and process. Mutually exclusive with file.
workspacestringWorkspace identifier (required).
documentTypestringDocument type identifier (optional -- enables skip-classification).
waitbooleantrue (default): block until done. false: return immediately.
customIdentifierstringYour internal ID for the document.
expiryTimeISO-8601Auto-delete the document at this time.
rejectDuplicatesbooleanReject if duplicate of existing document.
lowPrioritybooleanRoute to low-priority queue (no rate limit).
compactbooleanReturn compact response (with wait=true).
deleteAfterParsebooleanDelete data after parsing (requires wait=true).
enableValidationToolbooleanMake document viewable in validation UI. Set false for speed.

Response Structure

Each extracted field in the response includes metadata:

FieldDescription
rawRaw extracted text before processing
parsedProcessed value after formatting and mapping
confidenceOverall confidence score (0-1)
classificationConfidenceConfidence the field was correctly classified
textExtractionConfidenceConfidence text was correctly extracted
isVerifiedWhether the value has been validated (any means)
isClientVerifiedWhether validated by a human
isAutoVerifiedWhether auto-validated by rules
rectangleBounding box coordinates on the page
pageIndexWhich page the data appears on

Document-level metadata includes ready, failed, language, pages, isOcrd, ocrConfidence, reviewUrl, isConfirmed, isRejected, isArchived, errorCode, and errorDetail.

Full metadata reference: https://docs.affinda.com/reference/metadata


Webhooks

Affinda uses RESTHooks -- webhook subscriptions managed via REST API. Webhooks can be scoped to an organization or workspace.

Available Events

EventDescription
document.parse.completedParsing finished (succeeded or failed)
document.parse.succeededParsing succeeded
document.parse.failedParsing failed
document.validate.completedDocument confirmed (manually or auto)
document.classify.completedClassification finished
document.classify.succeededClassification succeeded
document.classify.failedClassification failed
document.rejectedDocument rejected

Setup Flow

  1. Subscribe -- POST /v3/resthook_subscriptions with targetUrl, event, and organization (or workspace).
  2. Confirm -- Affinda sends a POST to your targetUrl with an X-Hook-Secret header. Respond with 200, then call POST /v3/resthook_subscriptions/activate with that secret.
  3. Receive -- Affinda sends webhook payloads to your endpoint. Respond 200 to acknowledge.

Signature Verification

Enable payload signing via Organization Settings -> Webhook Signature Key. Incoming webhooks include an X-Hook-Signature header (<timestamp>.<signature>). Verify using HMAC-SHA256:

import hmac, hashlib, json, time

def verify_webhook(request, sig_key: bytes) -> bool:
    sig_header = request.headers["X-Hook-Signature"]
    timestamp, sig_received = sig_header.split(".")
    sig_calculated = hmac.new(sig_key, msg=request.body, digestmod=hashlib.sha256).hexdigest()

    sig_ok = hmac.compare_digest(sig_received, sig_calculated)
    body = json.loads(request.body)
    time_ok = (time.time() - body["timestamp"]) < 600  # 10 min window
    return sig_ok and time_ok

Webhook Payload

The payload contains document metadata (not the full parsed data). Use the identifier to fetch full results:

{
  "id": "e3bd1942-...",
  "event": "document.parse.completed",
  "timestamp": 1665637107,
  "payload": {
    "identifier": "abcdXYZ",
    "ready": true,
    "failed": false,
    "fileName": "invoice.pdf",
    "workspace": { "identifier": "...", "name": "..." }
  }
}

Retry Behavior

  • 200 -- Success, delivery confirmed
  • 410 -- Subscription auto-deleted (endpoint "gone")
  • Other 4xx/5xx -- Retried with exponential backoff for ~1 day

Full webhook docs: https://docs.affinda.com/reference/webhooks


Embedded Validation UI

Affinda provides a human-in-the-loop validation interface that can be embedded in your application via iframe. Each document response includes a reviewUrl -- a signed URL valid for 60 minutes.

Implementation pattern:

  1. Store only the Affinda document identifier in your system
  2. When a user needs to review, fetch a fresh reviewUrl via GET /documents/{id}
  3. Embed the URL in an iframe
  4. Do not persist the URL -- treat it as ephemeral

The UI supports custom theming (colors, fonts, border radius) in embedded mode. Contact Affinda to configure.

Full embedded docs: https://docs.affinda.com/reference/embedded


Key API Methods

Documents

MethodEndpointDescription
POST/v3/documentsUpload and parse a document
GET/v3/documents/{id}Retrieve a document and its data
PATCH/v3/documents/{id}Update document fields/status
DELETE/v3/documents/{id}Delete a document
GET/v3/documentsList documents (with filtering)
GET/v3/documents/{id}/redactedDownload redacted PDF

Workspaces

MethodEndpointDescription
GET/v3/workspacesList workspaces
POST/v3/workspacesCreate a workspace
GET/v3/workspaces/{id}Get workspace details
PATCH/v3/workspaces/{id}Update workspace
DELETE/v3/workspaces/{id}Delete workspace

Annotations

MethodEndpointDescription
GET/v3/annotationsList annotations for a document
POST/v3/annotationsCreate an annotation
PATCH/v3/annotations/{id}Update an annotation
POST/v3/annotations/batch_createBatch create annotations
POST/v3/annotations/batch_updateBatch update annotations
POST/v3/annotations/batch_deleteBatch delete annotations

Webhooks

MethodEndpointDescription
POST/v3/resthook_subscriptionsCreate subscription
POST/v3/resthook_subscriptions/activateActivate with X-Hook-Secret
GET/v3/resthook_subscriptionsList subscriptions
PATCH/v3/resthook_subscriptions/{id}Update subscription
DELETE/v3/resthook_subscriptions/{id}Delete subscription

Full API reference: https://docs.affinda.com/reference/getting-started OpenAPI spec: https://api.affinda.com/static/v3/api_spec.yaml


Common Integration Patterns

Affinda supports six integration workflow patterns depending on where validation logic lives and where exceptions are handled:

PatternDescriptionWebhook Event
W1 -- No validationUpload -> get JSON. No rules, no human review.document.parse.completed
W2 -- Client-side validationSame as W1; your system applies rules after export.document.parse.completed
W3 -- Affinda validation logicAffinda validates automatically; no human review.document.validate.completed
W4 -- Review all in AffindaHumans review every document in Affinda UI.document.validate.completed
W5 -- Client rules + Affinda reviewYour rules, pushed back as warnings; flagged docs reviewed in Affinda.document.parse.completed then document.validate.completed
W6 -- Full Affinda validationAffinda validates; exceptions reviewed in Affinda UI.document.validate.completed

For most new integrations, W1 or W2 is the simplest starting point. W6 provides the most automation with human-in-the-loop for exceptions.

Full solution design guide: https://docs.affinda.com/academy/solution-design


Common Errors

Error CodeMeaningResolution
duplicate_document_errorDocument rejected as duplicateDisable "Reject duplicates" or upload unique files
no_text_foundNo extractable textCheck file is not a photo of an object; try OCR
file_corruptedFile is corruptedRe-upload a valid file
file_too_largeExceeds 20 MB limitReduce file size
invalid_file_typeUnsupported formatUse PDF, DOC, DOCX, XLSX, ODT, RTF, TXT, HTML, PNG, JPG, TIFF, JPEG
no_parsing_creditsOut of creditsPurchase more credits and reparse
password_protectedFile is password-protectedRemove password and re-upload
document_classification_failedNo matching document typeCheck document type configuration or disable "Reject Documents"
capacity_exceededSystem capacity exceededWait and retry
parse_terminatedExceeded timeoutContact Affinda for custom limits

Full error reference: https://docs.affinda.com/error-glossary


Documentation Map

Use this index to find detailed information on specific topics. Each link goes to the full documentation page.

Affinda Academy (Tutorials)

Configuration Guide

Overview & Workflow:

  • Workflow -- End-to-end document processing pipeline stages.
  • Glossary -- Platform terminology definitions.
  • Document Status -- For Review, Confirmed, Archived, Rejected states.

Ingestion & Pre-Processing:

  • Ingestion -- Upload methods: manual, email, API.
  • Email Upload -- Email-to-workspace document ingestion.
  • Pre-Processing -- Automated cleaning before extraction.
  • OCR -- OCR modes: Skip, Auto-detect, Partial, Full.
  • Duplicates -- Duplicate detection and rejection.

Splitting, Classification & Extraction:

Validation & Export:

API Reference

Resume Parsing Guide

Additional Resources

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

ClawHub CLI Assistant

Use the ClawHub CLI to publish, inspect, version, update, sync, and troubleshoot OpenClaw skills from the terminal.

Registry SourceRecently Updated
21.8K
Profile unavailable
Coding

Self Updater

⭐ OPEN SOURCE! GitHub: github.com/GhostDragon124/openclaw-self-updater ⭐ ONLY skill with Cron-aware + Idle detection! Auto-updates OpenClaw core & skills, an...

Registry SourceRecently Updated
1106
Profile unavailable
Coding

SkillTree Learning Progress Tracker

Track learning across topics like an RPG skill tree. Prerequisites, milestones, suggested next steps. Gamified learning path.

Registry SourceRecently Updated
084
Profile unavailable