Dify Knowledge Base Upload

Use this skill when

The target Dify knowledge base is backed by a published pipeline datasource.
UI uploads succeed but document/create-by-file or create-by-text yields completed with 0 chunks.
You need to upload one local file and optionally write existing metadata fields afterward.

Provide a local file path to upload.
For pure text, save it as .txt or .md first and upload that file.
Set caller env values:
- DIFY_API_BASE_URL
- DIFY_DATASET_ID
- DIFY_API_KEY
Optionally provide pipeline inputs JSON if the published pipeline exposes user input variables.
Optionally provide metadata JSON:
- a flat object mapping existing Dify metadata field names to values, or
- a list of { "name": ..., "value": ... } / { "id": ..., "value": ... } items.
- The skill does not hardcode your metadata schema. Pass the current field names or ids in the JSON you provide at runtime.
Read references/env.md only if you need env or debug details.

python3 scripts/upload_to_dataset.py \
  --file /path/to/document.pdf

python3 scripts/upload_to_dataset.py \
  --file /path/to/document.pdf \
  --inputs-json assets/example-pipeline-inputs.json

python3 scripts/upload_to_dataset.py \
  --file /path/to/document.pdf \
  --metadata-json assets/example-metadata.json

Success prints JSON with batch, document_id, file_upload_response, pipeline_run_response, indexing_status_response, document_response, optional metadata_response, and validation.
validation.ok is only true when:
- indexing_status = "completed"
- total_segments > 0
- segment_count > 0
If tokens is returned by the document detail API, it should also be greater than 0. Some pipeline deployments leave tokens = null; the script reports that as a warning instead of a hard failure.
The script exits non-zero if the required checks fail.
If --metadata-json is omitted, the script uploads the file only and skips metadata API calls.
Metadata keys must already exist in the target dataset. Unknown names fail fast before the metadata write request.
assets/example-metadata.json is only a template. Replace its keys with your own existing Dify metadata field names before live use.
Use --dry-run to validate local files and request shape without calling Dify.

If auth fails, verify DIFY_API_KEY and Authorization header format.
If datasource discovery fails, verify the dataset has a published pipeline datasource and the key belongs to that dataset.
If the pipeline run fails with missing input errors, inspect user_input_variables from discovery_response and pass the missing keys in --inputs-json.
If the published datasource is not local_file, this skill is not the right uploader.
If metadata resolution fails, rename keys in the metadata JSON to the exact field names defined in Dify.
If you only have raw text, save it to .txt or .md and upload that file.
In the current tested deployment, .md uploaded successfully while .txt hit a Dify-side indexing 400. If plain text fails as .txt, retry as .md.