Feishu Knowledge Ingest
Use this skill to turn a Feishu folder or a single shared attachment into structured, reviewable knowledge outputs.
What this skill does
- Accept a Feishu folder link/token or a single shared attachment.
- Classify files into direct-read, download-and-parse, manual-review, or permission-blocked.
- Parse
.docxand.pdfin v0.1. - Produce report-first outputs instead of writing
MEMORY.mddirectly. - Preserve failures and uncertainty instead of guessing content.
Supported v0.1 scope
Inputs
- Feishu folder link or
folder_token - Single shared attachment link or token
Parsing
.docx.pdf
Outputs
ingest-report.mdkb-items.jsonlfailed-items.jsonlMEMORY.candidate.md
Required behavior
- Distinguish Feishu native docs from uploaded attachments.
- Native docs:
doc,sheet,wiki,bitable - Uploaded attachments:
.docx,.pdf,.pptx, other files
- Native docs:
- Do not claim attachment content was learned unless text was actually extracted.
- Default to report-first. Do not write
MEMORY.mdin v0.1. - Record every failed file with a concrete reason.
- Prefer plain-text summaries over complex Feishu cards when reporting progress.
File routing rules
Direct-read
Treat these as direct-read only when the runtime has a reliable native-reader path:
docsheetwikibitable
Download-and-parse
Treat these as download-and-parse:
.docx.pdf
Manual-review
Route here when the file is out of scope or low-confidence in v0.1:
.pptx- images
- scans with no extractable text
- archives
- unusual file types
Permission-blocked
Route here when listing is possible but the file cannot be downloaded or read.
Standard workflow
- Resolve input type.
- Folder link/token -> enumerate files.
- Single file link/token -> build a one-file manifest.
- Create a batch record.
- Generate
batch_id. - Record
started_at.
- Generate
- Build a manifest.
- File name
- File token/link
- file type
- route decision
- Attempt extraction.
.docx-> useparsers/parse_docx.py.pdf-> useparsers/parse_pdf.py
- Produce structured outputs.
- success -> append to
kb-items.jsonl - failure -> append to
failed-items.jsonl
- success -> append to
- Summarize the batch.
- Write
ingest-report.md - Write
MEMORY.candidate.md
- Write
- Finish the batch.
- Record
finished_at - Never auto-write
MEMORY.md
- Record
Output contracts
kb-items.jsonl
Write one JSON object per successfully extracted knowledge item with at least:
batch_idsource_filesource_tokenfile_typetopiccontent_typesummaryextracted_atconfidence
failed-items.jsonl
Write one JSON object per failed or blocked file with at least:
batch_idsource_filesource_tokenfile_typefailure_reasonerror_detailsuggested_actionfailed_at
MEMORY.candidate.md
Include:
- batch header (
batch_id,started_at,finished_at,source_directoryorsource_file) - grouped knowledge summaries
- source references
- confidence notes
- items needing review
ingest-report.md
Include:
- Batch summary
- Input scope
- File counts and routing counts
- Successful extraction summary
- Failures and risks
- Recommended next actions
Safety rules
- Never invent text that was not extracted.
- If parsing fails, say so plainly and log it.
- Treat filenames as hints only, never as proof of document contents.
- Keep sensitive data out of
MEMORY.candidate.mdunless the workflow explicitly allows it.
Included files
run.py: minimal batch runner for local testingparsers/parse_docx.py: docx text extraction helperparsers/parse_pdf.py: pdf text extraction helperreferences/output_examples.md: sample output shapes and field guidanceREADME.md: setup and usage notes