海关税单批量解析

# 海关税单批量解析 - yx-tax-batch-parser

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "海关税单批量解析" with this command: npx skills add jjjstar/yx-tax-batch-parser

海关税单批量解析 - yx-tax-batch-parser

使用阿里云百炼多语言OCR (Qwen-VL Plus) 解析海关税单PDF,提取关税和海关增值税,按税种汇总,输出JSON。

税单结构说明

海关税单(税费单)通常包含两个税种(同一报关单号,同一批货物):

  • A类(进口关税):从价税率 × 完税价格 = 关税税额
  • L类(进口增值税):根据增值税率计算

同一份税单PDF可能有1页或2页,取决于税种数量。

触发条件

当用户提到以下内容时使用此Skill:

  • "解析税单"
  • "解析海关税单"
  • "提取关税和增值税"
  • "税金汇总"
  • "海关税单JSON"
  • "税单已收到"

工作流程

1. PDF转图片

使用 pymupdf 将PDF每页转换为高分辨率图像:

import pymupdf

pdf_path = "path/to/tax.pdf"
doc = pymupdf.open(pdf_path)
images = []
for page_num in range(len(doc)):
    page = doc[page_num]
    zoom = 2.0
    mat = pymupdf.Matrix(zoom, zoom)
    pix = page.get_pixmap(matrix=mat)
    img_path = f"tax_page_{page_num+1}.png"
    pix.save(img_path)
    images.append(img_path)

2. 调用OCR(阿里云百炼 Qwen-VL Plus)

import base64, dashscope
from dashscope import MultiModalConversation

DASHSCOPE_API_KEY = "sk-d58e001170b14003939330357dd5121e"
dashscope.api_key = DASHSCOPE_API_KEY

def ocr_image(image_path):
    with open(image_path, "rb") as f:
        img_base64 = base64.b64encode(f.read()).decode("utf-8")
    
    messages = [{
        "role": "user",
        "content": [
            {"image": f"data:image/png;base64,{img_base64}"},
            {"text": """请提取这份海关税单的完整信息,严格按以下JSON格式输出,不要包含任何其他内容:
{
  "contractNo": "合同号(从合同号字段提取)",
  "billOfLading": "提单号(从提单号字段提取)",
  "customsDeclarationNo": "报关单号",
  "taxType": "税种(A=关税,L=增值税)",
  "taxAmount": 税款金额(数字,从税款金额字段提取),
  "detailList": [
    {
      "hsCode": "税号",
      "productName": "货名/品名",
      "quantity": 数量(数字),
      "unit": "单位",
      "currency": "币制",
      "exchangeRate": 外汇折算率(数字),
      "dutiablePrice": 完税价格(数字,若无则填0),
      "tariffRate": 从价税率(数字,若无则填0),
      "taxAmount": 税额(数字,从税额列提取,若无则用页眉税款金额)
    }
  ]
}
只输出JSON,不要有其他文字。"""}
        ]
    }]
    
    response = MultiModalConversation.call(model="qwen-vl-plus", messages=messages)
    return response.output.choices[0].message.content[0]["text"]

3. 解析字段说明

字段来源示例
contractNo合同号NZ2072/25
billOfLading提单号HKGHUA25090055
customsDeclarationNo报关单号520220251025055499
taxType税种(A=关税,L=增值税)A, L
taxAmount税款金额(页眉)33303.31
hsCode税号3901200099
productName货名高密度聚乙烯
quantity数量49.5
unit单位
currency币制USD
exchangeRate外汇折算率7.1384
dutiablePrice完税价格512358.66
tariffRate从价税率0.065
detailTaxAmount税额(表内)33303.31

4. 汇总逻辑

  • 外汇折算率:取第一条记录的数值
  • 数量累计:同一批次号(报关单号)下所有明细的quantity相加
  • 按税种汇总
    • tariffSummary:所有A类(关税)的taxAmount汇总
    • vatSummary:所有L类(增值税)的taxAmount汇总

5. 输出JSON结构

{
  "contractNo": "NZ2072/25",
  "billOfLading": "HKGHUA25090055",
  "exchangeRate": 7.1384,
  "totalQuantity": 49.5,
  "unit": "吨",
  "taxSummary": {
    "tariffAmount": 33303.31,
    "vatAmount": 70936.06
  },
  "detailList": [
    {
      "taxType": "A",
      "taxTypeName": "进口关税",
      "hsCode": "3901200099",
      "productName": "高密度聚乙烯",
      "quantity": 49.5,
      "unit": "吨",
      "currency": "USD",
      "exchangeRate": 7.1384,
      "dutiablePrice": 512358.66,
      "tariffRate": 0.065,
      "taxAmount": 33303.31
    },
    {
      "taxType": "L",
      "taxTypeName": "进口增值税",
      "hsCode": "3901200099",
      "productName": "高密度聚乙烯",
      "quantity": 49.5,
      "unit": "吨",
      "currency": "USD",
      "exchangeRate": 7.1384,
      "dutiablePrice": 0,
      "tariffRate": 0,
      "taxAmount": 70936.06
    }
  ]
}

6. 保存结果

  • 文件名: {合同号}_tax.json(如 NZ2072_25_tax.json
  • 路径: 与原始PDF同目录

注意事项

  1. API Key: 使用 sk-d58e001170b14003939330357dd5121e (阿里云百炼)
  2. PDF转换: 扫描件需先转图像,文字版PDF可直接提取
  3. 税种判断: A=关税(tariff),L=增值税(VAT)
  4. 税额取值优先级: 表内税额列 > 页眉税款金额
  5. 编码: Python输出设置 sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
  6. 数量单位: 海关税单数量单位通常为"吨",需注意与批次发票的"KG"区分

使用示例

用户输入:

解析税单:C:\Users\Administrator.openclaw\workspace\ineos_attachments\NZ2072-25税单.pdf

执行步骤:

  1. 使用 scripts/parse_tax.py 脚本
  2. 提取 contractNo、billOfLading、taxType、taxAmount、detailList
  3. 外汇折算率取第一条
  4. 按A类(关税)和L类(增值税)分别汇总
  5. 输出JSON文件

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Debugging R Environment And Dependencies

Diagnose and fix R environment issues, including package installation failures, dependency conflicts, system library problems, renv errors, and Bioconductor...

Registry SourceRecently Updated
General

Sci Data Extractor

AI-powered tool for extracting structured data from scientific literature PDFs

Registry SourceRecently Updated
General

MoreLogin

Manage MoreLogin anti-detect browser profiles and cloud phones through the official Local API (http://127.0.0.1:40000), including browser profile lifecycle,...

Registry SourceRecently Updated
General

realtime-interact-overlay

实时交互浮窗技能。在需要用户确认、输入或交互的场景中,通过浮窗方式在当前操作界面旁边进行交互, 而不是回到OpenClaw聊天窗口。适用于:(1) 评论内容需要用户确认后执行,(2) 删除文件前需要用户确认, (3) 购物付款时需要输入密码,(4) 任何需要即时交互的场景。支持系统级浮窗和浏览器内浮窗。

Registry SourceRecently Updated