glm-v-model

智谱 GLM-4V/4.6V 视觉模型调用技能。用于图像/视频理解、多模态对话、图表分析等任务。 当用户提到:图片理解、图像识别、视觉模型、GLM-4V、GLM-4.6V、多模态分析、看图说话、图表分析、视频理解时使用此技能。

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "glm-v-model" with this command: npx skills add baokui/glm-v-model

GLM 视觉模型调用

本技能提供调用智谱 AI 的 GLM-4V 和 GLM-4.6V 视觉模型的能力,支持图像理解、视频分析、图表解读等功能。

支持的模型

模型说明特点
glm-4vGLM-4 视觉模型基础视觉理解
glm-4.6vGLM-4.6V 视觉模型更强的视觉理解能力,支持更长上下文

快速使用

基本图像理解

from zai import ZhipuAiClient
import base64

client = ZhipuAiClient(api_key="YOUR_API_KEY")

# 读取本地图片并转为 base64
with open("image.jpg", "rb") as f:
    img_base = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="glm-4.6v",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_base}"}},
            {"type": "text", "content": "描述这张图片"}
        ]
    }],
    thinking={"type": "enabled"}
)
print(response.choices[0].message.content)

使用图片URL

response = client.chat.completions.create(
    model="glm-4.6v",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}},
            {"type": "text", "content": "这张图片里有什么?"}
        ]
    }]
)

多图理解

response = client.chat.completions.create(
    model="glm-4.6v",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "图片1 base64 或 URL"}},
            {"type": "image_url", "image_url": {"url": "图片2 base64 或 URL"}},
            {"type": "text", "content": "比较这两张图片的异同"}
        ]
    }]
)

视频理解(GLM-4.6V)

# 支持理解视频内容
response = client.chat.completions.create(
    model="glm-4.6v",
    messages=[{
        "role": "user",
        "content": [
            {"type": "video_url", "video_url": {"url": "视频URL"}},
            {"type": "text", "content": "描述这个视频的内容"}
        ]
    }]
)

使用脚本

项目中已包含脚本 script/infer_glmv.py,可直接调用:

import sys
sys.path.append('/Users/guobaokui/.openclaw/workspace_multmodal/skills/glm-v-model/script')
from infer_glmv import glm_v

# 使用方式
# glm_v(['image.jpg'], '描述图片', 'glm-4.6v')

常用场景

场景Prompt 示例
图片描述"详细描述这张图片的内容"
图表分析"分析这张图表数据"
文字识别(OCR)"提取图片中的文字"
物体识别"图片中有哪些物体"
场景理解"这是什么地方"
多图对比"比较这两张图片的异同"
视频理解"总结这个视频的内容"

注意事项

  1. API Key: 需要智谱 AI 的 API Key,可从 https://open.bigmodel.cn 获取
  2. 图片格式: 支持 JPEG、PNG、WebP 等常见格式
  3. 图片大小: 单张图片建议不超过 10MB
  4. thinking: 可启用深度思考模式 thinking={"type": "enabled"}
  5. 计费: 按 token 计费,图片会转换为 token 消耗

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Cclaw

Open-source comedy AI + video editing + poster generation. Create standup/sketch/manzai/scripts, edit videos via FFmpeg, and generate comedy posters via canv...

Registry SourceRecently Updated
General

Bird Recognition Tool | 鸟类识别工具

Identifies bird species in images/videos of target areas. Supports recognition of no less than 500 common bird species, supports customized model training, s...

Registry SourceRecently Updated
General

Image Amazon Product Image Suite

A professional product image generation skill purpose-built for the Amazon e-commerce platform. Outputs comply with Amazon's image guidelines while optimizin...

Registry SourceRecently Updated
General

SearchOnlineAssets

Online asset search tool: queries public stock libraries (Pixabay) for high-quality photos, illustrations, vectors and videos, returning result metadata and...

Registry SourceRecently Updated