Zhipu Web Page Reader
Fetch and parse web page content via Zhipu AI's Reader API (/paas/v4/reader), using lightweight cURL. Returns parsed page content in Markdown or plain text format, along with metadata like title and description.
Quick Start
Basic cURL Usage
curl --request POST \
--url https://open.bigmodel.cn/api/paas/v4/reader \
--header "Authorization: Bearer $ZHIPU_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"url": "https://www.example.com"
}'
Script Usage
A wrapper shell script is provided for convenience.
# Basic Fetch (returns Markdown by default)
bash scripts/zhipu_fetch.sh --url "https://www.example.com"
# Fetch as plain text, no cache
bash scripts/zhipu_fetch.sh \
--url "https://docs.python.org/3/" \
--format text \
--no-cache
# Fetch with image and link summaries
bash scripts/zhipu_fetch.sh \
--url "https://news.example.com/article" \
--images-summary \
--links-summary
# Fetch without images, disable GFM
bash scripts/zhipu_fetch.sh \
--url "https://blog.example.com/post" \
--no-images \
--no-gfm
API Parameter Reference
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | ✅ | - | URL of the web page to fetch |
timeout | integer | - | 20 | Request timeout in seconds |
no_cache | boolean | - | false | Disable caching (true/false) |
return_format | string | - | markdown | Return format: markdown or text |
retain_images | boolean | - | true | Retain images in output (true/false) |
no_gfm | boolean | - | false | Disable GitHub Flavored Markdown (true/false) |
keep_img_data_url | boolean | - | false | Keep image data URLs (true/false) |
with_images_summary | boolean | - | false | Include images summary (true/false) |
with_links_summary | boolean | - | false | Include links summary (true/false) |
Response Structure
The API returns JSON with the parsed page content.
{
"id": "task-id",
"created": 1704067200,
"request_id": "request-id",
"model": "model-name",
"reader_result": {
"title": "Page Title",
"description": "Brief page description",
"url": "https://www.example.com",
"content": "Parsed page content (Markdown or text)",
"external": {
"stylesheet": {}
},
"metadata": {
"keywords": "page, keywords",
"viewport": "width=device-width",
"description": "Meta description",
"format-detection": "telephone=no"
}
}
}
Key Response Fields
| Field | Description |
|---|---|
reader_result.content | Main parsed content (body text, images, links) |
reader_result.title | Page title |
reader_result.description | Brief page description |
reader_result.url | Original page URL |
reader_result.metadata | Page metadata (keywords, viewport, etc.) |
Common Use Cases
| Scenario | Command |
|---|---|
| Read a documentation page | --url <doc_url> |
| Extract text only (no images) | --url <url> --no-images --format text |
| Force fresh fetch (bypass cache) | --url <url> --no-cache |
| Get content with all summaries | --url <url> --images-summary --links-summary |
| Long page with extended timeout | --url <url> --timeout 60 |
Environment Requirements
- Environment variable
ZHIPU_API_KEYmust be configured. curlcommand must be available in your system path.