Search Cluster (Industrial Standard v3.1)
A multi-provider search aggregator designed for high-availability and security.
Installation
The scrapling provider requires a dedicated virtual environment.
- Create a venv: python3 -m venv venv/scrapling
- Install scrapling: venv/scrapling/bin/pip install scrapling
- Provide the path to the venv binary in SCRAPLING_PYTHON_PATH.
Security Posture
- Subprocess Isolation: Query inputs are passed as arguments to stealth_fetch.py.
- Strict TLS: Mandatory SSL verification on all providers.
- Sanitization: Integrated native internal scrubber (Path Neutral).
Requirements and Environment
Declare these variables in your environment or vault:
| Variable | Requirement | Description |
|---|---|---|
| GOOGLE_API_KEY | Optional | API Key for Google Custom Search. |
| GOOGLE_CSE_ID | Optional | Search Engine ID for Google CSE. |
| SCRAPLING_PYTHON_PATH | Optional | Path to the scrapling venv python binary. |
| REDIS_HOST | Optional | Host for result caching. |
| REDIS_PORT | Optional | Port for result caching (Default: 6379). |
| SEARCH_USER_AGENT | Optional | Custom User-Agent string. |
Providers
- google: Official Google Custom Search.
- wiki: Wikipedia OpenSearch API.
- reddit: Reddit JSON search API.
- gnews: Google News RSS aggregator.
- scrapling: Headless stealth scraping (via DuckDuckGo).
Included Scripts
- scripts/search-cluster.py: Main entry point.
- scripts/stealth_fetch.py: Scrapling fetcher (REQUIRED for scrapling provider).
Workflow
- Execute: scripts/search-cluster.py all "<query>"
- Output is structured JSON with source, title, link, and sanitized snippet.