Scry Skill
Scry gives you read-only SQL access to the ExoPriors public corpus (229M+ entities)
via a single HTTP endpoint. You write Postgres SQL against a curated scry.* schema
and get JSON rows back. There is no ORM, no GraphQL, no pagination token -- just SQL.
Skill generation: 2026031601
A) When to use / not use
Use this skill when:
- Searching, filtering, or aggregating content across the ExoPriors corpus
- Running lexical (BM25) or hybrid searches
- Exploring author networks, cross-platform identities, or publication patterns
- Navigating the OpenAlex academic graph (authors, citations, institutions, concepts)
- Creating shareable artifacts from query results
- Emitting structured agent judgements about entities or external references
Do NOT use this skill when:
- The user wants semantic/vector search composition or embedding algebra (use the scry-vectors skill)
- The user wants LLM-based reranking (use the scry-rerank skill)
- The user is querying their own local database
B) Golden Rules
-
Context handshake first. At session start, call
GET /v1/scry/context?skill_generation=2026031601. This endpoint is public; you do not need a key for the handshake itself. Use the returnedofferingsblock for the current product summary budgets, canonical env var, default skill, and specialized skill catalog. If you need a concise shareable bootstrap prompt for another agent, useofferings.public_agent_prompt.copy_textinstead of paraphrasing your own. If you need deeper docs, useofferings.canonical_doc_path, each skill'srepo_path, andreference_pathsinstead of guessing where the maintained docs live. If you cache descriptive bootstrap context across turns or sessions, also tracksurface_context_generationand refresh when it changes. Readlexical_search.statusas well: if it is nothealthy, stop assuming globalscry.search*is reliable and pivot to source-localscry.*/mv_*surfaces or semantic retrieval while the canonical BM25 index recovers. Ifshould_update_skill=true, tell the user to runnpx skills update. -
Schema first. ALWAYS call
GET /v1/scry/schemabefore writing SQL. Never guess column names or types. The schema endpoint returns live column metadata and row-count estimates for every view. -
Check operational status when search looks wrong. If lexical search, materialized-view freshness, or corpus behavior seems off, call
GET /v1/scry/index-view-statusbefore assuming the query or schema is wrong. -
Clarify ambiguous intent before heavy queries. If the request is vague ("search Reddit for X", "find things about Y"), ask one short clarification question about the goal/output format before running expensive SQL.
-
Start with a cheap probe. Before any query likely to run >5s, run
/v1/scry/estimateand/or a tight exploratory query (LIMIT 20plus scoped source/window filters), then scale only after confirming relevance. -
Choose lexical vs semantic explicitly. Use lexical (
scry.search*) for exact terms and named entities. For conceptual intent ("themes", "things like", "similar to"), route to scry-vectors first, then optionally hybridize. -
LIMIT always. Every query MUST include a LIMIT clause. Max 10,000 rows. Queries without LIMIT are rejected by the SQL validator.
-
Prefer materialized views.
scry.entitieshas 229M+ rows. Scanning it without filters is slow. Usescry.mv_lesswrong_posts,scry.mv_arxiv_papers,scry.mv_hackernews_posts, etc. for targeted access. They are pre-filtered and often have embeddings pre-joined. -
Filter dangerous content. Always include
WHERE content_risk IS DISTINCT FROM 'dangerous'unless the user explicitly asks for unfiltered results. Dangerous content contains adversarial prompt-injection content. -
Raw SQL, not JSON.
POST /v1/scry/querytakesContent-Type: text/plainwith raw SQL in the body. Not JSON-wrapped SQL. -
File rough edges promptly. If Scry blocks the task, misses an obvious result set, or exposes a rough edge, submit a brief note to
POST /v1/feedback?feedback_type=suggestion|bug|other&channel=scry_skillusingContent-Type: text/plainby default (text/markdownalso works). Do not silently work around it. Logged-in users can review their submissions withGET /v1/feedback.
For full tier limits, timeout policies, and degradation strategies, see Shared Guardrails.
B.1 API Key Setup (Canonical)
Recommended default for less-technical users: in the directory where you launch the agent, store SCRY_API_KEY in .env so skills and copied prompts use the same place.
Canonical key naming for this skill:
- Env var:
SCRY_API_KEY - Anonymous bootstrap key format:
scry_anon_*fromPOST /v1/scry/anonymous-key - Personal key format: personal Scry API key with Scry access
- Recommended anonymous client header:
X-Scry-Client-Tag: <short-stable-tag>
printf '%s\n' 'SCRY_API_KEY=<your key>' >> .env
set -a && source .env && set +a
Verify:
echo "$SCRY_API_KEY"
Anonymous bootstrap flow when the user wants immediate public access without signup:
CLIENT_TAG="${SCRY_CLIENT_TAG:-dev-laptop}"
ANON_KEY="$(curl -s https://api.scry.io/v1/scry/anonymous-key -X POST -H "X-Scry-Client-Tag: $CLIENT_TAG" | python3 -c 'import json,sys; print(json.load(sys.stdin)[\"api_key\"])')"
curl -s https://api.scry.io/v1/scry/schema \
-H "Authorization: Bearer $ANON_KEY" \
-H "X-Scry-Client-Tag: $CLIENT_TAG"
curl -s https://api.scry.io/v1/scry/query \
-H "Authorization: Bearer $ANON_KEY" \
-H "X-Scry-Client-Tag: $CLIENT_TAG" \
-H "Content-Type: text/plain" \
--data "SELECT 1 LIMIT 1"
Use this for fast trial access only. The anonymous bootstrap lane is intentionally generous for the first few queries and then degrades. For sustained usage, prefer a personal Scry API key.
Keep the same X-Scry-Client-Tag value on the same device when staying anonymous so the backend can distinguish a real first-use session from abuse behind shared IPs.
If using packaged skills, keep them current:
npx skills add exopriors/skills
npx skills update
B.1b x402 Query-Only Access
POST /v1/scry/query still supports standard x402, but it is now an explicit
paid path rather than the default no-auth bootstrap path. Use x402 when the
user already has an x402-capable wallet/client and only needs direct paid query
execution. For public trial use, use POST /v1/scry/anonymous-key. For
schema/context, shares, judgements, feedback, or repeated multi-endpoint usage,
prefer a personal Scry API key.
If the user wants wallet-native durable identity plus a reusable key, use
POST /v1/auth/agent/signup first. That binds the wallet to a user and returns
a session token plus API key in one flow.
Minimal client shape:
import { wrapFetchWithPayment } from 'x402-fetch';
const paidFetch = wrapFetchWithPayment(fetch, walletClient);
const resp = await paidFetch('https://api.scry.io/v1/scry/query', {
method: 'POST',
headers: { 'content-type': 'text/plain' },
body: 'SELECT 1 LIMIT 1',
});
C) Quickstart
One end-to-end example: find recent high-scoring LessWrong posts about RLHF.
Step 1: Get dynamic context + update advisory
GET https://api.scry.io/v1/scry/context?skill_generation=2026031601
Authorization: Bearer $SCRY_API_KEY
Step 2: Get schema
GET https://api.scry.io/v1/scry/schema
Authorization: Bearer $SCRY_API_KEY
Step 3: Run query
POST https://api.scry.io/v1/scry/query
Authorization: Bearer $SCRY_API_KEY
Content-Type: text/plain
WITH hits AS (
SELECT id FROM scry.search('RLHF reinforcement learning human feedback',
mode=>'mv_lesswrong_posts', kinds=>ARRAY['post'], limit_n=>100)
)
SELECT mv.uri, mv.title, mv.original_author, mv.original_timestamp, mv.base_score
FROM hits h
JOIN scry.mv_lesswrong_posts mv ON mv.entity_id = h.id
ORDER BY mv.base_score DESC NULLS LAST, mv.original_timestamp DESC
LIMIT 20
Response shape:
{
"columns": ["uri", "title", "original_author", "original_timestamp", "base_score"],
"rows": [["https://...", "My RLHF Post", "author", "2025-01-15T...", 142], ...],
"row_count": 20,
"duration_ms": 312,
"truncated": false
}
D) Decision Tree
User wants to search the ExoPriors corpus?
|
+-- Ambiguous / conceptual ask? --> Clarify intent first, then use
| scry-vectors for semantic search (optionally hybridize with lexical)
|
+-- By keywords/phrases? --> scry.search() (BM25 lexical over canonical content_text)
| +-- Specific forum? --> pass mode='mv_lesswrong_posts' or kinds filter
| +-- Reddit? --> START with scry.reddit_subreddit_stats /
| scry.reddit_clusters() / scry.reddit_embeddings
| and trust /v1/scry/schema status before
| using direct retrieval helpers
| +-- Large result? --> scry.search_ids() (up to 2000 lexical IDs; join for fields)
|
+-- By structured filters (source, date, author)? --> Direct SQL on MVs
|
+-- By semantic similarity? --> (scry-vectors skill, not this one)
|
+-- Hybrid (keywords + semantic rerank)? --> scry.hybrid_search() or
| lexical CTE + JOIN scry.embeddings
|
+-- Author/people lookup? --> scry.actors, scry.people, scry.person_accounts
|
+-- Academic graph (OpenAlex)? --> scry.openalex_find_authors(),
| scry.openalex_find_works(), etc. (see schema-guide.md)
|
+-- Need to share results? --> POST /v1/scry/shares
|
+-- Need to emit a structured observation? --> POST /v1/scry/judgements
|
+-- Scry blocked / missing obvious results? --> POST /v1/feedback
E) Recipes
E0. Context handshake + skill update advisory
curl -s "https://api.scry.io/v1/scry/context?skill_generation=2026031601" \
-H "Authorization: Bearer $SCRY_API_KEY"
If response includes "should_update_skill": true, ask the user to run:
npx skills update.
If response includes "lexical_search": {"status": "rebuilding"|"degraded"|"stale"|...},
prefer source-local scry.* / mv_* surfaces and use
/v1/scry/index-view-status for detailed rebuild timing before blaming the query.
E0b. Submit feedback when Scry blocks the task
curl -s "https://api.scry.io/v1/feedback?feedback_type=bug&channel=scry_skill" \
-H "Authorization: Bearer $SCRY_API_KEY" \
-H "Content-Type: text/plain" \
--data $'## What happened\n- Query: ...\n- Problem: ...\n\n## Why it matters\n- ...\n\n## Suggested fix\n- ...'
Success response includes a receipt id. Logged-in users can review their own
submissions with:
curl -s "https://api.scry.io/v1/feedback?limit=10" \
-H "Authorization: Bearer $SCRY_API_KEY"
E1. Lexical search (BM25)
WITH c AS (
SELECT id FROM scry.search('your query here',
kinds=>ARRAY['post'], limit_n=>100)
)
SELECT e.uri, e.title, e.original_author, e.original_timestamp
FROM c JOIN scry.entities e ON e.id = c.id
WHERE e.content_risk IS DISTINCT FROM 'dangerous'
LIMIT 50
Default kinds if omitted: ['post','paper','document','webpage','twitter_thread','grant'].
scry.search() broadens once to kinds=>ARRAY['comment'] if that default returns 0 rows.
Pass explicit kinds for strict scope (for example comment-only or tweet-only).
Pass mode=>'mv_lesswrong_posts' to scope to LessWrong posts.
When a source-specific MV exists, join that MV and use its native score field
(base_score for LessWrong / EA Forum) instead of sorting the full scry.entities row.
E2. Reddit-specific discovery
SELECT subreddit, total_count, latest
FROM scry.reddit_subreddit_stats
WHERE subreddit IN ('MachineLearning', 'LocalLLaMA')
ORDER BY total_count DESC
For semantic Reddit retrieval over the embedding-covered subset, use
scry.reddit_embeddings or scry.search_reddit_posts_semantic(...).
Direct retrieval helpers (scry.reddit_posts, scry.reddit_comments,
scry.mv_reddit_*, scry.search_reddit_posts(...),
scry.search_reddit_comments(...)) are currently degraded on the public
instance. Check /v1/scry/schema status before using them.
E3. Source-filtered materialized view query
SELECT entity_id, uri, title, original_author, score, original_timestamp
FROM scry.mv_arxiv_papers
WHERE original_timestamp >= '2025-01-01'
ORDER BY original_timestamp DESC
LIMIT 50
score is NULL for arXiv papers on the public surface. Sort by
original_timestamp, category, or downstream citation proxies instead.
E4. Author activity across sources
SELECT e.source::text, COUNT(*) AS docs, MAX(e.original_timestamp) AS latest
FROM scry.entities e
WHERE e.original_author ILIKE '%yudkowsky%'
AND e.content_risk IS DISTINCT FROM 'dangerous'
GROUP BY e.source::text
ORDER BY docs DESC
LIMIT 20
E5. Recent entity kind distribution for a source
SELECT kind::text, COUNT(*)
FROM scry.entities
WHERE source = 'hackernews'
AND original_timestamp >= '2025-01-01'
GROUP BY kind::text
ORDER BY 2 DESC
LIMIT 20
Removing the date bound turns this into a large base-table aggregation. Run
/v1/scry/estimate first or prefer source-specific MVs when they already cover
the question.
E6. Hybrid search (lexical + semantic rerank in SQL)
WITH c AS (
SELECT id FROM scry.search('deceptive alignment',
kinds=>ARRAY['post'], limit_n=>200)
)
SELECT e.uri, e.title, e.original_author,
emb.embedding_voyage4 <=> @p_deadbeef_topic AS distance
FROM c
JOIN scry.entities e ON e.id = c.id
JOIN scry.embeddings emb ON emb.entity_id = c.id AND emb.chunk_index = 0
WHERE e.content_risk IS DISTINCT FROM 'dangerous'
ORDER BY distance
LIMIT 50
Requires a stored embedding handle (@p_deadbeef_topic). See scry-vectors
skill for creating handles.
E7. Cost estimation before execution
curl -s -X POST https://api.scry.io/v1/scry/estimate \
-H "Authorization: Bearer $SCRY_API_KEY" \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT id, title FROM scry.mv_arxiv_papers LIMIT 1000"}'
Returns EXPLAIN (FORMAT JSON) output. Use this for expensive queries before committing.
It does not prove BM25 helper health: if scry.search* fails, check
/v1/scry/index-view-status and /v1/scry/schema status as well.
The /v1/scry/context handshake now also exposes lexical_search.status for
cheap degraded-mode detection before you start issuing lexical helpers.
E8. Create a shareable artifact
# 1. Run query and capture results
# 2. POST share
curl -s -X POST https://api.scry.io/v1/scry/shares \
-H "Authorization: Bearer $SCRY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"kind": "query",
"title": "Top RLHF posts on LessWrong",
"summary": "20 highest-scored LW posts mentioning RLHF.",
"payload": {
"sql": "...",
"result": {"columns": [...], "rows": [...]}
}
}'
Kinds: query, rerank, insight, chat, markdown.
Progressive update: create stub immediately, then PATCH /v1/scry/shares/{slug}.
Rendered at: https://scry.io/scry/share/{slug}.
E9. Emit a structured agent judgement
curl -s -X POST https://api.scry.io/v1/scry/judgements \
-H "Authorization: Bearer $SCRY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"emitter": "my-agent",
"judgement_kind": "topic_classification",
"target_external_ref": "arxiv:2401.12345",
"summary": "Paper primarily about mechanistic interpretability.",
"payload": {"primary_topic": "mech_interp", "confidence_detail": "title+abstract match"},
"confidence": 0.88,
"tags": ["arxiv", "mech_interp"],
"privacy_level": "public"
}'
Exactly one target required: target_entity_id, target_actor_id,
target_judgement_id, or target_external_ref.
Judgement-on-judgement: use target_judgement_id to chain observations.
E10. People / author lookup
-- Per-source author grouping
SELECT a.handle, a.display_name, a.source::text, COUNT(*) AS docs
FROM scry.entities e
JOIN scry.actors a ON a.id = e.author_actor_id
WHERE e.source = 'twitter'
GROUP BY a.handle, a.display_name, a.source::text
ORDER BY docs DESC
LIMIT 50
E11. Thread navigation (replies)
-- Find all replies to a root post
SELECT id, uri, title, original_author, original_timestamp
FROM scry.entities
WHERE anchor_entity_id = 'ROOT_ENTITY_UUID'
ORDER BY original_timestamp
LIMIT 100
anchor_entity_id is the root subject; parent_entity_id is the direct parent.
E12. Count estimation (safe pattern)
Avoid COUNT(*) on large tables. Instead, use schema endpoint row estimates or:
SELECT reltuples::bigint AS estimated_rows
FROM pg_class
WHERE relname = 'mv_lesswrong_posts'
LIMIT 1
Note: pg_class access is blocked on the public Scry SQL surface. Use /v1/scry/schema instead.
F) Error Handling
See references/error-reference.md for the full catalogue. Key patterns:
| HTTP | Code | Meaning | Action |
|---|---|---|---|
| 400 | invalid_request | SQL parse error, missing LIMIT, bad params | Fix query |
| 401 | unauthorized | Missing or invalid API key | Check key |
| 402 | insufficient_credits | Token budget exhausted | Notify user |
| 429 | rate_limited | Too many requests | Respect Retry-After header |
| 503 | service_unavailable | Scry pool down or overloaded | Wait and retry |
Auth + timeout diagnostics for CLI users:
- If curl shows HTTP
000, that is client-side timeout/network abort, not a server HTTP status. Check--max-timeand retry with/v1/scry/estimatefirst. - If you see
401with"Invalid authorization format", check for whitespace/newlines in the key:KEY_CLEAN="$(printf '%s' \"$SCRY_API_KEY\" | tr -d '\\r\\n')"then useAuthorization: Bearer $KEY_CLEAN.
Quota fallback strategy:
- If 429: wait
Retry-Afterseconds, retry once. - If 402: tell the user their token budget is exhausted.
- If 503: retry after 30s with exponential backoff (max 3 attempts).
- If query times out: simplify (use MV instead of full table, reduce LIMIT, add tighter WHERE filters).
G) Output Contract
When this skill completes a query task, return a consistent structure:
## Scry Result
**Query**: <natural language description>
**SQL**: ```sql <the SQL that ran> ```
**Rows returned**: <N> (truncated: <yes/no>)
**Duration**: <N>ms
<formatted results table or summary>
**Share**: <share URL if created>
**Caveats**: <any data quality notes, e.g., "score is NULL for arXiv">
Handoff Contract
Produces: JSON with columns, rows, row_count, duration_ms, truncated
Feeds into:
rerank: ensure SQL returnsidandcontent_textcolumns for candidate setsscry-vectors: save entity IDs for embedding lookup and semantic reranking Receives from: none (entry point for SQL-based corpus access)
Related Skills
- scry-vectors -- embed concepts as @handles, search by cosine distance, debias with vector algebra
- scry-rerank -- LLM-powered multi-attribute reranking of candidate sets via pairwise comparison
For detailed schema documentation, see references/schema-guide.md.
For the full pattern library, see references/query-patterns.md.
For error codes and quota details, see references/error-reference.md.