Searching with Exa (Search API)

This skill is a recipe for consistent web research using Exa’s Search API: you choose the right search mode, apply the right filters, request only the content you need (highlights/text/summary), and return clean citations.

Quick start (default path)

Ensure an API key is available as an environment variable:

EXA_API_KEY (preferred)
or pass --api-key to the scripts.

Run a search (JSON response to stdout):

python {baseDir}/scripts/exa_search.py   --query "latest research in LLMs"   --type auto   --category "research paper"   --num-results 5   --highlights   --highlights-per-url 3   --num-sentences 2

Operating principles (always follow)

Always return URLs. Prefer also returning title + a 1–2 sentence “why this source” note.
Prefer highlights first for agentic workflows. Escalate to full text only when necessary.
Use filters aggressively: domain allowlists, date windows, and category improve relevance and reduce noise.
Freshness is explicit: if the user asks for “latest”, “today”, “current”, etc., set a freshness policy (see below).
Don’t overfetch: cap content length with maxCharacters when requesting text.
If the user needs hard evidence (numbers, quotes), fetch full text for the top 1–3 pages and verify.

Workflow

Step 1 — Translate the user task into a search plan

Decide:

Search type
- auto (default): best general quality.
- instant: lowest latency, for autocomplete / live suggestions.
- deep: more comprehensive; can use additionalQueries.
- fast / neural: streamlined alternatives.
Category (when appropriate)
- news, research paper, company, people, tweet, personal site, financial report, etc.
Freshness
- If “real-time / latest”: consider live crawling via maxAgeHours (see Freshness).
- If “historical/static”: use cache only (e.g., maxAgeHours: -1).
Content mode
- highlights: token-efficient evidence snippets.
- text: deep reading (cap via maxCharacters).
- summary: quick structured overviews (optionally with a guiding query).

Step 2 — Build the request payload

Start from this template and fill only what you need:

{
  "query": "...",
  "type": "auto",
  "category": "news",
  "numResults": 10,
  "includeDomains": ["..."],
  "excludeDomains": ["..."],
  "startPublishedDate": "2025-01-01T00:00:00.000Z",
  "endPublishedDate": "2025-12-31T23:59:59.999Z",
  "includeText": ["must contain phrase"],
  "excludeText": ["must not contain phrase"],
  "contents": {
    "highlights": true,
    "text": { "maxCharacters": 8000, "includeHtmlTags": false },
    "summary": { "query": "..." },
    "subpages": 0,
    "extras": { "links": 0, "imageLinks": 0 },
    "maxAgeHours": 24
  }
}

Notes:

contents is optional. If omitted, you’ll only get metadata (title, url, etc.).
maxAgeHours controls when Exa should live-crawl vs use cached content (see below).
context is deprecated; use highlights or text instead.

Step 3 — Execute the request

Option A (recommended): use the bundled script so requests are consistent and validated.

python {baseDir}/scripts/exa_search.py --query "..." --highlights --num-results 10

Option B: call the HTTP endpoint directly.

curl --request POST   --url https://api.exa.ai/search   --header "content-type: application/json"   --header "x-api-key: $EXA_API_KEY"   --data '{"query":"...","type":"auto","numResults":5}'

Step 4 — Post-process results into an answer with citations

De-duplicate near-identical domains/pages when the user wants breadth.
Select the top sources (usually 3–7) that jointly cover the claim space.
For each selected result, extract:
- title, url
- key highlight(s) or a short quote from text
- published date (if available)
Write the response with inline citations (URLs) and clear uncertainty where needed.
If the user wants a deliverable (report, memo), preserve a “Sources” section listing all URLs.

Freshness policy (use this when “latest/current/today” appears)

Use contents.maxAgeHours (or the maxAgeHours top-level alias if the API accepts it):

24: daily-fresh content (use cache if <24h else livecrawl)
1: near-real-time (cache if <1h else livecrawl)
0: always livecrawl (slowest, most current)
-1: never livecrawl (fastest; cache only)
omit: default behaviour (livecrawl only when cache missing)

Common patterns

Pattern A — “Give me sources for X” (fast + token efficient)

type: auto, numResults: 5–10
contents.highlights: true
Optional: category and includeDomains

Pattern B — “Do deep research on X” (read a few pages thoroughly)

Start with highlights on 10–20 results.
Then fetch full text for the top 3–5 URLs with a maxCharacters cap.
Summarise with citations.

Pattern C — “Latest news about X”

category: news
Apply a date window (startPublishedDate) if the question is time-bound.
Use a freshness setting (often maxAgeHours: 1–24).

Pattern D — “Find a company / person page”

category: company or people
If using people, allowlist LinkedIn domains when needed.
IMPORTANT: some filters are unsupported for company/people; see troubleshooting.

Troubleshooting

401 / 403 (auth)

Confirm x-api-key header is present and valid.
Confirm you aren’t accidentally using a placeholder like YOUR-EXA-API-KEY.

400 (invalid parameters)

company and people categories support a limited set of filters; unsupported parameters can trigger 400 errors.
If in doubt, remove date and text filters first, then re-add one-by-one.

Too much content / token blow-ups

Prefer highlights over text.
Cap text.maxCharacters.
Reduce numResults.

Bundled references

API + parameter cheat sheet: references/exa-search-api.md
Best-practice recipes: references/exa-search-best-practices.md
Quickstart snippets (SDK + curl): references/exa-search-quickstart.md

exa-search

Safety Notice

Copy this and send it to your AI assistant to learn