tavily-best-practices

Build production-ready Tavily integrations with best practices baked in. Reference documentation for implementing web search, content extraction, crawling, and research in agentic workflows, RAG systems, or autonomous agents.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "tavily-best-practices" with this command: npx skills add matthew77/liang-tavily-best-practices

Tavily Best Practices

Tavily is a search API designed for LLMs, enabling AI applications to access real-time web data.

Installation

Python:

pip install tavily-python

JavaScript:

npm install @tavily/core

Client Initialization

from tavily import TavilyClient

# Option 1: Uses TAVILY_API_KEY env var (recommended)
client = TavilyClient()

# Option 2: Explicit API key
client = TavilyClient(api_key="tvly-YOUR_API_KEY")

# Async client for parallel queries
from tavily import AsyncTavilyClient
async_client = AsyncTavilyClient()

Choosing the Right Method

For custom agents/workflows:

NeedMethod
Web search resultssearch()
Content from specific URLsextract()
Content from entire sitecrawl()
URL discovery from sitemap()

For out-of-the-box research:

NeedMethod
End-to-end research with AI synthesisresearch()

Quick Reference

search() - Web Search

response = client.search(
    query="quantum computing breakthroughs",  # Keep under 400 chars
    max_results=10,
    search_depth="advanced",  # highest relevance
    topic="general"  # or "news"
)

for result in response["results"]:
    print(f"{result['title']}: {result['score']}")

Key parameters:

  • query - Keep under 400 characters
  • max_results - 1-20
  • search_depth - ultra-fast, fast, basic, advanced
  • topic - general or news
  • include_domains, exclude_domains - Filter sources
  • time_range - day, week, month, year

extract() - URL Content Extraction

# Two-step pattern (recommended for control)
search_results = client.search(query="Python async best practices")
urls = [r["url"] for r in search_results["results"] if r["score"] > 0.5]
extracted = client.extract(
    urls=urls[:20],
    query="async patterns",  # Reranks chunks by relevance
    chunks_per_source=3  # Prevents context explosion
)

Key parameters:

  • urls - Max 20 URLs
  • extract_depth - basic or advanced
  • query - Reranks chunks by relevance
  • chunks_per_source - 1-5 (prevents context explosion)

crawl() - Site-Wide Extraction

response = client.crawl(
    url="https://docs.example.com",
    max_depth=2,
    instructions="Find API documentation pages",  # Semantic focus
    chunks_per_source=3,  # Token optimization
    select_paths=["/docs/.*", "/api/.*"]
)

Key parameters:

  • url - Root URL to crawl
  • max_depth - 1-5 (start with 1)
  • max_breadth - Links per page
  • limit - Total pages cap
  • instructions - Natural language guidance
  • chunks_per_source - 1-5 (for agentic use)
  • select_paths, exclude_paths - Regex patterns

map() - URL Discovery

response = client.map(
    url="https://docs.example.com",
    max_depth=2,
    instructions="Find all API and guide pages"
)
api_docs = [url for url in response["results"] if "/api/" in url]

Use map() when you only need URLs, not content (faster than crawl).

research() - AI-Powered Research

import time

# For comprehensive multi-topic research
result = client.research(
    input="Analyze competitive landscape for X in SMB market",
    model="pro"  # or "mini" for focused queries, "auto" when unsure
)
request_id = result["request_id"]

# Poll until completed
response = client.get_research(request_id)
while response["status"] not in ["completed", "failed"]:
    time.sleep(10)
    response = client.get_research(request_id)

print(response["content"])  # The research report

Key parameters:

  • input - Research topic or question
  • model - mini (quick), pro (comprehensive), auto
  • stream - Stream results as they arrive
  • output_schema - Structured JSON output
  • citation_format - Citation style

Search Depth Selection

DepthLatencyRelevanceUse Case
ultra-fastLowestLowerReal-time chat, autocomplete
fastLowGoodNeed chunks but latency matters
basicMediumHighGeneral-purpose, balanced
advancedHigherHighestPrecision matters, research

Rule of thumb: Start with basic, escalate to advanced for complex topics.

Model Selection for Research

Rule of thumb: "what does X do?" → mini. "X vs Y vs Z" or "best way to..." → pro.

ModelUse CaseSpeed
miniSingle topic, targeted research~30s
proComprehensive multi-angle analysis~60-120s
autoAPI chooses based on complexityVaries

Crawl for Context vs Data Collection

For agentic use (feeding results into context): Always use instructions + chunks_per_source. This returns only relevant chunks instead of full pages, preventing context window explosion.

For data collection (saving to files): Omit chunks_per_source to get full page content.

Common Patterns

Pattern 1: Search + Extract

# Find relevant URLs first
search_results = client.search(query="React hooks documentation")
high_quality_urls = [r["url"] for r in search_results["results"] if r["score"] > 0.7]

# Extract content from best results
extracted = client.extract(
    urls=high_quality_urls[:10],
    query="useState and useEffect",
    chunks_per_source=3
)

Pattern 2: Map + Crawl

# Discover structure first
map_results = client.map(
    url="https://docs.example.com",
    max_depth=2,
    instructions="Find API documentation pages"
)

# Crawl only relevant sections
api_urls = [url for url in map_results["results"] if "/api/" in url]
crawl_results = client.crawl(
    url="https://docs.example.com/api",
    max_depth=1,
    limit=len(api_urls)
)

Pattern 3: Research with Citations

result = client.research(
    input="Compare LangGraph vs CrewAI for multi-agent systems",
    model="pro"
)

# The response includes citations
print(result["content"])  # AI-synthesized report
print(result["citations"])  # Source references

Performance Tips

  • Keep queries under 400 characters - Think search query, not prompt
  • Break complex queries into sub-queries - Better results than one massive query
  • Use include_domains to focus on trusted sources
  • Use time_range for recent information
  • Start conservative with crawl (max_depth=1, limit=20)
  • Always set a limit to prevent runaway crawls
  • Use chunks_per_source for agentic workflows - prevents context explosion

Cost Optimization

  • Use basic depth as default (cheaper than advanced)
  • Limit max_results to what you'll actually use
  • Disable include_raw_content unless needed
  • Use chunks_per_source instead of full content for context
  • Cache results locally for repeated queries

Error Handling

from tavily import TavilyClient
from tavily.errors import TavilyError

client = TavilyClient()

try:
    result = client.search(query="example")
except TavilyError as e:
    print(f"Tavily API error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Framework Integrations

Tavily integrates with popular frameworks:

  • LangChain - TavilySearch tool
  • LlamaIndex - TavilySearch tool
  • CrewAI - Built-in Tavily tools
  • Vercel AI SDK - Direct API calls

See the official documentation for integration examples.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

地藏经药师经智慧

地藏经药师经智慧 - 佛家孝道与救度思想,涵盖地藏本愿、药师十二愿、因果报应、消灾延寿等核心智慧,适用于道德修养、慈悲精神、身心健康

Registry SourceRecently Updated
General

Precision Oncology Zhcn

综合学术文献、流行病学报告、临床与药物指南及临床试验报告,提供关于癌症及其治疗的报告。 基于癌变机制进行详细的分子生物学和组织学分析。 当查询涉及以下内容时加载本技能: - 癌症或肿瘤 - 癌变机制 - 癌症或肿瘤的治疗 典型查询 - 乳腺癌是如何发生的? - 白血病的一线和二线治疗 - CAR-T 疗法治疗胰腺...

Registry SourceRecently Updated
General

hermes-traffic-guardian

Hermes runtime traffic monitoring baseline for opt-in proxy inspection, egress detection, and attestation-aware traffic posture.

Registry SourceRecently Updated
General

Scp Paradigm

Use when analyzing how industry structure drives firm behavior and market performance, assessing market concentration, entry barriers, or competitive dynamic...

Registry SourceRecently Updated