hyde-retrieval

HyDE (Hypothetical Document Embeddings)

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "hyde-retrieval" with this command: npx skills add yonatangross/orchestkit/yonatangross-orchestkit-hyde-retrieval

HyDE (Hypothetical Document Embeddings)

Generate hypothetical answer documents to bridge vocabulary gaps in semantic search.

The Problem

Direct query embedding often fails due to vocabulary mismatch:

Query: "scaling async data pipelines" Docs use: "event-driven messaging", "Apache Kafka", "message brokers" → Low similarity scores despite high relevance

The Solution

Instead of embedding the query, generate a hypothetical answer document:

Query: "scaling async data pipelines" → LLM generates: "To scale asynchronous data pipelines, use event-driven messaging with Apache Kafka. Message brokers provide backpressure..." → Embed the hypothetical document → Now matches docs using similar terminology

Implementation

from openai import AsyncOpenAI from pydantic import BaseModel, Field

class HyDEResult(BaseModel): """Result of HyDE generation.""" original_query: str hypothetical_doc: str embedding: list[float]

async def generate_hyde( query: str, llm: AsyncOpenAI, embed_fn: callable, max_tokens: int = 150, ) -> HyDEResult: """Generate hypothetical document and embed it."""

# Generate hypothetical answer
response = await llm.chat.completions.create(
    model="gpt-5.2-mini",  # Fast, cheap model
    messages=[
        {"role": "system", "content":
            "Write a short paragraph that would answer this query. "
            "Use technical terminology that documentation would use."},
        {"role": "user", "content": query}
    ],
    max_tokens=max_tokens,
    temperature=0.3,  # Low temp for consistency
)

hypothetical_doc = response.choices[0].message.content

# Embed the hypothetical document (not the query!)
embedding = await embed_fn(hypothetical_doc)

return HyDEResult(
    original_query=query,
    hypothetical_doc=hypothetical_doc,
    embedding=embedding,
)

With Caching

from functools import lru_cache import hashlib

class HyDEService: def init(self, llm, embed_fn): self.llm = llm self.embed_fn = embed_fn self._cache: dict[str, HyDEResult] = {}

def _cache_key(self, query: str) -> str:
    return hashlib.md5(query.lower().strip().encode()).hexdigest()

async def generate(self, query: str) -> HyDEResult:
    key = self._cache_key(query)

    if key in self._cache:
        return self._cache[key]

    result = await generate_hyde(query, self.llm, self.embed_fn)
    self._cache[key] = result
    return result

Per-Concept HyDE (Advanced)

For multi-concept queries, generate HyDE for each concept:

async def batch_hyde( concepts: list[str], hyde_service: HyDEService, ) -> list[HyDEResult]: """Generate HyDE embeddings for multiple concepts in parallel.""" import asyncio

tasks = [hyde_service.generate(concept) for concept in concepts]
return await asyncio.gather(*tasks)

Overview

Scenario Use HyDE?

Abstract/conceptual queries Yes

Exact term searches No (use keyword)

Code snippet searches No

Natural language questions Yes

Vocabulary mismatch suspected Yes

Fallback Strategy

async def hyde_with_fallback( query: str, hyde_service: HyDEService, embed_fn: callable, timeout: float = 3.0, ) -> list[float]: """HyDE with fallback to direct embedding on timeout.""" import asyncio

try:
    async with asyncio.timeout(timeout):
        result = await hyde_service.generate(query)
        return result.embedding
except TimeoutError:
    # Fallback to direct query embedding
    return await embed_fn(query)

Performance Tips

  • Use fast model (gpt-5.2-mini, claude-haiku-4-5) for generation

  • Cache aggressively (queries often repeat)

  • Set tight timeouts (2-3s) with fallback

  • Keep hypothetical docs concise (100-200 tokens)

  • Combine with query decomposition for best results

Related Skills

  • rag-retrieval

  • Core RAG patterns that HyDE enhances for better retrieval

  • embeddings

  • Embedding models used to embed hypothetical documents

  • query-decomposition

  • Complementary technique for multi-concept queries

  • semantic-caching

  • Cache HyDE results to avoid repeated LLM calls

Key Decisions

Decision Choice Rationale

Generation model gpt-5.2-mini / claude-haiku-4-5 Fast and cheap for hypothetical doc generation

Temperature 0.3 Low temperature for consistent, factual hypothetical docs

Max tokens 100-200 Concise docs match embedding sweet spot

Timeout with fallback 2-3 seconds Graceful degradation to direct query embedding

References

  • Gao et al. 2022 - HyDE Paper

  • LangChain HyDE

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

responsive-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
General

domain-driven-design

No summary provided by upstream source.

Repository SourceNeeds Review
General

dashboard-patterns

No summary provided by upstream source.

Repository SourceNeeds Review