query-decomposition

Query Decomposition

Break complex queries into independent concepts for parallel retrieval and fusion.

Overview

Complex queries spanning multiple topics or concepts
Multi-hop questions requiring chained reasoning
Queries where single retrieval misses relevant documents
Improving recall for compound questions

Break complex queries into independent concepts for parallel retrieval and fusion.

The Problem

Complex queries span multiple topics that may not co-occur in single documents:

Query: "How do chunking strategies affect reranking in RAG?" → Single search may miss docs about chunking OR reranking → Poor coverage across all concepts

The Solution

Decompose into independent concepts, retrieve separately, then fuse:

Query: "How do chunking strategies affect reranking in RAG?" → Concepts: ["chunking strategies", "reranking methods", "RAG pipeline"] → Search each concept independently → Fuse results with Reciprocal Rank Fusion (RRF) → Full coverage across all topics

Implementation

Heuristic Detection (Fast Path)

MULTI_CONCEPT_INDICATORS = [ " vs ", " versus ", " compared to ", " or ", " and ", " with ", " affect ", " impact ", "difference between", "relationship between", ]

def is_multi_concept_heuristic(query: str) -> bool: """Fast check for multi-concept indicators (<1ms).""" query_lower = query.lower() return any(ind in query_lower for ind in MULTI_CONCEPT_INDICATORS)

LLM Decomposition

from pydantic import BaseModel, Field from openai import AsyncOpenAI

class ConceptExtraction(BaseModel): """LLM output schema for concept extraction.""" concepts: list[str] = Field( ..., min_length=1, max_length=5, description="Distinct concepts from the query", ) reasoning: str | None = None

async def decompose_query( query: str, llm: AsyncOpenAI, ) -> list[str]: """Extract independent concepts using LLM."""

response = await llm.chat.completions.create(
    model="gpt-5.2-mini",
    messages=[
        {"role": "system", "content": """

Extract 2-4 independent concepts from this query. Each concept should be searchable on its own. Output JSON: {"concepts": ["concept1", "concept2"], "reasoning": "..."} """}, {"role": "user", "content": query} ], response_format={"type": "json_object"}, temperature=0, )

result = ConceptExtraction.model_validate_json(
    response.choices[0].message.content
)
return result.concepts

3. Parallel Retrieval + RRF Fusion

import asyncio from collections import defaultdict

async def decomposed_search( query: str, search_fn: callable, llm: AsyncOpenAI, top_k: int = 10, ) -> list[dict]: """Search with query decomposition and RRF fusion."""

# Check if decomposition needed
if not is_multi_concept_heuristic(query):
    return await search_fn(query, limit=top_k)

# Decompose into concepts
concepts = await decompose_query(query, llm)

if len(concepts) &#x3C;= 1:
    return await search_fn(query, limit=top_k)

# Parallel retrieval for each concept
tasks = [search_fn(concept, limit=top_k) for concept in concepts]
results_per_concept = await asyncio.gather(*tasks)

# RRF fusion
return reciprocal_rank_fusion(results_per_concept, k=60)

def reciprocal_rank_fusion( result_lists: list[list[dict]], k: int = 60, ) -> list[dict]: """Combine ranked lists using RRF.""" scores: defaultdict[str, float] = defaultdict(float) docs: dict[str, dict] = {}

for results in result_lists:
    for rank, doc in enumerate(results, start=1):
        doc_id = doc["id"]
        scores[doc_id] += 1.0 / (k + rank)
        docs[doc_id] = doc

# Sort by RRF score
ranked_ids = sorted(scores.keys(), key=lambda x: scores[x], reverse=True)
return [docs[doc_id] for doc_id in ranked_ids]

Complete Service

class QueryDecomposer: def init(self, llm, search_fn): self.llm = llm self.search_fn = search_fn self._cache: dict[str, list[str]] = {}

async def search(
    self,
    query: str,
    top_k: int = 10,
) -> list[dict]:
    """Search with automatic decomposition."""

    # Fast path: single concept
    if not is_multi_concept_heuristic(query):
        return await self.search_fn(query, limit=top_k)

    # Check cache
    cache_key = query.lower().strip()
    if cache_key in self._cache:
        concepts = self._cache[cache_key]
    else:
        concepts = await decompose_query(query, self.llm)
        self._cache[cache_key] = concepts

    # Single concept after decomposition
    if len(concepts) &#x3C;= 1:
        return await self.search_fn(query, limit=top_k)

    # Parallel retrieval
    tasks = [self.search_fn(c, limit=top_k) for c in concepts]
    results_per_concept = await asyncio.gather(*tasks)

    # Fuse with RRF
    return reciprocal_rank_fusion(results_per_concept)[:top_k]

Combining with HyDE

async def decomposed_hyde_search( query: str, decomposer: QueryDecomposer, hyde_service: HyDEService, vector_search: callable, top_k: int = 10, ) -> list[dict]: """Best of both: decomposition + HyDE for each concept."""

# Decompose query
concepts = await decomposer.get_concepts(query)

# Generate HyDE for each concept in parallel
hyde_results = await asyncio.gather(*[
    hyde_service.generate(concept) for concept in concepts
])

# Search with HyDE embeddings
search_tasks = [
    vector_search(embedding=hr.embedding, limit=top_k)
    for hr in hyde_results
]
results_per_concept = await asyncio.gather(*search_tasks)

# Fuse results
return reciprocal_rank_fusion(results_per_concept)[:top_k]

When to Decompose

Query Type Decompose?

"What is X?" No

"X vs Y" Yes

"How does X affect Y?" Yes

"Best practices for X" No

"X and Y in Z" Yes

"Difference between X, Y, Z" Yes

Performance Tips

Use heuristics first (sub-millisecond)
Cache decomposition results
Limit to 2-4 concepts max
Set timeout with fallback to original query
Combine with HyDE for vocabulary bridging

Related Skills

rag-retrieval
Core RAG patterns enhanced by query decomposition
hyde-retrieval
Combine with HyDE for vocabulary bridging per concept
reranking-patterns
Rerank fused results for final precision
embeddings
Embedding strategies for parallel concept retrieval

Key Decisions

Decision Choice Rationale

Decomposition detection Heuristic first, LLM second Sub-millisecond fast path for simple queries

Max concepts 2-4 More concepts increase latency without proportional benefit

Fusion algorithm Reciprocal Rank Fusion (RRF) Robust, parameter-free rank combination

LLM for decomposition gpt-5.2-mini Fast, cheap, good at concept extraction

References

Multi-hop Question Answering
Query2Doc

query-decomposition

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

responsive-patterns

domain-driven-design

dashboard-patterns

rag-retrieval