Cross-Verified Research
Systematic research engine with anti-hallucination safeguards and source quality tiering.
Rules (Absolute)
-
Never fabricate sources. No fake URLs, no invented papers, no hallucinated statistics.
-
Source-traceability gate. Every factual claim must be traceable to a specific, citable source. If a claim cannot be traced to any source, mark it as Unverified (internal knowledge only) and state what verification would be needed. Never present untraced claims as findings.
-
No speculation as fact. Do not present unverified claims using hedging language as if they were findings. Banned patterns: "아마도", "~인 것 같습니다", "~로 보입니다", "~수도 있습니다", "probably", "I think", "seems like", "appears to be", "likely". If a claim is not verified, label it explicitly as Unverified or Contested — do not soften it with hedging.
-
BLUF output. Lead with conclusion, follow with evidence. Never bury the answer.
-
Scaled effort. Match research depth to question scope:
-
Narrow factual (single claim, date, specification): 2-3 queries, 2+ sources
-
Technology comparison (A vs B): 5+ queries, 5+ sources
-
Broad landscape (market analysis, state-of-art): 8+ queries, 8+ sources Default to the higher tier when scope is ambiguous.
-
Cross-verify. Every key claim must appear in 2+ independent sources before presenting as fact. "Independent" means the sources conducted their own analysis or reporting — two articles that both cite the same original source (press release, blog post, study) count as ONE source, not two. Trace claims back to their origin.
-
Scope before search. If the research question is ambiguous or overly broad, decompose it into specific sub-questions in Stage 1 and present them to the user for confirmation before proceeding to Stage 2. Do not research a vague question — sharpen it first.
Pipeline
Execute these 4 stages sequentially. Do NOT skip stages.
Stage 1: Deconstruct
Break the research question into atomic sub-questions.
Input: "Should we use Bun or Node.js for our backend?" Decomposed:
- Runtime performance benchmarks (CPU, memory, startup)
- Ecosystem maturity (npm compatibility, native modules)
- Production stability (known issues, enterprise adoption)
- Developer experience (tooling, debugging, testing)
- Long-term viability (funding, community, roadmap)
-
Identify what requires external verification vs. internal knowledge
-
If the original question is vague or overly broad, present the decomposed sub-questions to the user for confirmation before proceeding (Rule 7)
-
For each sub-question, note what a traceable source would look like
Stage 2: Search & Collect
For each sub-question requiring verification:
-
Formulate diverse queries — vary keywords, include year filters, try both English and Korean
-
Use WebSearch for broad discovery, WebFetch for specific page analysis
-
Classify every source by tier immediately (see Source Tiers below)
-
Extract specific data points — numbers, dates, versions, quotes with attribution
-
Record contradictions — when sources disagree, note both positions
-
Trace origin — when multiple sources cite the same underlying source, identify the original
Search pattern (scale per Rule 5):
Query 1: [topic] + "benchmark" or "comparison" Query 2: [topic] + "production" or "enterprise" Query 3: [topic] + [current year] + "review" Query 4: [topic] + "issues" or "problems" or "limitations" Query 5: [topic] + site:github.com (issues, discussions)
Fallback when WebSearch is unavailable or returns no results:
-
Use WebFetch to directly access known authoritative URLs (official docs, GitHub repos, Wikipedia)
-
Rely on internal knowledge but label all claims as Unverified (no external search available)
-
Ask the user to provide source URLs or documents for verification
-
Reduce the minimum source requirement but maintain cross-verification where possible
Stage 3: Cross-Verify
For each key finding:
-
Does it appear in 2+ independent Tier S/A sources? → Verified
-
Does it appear in only 1 source? → Unverified (label it)
-
Do sources contradict? → Contested (present both sides with tier labels)
Remember: "independent" means each source did its own analysis. Two articles both citing the same benchmark study = 1 source.
Build a verification matrix:
| Claim | Source 1 (Tier) | Source 2 (Tier) | Status |
|---|---|---|---|
| Bun 3x faster startup | benchmarks.dev (A) | bun.sh/blog (B) | Verified (note: Bun's own blog = biased) |
Stage 4: Synthesize
Produce the final report in BLUF format.
Output Format
Research: [Topic]
Conclusion (BLUF)
[1-3 sentence definitive answer or recommendation]
Key Findings
[Numbered findings, each with inline source tier labels]
-
[Finding] — [evidence summary] Sources: 🏛️ [source1], 🛡️ [source2]
-
[Finding] — [evidence summary] Sources: 🛡️ [source1], 🛡️ [source2]
Contested / Uncertain
[Any claims that couldn't be cross-verified or where sources conflict]
- ⚠️ [claim] — Source A says X, Source B says Y
Verification Matrix
| Claim | Sources | Tier | Status |
|---|---|---|---|
| ... | ... | ... | Verified/Unverified/Contested |
Sources
[All sources, grouped by tier]
🏛️ Tier S — Academic & Primary Research
- Title — Journal/Org (Year)
🛡️ Tier A — Trusted Official
- Title — Source (Year)
⚠️ Tier B — Community / Caution
- Title — Platform (Year)
Tier C — General
Quality Calibration
BAD Example — What to Avoid
Research: Is Rust faster than Go for web servers?
Conclusion (BLUF)
Rust is generally faster than Go for web servers due to zero-cost abstractions.
Key Findings
- Rust is 2-5x faster than Go — Rust's ownership model eliminates GC pauses. Sources: 🛡️ https://rust-performance-comparison.example.com
- Rust uses less memory — Typically 50% less memory in production. Sources: 🛡️ https://memory-benchmarks.example.com
- Go is easier to learn — Most developers pick up Go in a week. Sources: 🏛️ https://developer-survey.example.com
Verification Matrix
| Claim | Sources | Tier | Status |
|---|---|---|---|
| 2-5x faster | 1 benchmark site | A | Verified |
| 50% less memory | 1 benchmark site | A | Verified |
Why this is bad:
-
Source URLs are fabricated (nonexistent domains)
-
"2-5x faster" and "50% less memory" are presented as Verified with only 1 source each
-
No contested claims section despite this being a nuanced topic
-
Claims are restated internal knowledge dressed up with fake citations
-
No origin tracing — where did "2-5x" come from?
-
The "Verified" labels are false — nothing was actually cross-verified
GOOD Example — What to Aim For
Research: Is Rust faster than Go for web servers?
Conclusion (BLUF)
Rust outperforms Go in raw throughput benchmarks (typically 1.5-3x in TechEmpower), but the gap narrows significantly with real-world I/O workloads. Go's GC pauses (sub-millisecond since Go 1.19) are rarely a bottleneck for typical web services. Choose based on your latency tail requirements, not averages.
Key Findings
- Rust frameworks lead TechEmpower benchmarks — Actix-web and Axum consistently rank in the top 10; Go's stdlib and Gin rank 20-40 range in plaintext/JSON tests. Sources: 🏛️ TechEmpower Round 22 (2024), 🛡️ Axum GitHub benchmarks
- Go's GC latency is sub-millisecond since 1.19 — p99 GC pause < 500μs confirmed by the Go team. Sources: 🛡️ Go Blog "Getting to Go" (2022), 🛡️ Go 1.19 Release Notes
- Real-world gap is smaller than microbenchmarks suggest — Discord's 2020 migration (Go→Rust) showed tail latency improvements, but their workload (millions of concurrent connections) is atypical. Sources: 🛡️ Discord Engineering Blog (2020), ⚠️ HN discussion with Discord engineer comments
Contested / Uncertain
- ⚠️ "Rust uses 50% less memory than Go" — Frequently repeated on Reddit/HN but no independent benchmark reproduces a consistent figure. Memory usage depends heavily on allocator choice (jemalloc vs system) and workload. Unverified.
- ⚠️ Developer productivity trade-off — Go advocates claim 2-3x faster development time. No peer-reviewed study supports a specific multiplier. Unverified (internal knowledge only) — would need controlled study to verify.
Verification Matrix
| Claim | Sources | Tier | Status |
|---|---|---|---|
| Rust 1.5-3x faster (synthetic) | TechEmpower R22 (S), Axum bench (A) | S+A | Verified |
| Go GC < 500μs p99 | Go Blog (A), Release Notes (A) | A+A | Verified |
| Discord latency improvement | Discord Blog (A), HN thread (B) | A+B | Verified (single case study) |
| Rust 50% less memory | Reddit threads (B) only | B | Unverified |
| Go 2-3x dev speed | No source found | — | Unverified (internal knowledge only) |
Sources
🏛️ Tier S — Academic & Primary Research
- TechEmpower Framework Benchmarks Round 22 — TechEmpower (2024)
🛡️ Tier A — Trusted Official
- Getting to Go: The Journey of Go's Garbage Collector — Go Blog (2022)
- Go 1.19 Release Notes — Go Team (2022)
- Why Discord is Switching from Go to Rust — Discord Engineering (2020)
- Axum Benchmarks — Tokio Project
⚠️ Tier B — Community / Caution
- HN Discussion on Discord migration — Hacker News (2020)
Why this is good:
-
Every URL is a real, verifiable page
-
Claims that lack sources are explicitly labeled Unverified
-
The "50% less memory" myth is called out rather than repeated
-
Verification matrix honestly shows what's verified vs. not
-
Sources are independent (TechEmpower did their own benchmarks, not citing each other)
-
Nuance preserved: "the gap narrows with real-world I/O"
Source Tiers
Classify every source on discovery.
Tier Label Trust Level Examples
S 🏛️ Academic, peer-reviewed, primary research, official specs Google Scholar, arXiv, PubMed, W3C/IETF RFCs, language specs (ECMAScript, PEPs)
A 🛡️ Government, .edu, major press, official docs .gov/.edu, Reuters/AP/BBC, official framework docs, company engineering blogs (Google AI, Netflix Tech)
B ⚠️ Social media, forums, personal blogs, wikis — flag to user Twitter/X, Reddit, StackOverflow, Medium, dev.to, Wikipedia, 나무위키
C (none) General websites not fitting above categories Corporate marketing, press releases, SEO content, news aggregators
Tier Classification Rules
-
Company's own content about their product:
-
Official docs → Tier A
-
Feature announcements → Tier A (existence), Tier B (performance claims)
-
Marketing pages → Tier C
-
GitHub:
-
Official repos (e.g., facebook/react) → Tier A
-
Issues/Discussions with reproduction → Tier A (for bug existence)
-
Random user repos → Tier B
-
Benchmarks:
-
Independent, reproducible, methodology disclosed → Tier S
-
Official by neutral party → Tier A
-
Vendor's own benchmarks → Tier B (note bias)
-
StackOverflow: Accepted answers with high votes = borderline Tier A; non-accepted = Tier B
-
Tier B sources must never be cited alone — corroborate with Tier S or A
When to Use
-
Technology evaluation or comparison
-
Fact-checking specific claims
-
Architecture decision research
-
Market/competitor analysis
-
"Is X true?" verification tasks
-
Any question where accuracy matters more than speed
When NOT to Use
-
Creative writing or brainstorming (use creativity-sampler )
-
Code implementation (use search-first for library discovery)
-
Simple questions answerable from internal knowledge with high confidence
-
Opinion-based questions with no verifiable answer
Integration Notes
-
With brainstorming: Can be invoked during brainstorming's "Explore context" phase for fact-based inputs
-
With search-first: search-first finds tools/libraries to USE; this skill VERIFIES factual claims. Different purposes.
-
With adversarial-review: Research findings can feed into adversarial review for stress-testing conclusions