Intelligence Collection Expert Knowledge
OSINT Methodology
Collection Cycle
-
Planning: Define target, scope, and collection requirements
-
Collection: Gather raw data from open sources
-
Processing: Extract entities, relationships, and data points
-
Analysis: Synthesize findings, identify patterns, detect changes
-
Dissemination: Generate reports, alerts, and updates
-
Feedback: Refine queries based on what worked and what didn't
Source Categories (by reliability)
Tier Source Type Reliability Examples
1 Official/Primary Very High Company filings, government data, press releases
2 Institutional High News agencies (Reuters, AP), research institutions
3 Professional Medium-High Industry publications, analyst reports, expert blogs
4 Community Medium Forums, social media, review sites
5 Anonymous/Unverified Low Anonymous posts, rumors, unattributed claims
Search Query Construction by Focus Area
Market Intelligence:
"[target] market share" "[target] industry report [year]" "[target] TAM SAM SOM" "[target] growth rate" "[target] market analysis" "[target industry] trends [year]"
Business Intelligence:
"[company] revenue" OR "[company] earnings" "[company] CEO" OR "[company] leadership team" "[company] strategy" OR "[company] roadmap" "[company] partnerships" OR "[company] acquisition" "[company] annual report" OR "[company] 10-K" site:sec.gov "[company]"
Competitor Analysis:
"[company] vs [competitor]" "[company] alternative" "[company] review" OR "[company] comparison" "[company] pricing" site:g2.com OR site:capterra.com "[company] customer reviews" site:trustpilot.com "switch from [company] to"
Person Tracking:
"[person name]" "[company]" "[person name]" interview OR podcast OR keynote "[person name]" site:linkedin.com "[person name]" publication OR paper "[person name]" conference OR summit
Technology Monitoring:
"[technology] release" OR "[technology] update" "[technology] benchmark [year]" "[technology] adoption" OR "[technology] usage statistics" "[technology] vs [alternative]" "[technology]" site:github.com "[technology] roadmap" OR "[technology] changelog"
Entity Extraction Patterns
Named Entity Types
-
Person: Name, title, organization, role
-
Organization: Company name, type, industry, location, size
-
Product: Product name, company, category, version
-
Event: Type, date, participants, location, significance
-
Financial: Amount, currency, type (funding, revenue, valuation)
-
Technology: Name, version, category, vendor
-
Location: City, state, country, region
-
Date/Time: Specific dates, time ranges, deadlines
Extraction Heuristics
-
Person detection: Title + Name pattern ("CEO John Smith"), bylines, quoted speakers
-
Organization detection: Legal suffixes (Inc, LLC), "at [Company]", domain names
-
Financial detection: Currency symbols, "raised $X", "valued at", "revenue of"
-
Event detection: Date + verb ("launched on", "announced at", "acquired")
-
Technology detection: CamelCase names, version numbers, "built with", "powered by"
Knowledge Graph Best Practices
Entity Schema
{ "entity_id": "unique_id", "name": "Entity Name", "type": "person|company|product|event|technology", "attributes": { "key": "value" }, "sources": ["url1", "url2"], "first_seen": "timestamp", "last_seen": "timestamp", "confidence": "high|medium|low" }
Relation Schema
{ "source_entity": "entity_id_1", "relation": "works_at|founded|competes_with|...", "target_entity": "entity_id_2", "attributes": { "since": "date", "context": "description" }, "source": "url", "confidence": "high|medium|low" }
Common Relations
Relation Between Example
works_at Person → Company "Jane Smith works at Acme"
founded Person → Company "John Doe founded StartupX"
invested_in Company → Company "VC Fund invested in StartupX"
competes_with Company → Company "Acme competes with BetaCo"
partnered_with Company → Company "Acme partnered with CloudY"
launched Company → Product "Acme launched ProductZ"
acquired Company → Company "BigCorp acquired StartupX"
uses Company → Technology "Acme uses Kubernetes"
mentioned_in Entity → Source "Acme mentioned in TechCrunch"
Change Detection Methodology
Snapshot Comparison
-
Store the current state of all entities as a JSON snapshot
-
On next collection cycle, compare new state against previous snapshot
-
Classify changes:
Change Type Significance Example
Entity appeared Varies New competitor enters market
Entity disappeared Important Company goes quiet, product deprecated
Attribute changed Critical-Minor CEO changed (critical), address changed (minor)
New relation Important New partnership, acquisition, hiring
Relation removed Important Person left company, partnership ended
Sentiment shift Important Positive→Negative media coverage
Significance Scoring
CRITICAL (immediate alert):
- Leadership change (CEO, CTO, board)
- Acquisition or merger
- Major funding round (>$10M)
- Product discontinuation
- Legal action or regulatory issue
IMPORTANT (include in next report):
- New product launch
- New partnership or integration
- Hiring surge (>5 roles)
- Pricing change
- Competitor move
- Major customer win/loss
MINOR (note in report):
- Blog post or press mention
- Minor update or patch
- Social media activity spike
- Conference appearance
- Job posting (individual)
Sentiment Analysis Heuristics
When track_sentiment is enabled, classify each source's tone:
Classification Rules
-
Positive indicators: "growth", "innovation", "breakthrough", "success", "award", "expansion", "praise", "recommend"
-
Negative indicators: "lawsuit", "layoffs", "decline", "controversy", "failure", "breach", "criticism", "warning"
-
Neutral indicators: factual reporting without strong adjectives, data-only articles, announcements
Sentiment Scoring
Strong positive: +2 (e.g., "Company wins major award") Mild positive: +1 (e.g., "Steady growth continues") Neutral: 0 (e.g., "Company releases Q3 report") Mild negative: -1 (e.g., "Faces increased competition") Strong negative: -2 (e.g., "Major data breach disclosed")
Track rolling average over last 5 collection cycles to detect trends.
Report Templates
Intelligence Brief (Markdown)
Intelligence Report: [Target]
Date: YYYY-MM-DD HH:MM UTC Collection Cycle: #N Sources Processed: X New Data Points: Y
Priority Changes
- [CRITICAL] [Description + source]
- [IMPORTANT] [Description + source]
Executive Summary
[2-3 paragraph synthesis of new intelligence]
Detailed Findings
[Category 1]
- Finding with source
- Data point with confidence: high/medium/low
[Category 2]
- ...
Entity Updates
| Entity | Change | Previous | Current | Source |
|---|
Sentiment Trend
| Period | Score | Direction | Notable |
|---|
Collection Metadata
- Queries executed: N
- Sources fetched: N
- New entities: N
- Updated entities: N
- Next scheduled collection: [datetime]
Source Evaluation Checklist
Before including data in the knowledge graph, evaluate:
-
Recency: Published within relevant timeframe? Stale data can mislead.
-
Primary vs Secondary: Is this the original source, or citing someone else?
-
Corroboration: Do other independent sources confirm this?
-
Bias check: Does the source have a financial or political interest in this claim?
-
Specificity: Does it provide concrete data, or vague assertions?
-
Track record: Has this source been reliable in the past?
If a claim fails 3+ checks, downgrade its confidence to "low".