Lead Enrichment — Multi-Source Data Completion
Enrich CRM contact records by filling missing fields from multiple sources. Works with DuckDB workspace entries or standalone JSON data.
Sources (Priority Order)
- LinkedIn (via linkedin-scraper skill) — name, title, company, education, connections
- Web Search (via web_search tool) — email patterns, company info, social profiles
- Company Website (via web_fetch) — team pages, about pages, contact info
- Email Pattern Discovery — derive email from name + company domain
Enrichment Pipeline
Step 1: Assess What's Missing
-- Query the target object to find gaps
SELECT "Name", "Email", "LinkedIn URL", "Company", "Title", "Location"
FROM v_leads
WHERE "Email" IS NULL OR "LinkedIn URL" IS NULL OR "Title" IS NULL;
Step 2: Prioritize by Value
- High priority: Missing email (needed for outreach)
- Medium priority: Missing title/company (needed for personalization)
- Low priority: Missing education, connections count, about text
Step 3: Enrich Per Record
For each record with gaps:
If LinkedIn URL is known but other fields missing:
- Use linkedin-scraper to visit profile
- Extract: title, company, location, education, about
- Update DuckDB record
If LinkedIn URL is missing:
- Search LinkedIn:
{name} {company}or{name} {title} - Verify match (name + company alignment)
- Store LinkedIn URL, then scrape full profile
If Email is missing:
- Find company domain (web search or LinkedIn company page)
- Try common patterns:
first@domain.comfirst.last@domain.comflast@domain.comfirstl@domain.com
- Optionally verify with web search:
"email" "{name}" site:{domain} - Check company team/about page for email format clues
If Company info is missing:
- Web search:
"{name}" "{title}"or check LinkedIn - Fetch company website for: industry, size, description, funding
Step 4: Update Records
-- Update via DuckDB pivot view
UPDATE v_leads SET
"Email" = ?,
"LinkedIn URL" = ?,
"Title" = ?,
"Company" = ?,
"Location" = ?
WHERE id = ?;
Bulk Enrichment Mode
For enriching many records at once:
- Query all incomplete records from DuckDB
- Group by company (scrape company once, apply to all employees)
- Process in batches of 10-20 records
- Report progress after each batch:
Enrichment Progress: 45/120 leads (38%) ├── Emails found: 32/45 (71%) ├── LinkedIn matched: 41/45 (91%) ├── Titles updated: 38/45 (84%) └── ETA: ~15 min remaining - Save checkpoint after each batch (in case of interruption)
Enrichment Quality Rules
- Confidence scoring: Mark each enriched field with confidence (high/medium/low)
- High: Direct match from LinkedIn profile or company website
- Medium: Inferred from patterns (email format) or partial match
- Low: Best guess from web search results
- Never overwrite existing data unless explicitly asked
- Flag conflicts: If enriched data contradicts existing data, flag for review
- Dedup check: Before inserting LinkedIn URL, check it's not already assigned to another contact
Email Pattern Discovery
Common corporate email formats by frequency:
first.last@domain.com(most common, ~45%)first@domain.com(~20%)flast@domain.com(~15%)firstl@domain.com(~10%)first_last@domain.com(~5%)last.first@domain.com(~3%)first.l@domain.com(~2%)
Strategy:
- If you know one person's email at the company, derive the pattern
- Search web for
"@{domain}" email format - Check company team page source code for mailto: links
- Use the most common pattern as fallback
Output
After enrichment, provide a summary:
Enrichment Complete: 120 leads processed
├── Emails: 94 found (78%), 26 still missing
├── LinkedIn: 108 matched (90%), 12 not found
├── Titles: 115 updated (96%)
├── Companies: 118 confirmed (98%)
├── Locations: 89 found (74%)
└── Avg confidence: High (82%), Medium (14%), Low (4%)
Top gaps remaining:
- 26 leads missing email (mostly small/stealth companies)
- 12 leads missing LinkedIn (common names, ambiguous matches)
DuckDB Field Mapping
Standard field names for Ironclaw CRM objects:
| Enrichment Data | DuckDB Field | Type |
|---|---|---|
| Full name | Name | text |
| Email address | ||
| LinkedIn URL | LinkedIn URL | url |
| Job title | Title | text |
| Company name | Company | text / relation |
| Location | Location | text |
| Education | Education | text |
| Phone | Phone | phone |
| Company size | Company Size | text |
| Industry | Industry | text |
| Enrichment date | Enriched At | date |
| Confidence | Enrichment Confidence | enum (high/medium/low) |