rfp-ingest

This skill helps implement multi-source RFP data ingestion with canonical schema normalization and deduplication.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "rfp-ingest" with this command: npx skills add atemndobs/nebula-rfp/atemndobs-nebula-rfp-rfp-ingest

RFP Ingestion Skill

Overview

This skill helps implement multi-source RFP data ingestion with canonical schema normalization and deduplication.

Supported Data Sources

Source Priority API Type Rate Limits

SAM.gov P1 REST API 10 req/sec, 10k/day

Maryland eMMA P1 Web scraping Respectful crawling

RFPMart API Current REST API As documented

RFPMart CSV Current Manual upload N/A

GovTribe P2 REST API (paid) Per subscription

CSV Upload (RFPMart Email Alerts)

RFPMart sends periodic email alerts with CSV attachments. These can be manually uploaded through the Admin UI.

CSV Format (No Header Row)

Column Index Content Example

ID 0 RFP identifier SW-82097

Country 1 Country code USA

State 2 State name Idaho

Title 3 Full title with location SW-82097 - USA (Idaho) - Data Concealment...

Deadline 4 Due date March 25,2026

URL 5 RFPMart link https://www.rfpmart.com/...

ID Prefix → Category Mapping

const categoryMap: Record<string, string> = { SW: "Software Development", ITES: "IT Services", NET: "Networking", TELCOM: "Telecommunications", DRA: "Data & Research", CSE: "Security Services", HR: "Human Resources", PM: "Project Management", MRB: "Marketing & Branding", // ... other prefixes default to "Other" };

IT-Relevant Prefixes

When filtering for IT-relevant RFPs only, these prefixes are included:

  • SW

  • Software Development

  • ITES

  • IT Services

  • NET

  • Networking

  • TELCOM

  • Telecommunications

  • DRA

  • Data & Research

  • CSE

  • Security Services

Key Files

File Purpose

convex/ingestion/rfpmartCsv.ts

CSV parser and Convex action

components/admin/CsvUpload.tsx

Drag-and-drop upload UI

Usage

  • Navigate to Admin → Data Sources tab

  • Scroll to RFPMart CSV Upload section

  • Drop a CSV file or click to browse

  • Toggle "Only import IT-relevant RFPs" if desired

  • View results summary (new/updated/skipped/errors)

Implementation Example

// Parsing CSV with quoted fields function parseCSVLine(line: string): string[] { const fields: string[] = []; let current = ""; let inQuotes = false;

for (let i = 0; i < line.length; i++) { const char = line[i]; if (char === '"') { if (inQuotes && line[i + 1] === '"') { current += '"'; i++; } else { inQuotes = !inQuotes; } } else if (char === "," && !inQuotes) { fields.push(current); current = ""; } else { current += char; } } fields.push(current); return fields; }

Canonical Schema

All sources must normalize to this schema:

interface Opportunity { externalId: string; // Source-specific ID source: "sam.gov" | "emma" | "rfpmart" | "govtribe"; title: string; description: string; summary?: string; location: string; category: string; naicsCode?: string; setAside?: string; // "Small Business", "8(a)", etc. postedDate: number; // Unix timestamp expiryDate: number; // Unix timestamp url: string; attachments?: Attachment[]; eligibilityFlags?: string[]; rawData: Record<string, unknown>; ingestedAt: number; }

SAM.gov Integration

API Endpoint

https://api.sam.gov/opportunities/v2/search

Required Headers

{ "Accept": "application/json", "X-Api-Key": process.env.SAM_GOV_API_KEY }

Example Query

const params = new URLSearchParams({ postedFrom: "2024-01-01", postedTo: "2024-12-31", limit: "100", offset: "0", ptype: "o", // Opportunities only });

Field Mapping

SAM.gov Field Canonical Field

noticeId

externalId

title

title

description

description

postedDate

postedDate (parse to timestamp)

responseDeadLine

expiryDate (parse to timestamp)

placeOfPerformance.state

location

naicsCode

naicsCode

setAsideDescription

setAside

Convex Implementation

Ingestion Action

// convex/ingestion.ts import { action, internalMutation } from "./_generated/server"; import { v } from "convex/values"; import { internal } from "./_generated/api";

export const ingestFromSam = action({ args: { daysBack: v.optional(v.number()) }, handler: async (ctx, args) => { const apiKey = process.env.SAM_GOV_API_KEY; if (!apiKey) throw new Error("SAM_GOV_API_KEY not configured");

const fromDate = new Date();
fromDate.setDate(fromDate.getDate() - (args.daysBack ?? 7));

const response = await fetch(
  `https://api.sam.gov/opportunities/v2/search?` +
  `api_key=${apiKey}&#x26;postedFrom=${fromDate.toISOString().split("T")[0]}&#x26;limit=100`,
  { headers: { Accept: "application/json" } }
);

if (!response.ok) {
  throw new Error(`SAM.gov API error: ${response.status}`);
}

const data = await response.json();
let ingested = 0;
let updated = 0;

for (const opp of data.opportunitiesData ?? []) {
  const result = await ctx.runMutation(internal.rfps.upsert, {
    externalId: opp.noticeId,
    source: "sam.gov",
    title: opp.title ?? "Untitled",
    description: opp.description ?? "",
    location: opp.placeOfPerformance?.state ?? "USA",
    category: opp.naicsCode ?? "Unknown",
    postedDate: new Date(opp.postedDate).getTime(),
    expiryDate: new Date(opp.responseDeadLine).getTime(),
    url: `https://sam.gov/opp/${opp.noticeId}/view`,
    rawData: opp,
  });

  if (result.action === "inserted") ingested++;
  else updated++;
}

// Log ingestion
await ctx.runMutation(internal.ingestion.logIngestion, {
  source: "sam.gov",
  status: "completed",
  recordsProcessed: data.opportunitiesData?.length ?? 0,
  recordsInserted: ingested,
  recordsUpdated: updated,
});

return { ingested, updated, source: "sam.gov" };

}, });

Upsert Mutation

// convex/rfps.ts (internal mutation) export const upsert = internalMutation({ args: { externalId: v.string(), source: v.string(), title: v.string(), description: v.string(), location: v.string(), category: v.string(), postedDate: v.number(), expiryDate: v.number(), url: v.string(), rawData: v.optional(v.any()), }, handler: async (ctx, args) => { const existing = await ctx.db .query("rfps") .withIndex("by_external_id", (q) => q.eq("externalId", args.externalId).eq("source", args.source) ) .first();

const now = Date.now();

if (existing) {
  await ctx.db.patch(existing._id, { ...args, updatedAt: now });
  return { id: existing._id, action: "updated" as const };
}

const id = await ctx.db.insert("rfps", {
  ...args,
  ingestedAt: now,
  updatedAt: now,
});
return { id, action: "inserted" as const };

}, });

Deduplication Strategy

  • Exact match: externalId
  • source combination
  • Title similarity: Fuzzy match titles within same deadline window

  • URL canonicalization: Normalize URLs before comparison

Eligibility Pre-Filtering

Detect disqualifiers during ingestion:

const DISQUALIFIER_PATTERNS = [ { pattern: /u.?s.?\s*(citizen|company|organization)\sonly/i, flag: "us-org-only" }, { pattern: /onshore\s(only|required)/i, flag: "onshore-required" }, { pattern: /on-?site\s*(required|mandatory)/i, flag: "onsite-required" }, { pattern: /security\sclearance\srequired/i, flag: "clearance-required" }, { pattern: /small\sbusiness\sset[- ]aside/i, flag: "small-business-set-aside" }, ];

function detectEligibilityFlags(text: string): string[] { return DISQUALIFIER_PATTERNS .filter(({ pattern }) => pattern.test(text)) .map(({ flag }) => flag); }

Scheduled Ingestion

// convex/crons.ts import { cronJobs } from "convex/server"; import { internal } from "./_generated/api";

const crons = cronJobs();

crons.interval( "ingest-sam-gov", { hours: 6 }, internal.ingestion.ingestFromSam, { daysBack: 3 } );

export default crons;

Error Handling

Error Type Action

Rate limit (429) Exponential backoff, retry after delay

Auth error (401/403) Log error, alert admin

Server error (5xx) Retry up to 3 times

Parse error Log raw data, skip record

Testing Approach

  • Mock API responses for unit tests

  • Use sandbox/test endpoints when available

  • Validate schema transformation

  • Test deduplication logic

  • Verify eligibility flag detection

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

csv-export

No summary provided by upstream source.

Repository SourceNeeds Review
General

pursuit-brief

No summary provided by upstream source.

Repository SourceNeeds Review
General

proposal-builder

No summary provided by upstream source.

Repository SourceNeeds Review
General

rfp-evaluate

No summary provided by upstream source.

Repository SourceNeeds Review