background-job-orchestrator

Background Job Orchestrator

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "background-job-orchestrator" with this command: npx skills add curiositech/some_claude_skills/curiositech-some-claude-skills-background-job-orchestrator

Background Job Orchestrator

Expert in designing and implementing production-grade background job systems that handle long-running tasks without blocking API responses.

When to Use

✅ Use for:

  • Long-running tasks (email sends, report generation, image processing)

  • Batch operations (bulk imports, exports, data migrations)

  • Scheduled tasks (daily digests, cleanup jobs, recurring reports)

  • Tasks requiring retry logic (external API calls, flaky operations)

  • Priority-based processing (premium users first, critical alerts)

  • Rate-limited operations (API quotas, third-party service limits)

❌ NOT for:

  • Real-time bidirectional communication (use WebSockets)

  • Sub-second latency requirements (use in-memory caching)

  • Simple delays (setTimeout is fine for <5 seconds)

  • Synchronous API responses (keep logic in request handler)

Quick Decision Tree

Does this task: ├── Take &gt;5 seconds? → Background job ├── Need to retry on failure? → Background job ├── Run on a schedule? → Background job (cron pattern) ├── Block user interaction? → Background job ├── Process in batches? → Background job └── Return immediately? → Keep synchronous

Technology Selection

Node.js: BullMQ (Recommended 2024+)

When to use:

  • TypeScript project

  • Redis already in stack

  • Need advanced features (rate limiting, priorities, repeatable jobs)

Why BullMQ over Bull:

  • Bull (v3) → BullMQ (v4+): Complete rewrite in TypeScript

  • Better Redis connection handling

  • Improved concurrency and performance

  • Active maintenance (Bull is in maintenance mode)

Python: Celery

When to use:

  • Python/Django project

  • Need distributed task execution

  • Complex workflows (chains, groups, chords)

Alternatives:

  • RQ (Redis Queue): Simpler, fewer features

  • Dramatiq: Modern, less ecosystem

  • Huey: Lightweight, good for small projects

Cloud-Native: AWS SQS, Google Cloud Tasks

When to use:

  • Serverless architecture

  • Don't want to manage Redis/RabbitMQ

  • Need guaranteed delivery and dead-letter queues

Common Anti-Patterns

Anti-Pattern 1: No Dead Letter Queue

Novice thinking: "Retry 3 times, then fail silently"

Problem: Failed jobs disappear with no visibility or recovery path.

Correct approach:

// BullMQ with dead letter queue const queue = new Queue('email-queue', { connection: redis, defaultJobOptions: { attempts: 3, backoff: { type: 'exponential', delay: 2000 }, removeOnComplete: 100, // Keep last 100 successful removeOnFail: false // Keep all failed for inspection } });

// Monitor failed jobs const failedJobs = await queue.getFailed();

Timeline:

  • Pre-2020: Retry and forget

  • 2020+: Dead letter queues standard

  • 2024+: Observability for job failures required

Anti-Pattern 2: Synchronous Job Processing

Symptom: API endpoint waits for job completion

Problem:

// ❌ WRONG - Blocks API response app.post('/send-email', async (req, res) => { await sendEmail(req.body.to, req.body.subject); res.json({ success: true }); });

Why wrong: Timeout, poor UX, wastes server resources

Correct approach:

// ✅ RIGHT - Queue and return immediately app.post('/send-email', async (req, res) => { const job = await emailQueue.add('send', { to: req.body.to, subject: req.body.subject });

res.json({ success: true, jobId: job.id, status: 'queued' }); });

// Separate worker processes the job worker.process('send', async (job) => { await sendEmail(job.data.to, job.data.subject); });

Anti-Pattern 3: No Idempotency

Problem: Job runs twice → duplicate charges, double emails

Why it happens:

  • Redis connection drops mid-processing

  • Worker crashes before job completion

  • Job timeout triggers retry while still running

Correct approach:

// ✅ Idempotent job with deduplication key await queue.add('charge-payment', { userId: 123, amount: 50.00 }, { jobId: payment-${orderId}, // Prevents duplicates attempts: 3 });

// In worker: Check if already processed worker.process('charge-payment', async (job) => { const { userId, amount } = job.data;

// Check idempotency const existing = await db.payments.findOne({ jobId: job.id }); if (existing) { return existing; // Already processed }

// Process payment const result = await stripe.charges.create({...});

// Store idempotency record await db.payments.create({ jobId: job.id, result });

return result; });

Anti-Pattern 4: No Rate Limiting

Problem: Overwhelm third-party APIs or exhaust quotas

Symptom: "Rate limit exceeded" errors from Sendgrid, Stripe, etc.

Correct approach:

// BullMQ rate limiting const queue = new Queue('api-calls', { limiter: { max: 100, // Max 100 jobs duration: 60000 // Per 60 seconds } });

// Or: Priority-based rate limits await queue.add('send-email', data, { priority: user.isPremium ? 1 : 10, rateLimiter: { max: user.isPremium ? 1000 : 100, duration: 3600000 // Per hour } });

Anti-Pattern 5: Forgetting Worker Scaling

Problem: Single worker can't keep up with queue depth

Symptom: Queue backs up, jobs delayed hours/days

Correct approach:

// Horizontal scaling with multiple workers const worker = new Worker('email-queue', async (job) => { await processEmail(job.data); }, { connection: redis, concurrency: 5 // Process 5 jobs concurrently per worker });

// Run multiple worker processes (PM2, Kubernetes, etc.) // Each worker processes concurrency * num_workers jobs

Monitoring:

// Set up alerts for queue depth setInterval(async () => { const waiting = await queue.getWaitingCount(); if (waiting > 1000) { alert('Queue depth exceeds 1000, scale workers!'); } }, 60000);

Implementation Patterns

Pattern 1: Email Campaigns

// Queue setup const emailQueue = new Queue('email-campaign', { connection: redis });

// Enqueue batch async function sendCampaign(userIds: number[], template: string) { const jobs = userIds.map(userId => ({ name: 'send', data: { userId, template }, opts: { attempts: 3, backoff: { type: 'exponential', delay: 5000 } } }));

await emailQueue.addBulk(jobs); }

// Worker with retry logic const worker = new Worker('email-campaign', async (job) => { const { userId, template } = job.data;

const user = await db.users.findById(userId); const email = renderTemplate(template, user);

try { await sendgrid.send({ to: user.email, subject: email.subject, html: email.body }); } catch (error) { if (error.code === 'ECONNREFUSED') { throw error; // Retry } // Invalid email, don't retry console.error(Invalid email for user ${userId}); } }, { connection: redis, concurrency: 10 });

Pattern 2: Scheduled Reports

// Daily report at 9 AM await queue.add('daily-report', { type: 'sales', recipients: ['admin@company.com'] }, { repeat: { pattern: '0 9 * * *', // Cron syntax tz: 'America/New_York' } });

// Worker generates and emails report worker.process('daily-report', async (job) => { const { type, recipients } = job.data;

const data = await generateReport(type); const pdf = await createPDF(data);

await emailQueue.add('send', { to: recipients, subject: Daily ${type} Report, attachments: [{ filename: 'report.pdf', content: pdf }] }); });

Pattern 3: Video Transcoding Pipeline

// Multi-stage job with progress tracking await videoQueue.add('transcode', { videoId: 123, formats: ['720p', '1080p', '4k'] }, { attempts: 2, timeout: 3600000 // 1 hour timeout });

worker.process('transcode', async (job) => { const { videoId, formats } = job.data;

for (let i = 0; i < formats.length; i++) { const format = formats[i];

// Update progress
await job.updateProgress((i / formats.length) * 100);

// Transcode
await ffmpeg.transcode(videoId, format);

}

await job.updateProgress(100); });

// Client polls for progress app.get('/videos/:id/status', async (req, res) => { const job = await queue.getJob(req.params.jobId); res.json({ state: await job.getState(), progress: job.progress }); });

Monitoring & Observability

Essential Metrics

// Queue health dashboard async function getQueueMetrics() { const [waiting, active, completed, failed, delayed] = await Promise.all([ queue.getWaitingCount(), queue.getActiveCount(), queue.getCompletedCount(), queue.getFailedCount(), queue.getDelayedCount() ]);

return { waiting, // Jobs waiting to be processed active, // Jobs currently processing completed, // Successfully completed failed, // Failed after retries delayed, // Scheduled for future health: waiting < 1000 && failed < 100 ? 'healthy' : 'degraded' }; }

BullMQ Board (UI)

// Development: Monitor jobs visually import { createBullBoard } from '@bull-board/api'; import { BullMQAdapter } from '@bull-board/api/bullMQAdapter'; import { ExpressAdapter } from '@bull-board/express';

const serverAdapter = new ExpressAdapter();

createBullBoard({ queues: [ new BullMQAdapter(emailQueue), new BullMQAdapter(videoQueue) ], serverAdapter });

app.use('/admin/queues', serverAdapter.getRouter()); // Visit http://localhost:3000/admin/queues

Production Checklist

□ Dead letter queue configured □ Retry strategy with exponential backoff □ Job timeout limits set □ Rate limiting for third-party APIs □ Idempotency keys for critical operations □ Worker concurrency tuned (CPU cores * 2) □ Horizontal scaling configured (multiple workers) □ Queue depth monitoring with alerts □ Failed job inspection workflow □ Job data doesn't contain PII in logs □ Redis persistence enabled (AOF or RDB) □ Graceful shutdown handling (SIGTERM)

When to Use vs Avoid

Scenario Use Background Jobs?

Send welcome email on signup ✅ Yes - can take 2-5 seconds

Charge credit card ⚠️ Maybe - depends on payment provider latency

Generate PDF report (30 seconds) ✅ Yes - definitely background

Fetch user profile from DB ❌ No - milliseconds, keep synchronous

Process video upload (5 minutes) ✅ Yes - always background

Validate form input ❌ No - synchronous validation

Daily cron job ✅ Yes - use repeatable jobs

Real-time chat message ❌ No - use WebSockets

Technology Comparison

Feature BullMQ Celery AWS SQS

Language Node.js Python Any (HTTP API)

Backend Redis Redis/RabbitMQ/SQS Managed

Priorities ✅ ✅ ✅

Rate Limiting ✅ ❌ ✅ (via attributes)

Repeat/Cron ✅ ✅ (celery-beat) ❌ (use EventBridge)

UI Dashboard Bull Board Flower CloudWatch

Workflows ❌ ✅ (chains, groups) ❌

Learning Curve Medium Medium Low

Cost Redis hosting Redis hosting $0.40/million requests

References

  • /references/bullmq-patterns.md

  • Advanced BullMQ patterns and examples

  • /references/celery-workflows.md

  • Celery chains, groups, and chords

  • /references/job-observability.md

  • Monitoring, alerting, and debugging

Scripts

  • scripts/setup_bullmq.sh

  • Initialize BullMQ with Redis

  • scripts/queue_health_check.ts

  • Queue metrics dashboard

  • scripts/retry_failed_jobs.ts

  • Bulk retry failed jobs

This skill guides: Background job implementation | Queue architecture | Retry strategies | Worker scaling | Job observability

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

background-job-orchestrator

No summary provided by upstream source.

Repository SourceNeeds Review
General

video-processing-editing

No summary provided by upstream source.

Repository SourceNeeds Review
General

interior-design-expert

No summary provided by upstream source.

Repository SourceNeeds Review
General

neobrutalist-web-designer

No summary provided by upstream source.

Repository SourceNeeds Review