What is a background job?

A background job is a unit of work that runs outside the main request/response cycle of your application. Instead of making the user wait for a slow operation to complete, you hand the work off to a separate process that runs independently. Common examples include sending emails, generating reports, processing file uploads, and syncing data with third-party APIs.

When should I use a task queue instead of processing inline?

Use a task queue when the operation takes more than a couple of seconds, when it can fail and needs retries, when it involves a third-party service with rate limits, or when you need to schedule work for a specific time. If the user does not need the result immediately, it is almost always better to offload it to a queue.

What is the difference between a message queue and a task queue?

A message queue is a general-purpose system for sending messages between services. A task queue is built on top of a message queue and adds features specifically for job processing, such as retries, scheduling, prioritisation, and progress tracking. Tools like BullMQ and Celery are task queues; RabbitMQ and Redis Streams are message queues.

How do I handle failed background jobs?

Implement exponential backoff for retries, set a maximum retry count, and route permanently failed jobs to a dead letter queue for manual inspection. Log enough context with each job that you can debug failures without reproducing them. Monitoring and alerting on failure rates is essential for catching problems early.

Can I use background jobs with serverless functions?

Yes. Cloud providers offer managed queue services that integrate with serverless. AWS SQS with Lambda, Google Cloud Tasks with Cloud Functions, and Cloudflare Queues with Workers all support this pattern. The trade-off is that you have less control over concurrency and execution environment compared to running your own workers.

The Developer's Guide to Background Jobs and Task Queues

Why Your Request Handler Should Not Do Everything

A user clicks “Generate Report.” Your server starts crunching numbers, querying the database, formatting a PDF, and attaching it to an email. Thirty seconds later, the request times out. The user refreshes, triggers the whole process again, and now you have two reports being generated simultaneously.

This is what happens when you try to do too much inside a single HTTP request. Background jobs solve this problem by moving slow, unreliable, or resource-heavy work out of the request/response cycle entirely.

If you have ever built resilient APIs, you already understand the importance of keeping request handlers fast and predictable. Background jobs are the other half of that equation.

When You Need Background Jobs

Not every operation belongs in a queue. Here is a practical framework for deciding.

Move to a background job when:

The operation takes more than 1-2 seconds
It involves a third-party API that might be slow or rate-limited
It can fail and needs automatic retries
The user does not need the result immediately
It is resource-intensive (CPU, memory, or I/O)
It needs to run on a schedule

Keep it inline when:

The user needs the result before the page renders
The operation is fast and reliable (under 200ms)
Failure means the user action should fail too (like a payment charge)

Operation	Inline or Background?	Why
Sending a welcome email	Background	User does not need to wait; email services can be slow
Resizing an uploaded image	Background	CPU-intensive; user can see a placeholder while it processes
Validating a form	Inline	User needs immediate feedback
Generating a monthly invoice PDF	Background	Can take several seconds; deliver via email or dashboard notification
Checking stock availability	Inline	User needs the answer before proceeding to checkout
Syncing data to a CRM	Background	Third-party API; can retry if it fails

The Architecture of a Task Queue

Every task queue system has three core components.

Producers

The part of your application that creates jobs. When a user signs up, your registration handler produces a “send welcome email” job and places it on the queue.

// Producer: enqueue a job from your API handler
await emailQueue.add('send-welcome', {
  userId: user.id,
  email: user.email,
  template: 'welcome'
});

The Queue (Broker)

The queue stores jobs until a worker is ready to process them. Redis ↗ is the most popular backing store for task queues in the Node.js and Python ecosystems. RabbitMQ ↗ is a dedicated message broker that offers more advanced routing and delivery guarantees.

Workers (Consumers)

Separate processes that pull jobs from the queue and execute them. Workers run independently of your web server, which means a slow job does not block your API.

// Worker: process jobs from the queue
const worker = new Worker('send-welcome', async (job) => {
  const { userId, email, template } = job.data;
  await sendEmail(email, template);
  await markEmailSent(userId, 'welcome');
});

Choosing the Right Tool

The best tool depends on your language, infrastructure, and how much you want to manage yourself.

Self-hosted options

Tool	Language	Broker	Best For
BullMQ ↗	Node.js / TypeScript	Redis	Most Node.js applications; great DX, strong community
Celery ↗	Python	Redis or RabbitMQ	Django and Flask apps; mature and battle-tested
Sidekiq	Ruby	Redis	Rails applications; known for performance
Hangfire	C#	SQL Server or Redis	.NET applications

Managed cloud options

Service	Provider	Best For
Amazon SQS ↗	AWS	Serverless architectures; massive scale
Google Cloud Tasks ↗	GCP	HTTP-based task execution
Cloudflare Queues	Cloudflare	Edge-first architectures with Workers

If you are already running Redis for caching or sessions, BullMQ or Celery with a Redis broker is the lowest-friction option. If you need advanced routing, fan-out, or strict message ordering, RabbitMQ is worth the operational overhead.

Common Patterns

Fire and forget

The simplest pattern. Produce a job and move on. The worker processes it whenever it gets to it. Good for non-critical operations like analytics events, audit logs, or cache warming.

await analyticsQueue.add('page-view', {
  page: '/pricing',
  userId: session.userId,
  timestamp: Date.now()
});
// Response sent immediately; analytics processed later

Request, then poll

For operations where the user eventually needs the result, return a job ID and let the client poll for completion. This keeps the initial response fast while still delivering the result.

// API endpoint: start report generation
const job = await reportQueue.add('monthly-report', {
  accountId: req.params.id,
  month: '2026-03'
});
res.json({ jobId: job.id, status: 'processing' });

// Separate endpoint: check job status
const job = await reportQueue.getJob(req.params.jobId);
res.json({
  status: await job.getState(),
  result: job.returnvalue
});

Scheduled jobs (cron-style)

Run jobs on a recurring schedule. Useful for daily report generation, data cleanup, subscription renewals, and similar tasks. Most task queue libraries support this natively.

await reportQueue.add('daily-digest', { type: 'daily' }, {
  repeat: { cron: '0 7 * * *' } // Every day at 07:00
});

Priority queues

Not all jobs are equally urgent. A password reset email should process before a weekly newsletter. Priority queues let you assign urgency levels so important jobs jump ahead.

// High priority: password reset
await emailQueue.add('password-reset', data, { priority: 1 });

// Low priority: marketing newsletter
await emailQueue.add('newsletter', data, { priority: 10 });

Handling Failures Gracefully

Jobs fail. APIs time out, databases go down, and third-party services return errors. A good task queue strategy assumes failure and plans for it.

Retry with exponential backoff

Instead of retrying immediately (which hammers an already struggling service), increase the delay between each attempt.

const worker = new Worker('sync-crm', processCrmSync, {
  settings: {
    backoffStrategy: (attemptsMade) => {
      return Math.min(1000 * Math.pow(2, attemptsMade), 30000);
    }
  }
});

This gives the failing service time to recover. First retry after 2 seconds, then 4, then 8, capping at 30 seconds.

Dead letter queues

After a job exhausts all its retries, it moves to a dead letter queue (DLQ). This prevents failed jobs from blocking the main queue while preserving them for inspection and manual replay.

Think of the DLQ as a holding area for jobs that need human attention. Good logging and observability make debugging these failures far easier.

Idempotency

If a job might run more than once (due to retries or infrastructure hiccups), make sure running it twice produces the same result as running it once. This is idempotency, and it is essential for background jobs.

async function processPaymentReminder(job) {
  const { invoiceId } = job.data;

  // Check if reminder already sent (idempotency guard)
  const existing = await db.reminders.findOne({
    invoiceId,
    type: 'payment',
    sentAt: { $exists: true }
  });
  if (existing) return { skipped: true };

  await sendReminder(invoiceId);
  await db.reminders.updateOne(
    { invoiceId, type: 'payment' },
    { $set: { sentAt: new Date() } },
    { upsert: true }
  );
}

Scaling Workers

One of the biggest advantages of background jobs is that you can scale workers independently of your web server. If your queue is backing up, add more workers. If traffic is quiet, scale down.

Concurrency

Most task queue libraries let you control how many jobs a single worker processes simultaneously.

const worker = new Worker('image-resize', processImage, {
  concurrency: 5 // Process 5 images at once
});

Be careful with concurrency. CPU-bound tasks (image processing, PDF generation) benefit from fewer concurrent jobs per worker, with more worker instances instead. I/O-bound tasks (API calls, email sending) can safely run higher concurrency on a single worker.

Separate queues for different workloads

Do not put everything in one queue. A slow report generation job should not block time-sensitive email delivery. Use separate queues with dedicated workers.

Queue	Concurrency	Workers	SLA
`email-critical`	10	2	Under 30 seconds
`email-marketing`	5	1	Under 5 minutes
`report-generation`	2	1	Under 1 hour
`data-sync`	3	1	Best effort

This maps naturally to microservice boundaries if your architecture is heading in that direction.

Monitoring and Observability

A queue you cannot see into is a queue that will surprise you. At minimum, track these metrics:

Queue depth: how many jobs are waiting. A growing queue means workers cannot keep up.
Processing time: how long each job type takes. Sudden increases signal a problem.
Failure rate: what percentage of jobs fail. Set alerts for spikes.
Worker utilisation: are workers idle or maxed out? This guides scaling decisions.

BullMQ provides a built-in dashboard through Bull Board. Celery has Flower. For custom setups, export metrics to your existing monitoring stack.

If you are running background jobs in a Docker environment, make sure your container orchestration handles worker health checks and restarts. A crashed worker that nobody notices is worse than no worker at all.

Deploying Workers Alongside Your Application

Workers need to be part of your CI/CD pipeline, just like your web server. A common mistake is deploying application code without restarting workers, which leaves them running stale job handlers.

Graceful shutdown

When deploying new code, workers should finish their current jobs before shutting down. Killing a worker mid-job can leave data in an inconsistent state.

process.on('SIGTERM', async () => {
  await worker.close(); // Finishes current jobs, stops accepting new ones
  process.exit(0);
});

Infrastructure as code

Define your workers in the same infrastructure configuration as the rest of your stack. This ensures they scale together and share the same deployment lifecycle.

Getting Started: A Practical Checklist

If you are adding background jobs to an existing application, here is a step-by-step approach:

Identify the bottleneck. Find the slowest operations in your request handlers. These are your first candidates for background processing.
Pick a tool. If you are using Node.js and already have Redis, start with BullMQ. Python with Django or Flask? Celery is the standard choice.
Start with one queue. Do not over-engineer. Move one operation to a background job, deploy it, and observe how it behaves in production.
Add retries and a dead letter queue. Make sure failed jobs do not disappear silently.
Add monitoring. Even a simple dashboard showing queue depth and failure rate is better than nothing.
Scale when you need to. Add concurrency, then add workers, then add separate queues. In that order.

Background jobs are one of those patterns that, once you start using them, you wonder how you managed without them. They make your applications faster for users, more resilient to failures, and easier to scale. Start small, keep your jobs idempotent, and make sure you can see what is happening in your queues.

The Developer's Guide to Background Jobs and Task Queues

Why Your Request Handler Should Not Do Everything

When You Need Background Jobs

The Architecture of a Task Queue

Producers

The Queue (Broker)

Workers (Consumers)

Choosing the Right Tool

Self-hosted options

Managed cloud options

Common Patterns

Fire and forget

Request, then poll

Scheduled jobs (cron-style)

Priority queues

Handling Failures Gracefully

Retry with exponential backoff

Dead letter queues

Idempotency

Scaling Workers

Concurrency

Separate queues for different workloads

Monitoring and Observability

Deploying Workers Alongside Your Application

Graceful shutdown

Infrastructure as code

Getting Started: A Practical Checklist

Frequently asked questions

Comments

The Developer's Guide to Background Jobs and Task Queues

Why Your Request Handler Should Not Do Everything

When You Need Background Jobs

The Architecture of a Task Queue

Producers

The Queue (Broker)

Workers (Consumers)

Choosing the Right Tool

Self-hosted options

Managed cloud options

Common Patterns

Fire and forget

Request, then poll

Scheduled jobs (cron-style)

Priority queues

Handling Failures Gracefully

Retry with exponential backoff

Dead letter queues

Idempotency

Scaling Workers

Concurrency

Separate queues for different workloads

Monitoring and Observability

Deploying Workers Alongside Your Application

Graceful shutdown

Infrastructure as code

Getting Started: A Practical Checklist

Frequently asked questions

Comments

Related articles

How to Design Idempotent API Endpoints

How to Design a Database Schema That Scales

The Developer's Guide to API Rate Limiting