The Developer's Guide to WebSockets

Most web applications are built on a request/response cycle: the client asks, the server answers. That model breaks down the moment you need data to flow in both directions without the client constantly polling. WebSockets solve this by establishing a persistent, full-duplex connection between client and server.

This guide covers when WebSockets are the right tool, how to implement them properly, and the pitfalls that catch teams out in production.

What WebSockets Actually Are

The WebSocket protocol ↗ (defined in RFC 6455 ↗) upgrades an HTTP connection to a persistent TCP connection. After the initial handshake, both client and server can send messages at any time without the overhead of new HTTP requests.

The key differences from HTTP:

  • Persistent connection: No repeated handshakes or connection setup
  • Full duplex: Both sides send and receive simultaneously
  • Low overhead: Message framing is minimal compared to HTTP headers
  • Event driven: Messages arrive as they happen, not when polled

A standard HTTP request includes headers that can easily reach 1 to 2 KB per request. A WebSocket frame adds just 2 to 14 bytes of overhead. When you are sending hundreds of small messages per second, that difference matters.

When to Use WebSockets

WebSockets are not a replacement for REST. They solve a specific set of problems:

Use CaseWhy WebSocketsAlternative
Live chatBoth sides send messages in real timeLong polling (higher latency)
Collaborative editingChanges must propagate instantly to all usersPeriodic sync (conflicts likely)
Live dashboardsServer pushes metric updates as they happenSSE (if one-way is sufficient)
Multiplayer gamesLow latency, bidirectional state updatesNone practical at scale
Financial tickersHigh frequency, server-pushed price updatesSSE or polling (higher latency)
NotificationsServer pushes alerts without client pollingSSE (simpler for one-way)

If your use case only needs server-to-client updates, consider Server-Sent Events (SSE) ↗ first. SSE is simpler, works over standard HTTP, handles reconnection automatically, and is sufficient for dashboards, notifications, and live feeds.

A Basic WebSocket Server in Node.js

The ws library ↗ is the most widely used WebSocket implementation for Node.js. Here is a minimal server:

import { WebSocketServer } from 'ws';

const wss = new WebSocketServer({ port: 8080 });

wss.on('connection', (ws, req) => {
  console.log(`Client connected from ${req.socket.remoteAddress}`);

  ws.on('message', (data) => {
    const message = JSON.parse(data);
    // Broadcast to all other connected clients
    wss.clients.forEach((client) => {
      if (client !== ws && client.readyState === 1) {
        client.send(JSON.stringify(message));
      }
    });
  });

  ws.on('close', () => {
    console.log('Client disconnected');
  });
});

On the client side, the browser’s built-in WebSocket API handles the connection:

const ws = new WebSocket('ws://localhost:8080');

ws.addEventListener('open', () => {
  ws.send(JSON.stringify({ type: 'chat', text: 'Hello' }));
});

ws.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);
  console.log('Received:', data);
});

This gets you running, but production usage requires considerably more thought.

Connection Management Patterns

The biggest source of WebSocket bugs is poor connection lifecycle management. Connections drop, servers restart, and networks change. Your code needs to handle all of it.

Heartbeats

TCP connections can silently die without either side knowing. A proxy, firewall, or load balancer might close an idle connection. Heartbeats (ping/pong frames) detect dead connections early:

const HEARTBEAT_INTERVAL = 30000;

wss.on('connection', (ws) => {
  ws.isAlive = true;

  ws.on('pong', () => {
    ws.isAlive = true;
  });
});

const interval = setInterval(() => {
  wss.clients.forEach((ws) => {
    if (!ws.isAlive) {
      return ws.terminate();
    }
    ws.isAlive = false;
    ws.ping();
  });
}, HEARTBEAT_INTERVAL);

Without heartbeats, you accumulate zombie connections that consume memory and file descriptors. This is one of the most common issues teams hit when moving from development to production. If you have not already read it, the guide to effective error handling covers patterns that complement this approach well.

Client-Side Reconnection

Clients must reconnect automatically when connections drop. Exponential backoff prevents thundering herd problems when a server restarts and thousands of clients try to reconnect simultaneously:

class WebSocketClient {
  constructor(url) {
    this.url = url;
    this.retryCount = 0;
    this.maxRetries = 10;
    this.connect();
  }

  connect() {
    this.ws = new WebSocket(this.url);

    this.ws.addEventListener('open', () => {
      this.retryCount = 0;
    });

    this.ws.addEventListener('close', () => {
      this.reconnect();
    });
  }

  reconnect() {
    if (this.retryCount >= this.maxRetries) return;

    const delay = Math.min(1000 * Math.pow(2, this.retryCount), 30000);
    const jitter = delay * 0.2 * Math.random();

    setTimeout(() => {
      this.retryCount++;
      this.connect();
    }, delay + jitter);
  }
}

The jitter is important. Without it, all clients back off to the same intervals and create periodic traffic spikes.

Message Queuing

Messages sent while reconnecting are lost by default. If message delivery matters, queue outbound messages and flush them after reconnection:

class WebSocketClient {
  constructor(url) {
    this.queue = [];
    // ... connection setup
  }

  send(data) {
    if (this.ws.readyState === WebSocket.OPEN) {
      this.ws.send(JSON.stringify(data));
    } else {
      this.queue.push(data);
    }
  }

  flushQueue() {
    while (this.queue.length > 0) {
      const message = this.queue.shift();
      this.ws.send(JSON.stringify(message));
    }
  }
}

For critical applications, pair this with server-side message acknowledgements so the client knows which messages were actually received.

Structuring WebSocket Messages

Raw strings are fragile. Define a message protocol early to avoid a tangled mess of if statements:

// Define message types
const MessageType = {
  CHAT: 'chat',
  TYPING: 'typing',
  PRESENCE: 'presence',
  ERROR: 'error',
};

// Server-side message handler
function handleMessage(ws, raw) {
  const message = JSON.parse(raw);

  const handlers = {
    [MessageType.CHAT]: handleChat,
    [MessageType.TYPING]: handleTyping,
    [MessageType.PRESENCE]: handlePresence,
  };

  const handler = handlers[message.type];

  if (!handler) {
    ws.send(JSON.stringify({
      type: MessageType.ERROR,
      payload: { message: 'Unknown message type' },
    }));
    return;
  }

  handler(ws, message.payload);
}

This pattern scales cleanly as you add message types. It also makes it straightforward to add validation, logging, and rate limiting per message type.

Scaling Beyond a Single Server

A single WebSocket server works fine for prototyping, but production systems need horizontal scaling. The challenge is that a connection lives on one specific server. When user A sends a message that needs to reach user B, and they are connected to different servers, you need a way to route that message.

The Pub/Sub Pattern

The standard solution is an external message broker. Redis Pub/Sub ↗ is a common choice:

Client A Client B WS Server 1 Redis Pub/Sub WS Server 2 Client C

Each server subscribes to a Redis channel. When a message arrives on any server, it publishes to Redis. Every server receives the message and forwards it to their locally connected clients.

import { createClient } from 'redis';
import { WebSocketServer } from 'ws';

const pub = createClient();
const sub = createClient();
await pub.connect();
await sub.connect();

const wss = new WebSocketServer({ port: 8080 });

// Subscribe to the broadcast channel
await sub.subscribe('chat', (message) => {
  wss.clients.forEach((client) => {
    if (client.readyState === 1) {
      client.send(message);
    }
  });
});

// When a client sends a message, publish to Redis
wss.on('connection', (ws) => {
  ws.on('message', (data) => {
    pub.publish('chat', data.toString());
  });
});

This approach scales horizontally. Add more WebSocket servers behind a load balancer, and Redis ensures messages reach every connected client. For a deeper look at building APIs that handle failure gracefully, see the guide to retry and circuit breaker patterns.

Security Considerations

WebSocket connections bypass many of the protections built into HTTP frameworks. You need to handle security explicitly.

Authentication

Authenticate during the HTTP upgrade handshake, not after the connection is established:

import { WebSocketServer } from 'ws';
import { verifyToken } from './auth.js';

const wss = new WebSocketServer({
  port: 8080,
  verifyClient: async ({ req }, done) => {
    const token = req.headers['authorization']?.split(' ')[1];

    if (!token) {
      done(false, 401, 'Unauthorized');
      return;
    }

    try {
      const user = await verifyToken(token);
      req.user = user;
      done(true);
    } catch {
      done(false, 403, 'Forbidden');
    }
  },
});

Rate Limiting

Without rate limiting, a single client can flood your server. Track message counts per connection and disconnect abusive clients:

const RATE_LIMIT = 50; // messages per window
const WINDOW_MS = 10000;

wss.on('connection', (ws) => {
  ws.messageCount = 0;
  ws.windowStart = Date.now();

  ws.on('message', (data) => {
    const now = Date.now();

    if (now - ws.windowStart > WINDOW_MS) {
      ws.messageCount = 0;
      ws.windowStart = now;
    }

    ws.messageCount++;

    if (ws.messageCount > RATE_LIMIT) {
      ws.close(1008, 'Rate limit exceeded');
      return;
    }

    // Process message
  });
});

The API rate limiting guide covers broader rate limiting strategies if you want to go deeper.

Input Validation

Never trust incoming WebSocket messages. Validate and sanitise everything:

function validateChatMessage(data) {
  if (typeof data.text !== 'string') return false;
  if (data.text.length === 0 || data.text.length > 2000) return false;
  if (typeof data.room !== 'string') return false;
  return true;
}

This is especially important because WebSocket messages often bypass middleware stacks that handle validation for HTTP routes. For a broader view of authentication approaches that work well with WebSockets, see the authentication patterns guide.

Common Pitfalls

Not Handling Backpressure

If the server sends messages faster than the client can process them, the send buffer grows until the server runs out of memory. Check the bufferedAmount property on the client and implement flow control on the server:

function safeSend(ws, data) {
  if (ws.bufferedAmount > 1024 * 1024) {
    // 1 MB buffer, skip or queue
    return false;
  }
  ws.send(data);
  return true;
}

Ignoring Close Codes

WebSocket close codes tell you why a connection ended. Use them for debugging and for deciding whether to reconnect:

CodeMeaningReconnect?
1000Normal closureNo
1001Going away (page navigation)No
1006Abnormal closure (no close frame)Yes
1008Policy violationNo
1011Server errorYes, with backoff
1012Server restartingYes, with backoff

Forgetting About Proxies

Nginx, Cloudflare, and AWS ALB all require specific configuration to proxy WebSocket connections. Nginx, for example, needs explicit upgrade headers:

location /ws {
    proxy_pass http://backend;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_read_timeout 86400s;
}

The proxy_read_timeout is critical. Without it, Nginx closes idle connections after 60 seconds by default, which kills long-lived WebSocket connections.

Libraries Worth Knowing

Rather than building everything from scratch, consider these options:

  • Socket.IO ↗: Adds automatic reconnection, rooms, namespaces, and fallback to long polling. Heavier than raw WebSockets, but handles many edge cases out of the box
  • ws (Node.js): Lightweight, fast, production-ready. Use this when you want full control
  • Cloudflare Durable Objects: If you are already on Cloudflare, Durable Objects provide WebSocket support with built-in state, no Redis needed

The right choice depends on your requirements. If you need rooms, presence, and reconnection logic immediately, Socket.IO saves significant development time. If you need minimal overhead and full control, raw ws is the better foundation.

Monitoring WebSocket Connections

WebSocket issues are harder to debug than HTTP issues because connections are long-lived and stateful. Track these metrics:

  • Active connections: Total count across all servers
  • Connection duration: How long connections typically live
  • Message throughput: Messages per second, inbound and outbound
  • Error rate: Close codes, failed upgrades, authentication failures
  • Latency: Time between send and acknowledgement

If you are building out your observability stack, the observability vs monitoring guide covers how these metrics fit into a broader strategy.

When Not to Use WebSockets

WebSockets add complexity. Before reaching for them, ask whether a simpler approach works:

  • Polling every 30 seconds is fine? Use REST with setInterval
  • Only need server-to-client updates? Use Server-Sent Events
  • Low frequency updates? HTTP is simpler and more debuggable
  • Users only need fresh data on page load? Standard REST with cache headers

WebSockets shine when latency matters and data flows in both directions. For everything else, HTTP is simpler to build, test, deploy, and debug.

Getting Started

If you want to add real-time features to an existing application:

  1. Start with the simplest transport that solves your problem (often SSE, not WebSockets)
  2. Add heartbeats and reconnection from day one, not after your first production outage
  3. Plan for horizontal scaling early, even if you only run one server today
  4. Define your message protocol before writing any handler code
  5. Monitor connection counts and message throughput so you know your baseline before problems arise

WebSockets are a powerful tool when used for the right problems. The key is knowing when they are genuinely needed and building in resilience from the start.

Frequently asked questions

When should I use WebSockets instead of REST?

Use WebSockets when you need the server to push data to the client without the client requesting it first. Common examples include live chat, multiplayer games, collaborative editing, real-time dashboards, and live notifications. If your data only changes when the user takes an action (submitting a form, clicking a button), REST is simpler and more appropriate.

What is the difference between WebSockets and Server-Sent Events?

Server-Sent Events (SSE) provide a one-way channel from server to client over a standard HTTP connection. WebSockets provide a full-duplex, two-way channel. SSE is simpler to implement and works well for notifications, live feeds, and dashboards where the client only needs to receive updates. WebSockets are better when both sides need to send messages, such as in chat applications or collaborative tools.

Can WebSockets work behind a load balancer?

Yes, but you need sticky sessions or an external message broker. Because a WebSocket connection is persistent, all messages for a given client must reach the same server process. Sticky sessions route a client to the same backend. Alternatively, a pub/sub system like Redis lets any server broadcast to any connected client regardless of which server holds the connection.

How many concurrent WebSocket connections can a server handle?

A single modern server can handle tens of thousands of concurrent WebSocket connections. Each idle connection uses very little memory (roughly 10 to 50 KB depending on the library and buffering). The practical limit depends on your message throughput, payload size, server memory, and the work each message triggers. Load testing your specific workload is the only reliable way to find your ceiling.

Do WebSockets work on mobile networks?

Yes, but mobile connections are less reliable. Cellular networks frequently drop connections, switch between Wi-Fi and mobile data, and introduce higher latency. You need robust reconnection logic with exponential backoff, and your protocol should handle missed messages gracefully. A message queue or sequence numbering approach helps ensure clients catch up after a reconnect.

Enjoyed this article? Get more developer tips straight to your inbox.

Comments

Join the conversation. Share your experience or ask a question below.

0/1000

No comments yet. Be the first to share your thoughts.