Handling Disconnects Gracefully: Reconnection and Message Replay

Every real-time application eventually has to answer a simple but uncomfortable question: what happens when the connection drops?

The honest answer, for most applications, is: not much good. The client reconnects, but it has no idea what happened while it was offline. It either shows stale data, re-fetches everything from the REST API (expensive and slow), or just leaves the user staring at whatever was on screen when the disconnect happened. None of those are good user experiences.

Reconnection and message replay solve this properly. This post explains why disconnects happen, what the naive recovery approaches get wrong, and how a well-designed replay system makes disconnects invisible to users.

Why Disconnects Happen

Disconnects are not edge cases. They are a normal part of operating a real-time application in the real world.

Network switching. A user walks from their desk to a conference room. Their laptop drops Wi-Fi and picks up a different access point. That transition takes a few seconds and the WebSocket connection dies mid-handoff. The OS may not even surface this to the application immediately.

Mobile sleep. On iOS and Android, backgrounding an app causes the OS to aggressively suspend network activity to preserve battery. The WebSocket connection is severed. When the user brings the app to the foreground, it needs to reconnect and catch up on what it missed.

Train tunnels and dead zones. Users move through the physical world. Cellular coverage has gaps. A user reading a chat app on a commute will pass through areas with no signal. The connection drops and comes back, potentially multiple times in a single session.

Server deploys. Rolling out a new version of your backend closes existing WebSocket connections. If you deploy daily, every user gets disconnected once per day. If your reconnection story is poor, that is a daily UX degradation for your entire user base.

Idle timeouts. Load balancers, reverse proxies, and NAT gateways close idle connections. A user who has a chat window open but has not sent a message in 10 minutes may find their connection silently dropped by an intermediate piece of network infrastructure.

The Reconnection Problem

Reconnecting the WebSocket is the easy part. Any decent client SDK handles exponential backoff and reconnection automatically. The hard part is what comes after: the gap.

The client was connected, received messages up to sequence number 1,042, then lost the connection for 90 seconds. During those 90 seconds, messages 1,043 through 1,081 were published to the topic. The client reconnects and resumes receiving from message 1,082 onward. Messages 1,043 through 1,081 are gone forever. The client's state is now inconsistent with reality.

In a chat application, those 39 messages might be a whole conversation thread the user missed. In a collaborative document, they are edits from other users that the local copy does not reflect. In a live trading dashboard, they are price updates that left the displayed prices stale.

Naive Approaches and Why They Fall Short

Re-fetching Everything

The simplest recovery is to call your REST API on reconnect and reload the full state. This works, but it is slow and expensive. A chat room with 500 messages means fetching all 500 on every reconnect. A dashboard with 200 data points means 200 API calls or one large query. At scale, every server deploy triggers a stampede of re-fetch requests from every connected client simultaneously.

Ignoring the Gap

Some applications simply pick up from where they reconnect and show nothing for the gap period. This is the worst option. The user sees a discontinuity in their data without any indication of what they missed. In a chat app, conversation threads jump forward inexplicably. In a notification feed, important alerts are silently lost.

Client-Side Timestamps

A slightly better approach is for the client to send its last-received timestamp on reconnect and ask the server to resend anything newer. This is closer to correct but has problems. Client clocks drift. Timestamps can collide or reorder under high throughput. And it requires custom server-side logic per application to query and replay by timestamp.

Proper Replay: Sequence-Based Delivery

The right solution is sequence-based message delivery with server-side retention.

Every message published to a topic gets a monotonically increasing sequence number, assigned by the broker, not the client. The broker retains messages for a configurable window, say 24 hours or 7 days. When a client reconnects, it sends its last acknowledged sequence number. The broker replays everything from that point forward.

import { NoLag } from '@nolag/js-sdk'

const client = NoLag('YOUR_TOKEN')

// Connect to your app and room
const room = client.setApp('my-app').setRoom('room_abc')

// The SDK tracks the last received sequence number automatically.
// On reconnect, it sends this number and the broker replays missed messages.
room.subscribe('chat')

room.on('chat', (message) => {
  if (message.isReplay) {
    // This message was missed during a disconnect.
    // Add it to the UI silently, do not trigger a notification.
    appendMessageSilently(message.payload)
  } else {
    // This is a live message, trigger normal notification behavior.
    appendMessage(message.payload)
    showNotificationIfNeeded(message.payload)
  }
})

The sequence number is managed by the SDK, not your application code. You do not need to store it, persist it, or send it manually. The SDK tracks the last acknowledged sequence number per subscription and handles the replay request on reconnect transparently.

The isReplay Flag

The isReplay flag is a small but important detail that most teams get wrong the first time.

When a user reconnects and the broker replays 50 missed messages, those messages need to be added to the UI to restore consistency. But they should not trigger notifications. If a chat app shows a desktop notification for every replayed message, a user who lost connectivity for an hour might come back to 200 notification popups for messages they already saw on another device, or messages from a conversation they have muted. That is a horrible experience.

The isReplay flag tells your message handler to distinguish between live messages (trigger notifications, animate in) and replayed messages (restore state silently). Live messages deserve the full treatment: badge counts increment, notifications fire, scroll behavior updates. Replayed messages should just appear in their correct position in the timeline, quietly.

function handleChatMessage(message: NoLagMessage) {
  // Always add to the message list
  chatStore.addMessage(message.payload)

  // Only do notification side effects for live messages
  if (!message.isReplay) {
    badgeCount.increment()
    if (document.hidden) {
      showDesktopNotification(message.payload)
    }
    if (isScrolledToBottom()) {
      scrollToLatestMessage()
    }
  }
}

Configurable Retention Windows

Retention windows are a tradeoff between coverage and storage cost. A longer window means clients that were offline for longer can catch up. A shorter window is cheaper to operate.

The right window depends on your use case. A chat application where users check in daily probably wants at least 24 hours of retention. A live trading dashboard where data older than a few minutes is worthless might want 5 minutes. An IoT alert stream might want 7 days to handle devices that lose connectivity for extended periods.

If a client reconnects after the retention window has expired, full replay is not possible. The SDK detects this and you can handle it in your application, typically by triggering a fresh data load from your REST API for the affected subscriptions. This is the same as the re-fetch approach, but it only fires when the gap is too large to replay, not on every reconnect.

How This Works in NoLag SDKs

In the NoLag client SDK, all of this is automatic. The SDK maintains the last sequence number per subscription in memory. On reconnect, it sends the sequence number to the broker and receives replayed messages before live messages resume. Messages are delivered to your onMessage handler in order, with the isReplay flag set correctly.

You do not write reconnection logic. You do not manage sequence numbers. You do not implement retry backoff. The SDK handles the transport layer and gives you a clean, ordered stream of messages with enough context to handle replayed vs live messages differently. Your application code only sees the message stream.