← Back to blog
ARCHITECTURE9 min read

IoT Fleet Tracking Over WebSockets: Architecture for 100K Devices

HB
Henco Burger
March 10, 2026

Fleet tracking is one of the most demanding real-time workloads you can build. You have a large number of devices, each publishing position updates at a high rate, and a much smaller number of dashboards that need to display a subset of those devices in real time. The architecture has to handle constant inbound telemetry, selective outbound delivery, variable device connectivity, and occasional high-priority alert messages, all at the same time.

This post walks through a practical architecture for fleet tracking at the 100,000-device scale, including topic design, QoS choices, reconnection handling, and how to keep your infrastructure from becoming a bottleneck.

The Core Challenge

Fleet tracking has a few properties that make it harder than most real-time workloads.

Volume. A fleet of 100,000 vehicles publishing GPS positions every 5 seconds generates 20,000 messages per second. That is constant, not bursty. Your infrastructure needs to sustain that throughput indefinitely, not just survive a spike.

Fanout asymmetry. A single dispatcher dashboard might be watching 50 vehicles. Another might be watching 200. A fleet manager wants an overview of all 100,000. The delivery requirements vary enormously between subscribers, but the ingest pipeline is the same for all devices.

Variable connectivity. Fleet devices are not sitting on a stable office Wi-Fi connection. They are in tunnels, switching between cellular towers, parked in areas with poor signal, and sometimes simply powered off for hours at a time. Your architecture needs to handle reconnections gracefully without losing critical events.

Mixed message criticality. A GPS position update that is 10 seconds old is almost worthless. A geofence breach alert from 10 seconds ago is still very much worth delivering. These two message types need different handling.

Topic Design: Fleet vs Vehicle

The first architecture decision is how to organize topics. There are two main patterns.

One topic per vehicle: Each device publishes to its own topic, for example vehicle.v_abc123. Dashboards subscribe to individual vehicle topics for the vehicles they are tracking.

One topic per fleet with filtered subscriptions: All vehicles in a fleet publish to a shared topic, for example fleet.f_xyz. Dashboards subscribe to the fleet topic with a filter list specifying which vehicle IDs they want updates for.

For most workloads, the filtered subscription approach is better. It keeps the subscription count manageable. A dashboard watching 200 vehicles opens one subscription with 200 filters, rather than 200 individual subscriptions. Subscription overhead adds up: connection tracking, heartbeat management, and routing table size all scale with subscription count.

import { NoLag } from '@nolag/js-sdk'

const client = NoLag('YOUR_TOKEN')
const room = client.setApp('fleet-tracker').setRoom('fleet-f_xyz')

// Device publishes its position
room.publish('positions', {
  key: 'vehicle.v_abc123',
  payload: {
    vehicleId: 'v_abc123',
    lat: -33.8688,
    lng: 151.2093,
    speed: 62,
    heading: 287,
    timestamp: Date.now(),
  },
  qos: 0, // fire and forget for position updates
})

// Dashboard subscribes to positions for visible vehicles only
room.subscribe('positions', {
  filters: visibleVehicleIds,
})

room.on('positions', (msg) => updateVehicleMarker(msg.payload))

The exception is very large fleets where different parts of the fleet are managed by different teams. In that case, per-sub-fleet topics can make access control simpler, with each team only having permission to subscribe to their own fleet's topic.

QoS Choices for Fleet Data

Not all fleet messages have the same delivery requirements. Getting QoS right is important both for reliability and for not overloading your infrastructure with unnecessary acknowledgment traffic.

QoS 0 (fire and forget) for position telemetry. GPS positions are high-frequency and time-sensitive. A position from 30 seconds ago has almost no value. If a packet drops, the next position update will arrive within seconds anyway. Using QoS 1 for position updates would mean storing and retrying thousands of messages per second, most of which would be stale by the time they are retried. QoS 0 is the right choice.

QoS 1 (at least once) for geofence alerts. When a vehicle breaches a geofence, that event needs to be delivered. Missing a geofence breach can mean a compliance violation or a missed theft alert. QoS 1 ensures the message is acknowledged and retried if the initial delivery fails.

// Geofence breach: use QoS 1
room.publish('alerts', {
  key: 'vehicle.v_abc123',
  payload: {
    type: 'geofence_breach',
    vehicleId: 'v_abc123',
    geofenceId: 'zone_restricted_01',
    breachType: 'exit',
    timestamp: Date.now(),
  },
  qos: 1, // at least once delivery
})

Separating alerts onto their own topic also lets dashboards manage two subscriptions with different characteristics: a high-volume QoS 0 feed for position updates, and a low-volume QoS 1 feed for alerts that demands attention.

Reconnection and Replay for Spotty Devices

Fleet devices lose connectivity constantly. A delivery truck going through a basement loading dock, a train in a tunnel, a vehicle switching between cellular providers in a rural area. The connection drops and eventually comes back.

For position data, reconnection is simple: the device reconnects and starts publishing again. No replay needed. The dashboard gap-fills naturally as new positions arrive.

For alert data, it is different. If a geofence breach happened while the device was offline, the device needs to publish that event when it reconnects. The device should buffer critical events locally and flush them on reconnect, publishing them with their original timestamps so the backend can process them in the correct order.

// Device-side: buffer critical events during outage
class EventBuffer {
  private queue: AlertEvent[] = []

  enqueue(event: AlertEvent) {
    this.queue.push(event)
    this.persistToStorage() // survive process restarts
  }

  async flush(room: ReturnType<typeof client.setApp(...).setRoom>) {
    while (this.queue.length > 0) {
      const event = this.queue[0]
      room.publish('alerts', {
        key: `vehicle.${event.vehicleId}`,
        payload: event,
        qos: 1,
      })
      this.queue.shift()
    }
  }
}

On the dashboard side, replay of missed alert messages can be handled by the message broker's retention window. If alerts are published with QoS 1 and a retention window configured, a dashboard that reconnects can request messages from the last known sequence number and receive anything it missed.

Scaling Considerations

At 100,000 devices publishing every 5 seconds, the numbers look like this: 20,000 inbound messages per second, each message fanout to potentially hundreds of dashboard subscriptions. If each dashboard is watching 50 vehicles out of 100,000, the filter match rate is 0.05%. Most messages are evaluated at the filter layer and dropped before reaching any subscriber. The infrastructure needs to do that evaluation fast and cheaply.

A few design choices help at this scale:

  • Keep payloads small. A GPS position needs only lat, lng, speed, heading, and a timestamp. That is under 100 bytes. Do not attach full vehicle records or lookup data to telemetry messages.
  • Use binary encoding for telemetry. MessagePack or a custom binary format cuts payload size by 30 to 50% compared to JSON. At 20,000 messages per second, that is meaningful bandwidth savings on your ingest pipeline.
  • Separate telemetry from metadata. Position updates and geofence alerts should be on different topics with different retention and QoS settings. Mixing them complicates scaling and retention policy management.
  • Limit dashboard subscription scope. A dashboard watching all 100,000 vehicles is the worst case for fanout. Design the UX to encourage focused views, for example by region or by job, so that each subscription filter list stays manageable.

How NoLag Approaches IoT Workloads

NoLag's infrastructure is designed for asymmetric fanout workloads like fleet tracking. The filter evaluation happens at the connection layer, before messages are transmitted, so subscribers only receive what they asked for. QoS levels are configurable per message, so you can mix fire-and-forget telemetry and at-least-once alerts on the same topic without any special configuration.

The reconnection and replay system handles dashboard reconnects transparently. A fleet manager who closes their laptop and comes back in the morning can pick up any alerts they missed during the retention window, without any custom replay logic in the application.

For device-side connectivity, the SDK handles exponential backoff reconnection automatically. Devices that lose signal and regain it will reconnect and resume publishing without any application-level retry code.