Modern web applications increasingly rely on real-time communication features, such as chat systems, live notifications, collaborative editing, and telemetry dashboards. WebSocket technology is the primary standard for achieving low-latency, full-duplex communication between client and server. However, building a scalable and resilient WebSocket-based system presents architectural challenges — particularly when it comes to isolating business logic from connection management.
This document outlines a reference architecture designed to separate WebSocket handling from business logic, ensuring that deployments, crashes, or logic updates do not disrupt client connections. The design supports scalability, maintainability, and fault isolation.
Why design a dedicated WebSocket gateway architecture instead of embedding real-time logic directly into your monolith or core services? The primary motivation is to solve the most common and costly issues encountered in production WebSocket systems:
Connection Fragility: If business logic resides in the same service that handles WebSocket connections, a crash or deployment can instantly sever all active connections.
Memory Leaks and Stability: A memory leak in the business logic layer can bring down a WebSocket server handling hundreds or thousands of concurrent clients.
Deployment Agility: Developers need to deploy new versions of business logic frequently. If connections are coupled to this layer, every deployment risks affecting end-user experience.
Scalability: Applications grow in usage. A tightly coupled WebSocket and logic layer cannot scale independently, limiting flexibility in resource allocation.
By clearly separating responsibilities — connection handling, messaging, and business logic — this architecture enables safer deployments, horizontal scalability, and more resilient systems.
A good real-time architecture must consider the entire lifecycle of a message: from client connection to delivery, processing, and response. Each of the components below plays a critical role in this flow.
The load balancer serves as the public entry point, routing traffic from clients to the WebSocket gateway layer. It must support WebSocket upgrades and, if necessary, provide session stickiness.
Why it matters: Without a load balancer, connections cannot be distributed across gateway replicas. In setups without external state, sticky sessions ensure clients are routed to the same gateway each time.
This is the heart of the real-time communication system. It manages WebSocket connections, authentication, message routing, and client presence. Importantly, it avoids doing any business-specific computation.
Why decouple logic: This separation means you can restart or scale business logic independently of connected users. Your WebSocket connections remain alive even when you deploy new backend features.
Gateways need to track which users are connected and what channels they're subscribed to. Storing this data in a shared, fast-access store like Redis allows the gateways to remain stateless and enables load-balanced setups.
Why Redis: It allows you to failover between replicas, reassign clients, and perform presence tracking efficiently. This is crucial when operating multiple gateway replicas.
The message bus decouples the gateway from backend services, enabling asynchronous processing and scalability. It acts as a real-time message router that handles fan-out, ordering, and retries (if needed).
Why not direct HTTP only: While HTTP works for basic commands or low-scale setups, a message bus is critical for queueing, throughput control, and decoupling services under high concurrency.
Stateless microservices or functions handle specific business operations like chat processing, analytics, alerts, or notifications. These services subscribe to relevant events and emit responses without managing connection state.
Why stateless: Stateless services simplify scaling, resilience, and observability. If a worker crashes, another can take over seamlessly.
Below is a visual representation of the architecture's communication path:
Client ↔ Load Balancer ↔ WebSocket Gateway ↔ Message Bus/HTTP ↔ Business Logic
         ↓                                 ↑
        Redis (ephemeral connection mapping)
Clients connect via WebSocket and authenticate at the gateway.
Gateway stores connection metadata in Redis.
Messages are published to a message bus or sent over HTTP.
Backend services consume messages, process them, and respond via the same channel.
Gateway delivers response back to the correct client.
This flow ensures flexibility and robust fault isolation at each step.
Scaling out each layer independently allows for better resource utilization and operational control. Below are guidelines for deploying each component:
Load Balancer:
Must support connection upgrades.
Consider session stickiness if skipping Redis.
Gateway:
Stateless by design.
Horizontally scalable based on connection count.
Redis:
Clustered setup preferred for high availability.
Monitor memory usage, TTLs, and eviction policies.
Message Bus:
Choose based on delivery guarantees and performance needs.
NATS and Redis for lightweight needs, Kafka for durability.
Backend Services:
Autoscale based on queue depth or system load.
Stateless containers/functions simplify orchestration.
No system design is perfect. Here's a breakdown of the main trade-offs this architecture introduces:
Trade-off  | Benefit  | Cost  | 
Gateway separate from logic  | Resilience and deployability  | Added network hop, requires bus integration  | 
Stateless gateways + Redis  | Load-balanced scaling and failover  | Redis becomes critical dependency  | 
Message Bus  | Loose coupling and async processing  | Slight increase in latency  | 
Decoupled business services  | Independent deployments  | Harder to trace full message lifecycle  | 
Horizontal scaling  | Performance under load  | Requires good observability setup  | 
Choosing where to invest depends on your traffic patterns, developer velocity, and fault tolerance requirements.
This architecture is designed to degrade gracefully. If a backend worker crashes, another takes its place. If a gateway crashes, another replica can serve the client. Redis maintains the state needed to enable this smooth recovery.
No single point of failure: All core components can run in high-availability configurations.
Zero-downtime deployments: Push new code to backend services without interrupting live connections.
Client reconnections: If clients reconnect, Redis provides the context for restoring sessions or resubscribing to channels.
WebSocket communication requires additional care:
Use TLS end-to-end to prevent message sniffing or tampering.
Employ JWT-based auth to validate user identity at connection time.
Apply rate limits at the gateway level to prevent abuse or DDoS.
Optionally, sign messages exchanged between services to protect against spoofing.
Layer  | Options  | 
Load Balancer  | AWS ALB, NGINX, HAProxy  | 
Gateway  | FastAPI (Python), , Centrifugo  | 
State Store  | Redis  | 
Message Bus  | NATS, Kafka, Redis Pub/Sub  | 
Business Logic  | Microservices (any language)  | 
This architecture can support a wide range of real-time systems:
Chat/Messaging: Room-based messaging, typing indicators, message delivery status.
Collaborative Editing: Broadcasting changes in shared documents (e.g., text, whiteboards).
Live Dashboards: Real-time event data visualizations and alerting.
IoT Telemetry: Aggregating data from connected devices and forwarding it for processing.
This reference architecture provides a blueprint for building robust, scalable, and decoupled WebSocket systems suitable for modern real-time applications. By separating connection handling from business logic and introducing a message bus as an intermediary, the architecture supports zero-downtime deployments, fault isolation, and dynamic scaling. Redis further reinforces resilience by externalizing state.
Although initial setup may appear more complex than a monolithic WebSocket app, the operational benefits become clear at scale. Teams can iterate faster on business logic without fear of impacting connections, and infrastructure teams can scale WebSocket handling independently of application development.
This architecture is flexible enough to accommodate different technologies and can be adjusted based on performance, security, or simplicity requirements. Whether starting with a basic stack (FastAPI + Redis + HTTP) or evolving into a more distributed model (Centrifugo + NATS + microservices), the separation of concerns remains a key principle that ensures longevity and maintainability of the system.
                       ┌────────────────────────────┐
                       │        Load Balancer       │
                       └────────────┬───────────────┘
                                    │
             ┌──────────────────────┴───────────────────────┐
             │                                              │
     ┌───────▼────────┐                            ┌────────▼────────┐
     │ WebSocket GW #1│                            │ WebSocket GW #N │
     └───────┬────────┘                            └────────┬────────┘
             │                                              │
      ┌──────▼───────┐                               ┌──────▼───────┐
      │    Redis     │◄─────────────────────────────►│    Redis     │
      └──────┬───────┘                               └──────────────┘
             │
     ┌───────▼────────────────────┐
     │        Message Bus         │
     └───────┬──────────┬─────────┘
             │          │
   ┌─────────▼──┐    ┌──▼────────┐
   │ Business A │    │ Business B│
   └────────────┘    └───────────┘
Each component can be scaled and deployed independently, creating a system that is resilient to failures and designed for modern real-time needs!