How can i use multiple load balancers in a system design architecture diagram ?

Using multiple load balancers (LBs) in a system design isn't just about handling more traffic; it’s about removing Single Points of Failure (SPOF) and organizing your architecture into manageable layers.

Here is how you can effectively integrate multiple load balancers into your design.

1. Multi-Tier Load Balancing (Vertical Scaling)

The most common way to use multiple LBs is to place them at different layers of the "stack." This separates external traffic management from internal service communication.

Tier 1: Global/External LB: This sits at the edge of your network (often a DNS-based LB like AWS Route 53 or a Content Delivery Network). It routes traffic to the nearest geographic data center.
Tier 2: Regional/External LB: Once traffic hits a data center, a Layer 4 (Transport) or Layer 7 (Application) LB distributes it across web servers.
Tier 3: Internal LB: Used for communication between internal microservices. For example, your Web Server layer might talk to a "Payment Service" through an internal load balancer to ensure the payment service itself is highly available.

2. High Availability Pair (Redundancy)

Placing a single load balancer in front of your cluster creates a bottleneck. If the LB crashes, the whole system goes down. To solve this, use a primary-secondary or active-active setup.

Active-Passive: One LB handles traffic while the other stays on standby. They share a "Floating IP." If the primary fails, the secondary takes over the IP.
Active-Active: Both LBs handle traffic simultaneously. This is often managed via DNS (Round Robin) or BGP (Border Gateway Protocol)

3. Specialized Load Balancing Types

When drawing your diagram, distinguish between these two based on where they sit in the OSI model:

Choosing between a Network Load Balancer (NLB) and an Application Load Balancer (ALB) depends on whether you need raw speed and protocol flexibility or intelligent, content-aware routing.

Quick Decision Guide

Choose a Network Load Balancer (Layer 4) if you need to handle millions of requests per second with ultra-low latency, or if you are using non-HTTP protocols like TCP, UDP, or TLS. It is ideal for gaming, IoT, and real-time streaming.
Choose an Application Load Balancer (Layer 7) if you are running modern web applications or microservices that require routing based on URL paths, host headers, or cookies. It is the standard choice for HTTP/HTTPS traffic.

If using multiple load balancers ,how to decide which request goes to which load balancer ?

When you have multiple load balancers, the decision of "who goes where" happens through a layered routing hierarchy. Think of it like a funnel: traffic starts broad and gets more specific as it moves deeper into your system.

1. The Entry Point: DNS & GSLB

The very first decision is made before the request even hits your servers. When a user types your URL, Global Server Load Balancing (GSLB) via DNS decides which regional load balancer to use.

Proximity/Latency: Routing the user to the data center physically closest to them (e.g., a London user goes to eu-west-1).
Health: If the US-East data center is down, DNS sends everyone to US-West.
Weighted Round Robin: Sending 10% of traffic to a new "Canary" deployment to test features.

2. The Network Layer: L4 Load Balancing

Once traffic hits a specific data center, a Network Load Balancer (NLB) often acts as the "front door." It doesn't look at the data inside the packet; it just looks at the IP address and Port.

Hash-based Routing: The NLB takes the Source IP and Port and runs a math function (hash) to pick a target. This ensures that a specific user stays connected to the same backend for that session (Sticky Sessions).
High Throughput: Since it doesn't "open" the packets, it can redirect millions of requests per second to a fleet of internal Application Load Balancers.

3. The Application Layer: L7 Load Balancing

This is where the "smart" decisions happen. An Application Load Balancer (ALB) opens the HTTP packet and makes decisions based on the content.

Path-based Routing: example.com/api goes to the API Cluster, while example.com/images goes to the Image Bucket.
Host-based Routing: app.example.com goes to one fleet, while blog.example.com goes to another.
Header/Cookie Routing: If a request has a cookie user_tier=premium, the ALB can route it to high-priority, faster servers

My Tech Learnings

Saturday, March 7, 2026

System Design : Load Balancers