Load Balancing Fundamentals
A load balancer sits in front of a server pool and distributes client requests across the pool members. From the client's perspective, they are communicating with a single virtual IP (VIP); the load balancer transparently forwards requests to back-end servers. When a server fails, the load balancer detects the failure (via health checks) and removes it from rotation — remaining servers absorb its traffic.
Layer 4 load balancing: distributes traffic based on TCP/UDP headers (IP address and port) without inspecting application content. Fast and efficient. Cannot make content-based decisions. Layer 7 load balancing: inspects HTTP/HTTPS content — can route requests based on URL path, cookies, headers, or host name. Enables content switching: /images/* to image servers, /api/* to API servers.
Load Balancing Algorithms
Round-robin: requests distributed sequentially across servers. Simple, equal distribution assuming servers have equal capacity and requests take equal time. Weighted round-robin: same as round-robin but servers with higher weight receive proportionally more requests — accommodates servers with different capacities.
Least connections: new requests sent to the server with the fewest active connections. Better for sessions with variable duration. Weighted least connections accounts for server capacity differences. IP hash: the client's source IP determines which server receives the request — the same client always goes to the same server (provides simple persistence without tracking state). Random: requests assigned randomly — simple but potentially uneven.
Session Persistence and Health Checks
Session persistence (sticky sessions): ensures a client's requests always reach the same back-end server during a session. Important for stateful applications that store session data locally on the server. Methods: source IP affinity, cookie-based persistence (load balancer inserts a cookie identifying the server). Without persistence, a user could be redirected to a different server mid-session and lose their state.
Health checks: the load balancer periodically tests each server's availability. Types: ICMP ping (basic — is the server alive?), TCP connection check (is the port open?), HTTP/HTTPS GET request (is the application responding correctly?). Servers failing health checks are removed from rotation. Servers recovering are added back. Active-passive load balancing: one server is primary, standby activates only on failure — provides failover, not load distribution.