The four load balancer types
Application Load Balancer (ALB) operates at Layer 7, meaning it understands HTTP and HTTPS. This lets it make routing decisions based on the content of the request: path-based routing sends /api/* to one target group and /static/* to another, host-based routing sends api.example.com and app.example.com to different backends. ALB also supports WebSocket connections and can authenticate users with Cognito or OIDC. ALB is the default choice for web applications.
Network Load Balancer (NLB) operates at Layer 4, handling TCP, UDP, and TLS traffic. It does not inspect the content of packets and routes based on protocol and port only. NLB handles millions of requests per second with ultra-low latency and preserves the source IP address of the client. Use NLB when you need extreme performance, static IP addresses per AZ, or need to handle non-HTTP protocols.
Gateway Load Balancer (GWLB) is a specialized type that routes traffic through third-party virtual appliances like firewalls, intrusion detection systems, and deep packet inspection tools. Traffic goes to GWLB, gets forwarded to the appliance fleet, comes back through GWLB, then continues to the destination. Gateway Load Balancer Endpoints connect VPCs to the appliance VPC. Classic Load Balancer is the original and legacy type that you should recognize by name but avoid in new designs.
Auto Scaling Groups and scaling policies
An Auto Scaling Group (ASG) defines a fleet of EC2 instances: a minimum count, a maximum count, and a desired count. The group uses a launch template to know what kind of instance to create and how to configure it. Instances are spread across Availability Zones automatically. When an instance fails a health check, the ASG terminates it and launches a replacement without manual intervention.
Scaling policies define when and how the ASG adjusts capacity. Target tracking scaling is the simplest: you define a target metric value, such as average CPU utilization at 50%, and the ASG adds or removes instances to stay near that target. Step scaling adjusts capacity by different amounts depending on how far outside the target the metric has gone. Scheduled scaling adds instances before a known traffic event, like scaling up before a sale that starts at noon.
Predictive scaling uses machine learning to analyze historical traffic patterns and pre-scale before demand arrives. Instead of reacting after CPU spikes, it provisions instances before the spike based on past behavior. Cooldown periods prevent the ASG from adding or removing instances too rapidly in response to transient spikes. The termination policy controls which instance the ASG removes first when scaling in: the default terminates the oldest launch configuration, then the instance closest to the next billing hour.
How to choose the correct answer
ALB: HTTP/HTTPS, Layer 7, path-based and host-based routing, WebSocket, user authentication. Default for web apps.
NLB: TCP/UDP/TLS, Layer 4, ultra-low latency, static IPs per AZ, preserves source IP. Choose for high performance or non-HTTP workloads.
GWLB: routes traffic through third-party network appliances. Use for inline inspection with firewall or IDS fleets.
Target tracking: simplest scaling policy, maintain a target metric value automatically.
Step scaling: scale by different increments based on severity of the metric breach.
Scheduled scaling: scale for predictable events. Predictive scaling: scale based on ML forecast.
ALB target groups can be: EC2 instances, IP addresses, Lambda functions, or other ALBs.