AWSAWS SAA-C03

AWS Elastic Load Balancing and Auto Scaling Explained for AWS SAA-C03

Traffic to a web application behaves like highway traffic. Most of the time the road is clear. At predictable peaks, it gets busy. Occasionally an unexpected event floods it. Running enough servers to handle the worst-case peak all the time is expensive and wasteful during quiet periods. Running only enough for the average load means users suffer when traffic spikes. Elastic Load Balancing distributes incoming requests across however many instances are running. Auto Scaling adjusts how many instances are running to match actual demand. Together, they let you pay for what you use while keeping the application responsive under any traffic pattern. The SAA-C03 exam tests which load balancer type fits which scenario and how scaling policies work.

8 min
3 sections · 7 exam key points

The four load balancer types

Application Load Balancer (ALB) operates at Layer 7, meaning it understands HTTP and HTTPS. This lets it make routing decisions based on the content of the request: path-based routing sends /api/* to one target group and /static/* to another, host-based routing sends api.example.com and app.example.com to different backends. ALB also supports WebSocket connections and can authenticate users with Cognito or OIDC. ALB is the default choice for web applications.

Network Load Balancer (NLB) operates at Layer 4, handling TCP, UDP, and TLS traffic. It does not inspect the content of packets and routes based on protocol and port only. NLB handles millions of requests per second with ultra-low latency and preserves the source IP address of the client. Use NLB when you need extreme performance, static IP addresses per AZ, or need to handle non-HTTP protocols.

Gateway Load Balancer (GWLB) is a specialized type that routes traffic through third-party virtual appliances like firewalls, intrusion detection systems, and deep packet inspection tools. Traffic goes to GWLB, gets forwarded to the appliance fleet, comes back through GWLB, then continues to the destination. Gateway Load Balancer Endpoints connect VPCs to the appliance VPC. Classic Load Balancer is the original and legacy type that you should recognize by name but avoid in new designs.

Auto Scaling Groups and scaling policies

An Auto Scaling Group (ASG) defines a fleet of EC2 instances: a minimum count, a maximum count, and a desired count. The group uses a launch template to know what kind of instance to create and how to configure it. Instances are spread across Availability Zones automatically. When an instance fails a health check, the ASG terminates it and launches a replacement without manual intervention.

Scaling policies define when and how the ASG adjusts capacity. Target tracking scaling is the simplest: you define a target metric value, such as average CPU utilization at 50%, and the ASG adds or removes instances to stay near that target. Step scaling adjusts capacity by different amounts depending on how far outside the target the metric has gone. Scheduled scaling adds instances before a known traffic event, like scaling up before a sale that starts at noon.

Predictive scaling uses machine learning to analyze historical traffic patterns and pre-scale before demand arrives. Instead of reacting after CPU spikes, it provisions instances before the spike based on past behavior. Cooldown periods prevent the ASG from adding or removing instances too rapidly in response to transient spikes. The termination policy controls which instance the ASG removes first when scaling in: the default terminates the oldest launch configuration, then the instance closest to the next billing hour.

How to choose the correct answer

ALB: HTTP/HTTPS, Layer 7, path-based and host-based routing, WebSocket, user authentication. Default for web apps.

NLB: TCP/UDP/TLS, Layer 4, ultra-low latency, static IPs per AZ, preserves source IP. Choose for high performance or non-HTTP workloads.

GWLB: routes traffic through third-party network appliances. Use for inline inspection with firewall or IDS fleets.

Target tracking: simplest scaling policy, maintain a target metric value automatically.

Step scaling: scale by different increments based on severity of the metric breach.

Scheduled scaling: scale for predictable events. Predictive scaling: scale based on ML forecast.

ALB target groups can be: EC2 instances, IP addresses, Lambda functions, or other ALBs.

AWS load balancer comparison

TypeLayerProtocolsKey featureBest for
ALB7 (Application)HTTP, HTTPS, WebSocketPath/host-based routing, authWeb apps, microservices, APIs
NLB4 (Transport)TCP, UDP, TLSUltra-low latency, static IPsHigh performance, non-HTTP protocols
GWLB3 (Network)IP (all protocols)Inline third-party appliancesFirewalls, IDS/IPS, packet inspection
CLB4 and 7HTTP, HTTPS, TCP, SSLLegacy onlyNot recommended for new deployments

Key exam facts — AWS SAA-C03

  • ALB: Layer 7, HTTP/HTTPS, path-based routing, host-based routing, Lambda targets, authentication.
  • NLB: Layer 4, TCP/UDP/TLS, ultra-low latency, static Elastic IPs per AZ, source IP preservation.
  • GWLB: inline appliance insertion using GENEVE protocol and Gateway Load Balancer Endpoints.
  • ASG minimum, desired, and maximum: floor, current target, ceiling for instance count.
  • Target tracking scaling: simplest, AWS manages scaling actions to hit the target metric.
  • Cooldown period: prevents rapid back-to-back scaling actions during transient spikes.
  • Health checks: ALB health checks replace EC2 status checks for determining instance health in an ASG.

Common exam traps

You can use an ALB to load balance any TCP application.

ALB only understands HTTP, HTTPS, and WebSocket protocols. For raw TCP, UDP, or non-HTTP workloads, you need an NLB. Trying to use an ALB for a MySQL connection or a custom TCP application will not work because ALB has no protocol handler for those traffic types.

Setting the ASG desired capacity to the maximum prevents unexpected scaling.

Setting desired to maximum removes the ability to scale out, which defeats the purpose of an Auto Scaling Group. The desired capacity is the current target, not a cap. The maximum is the upper boundary the ASG will never exceed during scale-out. Proper design sets minimum to the lowest acceptable instance count, maximum to the highest you can afford, and lets scaling policies manage desired dynamically.

Cross-zone load balancing is always enabled and distributes traffic evenly by default.

Cross-zone load balancing behavior differs by load balancer type. For ALB it is always on. For NLB and GWLB it is disabled by default and can be enabled. Without cross-zone load balancing, each load balancer node distributes traffic only among instances in its own Availability Zone, which can cause uneven distribution if zones have different instance counts.

Practice this topic

Test yourself on ELB & Auto Scaling

JT Exams routes you to questions in your exact weak areas — automatically, after every session.

No credit card · Cancel anytime

Related certification topics