Scaling Policies and Decision Strategies
Auto Scaling behavior is defined by scaling policies.
There are three main strategies:
1. Target Tracking Scaling
You define a metric target (e.g., CPU at 50%). AWS automatically adjusts capacity to maintain that target.
This is the recommended approach for most systems.
Architecturally, this creates a self-balancing system that maintains steady performance.
2. Step Scaling
Scaling actions occur in steps depending on metric breach levels.
Example:
- CPU > 60% → add 1 instance
- CPU > 80% → add 3 instances
This gives fine-grained control but requires careful tuning.
3. Scheduled Scaling
Used for predictable traffic patterns.
Example:
- Scale to 10 instances at 8 AM
- Scale to 3 instances at midnight
Useful for business-hour workloads.
Cooldown and Warmup
Scaling is not instant. New instances need:
- Boot time
- Application startup
- Health check validation
If scaling decisions ignore warmup time, you risk oscillation (rapid scaling in and out).
Production insight: Always configure instance warmup and use load balancer health checks.
Architectural Implications
Improper scaling policies cause:
- Thrashing
- Increased costs
- Latency spikes
Well-designed policies:
- Protect user experience
- Maintain steady state
- Optimize cost
Scaling policy design is part of system reliability engineering.
Scaling Strategies
Question 1 of 2
Which scaling policy automatically maintains a metric at a defined target?
In this section, I learned:
0 of 4 completed