Deployment and Provisioning
SysOps covers deploying and updating AWS infrastructure reliably. CloudFormation is the primary IaC tool: stacks group related resources, stack updates use Change Sets to preview impact before applying, drift detection identifies manual changes that diverged from the template. Nested stacks modularise large templates. Stack policies prevent accidental updates to critical resources. AWS Systems Manager (SSM): Patch Manager automates OS patching across EC2 fleets — defines patch baselines, patch groups (tagged instances), and maintenance windows. SSM Run Command executes scripts on managed instances without SSH/RDP (agentless — just SSM agent installed). Session Manager provides shell access through SSM without opening inbound ports — better than bastion hosts for security. EC2 Image Builder automates the creation, testing, and distribution of hardened AMIs on a schedule.
High Availability and Auto Scaling
Building HA architectures is a core SysOps skill. Auto Scaling Groups (ASG): define launch template (AMI, instance type, security groups, IAM role), set min/max/desired capacity, attach to target group for load balancer registration. Scaling policies: Target Tracking (maintain a metric at a target — e.g., 60% CPU — simplest), Step Scaling (different scale-out amounts at different thresholds), Scheduled Scaling (predictable load patterns — scale up before business hours). Lifecycle hooks: pause instance before it enters InService or before termination — allows custom scripts (drain connections, update service registry). Health checks: ELB health checks (HTTP request to target — application-aware) vs EC2 health checks (instance status checks — lower level). If ELB health check fails, ASG replaces the instance. Multi-AZ deployments: spread instances across AZs, use ALB or NLB as the single entry point. RDS Multi-AZ: synchronous replication to standby in another AZ — automatic failover in under 2 minutes.
CloudWatch: Monitoring, Alarms, and Dashboards
CloudWatch is the central monitoring service for AWS. Metrics: default metrics (CPU, NetworkIn/Out, DiskReadOps — no memory or disk usage by default), custom metrics (push from CloudWatch agent — includes memory, disk, application metrics). Metric resolution: standard (1-minute, free), high-resolution (1-second, additional cost). Alarms: trigger on metric thresholds, states are OK/ALARM/INSUFFICIENT_DATA — actions include Auto Scaling policy, SNS notification, EC2 action (stop/start/reboot). Composite alarms reduce alarm noise by combining multiple alarms with AND/OR logic. CloudWatch Logs: ingest application and service logs — Log Groups contain Log Streams, Log Insights queries provide fast analysis. Metric filters extract numeric data from log entries to create custom CloudWatch metrics. CloudWatch Agent: install on EC2 for custom OS metrics and log shipping — configure via SSM Parameter Store. CloudWatch Synthetics: canary scripts test API endpoints and web workflows on a schedule — detect failures before users do.
Cost Optimisation and Billing Management
SysOps administrators are responsible for cost. Cost Explorer: visualise spending by service, region, account, and tags — identify anomalies and forecast future costs. Budgets: set spending or usage thresholds, receive alerts before limits are hit. Savings Plans: commit to a consistent usage amount ($/hour) for 1 or 3 years — 40-70% savings vs On-Demand. Reserved Instances: commit to specific instance type in a region — Standard RIs (highest discount, least flexible), Convertible RIs (can change instance family, flexible). Spot Instances: unused EC2 capacity at 50-90% discount — interrupted with 2-minute warning when capacity is reclaimed — suitable for batch processing, CI/CD, fault-tolerant workloads. Cost Allocation Tags: activate in billing console, apply to resources — enables per-project/team/environment cost reporting. Trusted Advisor: checks for idle resources, over-provisioned instances, unattached EIPs, and S3 buckets with public access.
Storage and Database Administration
S3 operations for SysOps: bucket versioning (retains all versions of objects — protect against accidental deletion), MFA Delete (requires MFA to permanently delete versioned objects), Cross-Region Replication (CRR — replicate to another region for DR or latency), Lifecycle policies (transition objects to cheaper storage classes: S3 Standard > Intelligent-Tiering > Standard-IA > Glacier > Glacier Deep Archive based on age). S3 Storage Lens: organisation-wide visibility into storage usage and activity patterns. EBS: volume types (gp3 general purpose SSD — baseline 3,000 IOPS, provisioned up to 16,000; io2 Block Express — up to 256,000 IOPS for database workloads; st1 throughput HDD — big data streaming; sc1 cold HDD — infrequent access). EBS snapshots are incremental and stored in S3 — Data Lifecycle Manager (DLM) automates snapshot schedules and retention. RDS: automated backups (7-35 day retention, point-in-time recovery), manual snapshots (retained until you delete them), Read Replicas (async replication, read scalability — promote to standalone for DR).