RPO and RTO
RPO (Recovery Point Objective): the maximum acceptable amount of data loss measured in time — how far back in time recovery must go. RPO = 4 hours means the organization can tolerate losing up to 4 hours of data. Drives backup frequency: to achieve 4-hour RPO, backups must run at least every 4 hours. Lower RPO requires more frequent backups and replication.
RTO (Recovery Time Objective): the maximum acceptable time to restore service after a disaster. RTO = 2 hours means services must be restored within 2 hours of a disaster declaration. Drives site and technology choices — a 2-hour RTO might allow a warm site; a 15-minute RTO requires a hot site or clustering. Lower RTO requires more investment in redundancy and automation.
DR Site Types
Hot site: a fully operational duplicate of the production environment — equipment running, data synchronized (near real-time replication). Failover can occur in minutes. Most expensive option. Used for mission-critical systems where downtime is catastrophically costly.
Warm site: partially equipped site with hardware and connectivity but not fully synchronized data. Hours to bring online — restore from recent backup, update configurations. Balance of cost and recovery speed. Most common for general enterprise DR.
Cold site: a physical space with power, cooling, and connectivity but no equipment. Equipment must be acquired, delivered, and configured during disaster recovery. Days to weeks to bring online. Cheapest option — appropriate for non-critical systems or low RTO tolerance.
Cloud DR: using cloud services as a DR target. Advantages: pay-per-use (no idle hardware cost), geographic redundancy, rapid scaling. Cloud DR services replicate on-premises workloads to cloud VMs that can be started within minutes. DR as a Service (DRaaS) provides fully managed cloud DR.
Backup Strategies
Full backup: backs up all selected data every time. Slowest to back up, fastest to restore. Differential backup: backs up all data changed since the last full backup. Faster than full; restore requires only the last full + last differential. Incremental backup: backs up only data changed since the last backup of any type. Fastest to back up; slowest to restore (requires full + all incrementals since).
3-2-1 backup rule: keep 3 copies of data, on 2 different media types, with 1 copy offsite. Protects against hardware failure (multiple copies), media failure (different types), and site disaster (offsite copy). Configuration backups for network devices follow the same principles — store configs in a version-controlled repository offsite.