Q: AWS Resilience Hub dùng để làm gì?

AWS Resilience Hub assess và improve resiliency của ứng dụng. Import application từ CloudFormation/Terraform/AppRegistry → define RTO/RPO targets → Hub analyze architecture → recommend gaps → generate resiliency score. Cũng có thể chạy resiliency drills (chaos engineering automated). Giúp teams biết application có đáp ứng DR objectives không — thay vì chỉ biết khi disaster thực sự xảy ra.

Q: Khi nào nên dùng Backup & Restore strategy?

Khi: (1) RTO/RPO requirements thấp (hours acceptable), (2) Budget hạn chế — không justify chạy infrastructure ở DR region liên tục, (3) Non-critical workloads — dev/test, batch processing, archives, (4) Data đã được backup đầy đủ và được test restore regularly, (5) Compliance requires offsite backup nhưng không yêu cầu fast failover. Không phù hợp cho production systems yêu cầu RTO < 1 giờ.

Question 1

4 DR strategies từ rẻ nhất → đắt nhất là gì?

Accepted Answer

(1) Backup & Restore: rẻ nhất — backup data sang DR Region, restore khi disaster. RTO giờ-ngày. (2) Pilot Light: core components (DB replication) running, compute off — scale up khi cần. RTO phút-giờ. (3) Warm Standby: scaled-down production running — scale up khi failover. RTO phút. (4) Multi-Site Active-Active: đắt nhất — cả hai regions full capacity, traffic routed theo tỷ lệ. RTO ~0.

Question 2

Pilot Light có resources running ở DR region không?

Accepted Answer

Có — nhưng chỉ core components: database replication running (RDS cross-region replica, DynamoDB Global Tables). Compute (EC2, ECS) tắt hoặc scaled to minimum. Khi disaster: (1) Promote DB replica, (2) Scale up/launch compute, (3) Update DNS. Tên "Pilot Light" từ gas heater — flame nhỏ luôn cháy sẵn sàng đốt lên full. Rẻ hơn Warm Standby vì compute không running.

Question 3

Multi-Site Active-Active RTO khoảng bao nhiêu?

Accepted Answer

Gần 0 (zero downtime) — cả hai Regions đang serve traffic. Khi một Region fail, Route 53 hoặc Global Accelerator tự động route 100% traffic sang Region còn lại — không cần manual intervention. RTO thực tế phụ thuộc vào DNS TTL và health check intervals (thường < 1 phút). RPO cũng gần 0 nếu data replicated sync (Aurora Global Database) hoặc minutes nếu async.

Question 4

AWS Resilience Hub dùng để làm gì?

Accepted Answer

AWS Resilience Hub assess và improve resiliency của ứng dụng. Import application từ CloudFormation/Terraform/AppRegistry → define RTO/RPO targets → Hub analyze architecture → recommend gaps → generate resiliency score. Cũng có thể chạy resiliency drills (chaos engineering automated). Giúp teams biết application có đáp ứng DR objectives không — thay vì chỉ biết khi disaster thực sự xảy ra.

Question 5

Khi nào nên dùng Backup & Restore strategy?

Accepted Answer

Khi: (1) RTO/RPO requirements thấp (hours acceptable), (2) Budget hạn chế — không justify chạy infrastructure ở DR region liên tục, (3) Non-critical workloads — dev/test, batch processing, archives, (4) Data đã được backup đầy đủ và được test restore regularly, (5) Compliance requires offsite backup nhưng không yêu cầu fast failover. Không phù hợp cho production systems yêu cầu RTO < 1 giờ.

Strategy	RTO	RPO	Cost	Complexity
Backup & Restore	Hours-days	Hours	$	Low
Pilot Light	10 min-1h	Minutes	$$	Medium
Warm Standby	< 30 min	Minutes	$$$	Medium-High
Multi-Site A/A	Seconds	Near-zero	$$$$	High