Tuần 9 - Ngày 3: Backup và Restore Patterns
Mục tiêu học tập
- Apply AWS Backup cho centralized backup strategy
- Hiểu service-specific backup options
- Nắm restore procedures và test
1. Backup Strategy Tiers
Tier 1: Service-native backup
- Mỗi service có backup tự native (RDS snapshots, EBS snapshots)
- Configure per service
- Manual or automatic
Tier 2: AWS Backup (centralized)
- Single service quản lý backup cho mọi service supported
- Backup plans, policies, vault management
- (Chi tiết Tuần 3 Day 6)
Tier 3: Third-party tools
- Veeam, Commvault, Cohesity
- Integration với AWS
2. Backup Components Review
Backup Plan
- Schedule (cron)
- Lifecycle (warm + cold transitions)
- Cross-region copy
Backup Vault
- Storage container
- KMS encryption
- Access policy
- Vault Lock (immutable WORM)
Backup Job
- Instance of backup execution
Restore Job
- Restore from recovery point to new resource
3. Service-Specific Backup Options
EBS Snapshots
- Incremental backup to S3
- Cross-region copy
- Cross-account share (with KMS key share)
- DLM (Data Lifecycle Manager) automation
RDS Snapshots
- Automated (daily, retention 0-35 days, PITR)
- Manual (indefinite retention)
- Cross-region/cross-account copy
DynamoDB Backups
- PITR (35 days continuous backup)
- On-demand backup (indefinite)
- AWS Backup integration
S3
- Versioning (track changes, recover from accidental delete)
- Replication (SRR, CRR)
- Object Lock (WORM)
Aurora
- Continuous backup to S3 (1-35 days PITR)
- Snapshots
- Cross-region snapshots
EFS
- Backup automatic via AWS Backup
- Per-file restore
4. Backup Strategy Best Practices
3-2-1 Rule
- 3 copies of data
- 2 on different storage media
- 1 offsite (cross-region)
AWS implementation
- Production: live in S3 + EBS + RDS
- Backup vault: AWS Backup vault same region
- DR copy: AWS Backup vault cross-region
Test restore regularly
- "Untested backup = no backup"
- Quarterly restore test
- Document time taken (validate RTO)
Vault Lock
- Compliance Mode (immutable, even root)
- Use for compliance archives
- Ransomware protection
5. Cross-Region Backup
Setup AWS Backup CRR
- Backup Plan with copy action
- Destination vault in DR region
- Lifecycle for cross-region copy
- Separate KMS keys per region
Cost
- Cross-region data transfer charge
- Storage cost in both regions
Use case
- DR (region-wide failure)
- Compliance (data in multiple geos)
6. Cross-Account Backup
Use case
- Ransomware protection (attacker in prod can't delete backups in security account)
- Compliance (separation of duties)
Setup (with Organizations)
- Enable AWS Backup at organization level
- Backup plan in management/delegated account
- Backup copies to central vault account
Vault in central account
- Vault Lock Compliance Mode
- Only backup admin role can manage
- Prod account compromise doesn't lose backups
7. Database Specific
RDS PITR (Point-in-Time Recovery)
- Restore to any second within retention (default 7 days)
- Restore creates NEW DB instance (cannot restore in-place)
- New endpoint → update app config
Aurora Backtrack (MySQL)
- "Rewind" cluster to previous timestamp (up to 72 hours)
- In-place (no new cluster)
- Faster than PITR
DynamoDB PITR
- Last 35 days
- Restore to new table
Choice
- PITR: standard, granular (any second)
- Backtrack: Aurora only, faster but only 72 hours
- Manual snapshot: long-term retention, after major changes
8. EBS Snapshot Patterns
DLM (Data Lifecycle Manager)
DLM Policy:
Target: tag Environment=production
Schedule:
- Frequency: daily at 03:00 UTC
Retention: 7 snapshots
- Frequency: weekly Monday 04:00 UTC
Retention: 4 weeks
- Frequency: monthly 1st 05:00 UTC
Retention: 12 months
Copy across regions: us-west-2
Encryption: customer-managed KMS key
Tags-based selection
- Tag instances/volumes with
Backup=trueorTier=production - DLM picks up automatically
- No manual maintenance
9. S3 as Backup Destination
Use case
- On-prem backup to S3 (replacing tape)
- App-level backup logs/files
- Database export
Tools
- AWS Storage Gateway (Tape Gateway): emulate tape library
- AWS Backup (with native sources)
- AWS DataSync (file sync)
- Custom scripts (aws s3 sync)
Storage class tiering
- Recent: S3 Standard or Standard-IA
- Older: Glacier Instant Retrieval
- Archive: Glacier Deep Archive
Cost optimization
- Lifecycle rules transition automatically
- Use S3 Object Lock Compliance Mode for immutability
10. Restore Best Practices
Document procedures
- Step-by-step runbook for each restore scenario
- Include who has authority, dependencies
Test regularly
- Quarterly: simulate failure, restore from backup
- Measure: actual RTO/RPO vs target
Restore environment
- Often restore to isolated environment first (verify)
- Then move to production once validated
Communication
- Document stakeholder notification process
- Status updates during recovery
11. Backup Cost Optimization
Strategies
- Lifecycle to cold storage for old backups
- Right-size retention (don't keep forever unnecessarily)
- Skip dev/test backups (or shorter retention)
- Compress before backup if possible
- Deduplication (Storage Gateway, some tools)
- Avoid multiple backup tools (consolidate to AWS Backup)
Cost components
- Storage in S3/Glacier ($)
- Cross-region transfer ($)
- API calls (small)
- KMS API calls (encryption)
12. Common Patterns
Pattern 1: SaaS app
- AWS Backup daily for EC2, RDS, EFS
- Lifecycle to cold storage after 30 days
- Cross-region copy for DR
- Vault Lock Governance for production
Pattern 2: Compliance archive
- AWS Backup with Compliance Mode Vault Lock
- 7-year retention
- Cross-region for redundancy
- Backup Audit Manager reports
Pattern 3: Ransomware protection
- Cross-account backup to security account
- Vault Lock Compliance Mode
- Production can backup TO but not delete FROM
Pattern 4: Critical DB
- RDS Multi-AZ (HA)
- Automated backups + PITR (35 days)
- Cross-region snapshot copy (DR)
- Aurora Global Database for fast recovery
Câu hỏi ôn tập
-
3-2-1 backup rule là gì?
Xem đáp án
3 copies của data, trên 2 loại media khác nhau, với 1 copy offsite. Trong AWS context: 3 copies (S3 tự replicate 3+, EBS snapshot), 2 media types (EBS + S3, hoặc S3 Standard + Glacier), 1 offsite (cross-Region backup). Nguyên tắc này đảm bảo: single hardware failure, site disaster, hoặc ransomware không xóa tất cả copies cùng lúc.
-
RDS PITR cho phép restore in-place không?
Xem đáp án
Không — RDS Point-in-Time Recovery luôn tạo new DB instance. Không thể overwrite existing instance. Sau restore, cần update connection string trong application hoặc rename instances. PITR granularity: 1 giây (cho MySQL/Aurora), cần enabled automated backups, retention 1-35 ngày. Đây là safety feature — ngăn overwrite production DB khi restore.
-
Aurora Backtrack max là bao lâu?
Xem đáp án
72 giờ (3 ngày) maximum backtrack window cho Aurora MySQL. Backtrack "rewind" cluster đến any point trong window — không cần restore snapshot (seconds to minutes, không cần new instance). Điểm khác biệt với PITR: Backtrack thực sự rewind in-place, rất nhanh. PITR tạo new instance. Backtrack chỉ cho Aurora MySQL, không có cho PostgreSQL.
-
Vault Lock Compliance Mode khác Governance Mode?
Xem đáp án
Compliance Mode: immutable — không ai (root, AWS Support) thể delete hay modify vault lock sau khi confirmed, cho đến khi retention period hết. Phục vụ SEC Rule 17a-4(f), CFTC, FINRA requirements. Governance Mode: administrators với đặc quyền có thể override lock. Governance phù hợp khi cần operational flexibility; Compliance cho regulatory mandates không cho exception.
-
Cross-account backup giúp bảo vệ chống gì?
Xem đáp án
Chủ yếu bảo vệ chống: (1) Ransomware — malware encrypt/delete production và backup trong cùng account, (2) Insider threat — malicious admin xóa cả backup, (3) Account compromise — attacker có full access account không thể xóa backup ở account khác, (4) Accidental deletion — vô tình xóa backup. Cross-account + Vault Lock = strongest protection cho backup integrity.
Bài tập thực hành
- Setup AWS Backup plan: daily backup, 30 days retention, cross-region
- Tag resources với
Backup=true, observe backup jobs - Test PITR: restore RDS to 1 hour ago
- Setup Vault Lock Governance Mode (test mode first)
- Document restore procedure cho 1 critical service
Tài liệu tham khảo chính thức
Tiếp theo: Route 53 Failover Patterns