Tuần 11 - Ngày 1: Review Domain 1 & 2
Mục tiêu
Tổng ôn 2 domain đầu tiên (chiếm 56% của exam):
- Domain 1: Design Secure Architectures (30%)
- Domain 2: Design Resilient Architectures (26%)
Domain 1: Design Secure Architectures (30%)
1.1 Secure access to AWS resources
IAM Core concepts
- Users, Groups, Roles
- Policies: Identity-based vs Resource-based vs SCPs vs Permission Boundaries
- Policy evaluation: explicit Deny > explicit Allow > default Deny
- STS for temporary credentials
Federation
- SAML 2.0 for enterprise SSO
- OIDC for web identity (Google, Facebook)
- Cognito for app users
- IAM Identity Center for multi-account AWS access
Best practices
- Root user MFA + minimal use
- IAM Roles cho EC2/Lambda/ECS (no hard-coded keys)
- Least privilege
- ABAC for scalable tagging-based access
1.2 Secure workloads and applications
Data encryption
- KMS: at-rest encryption với CMKs
- TLS: in-transit encryption
- S3 SSE: SSE-S3, SSE-KMS, SSE-C, DSSE-KMS, client-side
- EBS encryption (enable by default per region)
Secrets management
- Secrets Manager: rotation built-in cho RDS
- Parameter Store: cheaper, manual rotation
- CloudHSM: dedicated HSM for compliance
Network security
- Security Groups: stateful, instance level
- NACLs: stateless, subnet level, support Deny
- WAF: L7 filtering (SQLi, XSS, rate limit)
- Shield: DDoS protection (Standard free, Advanced $3000/mo)
- VPC Endpoints: private connectivity to AWS services
1.3 Determine appropriate data security controls
Detection
- GuardDuty: threat detection (VPC Flow Logs, CloudTrail, DNS)
- Inspector: vulnerability assessment (EC2, ECR, Lambda)
- Macie: PII detection in S3
- Config: configuration tracking + compliance rules
- Security Hub: aggregator for security findings
Compliance
- CloudTrail: API audit logs
- Audit Manager: compliance evidence collection
- Object Lock (S3): WORM for compliance archives
- Vault Lock (AWS Backup): immutable backups
Domain 2: Design Resilient Architectures (26%)
2.1 Design scalable + loosely coupled architectures
Decoupling tools
- SQS: queue (point-to-point async)
- SNS: pub/sub (fanout)
- EventBridge: event bus with filtering
- Step Functions: workflow orchestration
Patterns
- Async API: long-running tasks
- Saga: distributed transactions
- CQRS: read/write separation
- Fanout: SNS + multiple SQS
- Pipeline: producer → queue → consumer
2.2 Design highly available + fault-tolerant
HA fundamentals
- Multi-AZ: tolerate AZ failure (RDS Multi-AZ, ALB across AZs)
- Multi-Region: tolerate region failure (Aurora Global, DynamoDB Global)
- Auto Scaling: replace failed instances
- Health checks: ELB + Route 53
Stateful services HA
- RDS Multi-AZ: sync replication, automatic failover (~1-2 min)
- Aurora: 6 copies / 3 AZs, < 30s failover
- DynamoDB: built-in Multi-AZ
- ElastiCache Redis: replication groups, Multi-AZ
- EFS: multi-AZ by default
DR strategies
- Backup & Restore: cheapest, RTO hours
- Pilot Light: minimal running, RTO 10 min - 1 hour
- Warm Standby: scaled-down full, RTO < 30 min
- Multi-Site Active-Active: most expensive, RTO seconds
Route 53 routing for HA
- Failover routing: active-passive
- Latency-based: closest region
- Weighted: gradual rollout
- Multivalue: lightweight LB
- Health checks integration
Stateless design
- No session affinity needed (Redis/DynamoDB for session)
- Idempotent operations
- Externalize state to persistent stores
Common Exam Scenarios
Scenario 1: HA web app
- ALB across 3 AZs
- ASG with min/desired/max
- RDS Multi-AZ
- ElastiCache for session
- S3 + CloudFront for static
Scenario 2: DR multi-region
- Aurora Global Database (primary + secondary region)
- S3 CRR for assets
- Route 53 failover routing
- AWS Backup cross-region
Scenario 3: Secure app
- VPC with private subnets for DB
- Security Groups referenceing each other (web → app → DB)
- WAF on CloudFront/ALB
- Secrets Manager for DB password (auto-rotation)
- KMS encryption everywhere
Scenario 4: Decoupled processing
- API Gateway → Lambda → SQS → Worker
- DLQ for failures
- EventBridge for fanout to multiple services
Scenario 5: Compliance archive
- S3 Glacier Deep Archive
- Object Lock Compliance Mode
- KMS encryption
- Cross-region replication
- AWS Backup Vault Lock
Key Decision Points
When to use what for HA?
| Need | Service |
|---|---|
| Replace failed EC2 | Auto Scaling Group |
| Distribute traffic across instances | ALB / NLB |
| DB HA same region | RDS Multi-AZ / Aurora |
| DB HA cross-region | Aurora Global Database |
| Session storage | ElastiCache / DynamoDB |
| File storage HA | EFS, FSx (Multi-AZ option) |
| DNS failover | Route 53 health checks |
When to use what for Security?
| Need | Service |
|---|---|
| Audit API calls | CloudTrail |
| Detect threats | GuardDuty |
| Scan vulnerabilities | Inspector |
| Discover PII | Macie |
| DDoS protection | Shield (Std free, Adv paid) |
| L7 filtering | WAF |
| Encryption keys | KMS / CloudHSM |
| Manage secrets with rotation | Secrets Manager |
| User auth (apps) | Cognito |
| User auth (AWS workforce) | IAM Identity Center |
Quick Trade-offs
Cost vs Resilience
- Backup & Restore: cheap, slow recovery
- Multi-Site: expensive, fast recovery
- Choose based on cost of downtime
Coupling vs Performance
- Tightly coupled: low latency, hard to scale
- Loosely coupled (async): higher latency, resilient
Encryption everywhere?
- Yes for at-rest (KMS easy)
- TLS for in-transit
- Performance impact minimal with KMS
Common Exam Keywords
| Keyword | Likely answer |
|---|---|
| "Most secure" | Encryption + Least privilege + Private network |
| "Highly available" | Multi-AZ ALB + ASG + RDS Multi-AZ |
| "Tolerate region failure" | Multi-Region (Aurora Global, S3 CRR) |
| "Decouple" | SQS, SNS, EventBridge |
| "Async processing" | SQS, Lambda |
| "Reduce blast radius" | Multiple accounts (Organizations) |
| "Compliance immutable" | Object Lock, Vault Lock |
Practice Questions
Q1
Web app needs to tolerate AZ failure. Components needed:
- Solution: ALB across AZs + ASG across AZs + RDS Multi-AZ + ElastiCache Multi-AZ
Q2
Need to share file storage across 100 Linux EC2 instances:
- Solution: EFS (Multi-AZ, scalable)
Q3
Need DDoS protection + L7 filtering for global web app:
- Solution: CloudFront + WAF + Shield Standard (free)
Q4
Decouple order processing (high spike during sales):
- Solution: API Gateway → SQS → Lambda/EC2 ASG workers
Q5
DR for production with RPO < 1 second, RTO < 1 minute:
- Solution: Aurora Global Database + Multi-Region active-passive
Tài liệu ôn tập
Tiếp theo: Review Domain 3 & 4