Tuần 7 - Ngày 1: Amazon SQS
Mục tiêu học tập
- Hiểu SQS: managed message queue
- Phân biệt Standard vs FIFO queues
- Nắm visibility timeout, DLQ, long polling
- Áp dụng decoupling pattern với SQS
1. Tổng quan SQS
Amazon SQS (Simple Queue Service) = fully managed message queue service for decoupling distributed components.
Đặc điểm
- Pull-based (consumers poll)
- At-least-once delivery (Standard) / Exactly-once (FIFO)
- Highly available: messages replicated across multiple AZs
- Scalable: handle 100K+ messages/sec (Standard)
- Message size: up to 256 KB (use S3 for larger)
- Retention: 1 minute - 14 days (default 4 days)
Use cases
- Decouple components (producer ≠ consumer rate)
- Buffer for traffic spikes
- Async processing (orders, emails, image processing)
- Task scheduling (delayed messages)
2. SQS Queue Types
Standard Queue
- At-least-once delivery (may deliver duplicate)
- Best-effort ordering (not strict FIFO)
- Unlimited throughput (auto-scale)
- Most use cases
FIFO Queue
- Exactly-once processing
- Strict FIFO ordering (per Message Group)
- 300 messages/sec (without batching), 3000/sec với batching
- Higher cost (~$0.50/M vs $0.40/M)
- Queue name must end with
.fifo
When to use FIFO
- Financial transactions (order matters)
- Sequential events processing
- Banking ledger
- Exactly-once required (no dedup logic in app)
When to use Standard
- High throughput needed
- Order không quan trọng (independent tasks)
- Lower cost
3. Producer
Send Message
sqs.send_message(
QueueUrl='https://sqs.us-east-1.amazonaws.com/111/MyQueue',
MessageBody='{"order_id": "123", "amount": 99.99}',
DelaySeconds=10, # delay before consumers can receive
MessageAttributes={...}
)
Send Message Batch
- Up to 10 messages in 1 call
- Reduces cost (1 API call vs 10)
- Total batch size ≤ 256 KB
4. Consumer
Poll Messages
response = sqs.receive_message(
QueueUrl='...',
MaxNumberOfMessages=10, # 1-10
WaitTimeSeconds=20, # long polling
VisibilityTimeout=30
)
for msg in response.get('Messages', []):
process(msg['Body'])
# MUST delete after processing
sqs.delete_message(
QueueUrl='...',
ReceiptHandle=msg['ReceiptHandle']
)
Consumers
- EC2 instances (often in ASG)
- Lambda functions (Event Source Mapping → poll automatically)
- ECS / Fargate
- On-prem servers
Auto-scaling consumers
- ASG scale based on CloudWatch metric
ApproximateNumberOfMessagesVisible - Lambda scale automatically (up to concurrency limit)
5. Visibility Timeout
Định nghĩa
Visibility Timeout = period during which a received message is hidden from other consumers.
Workflow
1. Consumer A: ReceiveMessage → message hidden từ others
2. Visibility Timeout starts (default 30s)
3. Consumer A processes message
4. Consumer A: DeleteMessage → message removed
5. (Or) If A fails to delete within timeout → message visible again, another consumer can receive
Configuration
- Default: 30 seconds
- Range: 0 to 12 hours
- Set per queue or per message (override)
Best practice
- Set timeout = average processing time × 2-3 (safety margin)
- Long-running tasks: extend timeout với
ChangeMessageVisibilityAPI - Short tasks: short timeout (faster retry on failure)
Common issue: duplicate processing
- Consumer A processes message but slow
- Timeout expires before A deletes
- Message reappears → Consumer B receives same message
- → Both A and B process duplicate
- Solution: idempotent processing OR extend visibility timeout
6. Long Polling vs Short Polling
Short Polling (default)
WaitTimeSeconds = 0- Returns immediately (even if no messages)
- More API calls → more cost
- Higher latency for low-traffic queues
Long Polling (recommended)
WaitTimeSeconds = 1-20- Wait up to N seconds for messages to arrive
- Fewer API calls → lower cost
- Lower latency
Setup
- Set
ReceiveMessageWaitTimeSecondsat queue level (default polling) - Or override per
ReceiveMessagecall
7. Dead Letter Queue (DLQ)
Định nghĩa
DLQ = secondary queue for messages that failed processing N times.
Workflow
1. Message delivered to consumer
2. Consumer fails to process (exception, timeout)
3. Message reappears, retry
4. After N retries (Max Receive Count), message moves to DLQ
5. DLQ alerts via CloudWatch → investigate
Configuration
RedrivePolicy:deadLetterTargetArn: ARN of DLQmaxReceiveCount: e.g., 3 (after 3 failed processing, move to DLQ)
DLQ best practices
- Set up CloudWatch alarm on DLQ message count
- Manually inspect DLQ for failure pattern
- Fix bug → redrive messages back to source queue (DLQ Redrive)
Use case
- Poison messages (malformed data)
- App bug processing certain message types
- External service downtime causing retries
8. Message Lifecycle
9. SQS Security
Access Control
- IAM policies (identity-based)
- SQS Queue Policy (resource-based, cross-account)
Encryption
- In transit: TLS
- At rest: SQS-managed encryption hoặc KMS
- Enable per queue
VPC Endpoint
- Interface Endpoint cho SQS (PrivateLink)
- Private subnet access SQS không qua internet
10. SQS Advanced Features
Delay Queues
- Delay messages 0-15 minutes before visible to consumers
- Set per queue (default 0)
Message Timers
- Per-message delay (override queue default)
- Up to 15 minutes
FIFO Specific Features
- MessageGroupId: group messages, FIFO within group
- MessageDeduplicationId: dedupe within 5-minute window
- Multiple Groups can process in parallel
Long Messages (> 256 KB)
- Use SQS Extended Client Library: store payload in S3, queue contains S3 reference
- Or use Step Functions with larger payload support
11. SQS vs SNS vs EventBridge
| SQS | SNS | EventBridge | |
|---|---|---|---|
| Type | Queue (point-to-point) | Topic (pub/sub) | Event Bus (pub/sub + routing) |
| Consumers | 1 consumer per message | Many subscribers per message | Many targets with filtering |
| Delivery | Pull (consumer polls) | Push to subscribers | Push to targets |
| Use case | Decouple, async work | Fanout notifications | Event-driven architecture |
Combined pattern: SNS + SQS fanout
Each consumer processes independently from their queue.
12. SQS Pricing
- $0.40 per million requests (Standard)
- $0.50 per million requests (FIFO)
- Free tier: 1M requests/month
- No data transfer charge within same Region (to EC2)
Optimization
- Batch operations (10 messages/call = 10x cost reduction)
- Long polling (fewer empty polls)
- Right-size visibility timeout
13. Common Patterns
Pattern 1: Order processing
Web app → SQS Queue → Order Processor (EC2 ASG)
→ DLQ if processing fails
Pattern 2: Image thumbnail generation
User upload → S3 → SNS → SQS → Lambda → resize → S3
Pattern 3: Email batch sending
App → SQS FIFO Queue (group by user)
→ Worker reads, sends email via SES
Pattern 4: Job retry mechanism
App → SQS (initial)
→ Lambda → 3 retries
→ DLQ → CloudWatch alarm → Ops investigation
Câu hỏi ôn tập
-
Khác biệt cốt lõi giữa Standard và FIFO queue?
Xem đáp án
Standard: throughput cao (nearly unlimited), delivery at-least-once (có thể duplicate), ordering best-effort (không đảm bảo). FIFO: throughput giới hạn (300 msg/s, hoặc 3000 với batching), delivery exactly-once (deduplication tự động), ordering guaranteed (FIFO). FIFO tên kết thúc bằng
.fifo. Dùng FIFO cho financial transactions, order processing — cần strict ordering; Standard cho high-throughput async jobs. -
Visibility timeout default là bao nhiêu giây?
Xem đáp án
30 giây (default). Range: 0 giây đến 12 giờ. Khi consumer nhận message, message "invisible" với consumers khác trong visibility timeout — ngăn duplicate processing. Nếu consumer xử lý xong → delete message. Nếu timeout hết mà chưa delete → message visible lại, consumer khác có thể nhận. Consumer nên set visibility timeout > max processing time để tránh duplicates.
-
Long polling khác Short polling ở điểm gì?
Xem đáp án
Short polling (default): trả về ngay kể cả queue empty — gây nhiều empty API responses, tốn tiền. Long polling (
WaitTimeSeconds1-20s): SQS đợi đến khi có message hoặc hết timeout mới trả response — giảm empty responses, giảm cost, giảm latency. AWS khuyến nghị dùng long polling (20 giây) cho production. SetReceiveMessageWaitTimeSecondsở queue level hoặc per-request. -
DLQ giải quyết vấn đề gì?
Xem đáp án
Dead Letter Queue (DLQ) lưu messages không thể xử lý thành công sau N lần retry (
maxReceiveCount). Ngăn "poison pill messages" làm block queue mãi mãi. DLQ là Standard/FIFO queue bình thường — có thể inspect, debug, hoặc reprocess messages. Thường setup alarm trên DLQ metrics để alert khi có failures. DLQ phải cùng type (Standard/FIFO) với source queue. -
Tối đa bao nhiêu messages trong 1 batch SendMessage?
Xem đáp án
10 messages per batch (SendMessageBatch API). Tổng size của batch tối đa 256 KB. Batching giảm số API calls và cost (tính theo request, không phải per message). Tương tự, ReceiveMessage có thể nhận tối đa 10 messages per call, DeleteMessageBatch có thể delete tối đa 10 messages. Dùng batching để tối ưu throughput và cost.
Bài tập thực hành
- Tạo Standard SQS queue, send 10 messages, receive batch 10
- Tạo FIFO queue, test ordering với MessageGroupId
- Setup DLQ, force consumer fail, observe message moves to DLQ
- Configure long polling 20s, observe API call reduction
- Setup Lambda trigger từ SQS (Event Source Mapping)
- Test SQS Extended Client cho message > 256 KB
Tài liệu tham khảo chính thức
Tiếp theo: Amazon SNS