Khác biệt cốt lõi giữa Standard và FIFO queue?

Standard: throughput cao (nearly unlimited), delivery at-least-once (có thể duplicate), ordering best-effort (không đảm bảo). FIFO: throughput giới hạn (300 msg/s, hoặc 3000 với batching), delivery exactly-once (deduplication tự động), ordering guaranteed (FIFO). FIFO tên kết thúc bằng .fifo. Dùng FIFO cho financial transactions, order processing — cần strict ordering; Standard cho high-throughput async jobs.

Visibility timeout default là bao nhiêu giây?

30 giây (default). Range: 0 giây đến 12 giờ. Khi consumer nhận message, message "invisible" với consumers khác trong visibility timeout — ngăn duplicate processing. Nếu consumer xử lý xong → delete message. Nếu timeout hết mà chưa delete → message visible lại, consumer khác có thể nhận. Consumer nên set visibility timeout > max processing time để tránh duplicates.

Long polling khác Short polling ở điểm gì?

Short polling (default): trả về ngay kể cả queue empty — gây nhiều empty API responses, tốn tiền. Long polling (WaitTimeSeconds 1-20s): SQS đợi đến khi có message hoặc hết timeout mới trả response — giảm empty responses, giảm cost, giảm latency. AWS khuyến nghị dùng long polling (20 giây) cho production. Set ReceiveMessageWaitTimeSeconds ở queue level hoặc per-request.

DLQ giải quyết vấn đề gì?

Dead Letter Queue (DLQ) lưu messages không thể xử lý thành công sau N lần retry (maxReceiveCount). Ngăn "poison pill messages" làm block queue mãi mãi. DLQ là Standard/FIFO queue bình thường — có thể inspect, debug, hoặc reprocess messages. Thường setup alarm trên DLQ metrics để alert khi có failures. DLQ phải cùng type (Standard/FIFO) với source queue.

Tối đa bao nhiêu messages trong 1 batch SendMessage?

10 messages per batch (SendMessageBatch API). Tổng size của batch tối đa 256 KB. Batching giảm số API calls và cost (tính theo request, không phải per message). Tương tự, ReceiveMessage có thể nhận tối đa 10 messages per call, DeleteMessageBatch có thể delete tối đa 10 messages. Dùng batching để tối ưu throughput và cost.

Tuần 7 - Ngày 1: Amazon SQS | SAA-C03

Mục tiêu học tập

Hiểu SQS: managed message queue
Phân biệt Standard vs FIFO queues
Nắm visibility timeout, DLQ, long polling
Áp dụng decoupling pattern với SQS

1. Tổng quan SQS

Amazon SQS (Simple Queue Service) = fully managed message queue service for decoupling distributed components.

Đặc điểm

Pull-based (consumers poll)
At-least-once delivery (Standard) / Exactly-once (FIFO)
Highly available: messages replicated across multiple AZs
Scalable: handle 100K+ messages/sec (Standard)
Message size: up to 256 KB (use S3 for larger)
Retention: 1 minute - 14 days (default 4 days)

Use cases

Decouple components (producer ≠ consumer rate)
Buffer for traffic spikes
Async processing (orders, emails, image processing)
Task scheduling (delayed messages)

2. SQS Queue Types

Standard Queue

At-least-once delivery (may deliver duplicate)
Best-effort ordering (not strict FIFO)
Unlimited throughput (auto-scale)
Most use cases

FIFO Queue

Exactly-once processing
Strict FIFO ordering (per Message Group)
300 messages/sec (without batching), 3000/sec với batching
Higher cost (~$0.50/M vs $0.40/M)
Queue name must end with .fifo

When to use FIFO

Financial transactions (order matters)
Sequential events processing
Banking ledger
Exactly-once required (no dedup logic in app)

When to use Standard

High throughput needed
Order không quan trọng (independent tasks)
Lower cost

3. Producer

Send Message

sqs.send_message(
    QueueUrl='https://sqs.us-east-1.amazonaws.com/111/MyQueue',
    MessageBody='{"order_id": "123", "amount": 99.99}',
    DelaySeconds=10,  # delay before consumers can receive
    MessageAttributes={...}
)

Send Message Batch

Up to 10 messages in 1 call
Reduces cost (1 API call vs 10)
Total batch size ≤ 256 KB

4. Consumer

Poll Messages

response = sqs.receive_message(
    QueueUrl='...',
    MaxNumberOfMessages=10,    # 1-10
    WaitTimeSeconds=20,         # long polling
    VisibilityTimeout=30
)

for msg in response.get('Messages', []):
    process(msg['Body'])
    # MUST delete after processing
    sqs.delete_message(
        QueueUrl='...',
        ReceiptHandle=msg['ReceiptHandle']
    )

Consumers

EC2 instances (often in ASG)
Lambda functions (Event Source Mapping → poll automatically)
ECS / Fargate
On-prem servers

Auto-scaling consumers

ASG scale based on CloudWatch metric ApproximateNumberOfMessagesVisible
Lambda scale automatically (up to concurrency limit)

1. Consumer A: ReceiveMessage → message hidden từ others
2. Visibility Timeout starts (default 30s)
3. Consumer A processes message
4. Consumer A: DeleteMessage → message removed
5. (Or) If A fails to delete within timeout → message visible again, another consumer can receive

Configuration

Default: 30 seconds
Range: 0 to 12 hours
Set per queue or per message (override)

Best practice

Set timeout = average processing time × 2-3 (safety margin)
Long-running tasks: extend timeout với ChangeMessageVisibility API
Short tasks: short timeout (faster retry on failure)

Common issue: duplicate processing

Consumer A processes message but slow
Timeout expires before A deletes
Message reappears → Consumer B receives same message
→ Both A and B process duplicate
Solution: idempotent processing OR extend visibility timeout

6. Long Polling vs Short Polling

Short Polling (default)

WaitTimeSeconds = 0
Returns immediately (even if no messages)
More API calls → more cost
Higher latency for low-traffic queues

Long Polling (recommended)

WaitTimeSeconds = 1-20
Wait up to N seconds for messages to arrive
Fewer API calls → lower cost
Lower latency

Setup

Set ReceiveMessageWaitTimeSeconds at queue level (default polling)
Or override per ReceiveMessage call

1. Message delivered to consumer
2. Consumer fails to process (exception, timeout)
3. Message reappears, retry
4. After N retries (Max Receive Count), message moves to DLQ
5. DLQ alerts via CloudWatch → investigate

Configuration

RedrivePolicy:
- deadLetterTargetArn: ARN of DLQ
- maxReceiveCount: e.g., 3 (after 3 failed processing, move to DLQ)

DLQ best practices

Set up CloudWatch alarm on DLQ message count
Manually inspect DLQ for failure pattern
Fix bug → redrive messages back to source queue (DLQ Redrive)

Use case

Poison messages (malformed data)
App bug processing certain message types
External service downtime causing retries

IAM policies (identity-based)
SQS Queue Policy (resource-based, cross-account)

Encryption

In transit: TLS
At rest: SQS-managed encryption hoặc KMS
Enable per queue

VPC Endpoint

Interface Endpoint cho SQS (PrivateLink)
Private subnet access SQS không qua internet

10. SQS Advanced Features

Delay Queues

Delay messages 0-15 minutes before visible to consumers
Set per queue (default 0)

Message Timers

Per-message delay (override queue default)
Up to 15 minutes

FIFO Specific Features

MessageGroupId: group messages, FIFO within group
MessageDeduplicationId: dedupe within 5-minute window
Multiple Groups can process in parallel

Long Messages (> 256 KB)

Use SQS Extended Client Library: store payload in S3, queue contains S3 reference
Or use Step Functions with larger payload support

	SQS	SNS	EventBridge
Type	Queue (point-to-point)	Topic (pub/sub)	Event Bus (pub/sub + routing)
Consumers	1 consumer per message	Many subscribers per message	Many targets with filtering
Delivery	Pull (consumer polls)	Push to subscribers	Push to targets
Use case	Decouple, async work	Fanout notifications	Event-driven architecture

Combined pattern: SNS + SQS fanout

Each consumer processes independently from their queue.

12. SQS Pricing

$0.40 per million requests (Standard)
$0.50 per million requests (FIFO)
Free tier: 1M requests/month
No data transfer charge within same Region (to EC2)

Optimization

Batch operations (10 messages/call = 10x cost reduction)
Long polling (fewer empty polls)
Right-size visibility timeout

13. Common Patterns

Pattern 1: Order processing

Web app → SQS Queue → Order Processor (EC2 ASG)
                   → DLQ if processing fails

Pattern 2: Image thumbnail generation

User upload → S3 → SNS → SQS → Lambda → resize → S3

Pattern 3: Email batch sending

App → SQS FIFO Queue (group by user)
    → Worker reads, sends email via SES

Pattern 4: Job retry mechanism

App → SQS (initial)
   → Lambda → 3 retries
   → DLQ → CloudWatch alarm → Ops investigation

Câu hỏi ôn tập

Khác biệt cốt lõi giữa Standard và FIFO queue?

Xem đáp án

Standard: throughput cao (nearly unlimited), delivery at-least-once (có thể duplicate), ordering best-effort (không đảm bảo). FIFO: throughput giới hạn (300 msg/s, hoặc 3000 với batching), delivery exactly-once (deduplication tự động), ordering guaranteed (FIFO). FIFO tên kết thúc bằng .fifo. Dùng FIFO cho financial transactions, order processing — cần strict ordering; Standard cho high-throughput async jobs.
Visibility timeout default là bao nhiêu giây?

Xem đáp án

30 giây (default). Range: 0 giây đến 12 giờ. Khi consumer nhận message, message "invisible" với consumers khác trong visibility timeout — ngăn duplicate processing. Nếu consumer xử lý xong → delete message. Nếu timeout hết mà chưa delete → message visible lại, consumer khác có thể nhận. Consumer nên set visibility timeout > max processing time để tránh duplicates.
Long polling khác Short polling ở điểm gì?

Xem đáp án

Short polling (default): trả về ngay kể cả queue empty — gây nhiều empty API responses, tốn tiền. Long polling (WaitTimeSeconds 1-20s): SQS đợi đến khi có message hoặc hết timeout mới trả response — giảm empty responses, giảm cost, giảm latency. AWS khuyến nghị dùng long polling (20 giây) cho production. Set ReceiveMessageWaitTimeSeconds ở queue level hoặc per-request.
DLQ giải quyết vấn đề gì?

Xem đáp án

Dead Letter Queue (DLQ) lưu messages không thể xử lý thành công sau N lần retry (maxReceiveCount). Ngăn "poison pill messages" làm block queue mãi mãi. DLQ là Standard/FIFO queue bình thường — có thể inspect, debug, hoặc reprocess messages. Thường setup alarm trên DLQ metrics để alert khi có failures. DLQ phải cùng type (Standard/FIFO) với source queue.
Tối đa bao nhiêu messages trong 1 batch SendMessage?

Xem đáp án

10 messages per batch (SendMessageBatch API). Tổng size của batch tối đa 256 KB. Batching giảm số API calls và cost (tính theo request, không phải per message). Tương tự, ReceiveMessage có thể nhận tối đa 10 messages per call, DeleteMessageBatch có thể delete tối đa 10 messages. Dùng batching để tối ưu throughput và cost.

Bài tập thực hành

Tạo Standard SQS queue, send 10 messages, receive batch 10
Tạo FIFO queue, test ordering với MessageGroupId
Setup DLQ, force consumer fail, observe message moves to DLQ
Configure long polling 20s, observe API call reduction
Setup Lambda trigger từ SQS (Event Source Mapping)
Test SQS Extended Client cho message > 256 KB

Tài liệu tham khảo chính thức

Tiếp theo: Amazon SNS