</>Học Dev
Bài học

Tuần 7 - Ngày 1: Amazon SQS

Tuần 7 – Ngày 1

Tuần 7 - Ngày 1: Amazon SQS

Mục tiêu học tập

  • Hiểu SQS: managed message queue
  • Phân biệt Standard vs FIFO queues
  • Nắm visibility timeout, DLQ, long polling
  • Áp dụng decoupling pattern với SQS

1. Tổng quan SQS

Amazon SQS (Simple Queue Service) = fully managed message queue service for decoupling distributed components.

Đặc điểm

  • Pull-based (consumers poll)
  • At-least-once delivery (Standard) / Exactly-once (FIFO)
  • Highly available: messages replicated across multiple AZs
  • Scalable: handle 100K+ messages/sec (Standard)
  • Message size: up to 256 KB (use S3 for larger)
  • Retention: 1 minute - 14 days (default 4 days)

Use cases

  • Decouple components (producer ≠ consumer rate)
  • Buffer for traffic spikes
  • Async processing (orders, emails, image processing)
  • Task scheduling (delayed messages)

2. SQS Queue Types

Standard Queue

  • At-least-once delivery (may deliver duplicate)
  • Best-effort ordering (not strict FIFO)
  • Unlimited throughput (auto-scale)
  • Most use cases

FIFO Queue

  • Exactly-once processing
  • Strict FIFO ordering (per Message Group)
  • 300 messages/sec (without batching), 3000/sec với batching
  • Higher cost (~$0.50/M vs $0.40/M)
  • Queue name must end with .fifo

When to use FIFO

  • Financial transactions (order matters)
  • Sequential events processing
  • Banking ledger
  • Exactly-once required (no dedup logic in app)

When to use Standard

  • High throughput needed
  • Order không quan trọng (independent tasks)
  • Lower cost

3. Producer

Send Message

sqs.send_message(
    QueueUrl='https://sqs.us-east-1.amazonaws.com/111/MyQueue',
    MessageBody='{"order_id": "123", "amount": 99.99}',
    DelaySeconds=10,  # delay before consumers can receive
    MessageAttributes={...}
)

Send Message Batch

  • Up to 10 messages in 1 call
  • Reduces cost (1 API call vs 10)
  • Total batch size ≤ 256 KB

4. Consumer

Poll Messages

response = sqs.receive_message(
    QueueUrl='...',
    MaxNumberOfMessages=10,    # 1-10
    WaitTimeSeconds=20,         # long polling
    VisibilityTimeout=30
)

for msg in response.get('Messages', []):
    process(msg['Body'])
    # MUST delete after processing
    sqs.delete_message(
        QueueUrl='...',
        ReceiptHandle=msg['ReceiptHandle']
    )

Consumers

  • EC2 instances (often in ASG)
  • Lambda functions (Event Source Mapping → poll automatically)
  • ECS / Fargate
  • On-prem servers

Auto-scaling consumers

  • ASG scale based on CloudWatch metric ApproximateNumberOfMessagesVisible
  • Lambda scale automatically (up to concurrency limit)

5. Visibility Timeout

Định nghĩa

Visibility Timeout = period during which a received message is hidden from other consumers.

Workflow

1. Consumer A: ReceiveMessage → message hidden từ others
2. Visibility Timeout starts (default 30s)
3. Consumer A processes message
4. Consumer A: DeleteMessage → message removed
5. (Or) If A fails to delete within timeout → message visible again, another consumer can receive

Configuration

  • Default: 30 seconds
  • Range: 0 to 12 hours
  • Set per queue or per message (override)

Best practice

  • Set timeout = average processing time × 2-3 (safety margin)
  • Long-running tasks: extend timeout với ChangeMessageVisibility API
  • Short tasks: short timeout (faster retry on failure)

Common issue: duplicate processing

  • Consumer A processes message but slow
  • Timeout expires before A deletes
  • Message reappears → Consumer B receives same message
  • → Both A and B process duplicate
  • Solution: idempotent processing OR extend visibility timeout

6. Long Polling vs Short Polling

Short Polling (default)

  • WaitTimeSeconds = 0
  • Returns immediately (even if no messages)
  • More API calls → more cost
  • Higher latency for low-traffic queues
  • WaitTimeSeconds = 1-20
  • Wait up to N seconds for messages to arrive
  • Fewer API calls → lower cost
  • Lower latency

Setup

  • Set ReceiveMessageWaitTimeSeconds at queue level (default polling)
  • Or override per ReceiveMessage call

7. Dead Letter Queue (DLQ)

Định nghĩa

DLQ = secondary queue for messages that failed processing N times.

Workflow

1. Message delivered to consumer
2. Consumer fails to process (exception, timeout)
3. Message reappears, retry
4. After N retries (Max Receive Count), message moves to DLQ
5. DLQ alerts via CloudWatch → investigate

Configuration

  • RedrivePolicy:
    • deadLetterTargetArn: ARN of DLQ
    • maxReceiveCount: e.g., 3 (after 3 failed processing, move to DLQ)

DLQ best practices

  • Set up CloudWatch alarm on DLQ message count
  • Manually inspect DLQ for failure pattern
  • Fix bug → redrive messages back to source queue (DLQ Redrive)

Use case

  • Poison messages (malformed data)
  • App bug processing certain message types
  • External service downtime causing retries

8. Message Lifecycle

ProducerSQSConsumer[Visible]ReceiveMessagebyConsumer[In-flight-hidden]ConsumerDELETESwithinvisibilitytimeout?YesNo[Removed][Visibleagain]Retry(uptomaxReceiveCount)[DLQ]

9. SQS Security

Access Control

  • IAM policies (identity-based)
  • SQS Queue Policy (resource-based, cross-account)

Encryption

  • In transit: TLS
  • At rest: SQS-managed encryption hoặc KMS
  • Enable per queue

VPC Endpoint

  • Interface Endpoint cho SQS (PrivateLink)
  • Private subnet access SQS không qua internet

10. SQS Advanced Features

Delay Queues

  • Delay messages 0-15 minutes before visible to consumers
  • Set per queue (default 0)

Message Timers

  • Per-message delay (override queue default)
  • Up to 15 minutes

FIFO Specific Features

  • MessageGroupId: group messages, FIFO within group
  • MessageDeduplicationId: dedupe within 5-minute window
  • Multiple Groups can process in parallel

Long Messages (> 256 KB)

  • Use SQS Extended Client Library: store payload in S3, queue contains S3 reference
  • Or use Step Functions with larger payload support

11. SQS vs SNS vs EventBridge

SQSSNSEventBridge
TypeQueue (point-to-point)Topic (pub/sub)Event Bus (pub/sub + routing)
Consumers1 consumer per messageMany subscribers per messageMany targets with filtering
DeliveryPull (consumer polls)Push to subscribersPush to targets
Use caseDecouple, async workFanout notificationsEvent-driven architecture

Combined pattern: SNS + SQS fanout

EventSNSTopicSQSQueue1ConsumerASQSQueue2ConsumerBSQSQueue3ConsumerC

Each consumer processes independently from their queue.

12. SQS Pricing

  • $0.40 per million requests (Standard)
  • $0.50 per million requests (FIFO)
  • Free tier: 1M requests/month
  • No data transfer charge within same Region (to EC2)

Optimization

  • Batch operations (10 messages/call = 10x cost reduction)
  • Long polling (fewer empty polls)
  • Right-size visibility timeout

13. Common Patterns

Pattern 1: Order processing

Web app → SQS Queue → Order Processor (EC2 ASG)
                   → DLQ if processing fails

Pattern 2: Image thumbnail generation

User upload → S3 → SNS → SQS → Lambda → resize → S3

Pattern 3: Email batch sending

App → SQS FIFO Queue (group by user)
    → Worker reads, sends email via SES

Pattern 4: Job retry mechanism

App → SQS (initial)
   → Lambda → 3 retries
   → DLQ → CloudWatch alarm → Ops investigation

Câu hỏi ôn tập

  1. Khác biệt cốt lõi giữa Standard và FIFO queue?

    Xem đáp án

    Standard: throughput cao (nearly unlimited), delivery at-least-once (có thể duplicate), ordering best-effort (không đảm bảo). FIFO: throughput giới hạn (300 msg/s, hoặc 3000 với batching), delivery exactly-once (deduplication tự động), ordering guaranteed (FIFO). FIFO tên kết thúc bằng .fifo. Dùng FIFO cho financial transactions, order processing — cần strict ordering; Standard cho high-throughput async jobs.

  2. Visibility timeout default là bao nhiêu giây?

    Xem đáp án

    30 giây (default). Range: 0 giây đến 12 giờ. Khi consumer nhận message, message "invisible" với consumers khác trong visibility timeout — ngăn duplicate processing. Nếu consumer xử lý xong → delete message. Nếu timeout hết mà chưa delete → message visible lại, consumer khác có thể nhận. Consumer nên set visibility timeout > max processing time để tránh duplicates.

  3. Long polling khác Short polling ở điểm gì?

    Xem đáp án

    Short polling (default): trả về ngay kể cả queue empty — gây nhiều empty API responses, tốn tiền. Long polling (WaitTimeSeconds 1-20s): SQS đợi đến khi có message hoặc hết timeout mới trả response — giảm empty responses, giảm cost, giảm latency. AWS khuyến nghị dùng long polling (20 giây) cho production. Set ReceiveMessageWaitTimeSeconds ở queue level hoặc per-request.

  4. DLQ giải quyết vấn đề gì?

    Xem đáp án

    Dead Letter Queue (DLQ) lưu messages không thể xử lý thành công sau N lần retry (maxReceiveCount). Ngăn "poison pill messages" làm block queue mãi mãi. DLQ là Standard/FIFO queue bình thường — có thể inspect, debug, hoặc reprocess messages. Thường setup alarm trên DLQ metrics để alert khi có failures. DLQ phải cùng type (Standard/FIFO) với source queue.

  5. Tối đa bao nhiêu messages trong 1 batch SendMessage?

    Xem đáp án

    10 messages per batch (SendMessageBatch API). Tổng size của batch tối đa 256 KB. Batching giảm số API calls và cost (tính theo request, không phải per message). Tương tự, ReceiveMessage có thể nhận tối đa 10 messages per call, DeleteMessageBatch có thể delete tối đa 10 messages. Dùng batching để tối ưu throughput và cost.

Bài tập thực hành

  • Tạo Standard SQS queue, send 10 messages, receive batch 10
  • Tạo FIFO queue, test ordering với MessageGroupId
  • Setup DLQ, force consumer fail, observe message moves to DLQ
  • Configure long polling 20s, observe API call reduction
  • Setup Lambda trigger từ SQS (Event Source Mapping)
  • Test SQS Extended Client cho message > 256 KB

Tài liệu tham khảo chính thức


Tiếp theo: Amazon SNS