Tuần 5 - Ngày 3: DynamoDB Basics
Mục tiêu học tập
- Hiểu DynamoDB data model: table, item, attribute
- Phân biệt Partition Key vs Composite Key
- Nắm Provisioned vs On-Demand capacity
- Hiểu Global Tables, DAX, Streams
1. Tổng quan DynamoDB
Amazon DynamoDB = fully managed NoSQL key-value & document database.
Đặc điểm
- Serverless: no servers to manage
- Single-digit ms latency at any scale
- Auto-scaling: handle millions of requests/sec
- Multi-AZ replication built-in (3 copies)
- NoSQL: no schema, flexible structure
- HTTP API (no JDBC/connection-based)
- Encryption at rest by default
Use cases
- High-traffic web/mobile apps
- Gaming leaderboards
- IoT data ingestion
- Real-time bidding
- Shopping carts
- Session storage
- User profiles
Not for
- Complex JOIN queries (NoSQL)
- Strong relational integrity needs
- Ad-hoc queries (need to design access patterns upfront)
- BI / analytics (use Redshift)
2. Data Model
Hierarchy
Example
Table: Users
Item 1:
{
"UserId": "u-001", // Partition Key
"Name": "Alice",
"Email": "alice@example.com",
"Age": 30,
"Interests": ["coding", "music"] // List attribute
}
Item 2:
{
"UserId": "u-002",
"Name": "Bob",
"Email": "bob@example.com",
"Age": 25
// Note: no "Interests" — schema-less!
}
Attribute types
- Scalar: String, Number, Binary, Boolean, Null
- Set: String Set, Number Set, Binary Set
- Document: List, Map (nested structure)
Max sizes
- Item size: 400 KB
- Attribute name: 64 KB
- Partition key: 2048 bytes (string), 38 bytes (number)
- Sort key: 1024 bytes
3. Primary Key
2 loại
Simple Primary Key (Partition Key only)
- 1 attribute uniquely identifies item
- Hash function distributes items across partitions
Composite Primary Key (Partition Key + Sort Key)
- Partition Key + Sort Key combination must be unique
- Items với same PK grouped together, sorted by SK
Partition Key best practice
- High cardinality (many distinct values) → spread evenly
- Avoid hot partitions: don't use timestamp/sequential as PK
- Common patterns: UserId, ProductId, SessionId, UUID
4. Read/Write Capacity Modes
Provisioned Capacity
- Specify RCU (Read Capacity Units) và WCU (Write Capacity Units)
- Pay for capacity reserved
- Auto-scaling available (scale RCU/WCU based on utilization target)
Capacity calculation
- 1 WCU = 1 write/sec for item up to 1 KB
- Transactional write: 2 WCU per write
- 1 RCU = 1 strongly-consistent read/sec for item up to 4 KB
- Eventually consistent: 2 reads/sec per RCU
- Transactional read: 2 RCU per read
Use case
- Predictable traffic
- Cost-optimization (Reserved Capacity option)
On-Demand Capacity
- Pay per request ($/million reads, $/million writes)
- No capacity planning — auto-handle any traffic
- More expensive per request than provisioned
- Instant scale (no need to provision)
Use case
- Unpredictable traffic
- New apps where traffic unknown
- Spiky traffic (0 → millions in seconds)
Switching between modes
- Can switch once every 24 hours
5. Read Consistency Models
Eventually Consistent (default)
- Read might return stale data (latest update not propagated yet)
- Faster, cheaper
- Lag typically < 1 second
Strongly Consistent
- Read returns latest data (after writes)
- Slower (higher latency)
- 2x cost in RCU
When to use Strongly Consistent
- Read after write must see update (e.g., post-payment check)
- Financial transactions
# boto3 example
table.get_item(
Key={'UserId': 'u-001'},
ConsistentRead=True # ← strongly consistent
)
6. Indexes
Local Secondary Index (LSI)
- Same partition key as base table, different sort key
- Created at table creation, cannot add later
- Max 5 LSIs per table
- Uses base table's capacity
Use case
- Different sort orders for same PK
- E.g., Orders sorted by date, by amount
Global Secondary Index (GSI)
- Different partition key (và optional different sort key)
- Can create anytime (no downtime)
- Max 20 GSIs per table
- Own provisioned capacity (separate RCU/WCU)
- Eventually consistent only
Use case
- Query by non-PK attribute
- E.g., Users by Email (PK is UserId)
Example
Base Table:
PK: UserId, SK: OrderId
LSI:
PK: UserId, SK: OrderDate ← same PK, different SK
GSI:
PK: ProductId, SK: OrderId ← different PK
7. DynamoDB Streams
Định nghĩa
Streams = ordered log of item changes (insert, modify, delete) in table.
Đặc điểm
- 24-hour retention
- Near real-time (< 1 sec lag)
- Enable per table
- Compatible với Kinesis Data Streams (newer)
View types
- KEYS_ONLY: only key attributes
- NEW_IMAGE: entire new item
- OLD_IMAGE: entire old item
- NEW_AND_OLD_IMAGES: both
Use cases
- Trigger Lambda on changes (CDC pattern)
- Replicate to other databases
- Audit trail
- Aggregations / analytics
- Notifications (SNS on item change)
8. DAX (DynamoDB Accelerator)
Định nghĩa
DAX = fully managed in-memory cache for DynamoDB, microsecond latency for reads.
Đặc điểm
- Write-through cache: writes go through DAX to DynamoDB
- 10x performance for cached reads
- Compatible với DynamoDB API (no code change)
- Multi-AZ cluster
- TTL configurable (5 min default)
When to use DAX
- Read-heavy workload (>80% reads)
- Repeated reads of same items
- Need sub-millisecond response
When NOT to use DAX
- Write-heavy
- Eventually consistent reads OK với regular DynamoDB
- Strongly-consistent reads (DAX is eventually consistent)
Cost
- DAX cluster (1 primary + readers): ~$0.04/hour per node
9. Global Tables
Định nghĩa
Global Tables = multi-region, multi-active replication for DynamoDB.
Đặc điểm
- Active-active: write to ANY region, replicated to all others
- Sub-second replication lag (typical)
- Conflict resolution: last writer wins (based on timestamp)
- Requires Streams enabled
- Cost: charged per region capacity + cross-region data transfer
Use case
- Multi-region apps with low-latency for global users
- DR (any region can become primary)
- Compliance (data in multiple regions)
Architecture
10. DynamoDB Backups
Continuous Backup (PITR - Point in Time Recovery)
- Enable per table
- Restore to any second in last 35 days
- No performance impact
- Cost: $0.20/GB-month
On-Demand Backup
- User-initiated full backup
- Retain indefinitely
- No performance impact
- Used for compliance, long-term archive
Restore
- Always restore to new table (cannot in-place)
- Includes indexes (LSI, GSI), provisioned capacity, encryption settings
11. DynamoDB Transactions
ACID transactions
- Up to 100 actions per transaction
- Up to 4 MB data
- Works across multiple tables in same region
- 2x RCU/WCU cost vs regular reads/writes
Operations
TransactWriteItems: atomic write to 1+ itemsTransactGetItems: atomic read of 1+ items
Use case
- Financial transfers (debit + credit atomically)
- Order processing (inventory + payment + confirmation)
- Anything requiring "all or nothing"
12. TTL (Time to Live)
Định nghĩa
TTL = auto-delete items at specified expiry time.
Setup
- Designate an attribute (e.g.,
ExpiresAt= epoch timestamp) - DynamoDB checks periodically (within 48 hours of expiry)
- Free operation (no WCU charge)
Use case
- Session storage (auto-cleanup expired sessions)
- Temporary data (cache, logs)
- Compliance (auto-delete old data)
Câu hỏi ôn tập
-
DynamoDB item size max là bao nhiêu?
Xem đáp án
400 KB per item. Nếu cần lưu data lớn hơn (images, files), lưu trong S3 và chỉ lưu S3 key trong DynamoDB item. Table size không giới hạn, chỉ item size bị giới hạn. Attribute names cũng tính vào item size — dùng tên ngắn để tối ưu storage và RCU/WCU.
-
GSI và LSI khác nhau ở điểm gì cốt lõi?
Xem đáp án
LSI (Local Secondary Index): dùng cùng Partition Key nhưng Sort Key khác — chỉ tạo được khi create table, không thể thêm sau. Dùng chung throughput với base table. GSI (Global Secondary Index): có Partition Key hoàn toàn khác — có thể tạo/xóa bất kỳ lúc nào. Có throughput riêng (provision riêng hoặc on-demand). Tối đa 5 LSI và 20 GSI per table.
-
1 RCU strongly consistent = bao nhiêu reads/sec cho item 4 KB?
Xem đáp án
1 read/second cho item đến 4 KB với strong consistency. Eventually consistent read: 2 reads/sec cho item 4 KB (gấp đôi). Item lớn hơn: 1 RCU strongly consistent = 1 read/4 KB — item 8 KB cần 2 RCUs, item 1 KB cần 0.5 RCU (làm tròn lên = 1 RCU). Transactional reads tiêu tốn 2 RCUs per read.
-
DAX phù hợp cho read-heavy hay write-heavy?
Xem đáp án
Read-heavy. DAX (DynamoDB Accelerator) là in-memory cache cho DynamoDB — giảm read latency từ milliseconds xuống microseconds và giảm RCU consumption. Không phù hợp cho write-heavy vì write-through DAX vẫn phải write xuống DynamoDB. Cũng không phù hợp khi application cần strongly consistent reads (DAX chỉ eventually consistent) hoặc data thay đổi liên tục (cache hit rate thấp).
-
Global Tables conflict resolution là gì?
Xem đáp án
DynamoDB Global Tables dùng "last writer wins" (LWW) với timestamp. Nếu cùng item được write ở hai Regions cùng lúc, write với timestamp mới nhất (theo wall clock) sẽ thắng. AWS khuyến nghị thiết kế ứng dụng tránh conflict: assign primary region cho mỗi user/tenant, hoặc dùng conditional writes với version counter. Không có built-in CRDT hay merge logic.
Bài tập thực hành
- Tạo DynamoDB table với composite PK (UserId + OrderId), On-Demand capacity
- Insert 100 items, query bằng PK
- Add GSI cho query by Email
- Enable Streams, tạo Lambda trigger log thay đổi
- Enable PITR, test restore to 1 hour ago
- Enable Global Tables sang 2 regions, test write từ region B
Tài liệu tham khảo chính thức
- DynamoDB Developer Guide
- Provisioned vs On-Demand
- Secondary Indexes
- DynamoDB Streams
- Global Tables
- DAX
Tiếp theo: ElastiCache (Redis & Memcached)