Tuần 6 - Ngày 4: Data Analytics Services
1. Analytics Services Overview
2. Amazon Kinesis
Kinesis Services
KINESIS DATA STREAMS:
- Real-time data streaming
- 1-365 days retention
- Replay capability
- Consumers: Lambda, KDA, KCL apps
AMAZON DATA FIREHOSE (cũ: Kinesis Data Firehose):
- Load data to destinations
- Near real-time (buffer 0–900 giây; zero buffering hỗ trợ từ Dec 2023)
- Destinations: S3, Redshift, OpenSearch, Splunk
- No data retention (pass-through)
- Lưu ý: AWS đã đổi tên từ "Kinesis Data Firehose" → "Amazon Data Firehose" (2024). Đề thi có thể dùng cả hai tên, kiến thức giữ nguyên.
KINESIS DATA ANALYTICS:
- SQL or Apache Flink on streaming data
- Real-time analytics
KINESIS VIDEO STREAMS:
- Stream video from devices
- ML integration
Kinesis Shards
1 Shard capacity:
- Read: 2 MB/s, 5 transactions/s
- Write: 1 MB/s, 1000 records/s
Scaling:
- Shard splitting (scale up)
- Shard merging (scale down)
3. Amazon Redshift
4. AWS Glue
5. Amazon Athena
Serverless SQL on S3:
- Pay per query ($5/TB scanned)
- Standard SQL (Presto)
- Integrates with Glue Data Catalog
Cost Optimization:
- Use columnar formats (Parquet, ORC)
- Partition data
- Compress data
Federated Query:
- Query across data sources
- RDS, DynamoDB, Redshift, on-prem
6. Lake Formation
Tài liệu tham khảo chính thức
- Amazon Redshift Documentation
- Amazon Athena User Guide
- Amazon Kinesis Documentation
- AWS Glue Developer Guide
Ngày tiếp theo: Quiz tổng kết Tuần 6