</>Học Dev
Bài học

Tuần 6 - Ngày 4: Data Analytics Services

Tuần 6 – Ngày 4

Tuần 6 - Ngày 4: Data Analytics Services

1. Analytics Services Overview

ANALYTICSPIPELINECollectStoreProcessAnalyzeVisualizeKinesisS3/EMR/Athena/QuickSightDataRedshiftGlueRedshiftFirehose

2. Amazon Kinesis

Kinesis Services

KINESIS DATA STREAMS:
- Real-time data streaming
- 1-365 days retention
- Replay capability
- Consumers: Lambda, KDA, KCL apps

AMAZON DATA FIREHOSE (cũ: Kinesis Data Firehose):
- Load data to destinations
- Near real-time (buffer 0–900 giây; zero buffering hỗ trợ từ Dec 2023)
- Destinations: S3, Redshift, OpenSearch, Splunk
- No data retention (pass-through)
- Lưu ý: AWS đã đổi tên từ "Kinesis Data Firehose" → "Amazon Data Firehose" (2024). Đề thi có thể dùng cả hai tên, kiến thức giữ nguyên.

KINESIS DATA ANALYTICS:
- SQL or Apache Flink on streaming data
- Real-time analytics

KINESIS VIDEO STREAMS:
- Stream video from devices
- ML integration

Kinesis Shards

1 Shard capacity:
- Read: 2 MB/s, 5 transactions/s
- Write: 1 MB/s, 1000 records/s

Scaling:
- Shard splitting (scale up)
- Shard merging (scale down)

3. Amazon Redshift

REDSHIFTClusterTypes:Provisioned:RA3,DC2nodesServerless:Auto-scalingFeatures:-Columnarstorage-MPP(MassivelyParallelProcessing)-Upto16PB-RedshiftSpectrum(queryS3directly)-FederatedQuery(RDS,Aurora)-MLwithCREATEMODELDistributionStyles:-AUTO(recommended)-EVEN(round-robin)-KEY(bycolumnvalue)-ALL(fullcopyoneachnode)

4. AWS Glue

AWSGLUEComponents:GlueDataCatalogMetadatarepositoryGlueCrawlersAuto-discoverschemaGlueETLJobsSpark(Python/Scala)PythonShellGlueDataBrewVisualdatapreparationGlueStudioVisualETL

5. Amazon Athena

Serverless SQL on S3:
- Pay per query ($5/TB scanned)
- Standard SQL (Presto)
- Integrates with Glue Data Catalog

Cost Optimization:
- Use columnar formats (Parquet, ORC)
- Partition data
- Compress data

Federated Query:
- Query across data sources
- RDS, DynamoDB, Redshift, on-prem

6. Lake Formation

LAKEFORMATIONCentralizeddatalakemanagement:1.Ingestdatafromsources2.StoreinS3(withformatting)3.CatalogwithGlue4.Securewithfine-grainedpermissions5.DiscoverandanalyzeSecurity:-Column-levelsecurity-Row-levelsecurity-Cell-levelsecurity-Tag-basedaccesscontrol

Tài liệu tham khảo chính thức


Ngày tiếp theo: Quiz tổng kết Tuần 6