Tuần 5 - Ngày 6: Database Migration (DMS & SCT)
Mục tiêu học tập
- Hiểu AWS DMS (Database Migration Service)
- Phân biệt Homogeneous vs Heterogeneous migration
- Nắm SCT (Schema Conversion Tool)
- Biết các migration strategies
1. Tổng quan AWS DMS
AWS Database Migration Service (DMS) = service di chuyển databases lên AWS, hoặc giữa các databases.
Đặc điểm
- Source → Replication Instance → Target
- Source vẫn chạy trong khi migration (minimum downtime)
- Support full load + CDC (Change Data Capture) continuous replication
- Multi-AZ replication instance for HA
2 loại migration
Homogeneous (cùng engine)
- MySQL → RDS MySQL
- PostgreSQL → Aurora PostgreSQL
- Oracle → RDS Oracle
- No schema conversion needed (DMS handles)
Heterogeneous (khác engine)
- Oracle → Aurora PostgreSQL
- SQL Server → MySQL
- Cần SCT để convert schema trước
- DMS migrate data sau
Architecture
2. DMS Components
Replication Instance
- EC2 instance runs DMS engine
- Sizing: t3, c5, r5 (CPU + memory)
- Multi-AZ option (HA)
- In VPC, has connectivity to source + target
Endpoints
- Source endpoint: where data comes from
- Target endpoint: where data goes to
- Support 20+ databases (MySQL, PostgreSQL, Oracle, SQL Server, MongoDB, DynamoDB, S3, Aurora, Redshift...)
Migration Task
- Defines what to migrate (tables, columns, transformations)
- 3 types:
- Full load (initial migration)
- CDC only (continuous replication)
- Full load + CDC (most common — copy data, then keep in sync)
3. SCT (Schema Conversion Tool)
Định nghĩa
SCT = desktop tool convert source database schema sang target schema (heterogeneous).
Đặc điểm
- Free tool (download install local)
- Convert: tables, indexes, views, stored procedures, functions
- Assessment report: identify issues, manual conversion needed
- Generate target schema, run on target before DMS
Workflow
1. Install SCT
2. Connect to source DB (Oracle)
3. Convert schema → SQL for target (PostgreSQL)
4. Review assessment report
5. Manually fix complex objects
6. Apply schema to target
7. Run DMS task migrate data
Supported conversions
- Oracle → PostgreSQL, MySQL, MariaDB
- SQL Server → PostgreSQL, MySQL, MariaDB
- MySQL → PostgreSQL
- Sybase → PostgreSQL, MySQL, MariaDB
- DB2 → PostgreSQL, MySQL, MariaDB
- Oracle DW → Redshift
- Teradata → Redshift
4. DMS vs SCT
| DMS | SCT | |
|---|---|---|
| Purpose | Data migration | Schema conversion |
| Type | Managed service | Desktop tool |
| Homogeneous | Yes (no SCT needed) | Not needed |
| Heterogeneous | Needs schema in target | Yes (convert schema) |
Order of operations (heterogeneous)
- SCT: Convert schema, apply to target
- DMS: Full load + CDC migration of data
- Cut over apps to target
5. Migration Strategies (Common)
Strategy 1: Lift-and-Shift (Rehosting)
- Move on-prem DB to RDS without changing engine
- DMS homogeneous (Oracle → RDS Oracle)
- Minimum changes, fastest
Strategy 2: Re-platforming
- Move + change platform (e.g., Oracle → Aurora PG)
- DMS heterogeneous + SCT
- Cost saving (Aurora PG cheaper than Oracle license)
Strategy 3: Refactoring
- Re-architect (e.g., monolithic SQL → DynamoDB + microservices)
- More work, more benefit
- Often involves rewriting app logic
6. DMS Best Practices
Pre-migration
- Test in non-prod first
- Assess schema complexity (SCT report)
- Optimize source DB (vacuum, analyze)
- Pre-create target schema (indexes after migration for speed)
During migration
- Disable triggers, indexes on target during full load (faster)
- Re-enable after full load complete
- Monitor CloudWatch metrics (CPU, memory, replication lag)
- Validate data with AWS DMS Data Validation
Cutover
- Stop writes on source briefly
- Verify CDC caught up (lag = 0)
- Switch apps to target endpoint
- Keep CDC running few hours for safety
7. DMS for Specific Targets
Target: S3
- Migrate database → S3 in CSV, Parquet, JSON
- Use case: data lake ingest from RDS
- Combine with Glue Crawler + Athena
Target: Redshift
- Migrate OLTP DB → Redshift data warehouse
- Use case: build analytics from operational data
Target: DynamoDB
- Migrate from relational to NoSQL
- Use case: re-architecting for scale
Target: Kinesis / OpenSearch
- Stream changes via CDC to data pipeline
- Real-time analytics
8. Other Migration Tools
AWS Application Migration Service (MGN)
- Lift-and-shift servers (OS + DB if on EC2)
- Continuous replication of source servers
- For migrating entire VMs/servers
AWS DataSync
- Migrate files (NFS, SMB, S3, EFS, FSx)
- Different from DMS (DMS = databases, DataSync = files)
AWS Snow Family
- Snowball Edge, Snowmobile
- Physical device for large data migration (TB-PB)
- Use case: limited bandwidth, large dataset
AWS Transfer Family
- Managed SFTP, FTPS, FTP for file transfer
- Use case: existing SFTP-based workflows
9. Migration Decision Matrix
10. Cost Considerations
DMS pricing
- Replication instance: per hour ($)
- Storage (logs, cache): $/GB
- Data transfer: standard rates (inbound to AWS free)
Cost optimization
- Right-size replication instance (start small, scale up if needed)
- Use Multi-AZ only for production cutover (not pre-test)
- Delete tasks/instances after migration complete
SCT
- Free (download tool)
11. Common Migration Scenarios
Scenario 1: Oracle on-prem → Aurora PostgreSQL
- Why: Reduce Oracle license cost
- How: SCT convert schema, DMS migrate data with CDC
- Effort: 1-3 months (depends on schema complexity)
Scenario 2: MySQL on-prem → RDS MySQL
- Why: Managed service, HA
- How: DMS homogeneous, optional Aurora upgrade later
- Effort: 1-4 weeks
Scenario 3: SQL Server on-prem → Aurora PostgreSQL
- Why: Cost optimization
- How: SCT + DMS, app changes for SQL dialect
- Effort: 3-6 months
Scenario 4: On-prem files (100 TB) → S3
- Why: Cloud storage migration
- How: Snowball Edge devices (~10 TB each)
- Effort: 2-4 weeks (shipping included)
Câu hỏi ôn tập
-
Khi nào cần SCT? Khi nào không?
Xem đáp án
AWS Schema Conversion Tool (SCT) cần khi migrate giữa engines khác nhau (heterogeneous): Oracle → Aurora PostgreSQL, SQL Server → MySQL. SCT convert schema, stored procedures, functions từ source dialect sang target dialect. Không cần SCT khi migrate cùng engine (homogeneous): MySQL → RDS MySQL, Oracle → RDS Oracle — schema tương thích trực tiếp, chỉ cần DMS để migrate data.
-
DMS hỗ trợ CDC để làm gì?
Xem đáp án
CDC (Change Data Capture) cho phép DMS capture real-time changes từ source database (inserts, updates, deletes) và apply vào target — không chỉ full load một lần. Dùng cho: (1) Zero-downtime migration — full load trước, rồi CDC sync ongoing changes, cutover khi lag gần 0, (2) Ongoing replication — keep target DB synced với source. CDC đọc từ transaction logs (binlog cho MySQL, redo log cho Oracle).
-
Khác biệt giữa DMS và DataSync?
Xem đáp án
DMS (Database Migration Service): migrate database data — tables, rows, schema. Hỗ trợ SQL databases, NoSQL, data warehouses. Dùng cho database-to-database migration. DataSync: di chuyển file/object data — từ NFS, SMB, HDFS, S3, EFS, FSx. Tối ưu cho large file transfers với validation và scheduling. DataSync không hiểu database schema — nó transfer files như là files.
-
Khi migrate 200 TB data với bandwidth 100 Mbps, nên dùng gì?
Xem đáp án
AWS Snowball Edge (hoặc Snowball Family). 200 TB ÷ 100 Mbps = ~18,000 giây ÷ 3600 × 8 = khoảng 185 ngày qua internet — không thực tế. Snowball Edge thiết bị physical lưu đến 80 TB data, ship đến AWS datacenter để import vào S3 trong vài ngày. Quy tắc ngón tay cái: nếu transfer > 1 tuần qua internet → dùng Snowball. Snowmobile cho > 10 PB.
-
Replication instance trong DMS chạy ở đâu (managed or self-managed)?
Xem đáp án
AWS managed EC2 instance trong VPC của bạn — AWS quản lý infrastructure, bạn chỉ chọn instance class và cấu hình. DMS tự động patch, monitor, và restart nếu fail. Replication instance cần kết nối được đến cả source và target endpoints. Đặt replication instance cùng Region với target (hoặc giữa source và target) để giảm latency và data transfer cost.
Bài tập thực hành
- Tạo source MySQL trên EC2, target RDS MySQL
- Setup DMS replication instance + endpoints
- Tạo Full Load + CDC task, observe migration
- Download SCT, connect to demo Oracle, convert to PostgreSQL schema
- So sánh cost: DataSync vs Snowball cho 50 TB
Tài liệu tham khảo chính thức
- AWS DMS Documentation
- AWS SCT Documentation
- DMS Best Practices
- DataSync
- Application Migration Service
Tiếp theo: Quiz Tuần 5