AWS Data Migration Considerations - Part IV

Relational databases migration to AWS environment into S3 data lake using AWS DMS Part IV: This is the fourth blog of the multi-part series on considerations and observations during relational databases migration to AWS S3 data lake using AWS DMS service. Relational databases such as Oracle, MS SQL Server, MySQL, PostgreSQL, etc., were migrated to AWS. Part I in this series covered considerations such as: Modified date not populated properly at the source Enabling supplemental logging for Oracle as the source Network bandwidth issues between source and target databases This blog will cover on below other considerations covered to fix the network bandwidth issue (#3) above: It was already discussed in first blog about below CloudWatch metrics verified to identify the delay at Oracle source or S3 target: CDCLatencySource CDCLatencyTarget CDCIncomingChanges NetworkReceiveThroughput It was clear with these metrics to conclude for the issue of network bandwidth between source (Oracle) and target (AWS S3) due to CDCLatencySource metrics. There are some other points considered, and changes made in DMS tasks configuration to resolve the network bandwidth issue, it will be discussed here. Single or few ongoing changes (CDC) tasks for a given source: AWS DMS tasks for the Oracle tables were configured as follows: Separate full load task for each large table Separate cdc task for each large table One full load plus cdc task for ach schema with small and medium tables grouped. Full load tasks captures source database backup and load it into target databases, but for the cdc tasks, it reads the changes from redo and archived logs (Oracle as source considered here) for the committed transactions and network bandwidth is consumed to fetch these logs into replication instance of the DMS for further processing. Hence it was observed that network bandwidth issue was evident at midnight during end of day and at month end processing, when larger archived logs files were generated around 50-60GB per hour against 10-15GB per hour during other time. Also, Oracle redo, archived logs gets shipped into the replication instance of the DMS would get multiplied by number of CDC tasks configured in DMS as follows: E.g. with 5 cdc tasks for the Oracle tables, for 50 GB of the archived log generated, 50*5 = 250GB of the total logs would be shipped into replication instance for processing the changes. Hence considering the network bandwidth issue, it was decided to have single or few cdc tasks configured to avoid multiplied archived logs in GBs required to ship into replication instance. So, CDC tasks were configured as: Single or very few cdc tasks for all the tables across schemas With single cdc tasks, 50GBs of archived log would be shipped against same changes with 5 cdc tasks to ship 50*5 = 250GBs of the archived logs Changes in memory configuration for DMS cdc tasks: MemoryKeepTime- It Sets the maximum time in seconds that each transaction can stay in memory before being written to disk with default value as 60. MemoryLimitTotal- It Sets the maximum size (in MB) that all transactions can occupy in memory before being written to disk with default value as 1024 Increasing both the values would help to manage memory efficiently for the cdc tasks by keeping transactions in memory longer, hence these properties were changed as: MemoryKeepTime - 600 MemoryLimitTotal - 10240 Other points checked for optimizing performance issues for the tasks were: Source database configuration to check load on the source database Replication instance configuration to handle the incoming changes Binary reader over Logminer for Oracle as source for reading log files for better cdc performance Minimum required supplemental logging to avoid additional details captured in archived logs LOB support for the tables – With number of LOB columns in tables and depending on LOB option such as Limited, Full and Inline for migration can affect the data migration performance. Conclusion: In this blog, it was discussed mainly the considerations for performance optimization of the Oracle tables migration issue faced for cdc tasks.

Mar 9, 2025 - 16:44
 0
AWS Data Migration Considerations - Part IV

Relational databases migration to AWS environment into S3 data lake using AWS DMS Part IV:

This is the fourth blog of the multi-part series on considerations and observations during relational databases migration to AWS S3 data lake using AWS DMS service. Relational databases such as Oracle, MS SQL Server, MySQL, PostgreSQL, etc., were migrated to AWS.

Part I in this series covered considerations such as:

  • Modified date not populated properly at the source
  • Enabling supplemental logging for Oracle as the source
  • Network bandwidth issues between source and target databases

This blog will cover on below other considerations covered to fix the network bandwidth issue (#3) above:

It was already discussed in first blog about below CloudWatch metrics verified to identify the delay at Oracle source or S3 target:

  • CDCLatencySource
  • CDCLatencyTarget
  • CDCIncomingChanges
  • NetworkReceiveThroughput

It was clear with these metrics to conclude for the issue of network bandwidth between source (Oracle) and target (AWS S3) due to CDCLatencySource metrics.

There are some other points considered, and changes made in DMS tasks configuration to resolve the network bandwidth issue, it will be discussed here.

Single or few ongoing changes (CDC) tasks for a given source:

AWS DMS tasks for the Oracle tables were configured as follows:

  • Separate full load task for each large table
  • Separate cdc task for each large table
  • One full load plus cdc task for ach schema with small and medium tables grouped.

Full load tasks captures source database backup and load it into target databases, but for the cdc tasks, it reads the changes from redo and archived logs (Oracle as source considered here) for the committed transactions and network bandwidth is consumed to fetch these logs into replication instance of the DMS for further processing.

Hence it was observed that network bandwidth issue was evident at midnight during end of day and at month end processing, when larger archived logs files were generated around 50-60GB per hour against 10-15GB per hour during other time.

Also, Oracle redo, archived logs gets shipped into the replication instance of the DMS would get multiplied by number of CDC tasks configured in DMS as follows:

E.g. with 5 cdc tasks for the Oracle tables, for 50 GB of the archived log generated, 50*5 = 250GB of the total logs would be shipped into replication instance for processing the changes.

Hence considering the network bandwidth issue, it was decided to have single or few cdc tasks configured to avoid multiplied archived logs in GBs required to ship into replication instance. So, CDC tasks were configured as:

Single or very few cdc tasks for all the tables across schemas

With single cdc tasks, 50GBs of archived log would be shipped against same changes with 5 cdc tasks to ship 50*5 = 250GBs of the archived logs

Changes in memory configuration for DMS cdc tasks:

MemoryKeepTime- It Sets the maximum time in seconds that each transaction can stay in memory before being written to disk with default value as 60.

MemoryLimitTotal- It Sets the maximum size (in MB) that all transactions can occupy in memory before being written to disk with default value as 1024

Increasing both the values would help to manage memory efficiently for the cdc tasks by keeping transactions in memory longer, hence these properties were changed as:

MemoryKeepTime - 600
MemoryLimitTotal - 10240

Other points checked for optimizing performance issues for the tasks were:

  • Source database configuration to check load on the source database
  • Replication instance configuration to handle the incoming changes
  • Binary reader over Logminer for Oracle as source for reading log files for better cdc performance
  • Minimum required supplemental logging to avoid additional details captured in archived logs
  • LOB support for the tables – With number of LOB columns in tables and depending on LOB option such as Limited, Full and Inline for migration can affect the data migration performance.

Conclusion:
In this blog, it was discussed mainly the considerations for performance optimization of the Oracle tables migration issue faced for cdc tasks.