Considerations during data transfer between Salesforce and AWS using Amazon Appflow:

This is the second blog of the three-part series on considerations and observations during data transfer between Salesforce (Software-as-a-service) CRM and AWS using Amazon Appflow. In Part I, we discussed an overview of Amazon AppFlow for data transfer from external CRM applications. We also covered sample flow configuration and cost considerations during configuring and executing the flows. This blog will cover the following considerations for Salesforce as a source: Large data migration from Salesforce to AWS (considered S3 as destination here): There are various limitations on the size of data transfer in a single flow run using Amazon AppFlow. For example, one can transfer a maximum of 1GB of data per single flow run for Marketo as the source. Similarly, it is applicable for Salesforce as the source with 15GB of data per flow run. Considering 2KB or 4KB for a single record in Salesforce, it would come to approximately 7.5 million Salesforce records. This is fine for daily incremental or scheduled flow runs. For the initial full load run, it will work for small to medium tables that fall below these limitations, and data transfer would work in a single flow run. For large tables with millions of records and data sizes more than 15GB for a single table, such as 30-50-100 GB, it is necessary to split the data transfer between separate flows configured to not exceed data transfer from a single flow more than 15GB. It is better to filter data based on some date column filter to split the data by month, quarter, or years and load it accordingly. This will also help to archive old data, if required. Let’s consider three tables here for the flow design considerations: Table A – 12 GB data size Suppose this table has data from the last 10 years. As this data size is within the limit of 15 GB, this table can be loaded in a single flow run in Amazon AppFlow. Table B – 40 GB data size Suppose this table has data from the last 5 years. It can be split based on the modified date column year-wise and verified the record count for each year to ensure it does not exceed the 15 GB limit. Accordingly, five separate flows could be created to load the data into AWS. Table C – 82 GB data size Suppose this table has data from the last 4 years. Splitting this table based on years would not work as each year may have data exceeding the 15 GB limit. In this case, it may need to be split based on quarterly or half-yearly data. In all these scenarios, one needs to query the data first and extract the record count and approximate data size based on yearly, half-yearly, or quarterly durations. Accordingly, the flows need to be designed. Salesforce API preference: Salesforce API preference settings allow you to specify which Salesforce APIs Amazon AppFlow can use during data transfer from Salesforce to AWS. There are three options available: Simple – Uses Salesforce REST API and is optimized for small to medium-sized data transfers. Bulk – Uses Salesforce Bulk API 2.0, which runs asynchronous data transfers and is optimized for large data transfers. Automatic – AppFlow decides which API to use based on the number of records the flow transfers. In this case, with Salesforce as the source, it is decided as follows: Salesforce REST API – For less than 1,000,000 Salesforce records. Salesforce Bulk API 2.0 – For more than 1,000,000 Salesforce records. It works with Automatic as the API preference in most cases, using the REST API for small and medium datasets and the Bulk API for large datasets. However, with the Bulk API for large datasets, there is one limitation as follows: Flow can't transfer Salesforce compound fields as Bulk API 2.0 doesn't support them. Suppose there are fields such as First name, Last name, and Full name in the Salesforce object (Table), and Full name is configured as a compound field in Salesforce as: Full name = First name + Last name In this scenario, the First name and Last name fields would be transferred to AWS, but the Full name would not be. Therefore, one would need to understand the logic for any compound fields in the dataset and derive them again at the destination during the transformations. Similarly, there could be another compound field such as Complete address derived from Address1, Address2, City, State, Pin code, Country, etc. OAuth grant type for Amazon AppFlow to communicate with Salesforce: During Salesforce connection setup, one needs to choose an OAuth grant type. This choice determines how Amazon AppFlow communicates with Salesforce to access the data. There are two options available: Authorization code With this option, the Amazon AppFlow console shows a window that prompts for authorization to the Salesforce account. Once signed in with the credentials, you need to choose "Allow" to permit Amazon AppFlow to access Salesforce data. Accordingly, it creates the AWS-managed connected app in your Salesforce account. With this option, there is no

Apr 22, 2025 - 07:49

Considerations during data transfer between Salesforce and AWS using Amazon Appflow:

This is the second blog of the three-part series on considerations and observations during data transfer between Salesforce (Software-as-a-service) CRM and AWS using Amazon Appflow.

In Part I, we discussed an overview of Amazon AppFlow for data transfer from external CRM applications. We also covered sample flow configuration and cost considerations during configuring and executing the flows.

This blog will cover the following considerations for Salesforce as a source:

Large data migration from Salesforce to AWS (considered S3 as destination here):

There are various limitations on the size of data transfer in a single flow run using Amazon AppFlow. For example, one can transfer a maximum of 1GB of data per single flow run for Marketo as the source. Similarly, it is applicable for Salesforce as the source with 15GB of data per flow run.

Considering 2KB or 4KB for a single record in Salesforce, it would come to approximately 7.5 million Salesforce records. This is fine for daily incremental or scheduled flow runs.

For the initial full load run, it will work for small to medium tables that fall below these limitations, and data transfer would work in a single flow run.

For large tables with millions of records and data sizes more than 15GB for a single table, such as 30-50-100 GB, it is necessary to split the data transfer between separate flows configured to not exceed data transfer from a single flow more than 15GB.

It is better to filter data based on some date column filter to split the data by month, quarter, or years and load it accordingly. This will also help to archive old data, if required.

Let’s consider three tables here for the flow design considerations:

Table A – 12 GB data size
Suppose this table has data from the last 10 years. As this data size is within the limit of 15 GB, this table can be loaded in a single flow run in Amazon AppFlow.
Table B – 40 GB data size
Suppose this table has data from the last 5 years. It can be split based on the modified date column year-wise and verified the record count for each year to ensure it does not exceed the 15 GB limit. Accordingly, five separate flows could be created to load the data into AWS.
Table C – 82 GB data size
Suppose this table has data from the last 4 years. Splitting this table based on years would not work as each year may have data exceeding the 15 GB limit. In this case, it may need to be split based on quarterly or half-yearly data.

In all these scenarios, one needs to query the data first and extract the record count and approximate data size based on yearly, half-yearly, or quarterly durations. Accordingly, the flows need to be designed.

Salesforce API preference:

Salesforce API preference settings allow you to specify which Salesforce APIs Amazon AppFlow can use during data transfer from Salesforce to AWS. There are three options available:

Simple – Uses Salesforce REST API and is optimized for small to medium-sized data transfers.
Bulk – Uses Salesforce Bulk API 2.0, which runs asynchronous data transfers and is optimized for large data transfers.
Automatic – AppFlow decides which API to use based on the number of records the flow transfers.

In this case, with Salesforce as the source, it is decided as follows:

Salesforce REST API – For less than 1,000,000 Salesforce records.
Salesforce Bulk API 2.0 – For more than 1,000,000 Salesforce records.

It works with Automatic as the API preference in most cases, using the REST API for small and medium datasets and the Bulk API for large datasets. However, with the Bulk API for large datasets, there is one limitation as follows:

Flow can't transfer Salesforce compound fields as Bulk API 2.0 doesn't support them.

Suppose there are fields such as First name, Last name, and Full name in the Salesforce object (Table), and Full name is configured as a compound field in Salesforce as:

Full name = First name + Last name

In this scenario, the First name and Last name fields would be transferred to AWS, but the Full name would not be. Therefore, one would need to understand the logic for any compound fields in the dataset and derive them again at the destination during the transformations.

Similarly, there could be another compound field such as Complete address derived from Address1, Address2, City, State, Pin code, Country, etc.

OAuth grant type for Amazon AppFlow to communicate with Salesforce:

During Salesforce connection setup, one needs to choose an OAuth grant type. This choice determines how Amazon AppFlow communicates with Salesforce to access the data. There are two options available:

Authorization code
With this option, the Amazon AppFlow console shows a window that prompts for authorization to the Salesforce account. Once signed in with the credentials, you need to choose "Allow" to permit Amazon AppFlow to access Salesforce data. Accordingly, it creates the AWS-managed connected app in your Salesforce account. With this option, there is no additional setup required.

JSON Web Token (JWT)
With this option, you need to provide a JWT to access the data from Salesforce. The JWT is passed along with the connection, and Salesforce provides access.

You need to create the JWT before it can be used for accessing Salesforce data.

The Authorization code option may be available for lower environments such as Dev, Sandbox, etc., but you may need to access data using JWT for higher environments such as Prod, UAT, etc.

Additionally, there could be an organizational mandate to restrict Authorization code access and use only JWT-based access.

A detailed process to set up JWT and access data using the same will be discussed in the next blog separately.

Conclusion:

This blog provides considerations, limitations, and their workarounds for data transfer between Salesforce and AWS using Amazon AppFlow.