MySQL to Redshift
NineData data replication supports both full data synchronization and incremental data synchronization from MySQL to Redshift data sources.
Prerequisites
Source and target data sources have been added to NineData. For instructions on how to add data sources, please refer to Adding Data Sources.
The source database type is MySQL or a MySQL-like database, such as MySQL, MariaDB, PolarDB MySQL, TDSQL-C, GaussDB MySQL, Aurora, etc.
The target database is a Redshift provisioned cluster or Redshift Serverless.
In the case of incremental replication, the source data source must have Binlog enabled, and the Binlog-related parameters are set as follows:
binlog_format
=ROW
binlog_row_image
=FULL
tipIf the source data source is a standby server, to ensure complete Binlog logs are obtained, you also need to enable the
log_slave_updates
parameter.
Usage Restrictions
- Incremental replication process does not currently support synchronizing DDL operations on the source side.
- Structural replication is not currently supported; you need to manually create tables on the target side that match the objects to be synchronized from the source.
- The data replication functionality is only for user databases in the data source; system databases are not replicated. For example, in MySQL-type data sources, databases like
information_schema
,mysql
,performance_schema
, andsys
will not be replicated. - The account of the source data source must have
SELECT
,REPLICATION CLIENT
, andREPLICATION SLAVE
permissions for the objects to be replicated (incremental replication), and the account of the target data source must have DML permissions. - Evaluate the performance of both the source and target data sources before executing data synchronization, and it is recommended to perform data synchronization during off-peak hours for the business. Otherwise, during full data initialization, there will be a certain consumption of reading and writing resources on both the source and target data sources, leading to an increase in database load.
- It is recommended to ensure that each table in the synchronized objects has a primary key or unique constraint, and column names are unique. Otherwise, the same data may be synchronously replicated repeatedly.
Operation Steps
NineData’s data replication product has been commercialized. You can still use 10 replication tasks for free, with the following considerations:
Among the 10 replication tasks, you can include 1 task, with a specification of Micro.
Tasks with a status of do not count towards the 10-task limit. If you have already created 10 replication tasks and want to create more, you can terminate previous replication tasks and then create new ones.
When creating replication tasks, you can only select the you have purchased. Specifications that have not been purchased will be grayed out and cannot be selected. If you need to purchase additional specifications, please contact us through the customer service icon at the bottom right of the page.
Log in to the NineData console.
Click on > in the left navigation bar.
On the page, click on in the top right corner.
On the tab, configure according to the table below and click .
Parameter Description Enter a name for the data synchronization task for easy later retrieval and management. Please use meaningful names. Up to 64 characters are supported. The data source where the synchronization object is located. The data source that receives the synchronized object. Select the content to be copied to the target data source. - : Synchronize all objects and data from the source data source, i.e., full data replication.
- : After full synchronization, perform incremental synchronization based on the source data source's logs. Click on the icon to unselect certain operation types as needed. These operations will be ignored during incremental synchronization after unselecting them.
IAM Role An IAM Role with permissions to execute commands in Redshift. During the replication process, NineData first copies MySQL data to NineData's Amazon S3 service, and then from S3 to your Redshift. This Role is used to authorize NineData to execute copy commands in your Redshift to copy data from S3 to Redshift. For information on obtaining the IAM Role, please refer to the section on this page. The should only be selected when is chosen. Options are: - : Perform incremental replication based on the start time of the current replication task.
- : Choose the starting point for incremental replication. You can select the time based on your business location's timezone.
- : Stop the task when data is detected in the target table during the pre-check stage.
- : Ignore the data detected in the target table during the pre-check stage and append other data.
- : Delete the data detected in the target table during the pre-check stage and rewrite it.
On the tab, configure the following parameters and then click .
Parameter Description Choose what to copy. You can select to copy all contents of the source library, or you can choose . In the list, select the content to be copied, click > to add it to the list on the right. (optional) Click to add a blacklist record. Select the library or object to be added to the blacklist; these contents will not be copied. This is used to exclude certain libraries or objects in full library replication with or . - Left dropdown: Select the library name to be added to the blacklist.
- Right dropdown: Select the corresponding objects in the library. You can click on multiple objects for multiple selections. Leave it blank to add the entire database to the blacklist.
On the tab, the system matches the target data source's schema with the same name by default. If it doesn't exist, you need to manually select the target schema. By default, table names and column names in the target library need to match the synchronized objects. If they don't match, you can manually map the table names and column names. After configuration, click Save and Pre-check.
On the tab, wait for the system to complete the pre-check. After the pre-check passes, click .
tip- If the pre-check fails, you need to click on the right of the target check item in the column to investigate the cause of the failure and manually fix it. Click to re-run the pre-check until it passes.
- Check items with in can be repaired or ignored depending on the specific situation.
On the page, when is displayed, the synchronization task starts running. At this point, you can:
- Click to view the execution status of each stage of the synchronization task.
- Click to return to the task list page.
Viewing Sync Results
Log in to the NineData console.
Click on > in the left navigation bar.
On the page, click on the of the target sync task. The page is explained as follows.
Serial Number Function Description 1 Sync Delay The data synchronization delay between the source data source and the target data source. 0 seconds indicates no delay between the two ends, allowing you to switch the business to the target data source for smooth migration. 2 Configure Alarm After configuring the alarm, the system will notify you in the way you choose when the task fails. For more information, see Operations and Monitoring Introduction. 3 More - Pause: Pause the task. Only tasks with the status Running can be selected.
- Terminate: End the task that is unfinished or in listening mode (i.e., in incremental sync). After terminating the task, it cannot be restarted, so please proceed with caution. If the synchronized objects contain triggers, a trigger copy option will pop up, choose as needed.
- Delete: Delete the task. Once the task is deleted, it cannot be recovered, so please proceed with caution.
4 Full Copy (displayed in scenarios involving full copy) Display the progress and details of the full copy. - Click Monitoring on the right side of the page: View various monitoring indicators during the full copy process. During the full copy process, you can also click Flow Control Settings on the right side of the monitoring indicator page to limit the rate of writing to the target data source in rows per second.
- Click Logs on the right side of the page: View the execution logs of the full copy.
- Click on the right side of the page: View the latest information.
5 Incremental Copy (displayed in scenarios involving incremental copy) Display various monitoring indicators for incremental copy. - Click on the right side of the page: View the operations currently being performed in the current copy task, including:
- : The copy task is performed by multiple threads, showing the thread number currently in progress.
- : Details of the SQL statement currently being executed by the thread.
- : The response time of the current thread. If this value increases, the current thread may be stuck for some reason.
- : The timestamp when the current thread was started.
- : The status of the current thread.
- Click Flow Control Settings on the right side of the page: Limit the rate of writing to the target data source in rows per second.
- Click Logs on the right side of the page: View the execution logs of the incremental copy.
- Click on the right side of the page: View the latest information.
6 Modify Object Display the modification records of synchronized objects. - Click Modify Synchronized Object on the right of the page to configure synchronized objects.
- Click on the right side of the page: View the latest information.
7 Expand Display detailed information about the current copy task, including Copy Type, Copy Object, Start Time, etc.
Appendix 1: Mapping Table of MySQL and Redshift Data Types
During the data copying process, MySQL data types are mapped to corresponding Redshift data types.
Category | MySQL Data Type | Redshift Data Type |
---|---|---|
Numeric | TINYINT | SMALLINT |
TINYINT UNSIGNED | SMALLINT | |
SMALLINT | SMALLINT | |
SMALLINT UNSIGNED | INTEGER | |
MEDIUMINT | INTEGER | |
MEDIUMINT UNSIGNED | BIGINT | |
INT | INTEGER | |
INT UNSIGNED | BIGINT | |
BIGINT | BIGINT | |
BIGINT UNSIGNED | BIGINT | |
BIT(M) | BOOL | |
Decimal | Decimal | |
Numeric | Numeric | |
Float | REAL | |
Double | REAL | |
BOOL, BOOLEAN | BOOL | |
DATE AND TIME | DATE | DATE |
DATETIME[(fsp)] | timestamp | |
Timestamp[(fsp)] | timestamptz | |
Time[(fsp)] | time | |
YEAR[(4)] | INT | |
STRING | CHAR | CHAR |
VARCHAR | VARCHAR | |
BINARY/VARBINARY | VARBINARY | |
TINYTEXT/TEXT/MEDIUMTEXT/LONGTEXT | VARCHAR | |
TINYBLOB/BLOB/MEDIUMBOLB/LONGBLOB | VARBINARY | |
ENUM | VARCHAR | |
SET | VARCHAR | |
JSON | JSON | VARCHAR |
Appendix 2: Pre-check Items Overview
Check Item | Check Item Description | Does Failure Prevent Task Continuation? | Failure Reasons and Remediation Strategies |
---|---|---|---|
Source Data Source Connection Test | Check the connectivity of the source data source. | Yes |
|
Target Data Source Connection Test | Check the connectivity of the target data source. | Yes |
|
Source Data Source Permission Check | Check the account permissions of the source data source. | Yes |
|
Target Data Source Permission Check | Check the account permissions of the target data source. | Yes |
|
Check if the Source Database has Binlog Enabled | Check if the source database has Binlog enabled. | Yes |
|
Check if the binlog format of the source database supports row mode | Check if the binlog format of the source database is row mode | Yes |
|
Check if binlog_row_image is full | Check if binlog_row_image of the source database is full | Yes |
|
Check for the Existence of Objects with the Same Name | Check if there are objects with the same name as the ones to be copied in the target |
|
|
Check for Data Existence | Check if the target data source already contains data for the copied objects |
|
|