Skip to main content

MySQL to Redshift

NineData data replication supports both full data synchronization and incremental data synchronization from MySQL to Redshift data sources.

Prerequisites

  • Source and target data sources have been added to NineData. For instructions on how to add data sources, please refer to Adding Data Sources.

  • The source database type is MySQL or a MySQL-like database, such as MySQL, MariaDB, PolarDB MySQL, TDSQL-C, GaussDB MySQL, Aurora, etc.

  • The target database is a Redshift provisioned cluster or Redshift Serverless.

  • In the case of incremental replication, the source data source must have Binlog enabled, and the Binlog-related parameters are set as follows:

    • binlog_format=ROW
    • binlog_row_image=FULL
    tip

    If the source data source is a standby server, to ensure complete Binlog logs are obtained, you also need to enable the log_slave_updates parameter.

Usage Restrictions

  • Incremental replication process does not currently support synchronizing DDL operations on the source side.
  • Structural replication is not currently supported; you need to manually create tables on the target side that match the objects to be synchronized from the source.
  • The data replication functionality is only for user databases in the data source; system databases are not replicated. For example, in MySQL-type data sources, databases like information_schema, mysql, performance_schema, and sys will not be replicated.
  • The account of the source data source must have SELECT, REPLICATION CLIENT, and REPLICATION SLAVE permissions for the objects to be replicated (incremental replication), and the account of the target data source must have DML permissions.
  • Evaluate the performance of both the source and target data sources before executing data synchronization, and it is recommended to perform data synchronization during off-peak hours for the business. Otherwise, during full data initialization, there will be a certain consumption of reading and writing resources on both the source and target data sources, leading to an increase in database load.
  • It is recommended to ensure that each table in the synchronized objects has a primary key or unique constraint, and column names are unique. Otherwise, the same data may be synchronously replicated repeatedly.

Operation Steps

  1. Log in to the NineData console.

  2. Click on > in the left navigation bar.

  3. On the page, click on in the top right corner.

  4. On the tab, configure according to the table below and click .

    Parameter
    Description
    Enter a name for the data synchronization task for easy later retrieval and management. Please use meaningful names. Up to 64 characters are supported.
    The data source where the synchronization object is located.
    The data source that receives the synchronized object.
    Select the content to be copied to the target data source.
    • : Synchronize all objects and data from the source data source, i.e., full data replication.
    • : After full synchronization, perform incremental synchronization based on the source data source's logs. Click on the setting icon to unselect certain operation types as needed. These operations will be ignored during incremental synchronization after unselecting them.
    IAM RoleAn IAM Role with permissions to execute commands in Redshift. During the replication process, NineData first copies MySQL data to NineData's Amazon S3 service, and then from S3 to your Redshift. This Role is used to authorize NineData to execute copy commands in your Redshift to copy data from S3 to Redshift. For information on obtaining the IAM Role, please refer to the section on this page.
    The should only be selected when is chosen. Options are:
    • : Perform incremental replication based on the start time of the current replication task.
    • : Choose the starting point for incremental replication. You can select the time based on your business location's timezone.
    • : Stop the task when data is detected in the target table during the pre-check stage.
    • : Ignore the data detected in the target table during the pre-check stage and append other data.
    • : Delete the data detected in the target table during the pre-check stage and rewrite it.
  5. On the tab, configure the following parameters and then click .

    Parameter
    Description
    Choose what to copy. You can select to copy all contents of the source library, or you can choose . In the list, select the content to be copied, click > to add it to the list on the right.
    (optional)Click to add a blacklist record. Select the library or object to be added to the blacklist; these contents will not be copied. This is used to exclude certain libraries or objects in full library replication with or .
    • Left dropdown: Select the library name to be added to the blacklist.
    • Right dropdown: Select the corresponding objects in the library. You can click on multiple objects for multiple selections. Leave it blank to add the entire database to the blacklist.
    If you want to add multiple libraries to the blacklist, you can click the Add button below to add a row.
  6. On the tab, the system matches the target data source's schema with the same name by default. If it doesn't exist, you need to manually select the target schema. By default, table names and column names in the target library need to match the synchronized objects. If they don't match, you can manually map the table names and column names. After configuration, click Save and Pre-check.

  7. On the tab, wait for the system to complete the pre-check. After the pre-check passes, click .

    tip
    • If the pre-check fails, you need to click on the right of the target check item in the column to investigate the cause of the failure and manually fix it. Click to re-run the pre-check until it passes.
    • Check items with in can be repaired or ignored depending on the specific situation.
  8. On the page, when is displayed, the synchronization task starts running. At this point, you can:

    • Click to view the execution status of each stage of the synchronization task.
    • Click to return to the task list page.

Viewing Sync Results

  1. Log in to the NineData console.

  2. Click on > in the left navigation bar.

  3. On the page, click on the of the target sync task. The page is explained as follows.

    redshift_result

    Serial Number
    Function
    Description
    1Sync DelayThe data synchronization delay between the source data source and the target data source. 0 seconds indicates no delay between the two ends, allowing you to switch the business to the target data source for smooth migration.
    2Configure AlarmAfter configuring the alarm, the system will notify you in the way you choose when the task fails. For more information, see Operations and Monitoring Introduction.
    3More
    • Pause: Pause the task. Only tasks with the status Running can be selected.
    • Terminate: End the task that is unfinished or in listening mode (i.e., in incremental sync). After terminating the task, it cannot be restarted, so please proceed with caution. If the synchronized objects contain triggers, a trigger copy option will pop up, choose as needed.
    • Delete: Delete the task. Once the task is deleted, it cannot be recovered, so please proceed with caution.
    4Full Copy (displayed in scenarios involving full copy)Display the progress and details of the full copy.
    • Click Monitoring on the right side of the page: View various monitoring indicators during the full copy process. During the full copy process, you can also click Flow Control Settings on the right side of the monitoring indicator page to limit the rate of writing to the target data source in rows per second.
    • Click Logs on the right side of the page: View the execution logs of the full copy.
    • Click refresh on the right side of the page: View the latest information.
    5Incremental Copy (displayed in scenarios involving incremental copy)Display various monitoring indicators for incremental copy.
    • Click on the right side of the page: View the operations currently being performed in the current copy task, including:
      • : The copy task is performed by multiple threads, showing the thread number currently in progress.
      • : Details of the SQL statement currently being executed by the thread.
      • : The response time of the current thread. If this value increases, the current thread may be stuck for some reason.
      • : The timestamp when the current thread was started.
      • : The status of the current thread.
    • Click Flow Control Settings on the right side of the page: Limit the rate of writing to the target data source in rows per second.
    • Click Logs on the right side of the page: View the execution logs of the incremental copy.
    • Click refresh on the right side of the page: View the latest information.
    6Modify ObjectDisplay the modification records of synchronized objects.
    • Click Modify Synchronized Object on the right of the page to configure synchronized objects.
    • Click refresh on the right side of the page: View the latest information.
    7ExpandDisplay detailed information about the current copy task, including Copy Type, Copy Object, Start Time, etc.

Appendix 1: Mapping Table of MySQL and Redshift Data Types

During the data copying process, MySQL data types are mapped to corresponding Redshift data types.

Category MySQL Data Type Redshift Data Type
Numeric TINYINT SMALLINT
TINYINT UNSIGNED SMALLINT
SMALLINT SMALLINT
SMALLINT UNSIGNED INTEGER
MEDIUMINT INTEGER
MEDIUMINT UNSIGNED BIGINT
INT INTEGER
INT UNSIGNED BIGINT
BIGINT BIGINT
BIGINT UNSIGNED BIGINT
BIT(M) BOOL
Decimal Decimal
Numeric Numeric
Float REAL
Double REAL
BOOL, BOOLEAN BOOL
DATE AND TIME DATE DATE
DATETIME[(fsp)] timestamp
Timestamp[(fsp)] timestamptz
Time[(fsp)] time
YEAR[(4)] INT
STRING CHAR CHAR
VARCHAR VARCHAR
BINARY/VARBINARY VARBINARY
TINYTEXT/TEXT/MEDIUMTEXT/LONGTEXT VARCHAR
TINYBLOB/BLOB/MEDIUMBOLB/LONGBLOB VARBINARY
ENUM VARCHAR
SET VARCHAR
JSON JSON VARCHAR

Appendix 2: Pre-check Items Overview

Check Item Check Item Description Does Failure Prevent Task Continuation? Failure Reasons and Remediation Strategies
Source Data Source Connection Test Check the connectivity of the source data source. Yes
  • Failure Reason: The connection between NineData and the source database is not established.
  • Remediation Method:
    • Self-built Database: Check if the database is configured with a firewall and confirm if NineData's service IP is allowed.
    • ECS Self-built Database: Confirm the security group settings of the ECS instance where the data source is located, allowing access from NineData's IP. If not allowed, modify the security group to open NineData IP and perform the pre-check again.
    • Cloud Database: Confirm the security group | whitelist settings of the cloud database, allowing access from NineData's service IP. If not allowed, modify the security group | whitelist to open NineData IP and perform the pre-check again.
    • JDBC Connection Failure: Confirm if the database account and password are incorrect. If incorrect, adjust the data source configuration and perform the pre-check again.
Target Data Source Connection Test Check the connectivity of the target data source. Yes
  • Failure Reason: The connection between NineData and the target data source is not established.
  • Remediation Method:
    • Cloud Database: Confirm the security group | whitelist settings of the cloud database, allowing access from NineData's service IP. If not allowed, modify the security group | whitelist to open NineData IP and perform the pre-check again.
    • JDBC Connection Failure: Confirm if the database account and password are incorrect. If incorrect, adjust the data source configuration and perform the pre-check again.
Source Data Source Permission Check Check the account permissions of the source data source. Yes
  • Failure Reason: Insufficient permissions for the source database.
  • Remediation Method: Assign SELECT, REPLICATION CLIENT, and REPLICATION SLAVE permissions to the account of the source data source, and perform the pre-check again.
Target Data Source Permission Check Check the account permissions of the target data source. Yes
  • Failure Reason: Insufficient permissions for the target database.
  • Remediation Method: Assign CREATE/ALTER/DROP TABLE, SELECT, INSERT, UPDATE, DELETE, REFERENCES, CREATE, TEMPORARY, and USAGE permissions to the account of the target data source, and perform the pre-check again.
Check if the Source Database has Binlog Enabled Check if the source database has Binlog enabled. Yes
  • Failure Reason: Binlog is not enabled for your source database.
  • Remediation Method: Enable Binlog and perform the pre-check again.
Check if the binlog format of the source database supports row mode Check if the binlog format of the source database is row mode Yes
  • Failure Reason: The binlog format of your source database is not row mode.
  • Remediation Method: Please refer to the MySQL official documentation to modify it, and perform the pre-check again.
Check if binlog_row_image is full Check if binlog_row_image of the source database is full Yes
  • Failure Reason: The binlog_row_image of your source database is not full.
  • Remediation Method: Please refer to the MySQL official documentation to modify it, and perform the pre-check again.
Check for the Existence of Objects with the Same Name Check if there are objects with the same name as the ones to be copied in the target
  • Yes: When the same-name object handling strategy is set to "pre-check error and stop the task".
  • No: In other cases.
  • Failure Reason: There are objects with the same name in the target data source.
  • Remediation Method:
    • Modify the same-name object handling strategy and perform the pre-check again.
    • Delete the objects with the same name and perform the pre-check again.
Check for Data Existence Check if the target data source already contains data for the copied objects
  • Yes: When the same-name object handling strategy is "pre-check error and stop the task".
  • No: In other cases.
  • Failure Reason: There is duplicate data in the target data source.
  • Remediation Method:
    • Modify the handling strategy for existing data and perform the pre-check again.
    • Manually clear the existing data in the target data source and perform the pre-check again.

Data Replication Introduction