MySQL to Greenplum

NineData data replication supports full data synchronization and incremental data synchronization between MySQL and Greenplum data sources.

Background Information

Greenplum is an open-source parallel data warehouse solution designed for large-scale data storage and analysis. By replicating MySQL data to Greenplum, enterprises can leverage its massively parallel processing architecture and high-performance data analysis capabilities to better handle and analyze large-scale data.

Prerequisites

Both source and target data sources have been added to NineData. For instructions on how to add data sources, please refer to Adding Data Sources.
The source database type is MySQL or a MySQL-like database, such as MySQL, MariaDB, PolarDB MySQL, TDSQL-C, GaussDB MySQL, Aurora, etc.
The target database is Greenplum 6.0, 5.0.
In the case of incremental replication, the source data source must have Binlog enabled, and the Binlog-related parameters should be set as follows:
- binlog_format=ROW
- binlog_row_image=FULL
tip
If the source data source is a standby server, to ensure complete Binlog logs, the log_slave_updates parameter must also be enabled.

Usage Limitations

Incremental replication does not currently support synchronizing DDL operations from the source.
Structural replication is not currently supported; you need to manually create tables on the target side that match those on the source. For the MySQL-to-Greenplum data type mapping, please refer to Appendix 1.
Data replication only applies to user databases in the data source; system databases will not be replicated. For example, databases like information_schema, mysql, performance_schema, and sys in MySQL-type data sources will not be replicated.
The account in the source data source must have the following privileges:
- Full replication: SELECT permission on the replicated objects.
- Incremental replication: SELECT permission on the replicated objects, REPLICATION CLIENT, and REPLICATION SLAVE permissions. The account in the target data source must have DML permissions.
The account in the target data source must have the following permissions on the target objects: CREATE/ALTER/DROP TABLE, SELECT, INSERT, UPDATE, DELETE, REFERENCES, CREATE, TEMPORARY, USAGE.
Evaluate the performance of the source and target data sources before executing data synchronization, and it is recommended to perform data synchronization during business off-peak hours. Otherwise, full data initialization will occupy certain read and write resources of both source and target data sources, leading to an increase in database load.
It is recommended to ensure that each table in the synchronized objects has a primary key or unique constraint and that column names are unique; otherwise, duplicate synchronization of the same data may occur.

Operating Steps

Commercialization Notice

NineData’s data replication product has been commercialized. You can still use 10 replication tasks for free, with the following considerations:

Among the 10 replication tasks, you can include 1 task, with a specification of Micro.
Tasks with a status of do not count towards the 10-task limit. If you have already created 10 replication tasks and want to create more, you can terminate previous replication tasks and then create new ones.
When creating replication tasks, you can only select the you have purchased. Specifications that have not been purchased will be grayed out and cannot be selected. If you need to purchase additional specifications, please contact us through the customer service icon at the bottom right of the page.

Log in to the NineData Console.
Click on > in the left navigation bar.
On the page, click on .

On the tab, configure according to the table below and click .

Parameter	Description
	Enter the name of the data synchronization task. For easy retrieval and management, use a meaningful name. Maximum 64 characters.
	Data source where the synchronized objects are located.
	Data source that receives the synchronized objects.
	Choose the content to be copied to the target data source. : Synchronize all objects and data from the source data source, i.e., full data replication. : After full synchronization, perform incremental synchronization based on the log of the source data source.
	Required only when is . : Incremental replication starts based on the current replication task start time. : Choose the time point when incremental replication starts. You can select the time zone according to your business location.
	: Stop the task when data is detected in the target table during the pre-check phase. : Ignore the data detected in the target table during the pre-check phase and append other data. : Delete the data detected in the target table during the pre-check phase and rewrite it.

On the tab, configure the following parameters and then click .

Parameter	Description
	Choose the content to be copied. You can choose to copy all content from the source library, or choose . In the list, select the content to be copied, and click > to add it to the right list.

On the tab, the system automatically matches the same-name schema in the target data source. If it does not exist, you need to manually select the target schema. By default, the table name and column name in the target database must be consistent with the synchronized objects. If they are inconsistent, you can manually map the table name and column name. After configuration, click Save and Pre-check.
On the tab, wait for the system to complete the pre-check. After the pre-check passes, click .
tip
- If the pre-check fails, click in the column on the right of the target check item to investigate the failure reasons. After manually fixing them, click to perform the pre-check again until it passes.
- For check items with as , you can fix or ignore them based on the specific situation.
On the page, when prompted with , the synchronization task starts running. At this point, you can perform the following operations:
- Click to view the execution status of each stage of the synchronization task.
- Click to return to the task list page .

View Sync Results

Log in to the NineData Console.
Click on > in the left navigation bar.

On the page, click on the of the target synchronization task, and the page is explained as follows.

greenplum_result

Number	Feature	Description
1	Sync Delay	Data synchronization delay between the source data source and the target data source. A delay of 0 seconds means no delay between the two ends. At this time, you can choose to switch the business to the target data source for a smooth migration.
2	Configure Alarms	After configuring alarms, the system will notify you in the way you choose when the task fails. For more information, see Introduction to Operation and Maintenance Monitoring.
3	More	Pause: Pause the task, only tasks with the status Running can be paused. Terminate: End tasks that are not completed or in listening (i.e., incremental synchronization). After terminating the task, it cannot be restarted, so operate with caution. If the synchronized objects include triggers, a trigger replication option will pop up. Select as needed. Delete: Delete the task. Once the task is deleted, it cannot be recovered, so operate with caution.
4	Full Replication (Displayed in scenarios involving full replication)	Display the progress and details of full replication. Click Monitoring on the right side of the page: View various monitoring indicators during full replication. During the full replication process, you can also click Flow Control on the right side of the monitoring indicator page to limit the rate of writing to the target data source per second. The unit is rows/second. Click Logs on the right side of the page: View the execution log of full replication. Click : View the latest information.
5	Incremental Replication (Displayed in scenarios involving incremental replication)	Display various monitoring indicators for incremental replication. Click on the right side of the page: View operations currently being executed by the current replication task, including: : The replication task is executed in multiple threads, showing the current thread number in progress. : Details of the SQL statement currently being executed by the current thread. : The response time of the current thread. If this value increases, it means that the current thread may be stuck for some reason. : Timestamp when the current thread started. : The current status of the thread. Click Flow Control on the right side of the page: Limit the rate of writing to the target data source per second. The unit is rows/second. Click Logs on the right side of the page: View the execution log of incremental replication. Click : View the latest information.
6	Modify Object	Display the modification records of synchronized objects. Click Modify Synchronized Objects on the right side of the page to configure synchronized objects. Click : View the latest information.
7	Data Comparison	Display the results of data comparison between the source data source and the target data source. If you have not enabled data comparison, click Enable Data Comparison on the page. Click Recompare on the right side of the page: Recompare the data between the current source and target ends. Click Logs on the right side of the page: View the execution log of consistency comparison. Click Monitoring (displayed only for data comparison): View the trend chart of RPS (records per second) for comparison. Click Details to view earlier records. Click in the Operation column on the right of the comparison list (displayed only in case of inconsistency): View details of the comparison between the source and target ends. Click in the Operation column on the right of the comparison list (displayed only in case of inconsistency): Generate change SQL. You can copy this SQL directly to the target data source to execute and modify the inconsistent content.
8	Expand	Display detailed information about the current replication task, including Replication Type, Replication Object, Start Time, etc.

Appendix 1: Mapping Table of MySQL and Greenplum Data Types

During the data replication process, MySQL data types are mapped to corresponding Greenplum data types.

Category	MySQL Data Type	Redshift Data Type
Numeric	TINYINT	SMALLINT
	TINYINT UNSIGNED	SMALLINT
	SMALLINT	SMALLINT
	SMALLINT UNSIGNED	INTEGER
	MEDIUMINT	INTEGER
	MEDIUMINT UNSIGNED	INTEGER
	INT	INTEGER
	INT UNSIGNED	BIGINT
	BIGINT	BIGINT
	BIGINT UNSIGNED	BIGINT
	BIT(M)	BIT(N)
	DECIMAL()	DECIMAL
	NUMERIC	DECIMAL
	FLOAT	FLOAT
	DOUBLE	FLOAT8
	BOOL，BOOLEAN	SMALLINT
DATE AND TIME	DATE	DATE
	DATETIME[(fsp)]	TIMESTAMP[(p)]
	TIMESTAMP[(fsp)]	TIMESTAMP[(p)] with time zone
	TIME[(fsp)]	TEXT
	YEAR[(4)]	INTEGER
STRING	CHAR	CHAR[(n)]
	VARCHAR	VARCHAR
	BINARY/VARBINARY	BYTEA
	TINYTEXT/TEXT/MEDIUMTEXT/LONGTEXT	TEXT
	TINYBLOB/BLOB/MEDIUMBOLB/LONGBLOB	BYTEA
	ENUM	VARCHAR
	SET	VARCHAR
JSON	JSON	JSON

Appendix 2: Overview of Pre-check Items

Check Item	Check Item Description	Does Failure Halt the Task?	Failure Reasons and Remediation Strategies
Source Data Source Connection Test	Check the connectivity of the source data source	Yes	Failure Reason: No connection between NineData and the source database. Remediation: Self-built database: Check if the database is configured with a firewall and confirm whether NineData's service IP is allowed. ECS self-built database: Confirm the security group settings of the ECS instance where the data source is located to allow access to NineData's service IP. If not allowed, modify the security group to open NineData IP and re-run the pre-check. Cloud database: Confirm the security group\|whitelist settings of the cloud database, allowing access to NineData's service IP. If not allowed, modify the security group\|whitelist to open NineData IP and re-run the pre-check. JDBC connection failure: Confirm if the database account and password are incorrect. If incorrect, adjust the data source configuration and re-run the pre-check.
Target Data Source Connection Test	Check the connectivity of the target data source	Yes	Failure Reason: No connection between NineData and the target data source. Remediation: Cloud database: Confirm the security group\|whitelist settings of the cloud database, allowing access to NineData's service IP. If not allowed, modify the security group\|whitelist to open NineData IP and re-run the pre-check. JDBC connection failure: Confirm if the database account and password are incorrect. If incorrect, adjust the data source configuration and re-run the pre-check.
Source Data Source Permission Check	Check the account permissions of the source data source	Yes	Failure Reason: Insufficient permissions for the source database. Remediation: Grant SELECT, REPLICATION CLIENT, and REPLICATION SLAVE permissions to the account of the source data source, and re-run the pre-check.
Target Data Source Permission Check	Check the account permissions of the target data source	Yes	Failure Reason: Insufficient permissions for the target database. Remediation: Grant CREATE/ALTER/DROP TABLE, SELECT, INSERT, UPDATE, DELETE, REFERENCES, CREATE, TEMPORARY, and USAGE permissions to the account of the target data source, and re-run the pre-check.
Source Database log_slave_updates Support Check	Check if the source database, in the case of being a standby, has enabled log_slave_updates	Yes	Failure Reason: Your source database is a standby, and log_slave_updates is not enabled. Remediation: Enable log_slave_updates and re-run the pre-check.
Source Database Binlog Activation Check	Check if the source database has enabled Binlog	Yes	Failure Reason: Your source database has not enabled Binlog. Remediation: Enable Binlog and re-run the pre-check.
Source Database binlog format Support Check (row mode)	Check if the binlog format of the source database is in row mode	Yes	Failure Reason: The binlog format of your source database is not in row mode. Remediation: Refer to the [MySQL official documentation](https://dev.mysql.com/doc/refman/8.0/en/grant.html) and modify accordingly, then re-run the pre-check.
binlog_row_image Check (full)	Check if binlog_row_image of the source database is set to full	Yes	Failure Reason: binlog_row_image of your source database is not set to full. Remediation: Refer to the [MySQL official documentation](https://dev.mysql.com/doc/refman/5.7/en/replication-options-binary-log.html) and modify accordingly, then re-run the pre-check.
Same Name Object Existence Check	Check if there are objects with the same name as the objects to be replicated in the target	Yes: When the same name object processing strategy is set to "Pre-check error and stop the task". No: In other cases.	Failure Reason: There are objects with the same name in the target data source. Remediation: Modify the same name object processing strategy and re-run the pre-check. Delete the same name object and re-run the pre-check.
Data Existence Check	Check if the replication objects in the target data source already have data	Yes: When the same name object processing strategy is set to "Pre-check error and stop the task". No: In other cases.	Failure Reason: There is identical data in the target data source. Remediation: Modify the incremental data processing strategy and re-run the pre-check. Manually clear the existing data in the target data source and re-run the pre-check.

Introduction to Data Replication

MySQL to Greenplum

Background Information​

Prerequisites​

Usage Limitations​

Operating Steps​

View Sync Results​

Appendix 1: Mapping Table of MySQL and Greenplum Data Types​

Appendix 2: Overview of Pre-check Items​

Related Documents​