MySQL to Greenplum
NineData data replication supports full data synchronization and incremental data synchronization between MySQL and Greenplum data sources.
Background Information
Greenplum is an open-source parallel data warehouse solution designed for large-scale data storage and analysis. By replicating MySQL data to Greenplum, enterprises can leverage its massively parallel processing architecture and high-performance data analysis capabilities to better handle and analyze large-scale data.
Prerequisites
Both source and target data sources have been added to NineData. For instructions on how to add data sources, please refer to Adding Data Sources.
The source database type is MySQL or a MySQL-like database, such as MySQL, MariaDB, PolarDB MySQL, TDSQL-C, GaussDB MySQL, Aurora, etc.
The target database is Greenplum 6.0, 5.0.
In the case of incremental replication, the source data source must have Binlog enabled, and the Binlog-related parameters should be set as follows:
binlog_format
=ROW
binlog_row_image
=FULL
tipIf the source data source is a standby server, to ensure complete Binlog logs, the
log_slave_updates
parameter must also be enabled.
Usage Limitations
- Incremental replication does not currently support synchronizing DDL operations from the source.
- Structural replication is not currently supported; you need to manually create tables on the target side that match those on the source. For the MySQL-to-Greenplum data type mapping, please refer to Appendix 1.
- Data replication only applies to user databases in the data source; system databases will not be replicated. For example, databases like
information_schema
,mysql
,performance_schema
, andsys
in MySQL-type data sources will not be replicated. - The account in the source data source must have the following privileges:
- Full replication:
SELECT
permission on the replicated objects. - Incremental replication:
SELECT
permission on the replicated objects,REPLICATION CLIENT
, andREPLICATION SLAVE
permissions. The account in the target data source must have DML permissions.
- Full replication:
- The account in the target data source must have the following permissions on the target objects:
CREATE/ALTER/DROP TABLE
,SELECT
,INSERT
,UPDATE
,DELETE
,REFERENCES
,CREATE
,TEMPORARY
,USAGE
. - Evaluate the performance of the source and target data sources before executing data synchronization, and it is recommended to perform data synchronization during business off-peak hours. Otherwise, full data initialization will occupy certain read and write resources of both source and target data sources, leading to an increase in database load.
- It is recommended to ensure that each table in the synchronized objects has a primary key or unique constraint and that column names are unique; otherwise, duplicate synchronization of the same data may occur.
Operating Steps
NineData’s data replication product has been commercialized. You can still use 10 replication tasks for free, with the following considerations:
Among the 10 replication tasks, you can include 1 task, with a specification of Micro.
Tasks with a status of do not count towards the 10-task limit. If you have already created 10 replication tasks and want to create more, you can terminate previous replication tasks and then create new ones.
When creating replication tasks, you can only select the you have purchased. Specifications that have not been purchased will be grayed out and cannot be selected. If you need to purchase additional specifications, please contact us through the customer service icon at the bottom right of the page.
Log in to the NineData Console.
Click on > in the left navigation bar.
On the page, click on .
On the tab, configure according to the table below and click .
Parameter Description Enter the name of the data synchronization task. For easy retrieval and management, use a meaningful name. Maximum 64 characters. Data source where the synchronized objects are located. Data source that receives the synchronized objects. Choose the content to be copied to the target data source. - : Synchronize all objects and data from the source data source, i.e., full data replication.
- : After full synchronization, perform incremental synchronization based on the log of the source data source.
Required only when is . - : Incremental replication starts based on the current replication task start time.
- : Choose the time point when incremental replication starts. You can select the time zone according to your business location.
- : Stop the task when data is detected in the target table during the pre-check phase.
- : Ignore the data detected in the target table during the pre-check phase and append other data.
- : Delete the data detected in the target table during the pre-check phase and rewrite it.
On the tab, configure the following parameters and then click .
Parameter Description Choose the content to be copied. You can choose to copy all content from the source library, or choose . In the list, select the content to be copied, and click > to add it to the right list. On the tab, the system automatically matches the same-name schema in the target data source. If it does not exist, you need to manually select the target schema. By default, the table name and column name in the target database must be consistent with the synchronized objects. If they are inconsistent, you can manually map the table name and column name. After configuration, click Save and Pre-check.
On the tab, wait for the system to complete the pre-check. After the pre-check passes, click .
tip- If the pre-check fails, click in the column on the right of the target check item to investigate the failure reasons. After manually fixing them, click to perform the pre-check again until it passes.
- For check items with as , you can fix or ignore them based on the specific situation.
On the page, when prompted with , the synchronization task starts running. At this point, you can perform the following operations:
- Click to view the execution status of each stage of the synchronization task.
- Click to return to the task list page .
View Sync Results
Log in to the NineData Console.
Click on > in the left navigation bar.
On the page, click on the of the target synchronization task, and the page is explained as follows.
Number Feature Description 1 Sync Delay Data synchronization delay between the source data source and the target data source. A delay of 0 seconds means no delay between the two ends. At this time, you can choose to switch the business to the target data source for a smooth migration. 2 Configure Alarms After configuring alarms, the system will notify you in the way you choose when the task fails. For more information, see Introduction to Operation and Maintenance Monitoring. 3 More - Pause: Pause the task, only tasks with the status Running can be paused.
- Terminate: End tasks that are not completed or in listening (i.e., incremental synchronization). After terminating the task, it cannot be restarted, so operate with caution. If the synchronized objects include triggers, a trigger replication option will pop up. Select as needed.
- Delete: Delete the task. Once the task is deleted, it cannot be recovered, so operate with caution.
4 Full Replication (Displayed in scenarios involving full replication) Display the progress and details of full replication. - Click Monitoring on the right side of the page: View various monitoring indicators during full replication. During the full replication process, you can also click Flow Control on the right side of the monitoring indicator page to limit the rate of writing to the target data source per second. The unit is rows/second.
- Click Logs on the right side of the page: View the execution log of full replication.
- Click : View the latest information.
5 Incremental Replication (Displayed in scenarios involving incremental replication) Display various monitoring indicators for incremental replication. - Click on the right side of the page: View operations currently being executed by the current replication task, including:
- : The replication task is executed in multiple threads, showing the current thread number in progress.
- : Details of the SQL statement currently being executed by the current thread.
- : The response time of the current thread. If this value increases, it means that the current thread may be stuck for some reason.
- : Timestamp when the current thread started.
- : The current status of the thread.
- Click Flow Control on the right side of the page: Limit the rate of writing to the target data source per second. The unit is rows/second.
- Click Logs on the right side of the page: View the execution log of incremental replication.
- Click : View the latest information.
6 Modify Object Display the modification records of synchronized objects. - Click Modify Synchronized Objects on the right side of the page to configure synchronized objects.
- Click : View the latest information.
7 Data Comparison Display the results of data comparison between the source data source and the target data source. If you have not enabled data comparison, click Enable Data Comparison on the page. - Click Recompare on the right side of the page: Recompare the data between the current source and target ends.
- Click Logs on the right side of the page: View the execution log of consistency comparison.
- Click Monitoring (displayed only for data comparison): View the trend chart of RPS (records per second) for comparison. Click Details to view earlier records.
- Click in the Operation column on the right of the comparison list (displayed only in case of inconsistency): View details of the comparison between the source and target ends.
- Click in the Operation column on the right of the comparison list (displayed only in case of inconsistency): Generate change SQL. You can copy this SQL directly to the target data source to execute and modify the inconsistent content.
8 Expand Display detailed information about the current replication task, including Replication Type, Replication Object, Start Time, etc.
Appendix 1: Mapping Table of MySQL and Greenplum Data Types
During the data replication process, MySQL data types are mapped to corresponding Greenplum data types.
Category | MySQL Data Type | Redshift Data Type |
---|---|---|
Numeric | TINYINT | SMALLINT |
TINYINT UNSIGNED | SMALLINT | |
SMALLINT | SMALLINT | |
SMALLINT UNSIGNED | INTEGER | |
MEDIUMINT | INTEGER | |
MEDIUMINT UNSIGNED | INTEGER | |
INT | INTEGER | |
INT UNSIGNED | BIGINT | |
BIGINT | BIGINT | |
BIGINT UNSIGNED | BIGINT | |
BIT(M) | BIT(N) | |
DECIMAL() | DECIMAL | |
NUMERIC | DECIMAL | |
FLOAT | FLOAT | |
DOUBLE | FLOAT8 | |
BOOL,BOOLEAN | SMALLINT | |
DATE AND TIME | DATE | DATE |
DATETIME[(fsp)] | TIMESTAMP[(p)] | |
TIMESTAMP[(fsp)] | TIMESTAMP[(p)] with time zone | |
TIME[(fsp)] | TEXT | |
YEAR[(4)] | INTEGER | |
STRING | CHAR | CHAR[(n)] |
VARCHAR | VARCHAR | |
BINARY/VARBINARY | BYTEA | |
TINYTEXT/TEXT/MEDIUMTEXT/LONGTEXT | TEXT | |
TINYBLOB/BLOB/MEDIUMBOLB/LONGBLOB | BYTEA | |
ENUM | VARCHAR | |
SET | VARCHAR | |
JSON | JSON | JSON |
Appendix 2: Overview of Pre-check Items
Check Item | Check Item Description | Does Failure Halt the Task? | Failure Reasons and Remediation Strategies |
---|---|---|---|
Source Data Source Connection Test | Check the connectivity of the source data source | Yes |
|
Target Data Source Connection Test | Check the connectivity of the target data source | Yes |
|
Source Data Source Permission Check | Check the account permissions of the source data source | Yes |
|
Target Data Source Permission Check | Check the account permissions of the target data source | Yes |
|
Source Database log_slave_updates Support Check | Check if the source database, in the case of being a standby, has enabled log_slave_updates | Yes |
|
Source Database Binlog Activation Check | Check if the source database has enabled Binlog | Yes |
|
Source Database binlog format Support Check (row mode) | Check if the binlog format of the source database is in row mode | Yes |
|
binlog_row_image Check (full) | Check if binlog_row_image of the source database is set to full | Yes |
|
Same Name Object Existence Check | Check if there are objects with the same name as the objects to be replicated in the target |
|
|
Data Existence Check | Check if the replication objects in the target data source already have data |
|
|