Skip to main content

MySQL to Greenplum

NineData data replication supports full data synchronization and incremental data synchronization between MySQL and Greenplum data sources.

Background Information

Greenplum is an open-source parallel data warehouse solution designed for large-scale data storage and analysis. By replicating MySQL data to Greenplum, enterprises can leverage its massively parallel processing architecture and high-performance data analysis capabilities to better handle and analyze large-scale data.

Prerequisites

  • Both source and target data sources have been added to NineData. For instructions on how to add data sources, please refer to Adding Data Sources.

  • The source database type is MySQL or a MySQL-like database, such as MySQL, MariaDB, PolarDB MySQL, TDSQL-C, GaussDB MySQL, Aurora, etc.

  • The target database is Greenplum 6.0, 5.0.

  • In the case of incremental replication, the source data source must have Binlog enabled, and the Binlog-related parameters should be set as follows:

    • binlog_format=ROW
    • binlog_row_image=FULL
    tip

    If the source data source is a standby server, to ensure complete Binlog logs, the log_slave_updates parameter must also be enabled.

Usage Limitations

  • Incremental replication does not currently support synchronizing DDL operations from the source.
  • Structural replication is not currently supported; you need to manually create tables on the target side that match those on the source. For the MySQL-to-Greenplum data type mapping, please refer to Appendix 1.
  • Data replication only applies to user databases in the data source; system databases will not be replicated. For example, databases like information_schema, mysql, performance_schema, and sys in MySQL-type data sources will not be replicated.
  • The account in the source data source must have the following privileges:
    • Full replication: SELECT permission on the replicated objects.
    • Incremental replication: SELECT permission on the replicated objects, REPLICATION CLIENT, and REPLICATION SLAVE permissions. The account in the target data source must have DML permissions.
  • The account in the target data source must have the following permissions on the target objects: CREATE/ALTER/DROP TABLE, SELECT, INSERT, UPDATE, DELETE, REFERENCES, CREATE, TEMPORARY, USAGE.
  • Evaluate the performance of the source and target data sources before executing data synchronization, and it is recommended to perform data synchronization during business off-peak hours. Otherwise, full data initialization will occupy certain read and write resources of both source and target data sources, leading to an increase in database load.
  • It is recommended to ensure that each table in the synchronized objects has a primary key or unique constraint and that column names are unique; otherwise, duplicate synchronization of the same data may occur.

Operating Steps

  1. Log in to the NineData Console.

  2. Click on > in the left navigation bar.

  3. On the page, click on .

  4. On the tab, configure according to the table below and click .

    Parameter
    Description
    Enter the name of the data synchronization task. For easy retrieval and management, use a meaningful name. Maximum 64 characters.
    Data source where the synchronized objects are located.
    Data source that receives the synchronized objects.
    Choose the content to be copied to the target data source.
    • : Synchronize all objects and data from the source data source, i.e., full data replication.
    • : After full synchronization, perform incremental synchronization based on the log of the source data source.
    Required only when is .
    • : Incremental replication starts based on the current replication task start time.
    • : Choose the time point when incremental replication starts. You can select the time zone according to your business location.
    • : Stop the task when data is detected in the target table during the pre-check phase.
    • : Ignore the data detected in the target table during the pre-check phase and append other data.
    • : Delete the data detected in the target table during the pre-check phase and rewrite it.
  5. On the tab, configure the following parameters and then click .

    Parameter
    Description
    Choose the content to be copied. You can choose to copy all content from the source library, or choose . In the list, select the content to be copied, and click > to add it to the right list.
  6. On the tab, the system automatically matches the same-name schema in the target data source. If it does not exist, you need to manually select the target schema. By default, the table name and column name in the target database must be consistent with the synchronized objects. If they are inconsistent, you can manually map the table name and column name. After configuration, click Save and Pre-check.

  7. On the tab, wait for the system to complete the pre-check. After the pre-check passes, click .

    tip
    • If the pre-check fails, click in the column on the right of the target check item to investigate the failure reasons. After manually fixing them, click to perform the pre-check again until it passes.
    • For check items with as , you can fix or ignore them based on the specific situation.
  8. On the page, when prompted with , the synchronization task starts running. At this point, you can perform the following operations:

    • Click to view the execution status of each stage of the synchronization task.
    • Click to return to the task list page .

View Sync Results

  1. Log in to the NineData Console.

  2. Click on > in the left navigation bar.

  3. On the page, click on the of the target synchronization task, and the page is explained as follows.

    greenplum_result

    Number
    Feature
    Description
    1Sync DelayData synchronization delay between the source data source and the target data source. A delay of 0 seconds means no delay between the two ends. At this time, you can choose to switch the business to the target data source for a smooth migration.
    2Configure AlarmsAfter configuring alarms, the system will notify you in the way you choose when the task fails. For more information, see Introduction to Operation and Maintenance Monitoring.
    3More
    • Pause: Pause the task, only tasks with the status Running can be paused.
    • Terminate: End tasks that are not completed or in listening (i.e., incremental synchronization). After terminating the task, it cannot be restarted, so operate with caution. If the synchronized objects include triggers, a trigger replication option will pop up. Select as needed.
    • Delete: Delete the task. Once the task is deleted, it cannot be recovered, so operate with caution.
    4Full Replication (Displayed in scenarios involving full replication)Display the progress and details of full replication.
    • Click Monitoring on the right side of the page: View various monitoring indicators during full replication. During the full replication process, you can also click Flow Control on the right side of the monitoring indicator page to limit the rate of writing to the target data source per second. The unit is rows/second.
    • Click Logs on the right side of the page: View the execution log of full replication.
    • Click refresh: View the latest information.
    5Incremental Replication (Displayed in scenarios involving incremental replication)Display various monitoring indicators for incremental replication.
    • Click on the right side of the page: View operations currently being executed by the current replication task, including:
      • : The replication task is executed in multiple threads, showing the current thread number in progress.
      • : Details of the SQL statement currently being executed by the current thread.
      • : The response time of the current thread. If this value increases, it means that the current thread may be stuck for some reason.
      • : Timestamp when the current thread started.
      • : The current status of the thread.
    • Click Flow Control on the right side of the page: Limit the rate of writing to the target data source per second. The unit is rows/second.
    • Click Logs on the right side of the page: View the execution log of incremental replication.
    • Click refresh: View the latest information.
    6Modify ObjectDisplay the modification records of synchronized objects.
    • Click Modify Synchronized Objects on the right side of the page to configure synchronized objects.
    • Click refresh: View the latest information.
    7Data ComparisonDisplay the results of data comparison between the source data source and the target data source. If you have not enabled data comparison, click Enable Data Comparison on the page.
    • Click Recompare on the right side of the page: Recompare the data between the current source and target ends.
    • Click Logs on the right side of the page: View the execution log of consistency comparison.
    • Click Monitoring (displayed only for data comparison): View the trend chart of RPS (records per second) for comparison. Click Details to view earlier records.
    • Click details in the Operation column on the right of the comparison list (displayed only in case of inconsistency): View details of the comparison between the source and target ends.
    • Click sql in the Operation column on the right of the comparison list (displayed only in case of inconsistency): Generate change SQL. You can copy this SQL directly to the target data source to execute and modify the inconsistent content.
    8ExpandDisplay detailed information about the current replication task, including Replication Type, Replication Object, Start Time, etc.

Appendix 1: Mapping Table of MySQL and Greenplum Data Types

During the data replication process, MySQL data types are mapped to corresponding Greenplum data types.

CategoryMySQL Data TypeRedshift Data Type
NumericTINYINTSMALLINT
TINYINT UNSIGNEDSMALLINT
SMALLINTSMALLINT
SMALLINT UNSIGNEDINTEGER
MEDIUMINTINTEGER
MEDIUMINT UNSIGNEDINTEGER
INTINTEGER
INT UNSIGNEDBIGINT
BIGINTBIGINT
BIGINT UNSIGNEDBIGINT
BIT(M)BIT(N)
DECIMAL()DECIMAL
NUMERICDECIMAL
FLOATFLOAT
DOUBLEFLOAT8
BOOL,BOOLEANSMALLINT
DATE AND TIMEDATEDATE
DATETIME[(fsp)]TIMESTAMP[(p)]
TIMESTAMP[(fsp)]TIMESTAMP[(p)] with time zone
TIME[(fsp)]TEXT
YEAR[(4)]INTEGER
STRINGCHARCHAR[(n)]
VARCHARVARCHAR
BINARY/VARBINARYBYTEA
TINYTEXT/TEXT/MEDIUMTEXT/LONGTEXTTEXT
TINYBLOB/BLOB/MEDIUMBOLB/LONGBLOBBYTEA
ENUMVARCHAR
SETVARCHAR
JSONJSONJSON

Appendix 2: Overview of Pre-check Items

Check Item Check Item Description Does Failure Halt the Task? Failure Reasons and Remediation Strategies
Source Data Source Connection Test Check the connectivity of the source data source Yes
  • Failure Reason: No connection between NineData and the source database.
  • Remediation:
    • Self-built database: Check if the database is configured with a firewall and confirm whether NineData's service IP is allowed.
    • ECS self-built database: Confirm the security group settings of the ECS instance where the data source is located to allow access to NineData's service IP. If not allowed, modify the security group to open NineData IP and re-run the pre-check.
    • Cloud database: Confirm the security group|whitelist settings of the cloud database, allowing access to NineData's service IP. If not allowed, modify the security group|whitelist to open NineData IP and re-run the pre-check.
    • JDBC connection failure: Confirm if the database account and password are incorrect. If incorrect, adjust the data source configuration and re-run the pre-check.
Target Data Source Connection Test Check the connectivity of the target data source Yes
  • Failure Reason: No connection between NineData and the target data source.
  • Remediation:
    • Cloud database: Confirm the security group|whitelist settings of the cloud database, allowing access to NineData's service IP. If not allowed, modify the security group|whitelist to open NineData IP and re-run the pre-check.
    • JDBC connection failure: Confirm if the database account and password are incorrect. If incorrect, adjust the data source configuration and re-run the pre-check.
Source Data Source Permission Check Check the account permissions of the source data source Yes
  • Failure Reason: Insufficient permissions for the source database.
  • Remediation: Grant SELECT, REPLICATION CLIENT, and REPLICATION SLAVE permissions to the account of the source data source, and re-run the pre-check.
Target Data Source Permission Check Check the account permissions of the target data source Yes
  • Failure Reason: Insufficient permissions for the target database.
  • Remediation: Grant CREATE/ALTER/DROP TABLE, SELECT, INSERT, UPDATE, DELETE, REFERENCES, CREATE, TEMPORARY, and USAGE permissions to the account of the target data source, and re-run the pre-check.
Source Database log_slave_updates Support Check Check if the source database, in the case of being a standby, has enabled log_slave_updates Yes
  • Failure Reason: Your source database is a standby, and log_slave_updates is not enabled.
  • Remediation: Enable log_slave_updates and re-run the pre-check.
Source Database Binlog Activation Check Check if the source database has enabled Binlog Yes
  • Failure Reason: Your source database has not enabled Binlog.
  • Remediation: Enable Binlog and re-run the pre-check.
Source Database binlog format Support Check (row mode) Check if the binlog format of the source database is in row mode Yes
  • Failure Reason: The binlog format of your source database is not in row mode.
  • Remediation: Refer to the [MySQL official documentation](https://dev.mysql.com/doc/refman/8.0/en/grant.html) and modify accordingly, then re-run the pre-check.
binlog_row_image Check (full) Check if binlog_row_image of the source database is set to full Yes
  • Failure Reason: binlog_row_image of your source database is not set to full.
  • Remediation: Refer to the [MySQL official documentation](https://dev.mysql.com/doc/refman/5.7/en/replication-options-binary-log.html) and modify accordingly, then re-run the pre-check.
Same Name Object Existence Check Check if there are objects with the same name as the objects to be replicated in the target
  • Yes: When the same name object processing strategy is set to "Pre-check error and stop the task".
  • No: In other cases.
  • Failure Reason: There are objects with the same name in the target data source.
  • Remediation:
    • Modify the same name object processing strategy and re-run the pre-check.
    • Delete the same name object and re-run the pre-check.
Data Existence Check Check if the replication objects in the target data source already have data
  • Yes: When the same name object processing strategy is set to "Pre-check error and stop the task".
  • No: In other cases.
  • Failure Reason: There is identical data in the target data source.
  • Remediation:
    • Modify the incremental data processing strategy and re-run the pre-check.
    • Manually clear the existing data in the target data source and re-run the pre-check.

Introduction to Data Replication