Skip to main content

Self-managed MySQL to ClickHouse

NineData data replication supports the replication of self-built MySQL data to ClickHouse, and supports full replication and incremental replication.

Prerequisite

  • Added source and target data sources to NineData. For how to add, see Adding Data Sources .

  • The versions of the source and target data sources are shown in the table below.

    SourceTarget
    • MySQL 8.0
    • MySQL 5.7
    • MySQL 5.6
    ClickHouse 20.8 and above
  • The source must have Binlog enabled, and the related parameters of Binlog are set as follows:

    • binlog_format=ROW
    • binlog_row_image=FULL
    tip

    If the source is the standby database, in order to obtain the complete Binlog log, the log_slave_updates must be turn ON.

  • The access_management parameter value of the account used to log in to ClickHouse needs to be 1.

    Configuration method:

Open the User.xml configuration file located in the /etc/clickhouse-server directory with an editor, find the target user, and add the parameter: <access_management>1</access_management>.

:::

Precautions

  • In the source MySQL data source, if there is an unsigned integer type field, it will be mapped to Uint type in ClickHouse. For other data type mappings between MySQL and ClickHouse, please refer to Appendix.

  • Usually, the replication scenario from MySQL to ClickHouse needs to include schema replication, the system will automatically migrate the MySQL table schema to ClickHouse, and insert two system columns into the tables involved in the replication in ClickHouse. If the replication type does not include schema replication, user need to manually ensure the following:

    • For data objects involved in replication in the ClickHouse data source, the database table structure must be consistent with the MySQL data source.

    • The following two system columns need to be manually inserted into the tables involved in the replication of the ClickHouse data source to record the time of DML operations and Binlog.

      column namedata type
      column default value
      description
      _jz_data_signInt8DEFAULT 1Record the type of DML operation to ensure that the data on the ClickHouse side is consistent with the MySQL side.
      • INSERT operation: 1.
      • DELETE operation: -1.
      • UPDATE operation: split into INSERT and DELETE, i.e. two records with 1 and -1 .
      _jz_data_timeStringDEFAULT now()Time to record Binlog.

      Example:

      CREATE TABLE [IF NOT EXISTS] [db.] table_name [ON CLUSTER cluster]
      (
      ...
      _jz_data_sign Int8 DEFAULT 1 COMMENT 'replication data update sign: add = 1, remove = -1',
      _jz_data_time String DEFAULT now() COMMENT 'replication data update time'
      ) ENGINE = engine

Restrictions

  • The data replication function is only for the user databases in the data source, and the system databases will not be replicated. For example: information_schema, mysql, performance_schema, sys databases in MySQL type data sources will not be replicated.
  • The account for source data must have SELECT (for replicate database structure and full data), SHOW VIEW (for replicate views), and REPLICATION CLIENT, REPLICATION SLAVE (for replicate incremental data) privileges on the objects to be replicated. The account for target must have DML and DDL privileges.
  • Before performing data synchronization, user need to evaluate the performance of the source data source and the target data source, and it is recommended to perform data synchronization during off-peak time. Otherwise, the full data initialization will occupy a certain amount of read and write resources of the source data source and the target data source, increasing database load.
  • It is necessary to ensure that each table in the synchronization object has a primary key or unique constraint, and the column name is unique, otherwise the same data may be synchronized repeatedly.

Steps

Commercialization Notice

NineData’s data replication product has been commercialized. You can still use 10 replication tasks for free, with the following considerations:

  • Among the 10 replication tasks, you can include 1 task, with a specification of Micro.

  • Tasks with a status of do not count towards the 10-task limit. If you have already created 10 replication tasks and want to create more, you can terminate previous replication tasks and then create new ones.

  • When creating replication tasks, you can only select the you have purchased. Specifications that have not been purchased will be grayed out and cannot be selected. If you need to purchase additional specifications, please contact us through the customer service icon at the bottom right of the page.

  1. Log in to NineData console .

  2. Click Replication > in the left navigation bar .

  3. On the Replication page, click Create Replication in the upper right corner .

  4. On the Source & Target tab, configure according to the following table, and click Next.

    Parameters
    Description
    NameEnter the name of the data replication task. For the convenience of subsequent search and management, please try to use a meaningful name. Up to 64 characters are supported.
    SourceThe MySQL data source where the synchronization object resides.
    TargetThe ClickHouse data source that receives the synchronization object.
    Type
    • Schema : Only the database table structure of the source data is synchronized.
    • Full : Synchronize all objects and data of the source. That is, full data replication.
    • Incremental: After the full synchronization is completed, incremental synchronization is performed based on the logs of source. Click the setting icon to uncheck certain operation types according to your needs. Once unchecked, these operations will be ignored during incremental synchronization.
    Incremental start timeA selection is required when selecting Incremental replication.
    • From start time: Based on the start time of the current copy task, perform incremental copy.
    • Custom Time: Select the time point when the incremental replication starts. Please select the time zone according to the region where your data source is located. If the time point is configured to be before the start of the current replication task, the replication task will fail if there is a DDL operation within that time period.
    If target table already exist (need to select when Schema is selected)
    • Abort after error: The same name is detected in the pre-check stage table, stop the task.
    • Skip and continue the task: When a table with the same name is detected in the pre-check phase, send a prompt and continue the task. When the schema is copied, the table with the same name is ignored. If copy data at the same time, the data will be appended to the table with the same name, and the original data will not be overwritten.
    • Delete object and rewrite: When a table with the same name is detected in the pre-check phase, send a prompt and continue the task. When copying the schema, delete the table with the same name in the target database, and re-copy the table schema from the source database. If copy data at the same time, the data will be written after the table schema is copied.
    • Preserve the schema and clean data before write (optional when replicating the schema and data at the same time): When a table with the same name is detected in the pre-check stage, send a prompt and continue the task. When the schema is copied, the table schema is retained in the target database, and the data in the table with the same name is cleared when the data copy starts, and then copied from the original table again.
    Target table data processing policy (need to select when Schema is not selected)
    • Pre-check for errors and Exit: Stop the task when the precheck phase detects that data exists in the target table.
    • Ignore the existing data and append: When the pre-check stage detects that there is data in the target table, ignore this part of the data and write additional data.
    • Clear the existing data before write: When the pre-check stage detects that there is data in the target table, delete this part of the data and rewrite it.
  5. On the Objects tab, configure the following parameters, and then click Next.

    Parameter
    Description
    Replication ObjectsSelect the content to be replicated, you can choose All Objects to replicate all the contents of the source library, or you can choose Customized Object, select it in the Source list For the content to be replicated, click > to add to the right Target list.
    Blocklist (optional)Click Add to add a blocklist record, Select libraries or objects that need to be added to the blocklist, these contents will not be replicated. Used to exclude certain libraries or objects from All Objects replication or Customized Object.
    • The drop-down box on the left: select the name of the library that needs to be added to the blocklist.
    • Right drop-down box: select the object in the corresponding library, you can click multiple objects to make multiple selections, and leave it blank to add the entire database to the blocklist.
    If you want to add multiple libraries to the blocklist, you can click the Add button below to add a row.
  6. On the Mapping tab, select different operations based on the selected replication Type.

    • Contains Schema: configure the table name after the target table is synchronized to the target data source, and click Save and Pre-check.

      tip

      User can click Mapping & Filtering on the right side of the page to customize the column names after they are synchronized to the target data source.

    • Does not contains Schema: The system selects the database with the same name in the target data source by default. If it does not exist, user need to manually select the target database. The table name and column name in the target library must be consistent with the synchronization object. If not, user can also manually map table names and column names.

  7. On the Pre-check tab, wait for the system to complete the pre-check. After the pre-check passes, click Launch.

    tip
    • User can click to Enable data consistency comparison. After the synchronization task is completed, the data consistency comparison based on the source data is automatically enabled to ensure that the data at both ends are consistent. Depending on the replication Type, the data consistency comparison starts as follows:

      • Schema : Starts after the schema replication is complete.

      • Schema + Full : start after full replication is complete.

      • Schema + Full + Incremental (SQL Server is not currently supported): Start when the incremental data is consistent with the source data for the first time and the Delay is 0 seconds. You can click View Details to view the synchronization delay in the Details page.

        sync_delay

    • If the pre-check fails, user need to click the Details in the Actions column to the right of the target item to check the cause of the failure. After correction, click Check Again to re-execute the pre-check.

    • For Warnings, user can be correct or ignore according to specific circumstances.

  8. On the Launch page, a message is displayed indicating that Launch Successfully , and the replication task starts to run. At this point user can do the following:

    • Click View Details to view the execution of each phase of the synchronization task.
    • Click Back to list to return to the Replication list.

View sync results

  1. Log in to NineData console .

  2. Click Data Replication > in the left navigation bar .

  3. On the Replication page, click the Task ID of the target synchronization task . The page description is as follows.

    result

    Serial number
    Function
    Description
    1DelayThe data synchronization delay between the source and the target. 0 seconds means there is no delay between the two ends. At this time, user can choose to switch the business to the target data source to achieve smooth migration.
    2Configure AlertAfter configuring alerts, the system notifies user in event of failures. For more information, see Introduction to Operational Monitoring .
    3More
    • Pause : Pause the task, only for the task whose status is Running.
    • Terminate : End the task that has not been completed or is being monitored (that is, in the process of incremental synchronization). The task cannot be restarted after the task is terminated. Please operate with caution. If the synchronization object contains triggers, a trigger replication option will pop up, please select as needed.
    • Delete : Delete the task. After the task is deleted, it cannot be recovered. Please operate with caution.
    4Schema (in scenarios that include schema replication)Shows the progress and details of schema replication.
    • Click Log on the right side of the page : View the execution log of the fabric replication.
    • Click Refresh on the right side of the page : View the latest information.
    • Click View DDL in the Action column to the right of the target table in the list: To view the SQL playback.
    5Full (in scenarios that include full replication)Displays the progress and details of the full copy.
    • Click Monitor on the right side of the page: view the monitoring indicators during the full replication process. During the full copy process, you can also click Limit Settings on the right side of the monitoring indicators page to limit the rate of writing to the target data source per second. Unit: rows/second.
    • Click Log on the right side of the page : View the execution log of full replication.
    • Click Refresh on the right side of the page : View the latest information.
    6Incremental (in scenarios that include incremental replication)Displays various monitoring indicators of incremental replication.
    • Click Limit Settings on the right side of the page: Limit the rate of writing to the target data source per second. Unit: rows/second.
    • Click Log on the right side of the page : View the execution log of incremental replication.
    • Click Refresh on the right side of the page : View the latest information.
    7Modify ObjectsDisplays the modification history of the replication object.
    • Click Modify Objects on the right side of the page to configure replication objects.
    • Click Refresh on the right side of the page : View the latest information.
    8ComparisonDisplays the results of the data comparison between the source data source and the target data source, including structural comparisons and data comparisons. If you have not enabled comparison, please click Enable Comparison on the page.
    • Click Re-compare on the right side of the page : restart comparision of the data at both ends of the current source and target.
    • Click Log on the right side of the page : View the execution log of the consistency comparison.
    • Click Monitor on the right side of the page (Data Comparison display only): View the trend graph of the comparison RPS (records per second comparison). Click Details to view earlier records.
    • Click detailsin the Actions column on the right side of the comparison list: view the comparison details of the table definition DDL statements of the source and target.
    • Click sqlin the Actions column on the right side of the comparison list (displayed in case of inconsistency): generate change SQL, which can be directly applicated on the target data source for execution, and to modify the inconsistent content.
    9Show DetailsDisplays the detailed information of the current replication task, including Type, Replication Object, Started, etc.

Appendix 1: Mapping relationship between MySQL and ClickHouse data types

During data replication, MySQL data types will be mapped to corresponding ClickHouse data types.

MySQLClickHouse
INT、MEDIUMNINTInt32
BITUInt64
TINYINTInt8
SMALLINT UNSIGNED、YEARUInt16
BIGINTInt64
DECIMALDecimal
FLOATFloat32
DOUBLEFloat64
DATEDate32
DATETIME、TIMESTAMPDateTime64
TIME、CHAR、VARCHAR、BINARY、VARBINARY、TINYBLOB、TINYTEXT、BLOB、TEXT、MEDIUMBLOB、MEDIUMTEXT、LONGBLOB、LONGTEXT、ENUM、SET、GEOMETRY、GEOMETRYCOLLECTION、JSONString
POINT、MULTIPOINTPoint
LINESTRING、MULTILINESTRINGRing
POLYGOPolygon
MULTIPOLYGONMultiPolygon

Appendix 2: Checklist of Pre-Check Items

Check ItemCheck Content
Target Database Data Existence CheckCheck if data exists for the objects to be replicated in the target database
Target Database Same Name Object Existence CheckCheck if the objects to be replicated already exist in the target database
Check if Target Table Contains System FieldsCheck if the target table contains system fields _sign and _version
Source Data Source Connection CheckCheck the status of the source data source gateway, instance accessibility, and accuracy of username and password
Target Data Source Connection CheckCheck the status of the target data source gateway, instance accessibility, and accuracy of username and password
Target Database Permission CheckCheck if the account permissions in the target database meet the requirements
Source Database Permission CheckCheck if the account permissions in the source database meet the requirements
Check if Source Database log_slave_updates is SupportedCheck if log_slave_updates is set to ON when the source database is a slave
Source Data Source and Target Data Source Version CheckCheck if the versions of the source database and target database are compatible
Check if Source Database is Enabled with BinlogCheck if the source database is enabled with Binlog
Check if Source Database Binlog Format is SupportedCheck if the binlog format of the source database is 'ROW'
Check if Source Database binlog_row_image is SupportedCheck if the binlog_row_image of the source database is 'FULL'

Introduction to Data Replication