OceanBase MySQL to Kafka Replication

Use this page to replicate OceanBase MySQL data to Kafka in NineData. NineData writes full and incremental data as JSON messages for downstream consumption.

Before you begin

Add the source and target data sources to NineData. For details, see Creating a Data Source.
The source database is OceanBase MySQL.
The target database is Kafka 0.10 or later.
Enable binlog for the source data source and set the following parameters:
- binlog_format=ROW
- binlog_row_image=FULL
  tip
  If the source data source is a standby database, enable log_slave_updates so NineData can capture complete binlog logs.

Limitations

Data replication applies only to user databases. System databases such as information_schema, mysql, performance_schema, and sys are not replicated.
Check the performance of both data sources before you start. Run replication during off-peak hours so full initialization does not overload the source or target.
Make sure each synchronized table has a primary key or unique constraint, and keep column names unique to avoid duplicate rows.

Procedure

Sign in to the NineData Console.
In the left navigation pane, click Replication > Data Replication.
On the Replication page, click Create Replication.

On the Source & Target tab, configure the fields in the table, and click Next.

Parameter	Description
Name	Enter a name for the data synchronization task. To make the task easier to find and manage later, use a meaningful name. Up to 64 characters are supported.
Source	The data source that contains the objects to synchronize.
Datahub Project	Select the target Datahub Project. Data from the source data source will be written to the specified Project.
Kafka Topic	Select the target Kafka Topic. Data from the source data source will be written to the specified Topic.
Delivery Partition	When delivering data to a Topic, you can specify the partition to which the data is delivered. Deliver All to Partition 0: Deliver all data to the default partition 0. Deliver to different partition by the Hash value of [databaseName + tableName]: Hash data across different partitions. The system uses the hash value of the database name and table name to calculate the target partition, ensuring that data from the same table is delivered to the same partition during hash delivery.
Type	Select the replication type. Full: Synchronize all objects and data from the source data source, namely full data replication. The switch on the right enables periodic full replication. For more information, see Periodic Full Replication. Incremental: After full synchronization is complete, perform incremental synchronization based on the logs of the source data source.
Incremental Started	Required only when Type is Incremental. From Started: Use the current replication task start time as the baseline for incremental replication. Customized Time: Select the point in time from which incremental replication starts. You can select a time zone based on the region of your business. If the configured time point is earlier than the current replication task start time and DDL operations occurred during that period, the replication task will fail.
Target Table Exists Data (Required when Full is selected)	Pre-Check Error and Stop Task: Stop the task when data is detected in the target table during the precheck stage. Ignore existing target data and append to it.: When data is detected in the target table during the precheck stage, ignore that data and append other data. Clear target existing data before write: When data is detected in the target table during the precheck stage, delete that data and write it again.
Incremental data conflict handling strategy for target table (Required when Incremental is selected)	Runtime error: During incremental replication, report an error when target data already exists and wait for manual intervention. Do not update target data: During incremental replication, do not write data when target data already exists, and continue subsequent tasks. Update target data: During incremental replication, overwrite the target data when target data already exists.

On the Objects tab, configure the parameters in the table, and click Next.

Parameter	Description

To create multiple replication tasks with the same replication objects, import a configuration file. Click Import Config, click Download Template to download the template, edit the file, and then click Upload to upload it and import the objects in bulk. The configuration file uses these fields:

Parameter	Description
`source_table_name`	The source table name of the object to synchronize.
`destination_table_name`	The target table name that receives the synchronized object.
`source_schema_name`	The source schema name of the object to synchronize.
`destination_schema_name`	The target schema name that receives the synchronized object.
`source_database_name`	The source database name of the object to synchronize.
`target_database_name`	The target database name that receives the synchronized object.
`column_list`	The list of columns to synchronize.
`extra_configuration`	Additional configuration information. This field supports: `column_rules`: Defines column mappings and value rules. Field descriptions: `column_name`: Original column name. `destination_column_name`: Specifies the target column name. `column_value`: Specifies the column value, which can be an SQL function or a constant value. `filter_condition`: Specifies row-level data filtering conditions. Only rows that meet the conditions are replicated.

tip

Example of extra_configuration:

{
  "extra_config":{
    "column_rules":[
      {
         "column_name": "created_time",
         "destination_column_name": "migrated_time",
         "column_value": "current_timestamp()"
      }
    ],
     "filter_condition": "id != 0"
  }
}

In this example, created_time is mapped to migrated_time, the target column value is changed to current_timestamp(), and only rows whose id value is not 0 are synchronized.

For a complete example of the configuration file, see the downloaded template.

On the "Mapping" tab, you can separately configure each column to replicate to Kafka. By default, all columns of the selected table are replicated. If source or target metadata changes while you configure mappings, click Refresh Metadata to refresh the metadata. After completing the configuration, click Save and Pre-Check.

On the Pre-check tab, wait for the system to complete the precheck. After the precheck passes, click Launch.
- Select Enable data consistency comparison to start a data consistency comparison task based on the source data source after synchronization completes. Based on the selected Type, Enable data consistency comparison starts at these times:
  - Full: Starts after full replication is complete.
  - Full+Incremental, Incremental: Starts when incremental data is consistent with the source data source for the first time and Delay is 0 seconds. Click View Details to view synchronization delay on the Details page.
- If the precheck fails, click Details in the Actions column for the failed check item, review the cause, fix the issue, and then click Check Again to run the precheck again until it passes.
- Items with Warning in Result can be fixed or ignored as needed.
On the Launch page, the Launch Successfully message appears, indicating that the synchronization task has started. You can then perform these actions:
- Click View Details to view the execution status of each stage of the synchronization task.
- Click Back to list to return to the Replication task list page.

Results

Sign in to the NineData Console.
In the navigation menu, click Replication > Data Replication.

On the Replication page, click the ID of the target synchronization task to open the Details page. The task details page shows the following information.

result_kafka

No.	Feature	Description
1	Synchronization Delay	The synchronization delay between the source and target data sources. `0` seconds means Kafka has caught up with the source data source.
2	Configure Alerts	When the task fails, the system notifies the selected channel through the configured alert. For more information, see Operational Monitoring Overview.
3	More	Pause: Pause the task, only tasks with the status Running are selectable. Terminate: End tasks that are incomplete or listening (i.e., in incremental synchronization), after terminating the task, it cannot be restarted, proceed with caution. Delete: Delete the task, the task cannot be recovered after deletion, proceed with caution.
4	Full Replication (Displayed in scenarios including full replication)	Displays the progress and detailed information of full replication. Click Monitor to view various monitoring metrics during the full replication process. During the full replication process, you can also click Flow Control Settings on the monitoring page to limit the rate of writing to the target data source per second. The unit is MB/S. Click Log to view the execution logs of full replication. Click the icon to view the latest information.
5	Incremental Replication (Displayed in scenarios including incremental replication)	Displays various monitoring metrics for incremental replication. Click Flow Control Settings to limit the rate of writing to the target data source per second. The unit is rows/second. Click Log to view the execution logs of incremental replication. Click the icon to view the latest information.
6	Modify Object	Displays the modification records of the synchronization object. Click Modify Objects to configure the synchronization object. Click the icon to view the latest information.
7	Expand	Displays detailed information of the current replication task, including Type, Replication Objects, Started, etc.

Appendix: Data format

Data replicated from OceanBase MySQL to Kafka is stored in JSON. NineData splits rows into JSON objects, and each JSON object represents one message.

During the full copy phase, the number of MySQL data rows stored in a single message is determined by the message.max.bytesmessage.max.bytes is the maximum message size allowed in the Kafka cluster. The default value is `1000000` bytes, or 1 MB. Adjust this value in the Kafka configuration file to store more MySQL data rows in each message. If the value is too large, Kafka may need more memory per message and cluster performance can drop. parameter.
During the incremental copy phase, a single message stores one row of MySQL data.

Each JSON object contains the following fields:

Field Name	Field Type	Field Description	Field Example
serverId	STRING	The data source information to which the message belongs, in the format: <connection_address:port>.	`"serverId":"47.98.224.21:3307"`
id	LONG	The Record id of the message. This field globally increments and serves as a judgment basis for duplicate message consumption.	`"Id":156`
es	INT	Different task stages represent different meanings: Full replication stage: represents the start time of the full data replication task, represented as a Unix timestamp. Incremental replication stage: represents the time corresponding to each event (EVENT) in the Binlog.	`"es":1668650385`
ts	INT	The time when more data was delivered to Kafka, represented as a Unix timestamp.	`"ts":1668651053`
isDdl	BOOLEAN	Whether the data is DDL, with values: true: Yes false: No	`"is_ddl":true`
type	STRING	The type of data, with values: INIT: Represents full data replication. INSERT: Represents INSERT operation. DELETE: Represents DELETE operation. UPDATE: Represents UPDATE operation. DDL: Represents DDL operation.	`"type":"INIT"`
database	STRING	The database to which the data belongs.	`"database":"database_name"`
table	STRING	The table to which the data belongs. If the object corresponding to the DDL statement is not a table, the field value is `null`.	`"table":"table_name"`
mysqlType	JSON	The data type of the data in MySQL, represented as a JSON array.	`"mysqlType": {"id": "bigint(20)", "shipping_type": "varchar(50)" }`
sqlType	JSON	Reserved field. Ignore this field.	`"sqlType":null`
pkNames	ARRAY	The primary key names corresponding to the record (Record) in the Binlog. Values: If the record is of DDL type, the value is null. If the record is of INIT or DML type, the value is the primary key name of that record.	`"pkNames": ["id", "uid"]`
data	ARRAY[JSON]	The data delivered from MySQL to Kafka, stored in a JSON format array. Full data replication scenario (type = INIT): Stores the full data delivered from MySQL to Kafka. Incremental data replication scenario: Stores the details of the changes made to the data in the Binlog. INSERT: The values of the insert operation in each field. UPDATE: The values of the update operation (after the update) in each field. DELETE: The values of the delete operation in each field. DDL: The table structure after the table DDL operation.	`"old": [{ "name": "someone", "
old	ARRAY[JSON]	Records the incremental replication details from MySQL to Kafka. UPDATE: The values of each field before the update operation. DDL: The table structure before the DDL operation on the table. For other operations, the value of this field is `null`.	`"old": [{ "name": "someone", "phone": "(737)1234787", "email": "someone@example.com", "address": "somewhere", "country": "china" }]`
sql	STRING	If the current data is an incremental DDL operation, records the SQL statement corresponding to the operation. For other operations, the value of this field is `null`.	`"sql":"create table sbtest1(id int primary key,name varchar(20))"`

Appendix: Pre-check items

Check Item	Check Content
Source Data Source Connection Check	Checks the source data source gateway status, instance reachability, and username/password validity.
Target Data Source Connection Check	Checks the target data source gateway status, instance reachability, and username/password validity.
Source Database Permission Check	Checks whether the source database account permissions meet the requirements.
Check if Source Database log_slave_updates is Supported	Checks whether `log_slave_updates` is set to `ON` when the source database is a standby database.
Source Data Source and Target Data Source Version Check	Checks whether the source and target database versions are compatible.
Check if Source Database is Enabled with Binlog	Checks whether binlog is enabled for the source database.
Check if Source Database Binlog Format is Supported	Checks whether the source database binlog format is `ROW`.
Check if Source Database binlog_row_image is Supported	Checks whether `binlog_row_image` is set to `FULL`.
Target Database Permission Check	Checks whether the Kafka account has permission to access the Topic.
Target Database Data Existence Check	Checks whether data already exists in the Topic.

Introduction to Data Replication

OceanBase MySQL to Kafka Replication

Before you begin​

Limitations​

Procedure​

Results​

Appendix: Data format​

Appendix: Pre-check items​

Related documents​