Replicating Sharded Databases and Tables

Use this workflow to replicate MySQL sharded databases and tables in NineData. Move data from multiple sharded databases and tables to another sharded setup, or consolidate the data into a single MySQL instance.

Before you begin

Add the source and target data sources to NineData first. For details, see Add Data Source.

Limitations

When creating database table groups, the physical sharded tables must exist and be extractable using an extraction expression.
Data replication applies only to user databases in the data source. System databases are not replicated. For example, in MySQL data sources, information_schema, mysql, performance_schema, and sys databases are not replicated.
The account for the source data source must have SELECT privileges for the replication objects (structure replication, full data replication), SHOW VIEW privileges (for view replication), and REPLICATION CLIENT and REPLICATION SLAVE privileges (for incremental replication). The account for the target data source must have DML and DDL privileges.
Before data synchronization, assess the performance of the source and target data sources. We recommend running synchronization during off-peak hours because full data initialization consumes database read and write resources and can create high load.
Only table objects are supported for replication. Other objects are ignored.
Ensure that every table in the synchronization objects has a partition key, primary key, or unique constraint. When the sharding routing algorithm is used, two rows routed to the same target table must not share the same primary key or unique constraint. Otherwise, new data overwrites old data.

Procedure

Step 1: Create a database table group

To replicate sharded databases and tables, add the source sharded databases and tables to a NineData database table group first. If the target also uses sharding, create a target database table group in the same way.

Sign in to the NineData Console.
Go to Datasource > Datasource.
Click the Database Grouping tab and click Create Database Grouping on the page.

Configure the form using this table, and click Create Database Grouping.

Parameter	Description
Database Grouping Name	Enter a name for the database group. Use only English letters, numbers, and underscores, and start with an English letter. Use a meaningful name so the group is easy to find and manage later.
Description (optional)	Enter a business description for this database group.
Environment	Select the environment that your business belongs to. This filters the available data sources.
Database	Click Add Datasource, and select the data sources to add to the group. Select multiple data sources, select all, deselect all, or search by data source name. After clicking OK, select the specific sharded databases.

After NineData redirects you to Database Grouping Detail, click Create Table Grouping.

Configure the form using this table, and click Create Table Grouping.

Parameter	Description
Table Grouping Name	Enter a name for the table group. Use only English letters, numbers, and underscores, and start with an English letter. Use a meaningful name so the group is easy to find and manage later.
Methods	Select Add Manually or Add by Expression.
Add by Expression	Configure this option when Methods is set to Add by Expression. Enter an expression and click Auto Fetch. NineData traverses the target database and extracts all tables that match the expression.
Routing Algorithm (optional)	Configure this parameter when Methods is set to Add by Expression. Set it based on the routing algorithm used by your application so NineData can quickly resolve the tables to access. For details about configuring Routing Algorithm, see the Appendix.
Database	Configure this option when Methods is set to Add Manually. Click Add Datasource, and select the data sources to add to the group. Select multiple data sources, select all, deselect all, or search by data source name. After clicking OK, select the databases and tables. Depending on the data source, select a schema if required.

Step 2: Create a data replication task

Sign in to the NineData Console.
In the left navigation pane, click Replication > Data Replication.
On the Replication page, click Create Replication.

On the Source & Target tab, configure the fields in the table, and click Next.

Parameter	Description
Name	Enter a name for the data synchronization task. To make the task easier to find and manage later, use a meaningful name. Up to 64 characters are supported.
Source	The data source that contains the objects to synchronize.
Datahub Project	Select the target Datahub Project. Data from the source data source will be written to the specified Project.
Type	Select the replication type. Schema: Synchronize only the database and table schemas of the source data source, without synchronizing data. Full: Synchronize all objects and data from the source data source, namely full data replication. The switch on the right enables periodic full replication. For more information, see Periodic Full Replication.
Spec (Unavailable only when Schema is selected)	The specification of the replication task. A larger specification provides a higher replication rate. Hover over the icon to view the rate and configuration information of each specification. Each specification shows the available quantity and total quantity. When the available quantity is 0, the specification is grayed out and cannot be selected.
If target table already exists (Required when Schema is selected)	Pre-Check Error and Stop Task: Stop the task when a table with the same name is detected during the precheck stage. Skip and Continue Task: When a table with the same name is detected during the precheck stage, display a message and continue the task. During schema replication, ignore the table with the same name. If you also perform data replication, data is appended to the table with the same name and existing data is not overwritten. Delete Objects and Rewrite: When a table with the same name is detected during the precheck stage, display a message and continue the task. During schema replication, delete the table with the same name in the target database and replicate the table schema again based on the source database. If you also perform data replication, data is written after schema replication completes.
Target Table Exists Data (Required when Full is selected)	Pre-Check Error and Stop Task: Stop the task when data is detected in the target table during the precheck stage. Ignore existing target data and append to it.: When data is detected in the target table during the precheck stage, ignore that data and append other data. Clear target existing data before write: When data is detected in the target table during the precheck stage, delete that data and write it again.

On the Objects tab, configure the parameters in the table, and click Next.

Parameter	Description
Replication Objects	Select the content to replicate. Select All Objects to replicate all content in the source database, or select Customized Object, select the content to replicate in the Source Object list, and click > to add it to the Target Object list on the right.
Blacklist	Click Add to add a blacklist record, and select the database or object to add to the blacklist. The selected content will not be replicated. This is used to exclude specific databases or objects when performing full-database replication for Customized Object or Full Instance. Left drop-down list: Select the database name to add to the blacklist. Right drop-down list: Select objects in the corresponding database. Select multiple objects if required. Leave it empty to add the entire database to the blacklist. To add multiple databases to the blacklist, click the Add button below to add another row.

Parameter

Description

Replication Objects

Select the content to replicate. Select All Objects to replicate all content in the source database, or select Customized Object, select the content to replicate in the Source Object list, and click > to add it to the Target Object list on the right.

Blacklist

Click Add to add a blacklist record, and select the database or object to add to the blacklist. The selected content will not be replicated. This is used to exclude specific databases or objects when performing full-database replication for Customized Object or Full Instance.

Left drop-down list: Select the database name to add to the blacklist.
Right drop-down list: Select objects in the corresponding database. Select multiple objects if required. Leave it empty to add the entire database to the blacklist.

To add multiple databases to the blacklist, click the Add button below to add another row.

To create multiple replication tasks with the same replication objects, import a configuration file. Click Import Config, click Download Template to download the template, edit the file, and then click Upload to upload it and import the objects in bulk. The configuration file uses these fields:

Parameter	Description
`source_table_name`	The source table name of the object to synchronize.
`destination_table_name`	The target table name that receives the synchronized object.
`source_schema_name`	The source schema name of the object to synchronize.
`destination_schema_name`	The target schema name that receives the synchronized object.
`source_database_name`	The source database name of the object to synchronize.
`target_database_name`	The target database name that receives the synchronized object.
`column_list`	The list of columns to synchronize.
`extra_configuration`	Additional configuration information. This field supports: `column_rules`: Defines column mappings and value rules. Field descriptions: `column_name`: Original column name. `destination_column_name`: Specifies the target column name. `column_value`: Specifies the column value, which can be an SQL function or a constant value. `filter_condition`: Specifies row-level data filtering conditions. Only rows that meet the conditions are replicated.

tip

Example of extra_configuration:

{
  "extra_config":{
    "column_rules":[
      {
         "column_name": "created_time",
         "destination_column_name": "migrated_time",
         "column_value": "current_timestamp()"
      }
    ],
     "filter_condition": "id != 0"
  }
}

In this example, created_time is mapped to migrated_time, the target column value is changed to current_timestamp(), and only rows whose id value is not 0 are synchronized.

For a complete example of the configuration file, see the downloaded template.

On the "Mapping" tab, configure the mapping that matches the selected replication type, then click Save and Pre-Check. If source or target metadata changes while you configure mappings, click Refresh Metadata to refresh the metadata.
- Includes Schema: Configure the table name after synchronization to the target data source.
- Does not include Schema: NineData selects the database with the same name in the target data source by default. If no such database exists, select the target database manually. The table names and column names in the target database must match the synchronization objects. If they do not match, map the table names and column names manually.
Other available actions:
- Click Mapping & Filtering to customize the column names after synchronization to the target data source.
- On the Mapping & Filtering page, click Data Filter to configure filtering conditions by using comparison expressions. Only data that meets the filtering conditions is synchronized to the target data source. For example, if the filtering condition is set to emp_no>=10005, data whose emp_no column value is less than 10005 is not synchronized to the target data source.
- Click the icon to the right of "Target Table" to search for a table name and replace it with the target name.
- Enter a table name in the Search Table text box to quickly locate the target table.
- Click Batch Configuration to define common rules in batches, such as table name and column name case conversion, prefix or suffix addition, and replacement. Use this option to apply mapping configuration to many tables and columns at the same time.

On the Pre-check tab, wait for NineData to complete the precheck. After the precheck passes, click Launch.
- Select Enable data consistency comparison to start a data consistency comparison task based on the source data source after synchronization completes. Based on the selected Type, Enable data consistency comparison starts at these times:
  - Schema: Starts after schema replication completes.
  - Schema+Full: Starts after full replication completes.
  - Full: Starts after full replication completes.
- If the precheck fails, click Details in the Actions column for the failed check item, review the cause, fix the issue, and then click Check Again to run the precheck again until it passes.
- Items with Warning in Result can be fixed or ignored if required.
On the Launch page, the Launch Successfully message appears, indicating that the synchronization task has started. Then perform these actions:
- Click View Details to view the execution status of each stage of the synchronization task.
- Click Back to list to return to the Replication task list page.

Step 3: View the synchronization result

Sign in to the NineData Console.
In the left navigation pane, select Replication > Data Replication.

On the Replication page, click the Task ID of the target synchronization task. Review the task details page.

result_no_incre

Number	Function	Description
1	Configure Alerts	When the task fails, NineData notifies the selected channel through the configured alert. For more information, see Operational Monitoring Overview.
2	More	Pause: Pause the task. Only tasks with the status Running are selectable. Duplicate: Create a new replication task with the same configuration as the current task. Terminate: End tasks that are incomplete or in listening (i.e., in incremental synchronization). After terminating the task, it cannot be restarted, so proceed with caution. If triggers are included in the synchronization object, trigger replication options appear for selection. Delete: Delete the task. Once the task is deleted, it cannot be recovered, so proceed with caution.
3	Structural Replication (Displayed in scenarios involving structural replication)	Displays the progress and details of structural replication. Select Log to view the execution log of structural replication. Select the to view the latest information. Select View DDL in the Actions column for the target object in the list to view SQL replay.
4	Full Replication (Displayed in scenarios involving full replication)	Displays the progress and details of full replication. Select Monitor to view various monitoring indicators during full replication. During full replication, select Flow Control Settings on the monitoring page to limit the rate of data written to the target data source per second. The unit is rows/second. Select Log to view the execution log of full replication. Select the to view the latest information.
5	Data Comparison	Displays the comparison results between the source and target data sources. If data comparison is not enabled, select Enable Comparison on the page to enable it. Select Re-compare to rerun the comparison for the current source and target data sources. Select Stop to stop the comparison task immediately after it starts. Select Log to view the execution log of consistency comparison. Select Monitor (only displayed in data comparison) to view the trend chart of RPS (records per second compared) for comparison. Select Details to view records from earlier times. Select the in the Actions column in the comparison list (displayed only under the Data tab when inconsistencies are found) to view detailed comparison between the source and target data sources. Select the in the Actions column in the comparison list (displayed only when inconsistencies are found): Generate change SQL. Copy this SQL to the target data source and run it to fix the mismatch.
6	Expand	Displays detailed information of the current replication task. Common actions: Export table configuration: Export the current task's database and table configuration for quick import when creating another replication task with the same objects. Alert Rules: Configure alerts for the current task.

Appendix: Routing algorithm

Use routing algorithms to route data to target sharded databases and tables. In routing algorithm configuration, define the target database and table with the following expressions:

'<dbname_expression>''.<tablename_expression>'

<dbname_expression>: Database name expression in the format: '<dbname_prefix>'+(<expression>)+'<dbname_suffix>'.

'<dbname_prefix>': The prefix of the database name, such as 'logical_db_0'.
(<expression>): The dynamic numeric part of the database name composition, e.g., #user_id#%4. Suppose the value of the user_id column is 1. Dividing 1 by 4 and taking the remainder gives 1, resulting in a database name of logical_db_01 when combined with the prefix. NineData specifies using # to enclose field names in routing algorithms for easier parsing.
'<dbname_suffix>': The suffix of the database name. This value is optional. For example, use '_bak' to produce the final database name logical_db_01_bak.

Example: If the result of #user_id#%4 is 0, use these database routing <dbname_expression> examples:

Target Database Name	`<dbname_expression>` Example
logical_db_01	'logical_db_0'+(#user_id#%4+1)
logical_db_01_bak	'logical_db_0'+(#user_id#%4+1)+'_bak'
logical_db_00	'logical_db_0'+(#user_id#%4)
logical_db_1	'logicaldb'+(#user_id#%4+1)

.<tablename_expression>: Table name expression in the format: '.<tablename_prefix>'+(<expression>)+'<tablename_suffix>'.

'.<tablename_prefix>': The prefix of the table name, such as '.test_time_0'. The dot (.) indicates the table belongs to the preceding database.
(<expression>): The dynamic numeric part of the table name composition, e.g., #user_id#%4. Suppose the value of the user_id column is 1. Dividing 1 by 4 and taking the remainder gives 1, resulting in a table name of test_time_01 when combined with the prefix. NineData specifies using # to enclose field names in routing algorithms for easier parsing.
'<tablename_suffix>': The suffix of the table name. This value is optional. For example, use '_bak' to produce the final table name .test_time_01_bak.

Example: If the result of #user_id#%4 is 0, use these table routing .<tablename_expression> examples:

Target Table Name	`<tablename_expression>` Example
test_time_01	'.test_time_0'+(#user_id#%4+1)
test_time_01_bak	'.test_time_0'+(#user_id#%4+1)+'_bak'
test_time_00	'.test_time_0'+(#user_id#%4)
test_time_1	'.testtime'+(#user_id#%4+1)

Using the examples above, if the value of the user_id column is 0, the following routing algorithm routes data to the test_time_01 table in the logical_db_01 database:

'logical_db_0'+(#user_id#%4+1)'.test_time_0'+(#user_id#%4+1)

Introduction to Data Replication

Replicating Sharded Databases and Tables

Before you begin​

Limitations​

Procedure​

Step 1: Create a database table group​

Step 2: Create a data replication task​

Step 3: View the synchronization result​

Appendix: Routing algorithm​

Related documents​