Replicating Sharded Databases and Tables
NineData supports data replication for MySQL sharded databases and tables, allowing you to replicate data from multiple sharded databases and tables to other sharded databases or aggregate them into another MySQL instance.
Prerequisites
The source and target data sources must be added to NineData. For details on how to add data sources, see Adding Data Sources.
Usage Restrictions
- When creating database table groups, the physical sharded tables must exist and be extractable using an extraction expression.
- The data replication feature only applies to user databases in the data source. System databases will not be replicated. For example, in MySQL data sources,
information_schema,mysql,performance_schema, andsysdatabases will not be replicated. - The account for the source data source must have
SELECTprivileges for the replication objects (structure replication, full data replication),SHOW VIEWprivileges (for view replication), andREPLICATION CLIENTandREPLICATION SLAVEprivileges (for incremental replication). The account for the target data source must have DML and DDL privileges. - Before performing data synchronization, assess the performance of both the source and target data sources. It is recommended to run data synchronization during off-peak business hours to avoid high load on the databases during the full data initialization, which consumes read and write resources.
- Only table objects are supported for replication; other objects will be automatically ignored.
- Ensure that every table in the synchronization object has a partition key, primary key, or unique constraint. Additionally, when using the sharding routing algorithm, the two rows of data routed to the same target table must not have the same primary key or unique constraint, or new data will overwrite the old data.
Step 1: Create Database Table Group
You need to add the sharded databases and tables from the source data source into NineData’s database table group to enable their migration. If your replication target also involves sharded databases and tables, follow this step to create a target database table group as well.
Log in to the NineData Console.
In the left navigation pane, click Datasource > Datasource.
Click the Database Grouping tab and click Create Database Grouping on the page.
Configure the form according to the table below, and click Create Database Grouping.
Parameter Description Database Grouping Name Enter the name of the database group, using only English characters, numbers, and underscores, starting with an English letter. To facilitate future use and management, use a meaningful name. Description (optional) Enter a business description for this database group. Environment Select the name of the environment your business belongs to, which will help you filter the data sources. Database Click Add Datasource, select the data sources to add to the group. You can select multiple data sources, select all, or deselect them. You can also search by data source name. After clicking OK, you will need to select the specific sharded databases. The page will automatically redirect to Database Grouping Detail. Click Create Table Grouping.
Configure the form according to the table below, and click Create Table Grouping.
Parameter Description Table Grouping Name Enter the name of the table group, using only English characters, numbers, and underscores, starting with an English letter. To facilitate future use and management, use a meaningful name. Methods Supports Add Manually and Add by Expression. Add by Expression When Methods is set to Add by Expression, you need to configure this option. Enter an expression and click Auto Fetch. The system will automatically traverse and extract all qualifying tables in the target database based on the expression provided. Routing Algorithm (optional) When Methods is set to Add by Expression, you can configure this parameter based on the routing algorithm used in your application, which allows quick resolution of the tables to be accessed. For the configuration method of Routing Algorithm, please refer to the Appendix of this document. Database When Methods is set to Add Manually, you need to configure this. Click Add Datasource, select the data sources to add to the group. You can select multiple data sources, select all, or deselect them. You can also search by data source name. After clicking OK, you will need to select specific databases and tables. Depending on the data source, you may also need to select a specific schema.
Step 2: Create Data Replication Task
Log in to the NineData Console.
In the left navigation pane, click Replication > Data Replication.
On the Replication page, click Create Replication in the upper-right corner.
On the Source & Target tab, configure the parameters in the following table, and click Next.
Parameter Description Name Enter a name for the data synchronization task. To make the task easier to find and manage later, use a meaningful name. Up to 64 characters are supported. Source The data source that contains the objects to synchronize. Target The data source that receives the synchronized objects. Target Object Name Select the case conversion rule for object names after they are migrated from the source to the target. - Convert all to Lowercase: Regardless of the naming rule on the source, all target names are lowercase.
- Consistent with Source: Keep the naming rule of the source.
- Convert all to Uppercase: Regardless of the naming rule on the source, all target names are uppercase.
Type Select the replication type. - Schema: Synchronize only the database and table schemas of the source data source, without synchronizing data.
- Full: Synchronize all objects and data from the source data source, namely full data replication. The switch on the right enables periodic full replication. For more information, see Periodic Full Replication.
Spec The specification of the replication task. A larger specification provides a higher replication rate. Hover over the icon to view the rate and configuration information of each specification. If you configure the data replication task before purchasing resources, you can select the required specification here. If you purchase resources before configuring the task, the system selects the specification chosen during resource purchase, and you cannot change it in the task configuration.
If target table already exists (Required when Schema is selected) - Pre-Check Error and Stop Task: Stop the task when a table with the same name is detected during the precheck stage.
- Skip and Continue Task: When a table with the same name is detected during the precheck stage, display a prompt and continue the task. During schema replication, ignore the table with the same name. If you also perform data replication, data is appended to the table with the same name and existing data is not overwritten.
- Delete Objects and Rewrite: When a table with the same name is detected during the precheck stage, display a prompt and continue the task. During schema replication, delete the table with the same name in the target database and replicate the table schema again based on the source database. If you also perform data replication, data is written after schema replication is complete.
Target Table Exists Data (Required when Full is selected) - Pre-Check Error and Stop Task: Stop the task when data is detected in the target table during the precheck stage.
- Ignore existing target data and append to it.: When data is detected in the target table during the precheck stage, ignore that data and append other data.
- Clear target existing data before write: When data is detected in the target table during the precheck stage, delete that data and write it again.
On the Objects tab, configure the following parameters, and click Next.
Parameter Description Replication Objects Select the content to replicate. You can select All Objects to replicate all content in the source database, or select Customized Object, select the content to replicate in the Source Object list, and click > to add it to the Target Object list on the right. Blacklist Click Add to add a blacklist record, and select the database or object to add to the blacklist. The selected content will not be replicated. This is used to exclude specific databases or objects when performing full-database replication for Customized Object or Full Instance. - Left drop-down list: Select the database name to add to the blacklist.
- Right drop-down list: Select objects in the corresponding database. You can click multiple objects to select them. Leave it empty to add the entire database to the blacklist.
If you need to create multiple replication links with the same replication objects, you can create a configuration file and import it when creating a task. Click Import Config in the upper-right corner, click Download Template to download the configuration file template, edit the file, and then click Upload to upload it for batch import. Configuration file description:
Parameter Description source_table_nameThe source table name of the object to synchronize. destination_table_nameThe target table name that receives the synchronized object. source_schema_nameThe source schema name of the object to synchronize. destination_schema_nameThe target schema name that receives the synchronized object. source_database_nameThe source database name of the object to synchronize. target_database_nameThe target database name that receives the synchronized object. column_listThe list of columns to synchronize. extra_configurationAdditional configuration information. You can configure the following information here: column_rules: Defines column mappings and value rules. Field descriptions:column_name: Original column name.destination_column_name: Specifies the target column name.column_value: Specifies the column value, which can be an SQL function or a constant value.
filter_condition: Specifies row-level data filtering conditions. Only rows that meet the conditions are replicated.
tipThe following is an example of
extra_configuration:{
"extra_config":{
"column_rules":[
{
"column_name": "created_time", // Specifies the original column name to be mapped.
"destination_column_name": "migrated_time", // Maps the target column name to "migrated_time".
"column_value": "current_timestamp()" // Changes the column value to the current timestamp.
}
],
"filter_condition": "id != 0" // Only rows whose ID is not 0 are synchronized.
}
}For a complete example of the configuration file, see the downloaded template.
On the "Mapping" tab, choose different operations based on the selected replication type, and click Save and Pre-Check. If the source and target data sources are updated during mapping configuration, click Refresh Metadata in the upper-right corner of the page to refresh the source and target data source information.
Includes Schema: Configure the table name after synchronization to the target data source.
Does not include Schema: The system selects the database with the same name in the target data source by default. If no such database exists, you need to manually select the target database. The table names and column names in the target database must match the synchronization objects. If they do not match, you can manually map the table names and column names.
You can also perform the following operations:
- Click Mapping & Filtering on the right side of the page to customize the column names after synchronization to the target data source.
- On the Mapping & Filtering page, enter a comparison expression in the text box below Data Filter as the filtering condition. Only data that meets the filtering condition is synchronized to the target data source. For example, if the filtering condition is set to
emp_no>=10005, data whose emp_no column value is less than 10005 is not synchronized to the target data source. - Click the
icon to the right of "Target Table" to search for a table name and replace it with the target name.
- Enter a table name in the Search Table text box in the upper-right corner to quickly locate the target table.
- Click Batch Configuration in the upper-right corner to define common rules in batches, such as table name and column name case conversion, prefix or suffix addition, and replacement, so that mapping configuration can be applied uniformly to a large number of tables and columns.
On the Pre-check tab, wait for the system to complete the precheck. After the precheck is passed, click Launch.
You can select Enable data consistency comparison. After the synchronization task is complete, a data consistency comparison task based on the source data source automatically starts to ensure data consistency between both ends. Based on the selected Type, Enable data consistency comparison starts at the following times:
- Schema: Starts after schema replication is complete.
- Schema+Full: Starts after full replication is complete.
- Full: Starts after full replication is complete.
If the precheck fails, click Details in the Actions column on the right of the failed check item, troubleshoot the cause, manually fix the issue, and then click Check Again to run the precheck again until it passes.
Check items whose Result is Warning can be fixed or ignored depending on the situation.
On the Launch page, the Launch Successfully prompt appears, indicating that the synchronization task has started. You can then perform the following operations:
Click View Details to view the execution status of each stage of the synchronization task.
Click Back to list to return to the Replication task list page.
Step 3: View Synchronization Results
Log in to the NineData Console.
Click on Replication > Data Replication in the left navigation pane.
On the Replication page, click on the Task ID of the target synchronization task, and the page description is as follows.

Number Function Description 1 Configure Alerts After configuring alerts, the system will notify you in the selected way when the task fails. For more information, please refer to Operational Monitoring Overview. 2 More - Pause: Pause the task. Only tasks with the status Running are selectable.
- Duplicate: Create a new replication task with the same configuration as the current task.
- Terminate: End tasks that are incomplete or in listening (i.e., in incremental synchronization). After terminating the task, it cannot be restarted, so please proceed with caution. If triggers are included in the synchronization object, trigger replication options will pop up, please choose as needed.
- Delete: Delete the task. Once the task is deleted, it cannot be recovered, so please proceed with caution.
3 Structural Replication (Displayed in scenarios involving structural replication) Display the progress and details of structural replication. - Click on Log on the right side of the page: View the execution log of structural replication.
- Click on
on the right side of the page: View the latest information.
- Click on View DDL in the Actions column on the right side of the target object in the list: View SQL playback.
4 Full Replication (Displayed in scenarios involving full replication) Display the progress and details of full replication. - Click on Monitor on the right side of the page: View various monitoring indicators during full replication. During full replication, you can also click on Flow Control Settings on the right side of the monitoring indicator page to limit the rate of data written to the target data source per second. The unit is rows/second.
- Click on Log on the right side of the page: View the execution log of full replication.
- Click on
on the right side of the page: View the latest information.
5 Data Comparison Display the comparison results between the source data source and the target data source. If you have not enabled data comparison, click Enable Comparison on the page. - Click on Re-compare on the right side of the page: Re-initiate the comparison of data between the current source and target ends.
- Click on Stop on the right side of the page: After starting the comparison task, you can click this button to stop the comparison task immediately.
- Click on Log on the right side of the page: View the execution log of consistency comparison.
- Click on Monitor (only displayed in data comparison): View the trend chart of RPS (records per second compared) for comparison. Click on Details to view records from earlier times.
- Click on
in the Actions column on the right side of the comparison list (displayed only under the Data tab when there is inconsistency): View detailed comparison between the source and target ends.
- Click on
in the Actions column on the right side of the comparison list (displayed only in case of inconsistency): Generate change SQL, which you can directly copy to the target data source to execute and modify the inconsistent content.
6 Expand Display detailed information of the current replication task. Common Options: - Export table configuration: Export the current task's database and table configuration, allowing for quick import when creating a new replication task. This helps rapidly establish multiple replication links with the same replication objects.
- Alert Rules: Configure the alarm strategy for the current task.
Appendix: Routing Algorithm Description
The main function of routing algorithms is to automatically complete data routing. In routing algorithm configuration, target sharding of databases and tables is defined through the following expressions:
'<dbname_expression>''.<tablename_expression>'
<dbname_expression>: Database name expression in the format:'<dbname_prefix>'+(<expression>)+'<dbname_suffix>'.'<dbname_prefix>': The prefix of the database name, such as'logical_db_0'.(<expression>): The dynamic numeric part of the database name composition, e.g.,#user_id#%4. Suppose the value of theuser_idcolumn is1. Dividing1by4and taking the remainder gives1, resulting in a database name oflogical_db_01when combined with the prefix. NineData specifies using#to enclose field names in routing algorithms for easier parsing.'<dbname_suffix>': The suffix of the database name, which can be configured as needed or left empty, e.g.,'_bak'. The final database name would belogical_db_01_bak.
Example: If the result of
#user_id#%4is0, the corresponding database routing<dbname_expression>would be written as follows:Target Database Name <dbname_expression>Examplelogical_db_01 'logical_db_0'+(#user_id#%4+1) logical_db_01_bak 'logical_db_0'+(#user_id#%4+1)+'_bak' logical_db_00 'logical_db_0'+(#user_id#%4) logical_db_1 'logicaldb'+(#user_id#%4+1) .<tablename_expression>: Table name expression in the format:'.<tablename_prefix>'+(<expression>)+'<tablename_suffix>'.'.<tablename_prefix>': The prefix of the table name, such as'.test_time_0'. The dot (.) indicates the table belongs to the preceding database.(<expression>): The dynamic numeric part of the table name composition, e.g.,#user_id#%4. Suppose the value of theuser_idcolumn is1. Dividing1by4and taking the remainder gives1, resulting in a table name oftest_time_01when combined with the prefix. NineData specifies using#to enclose field names in routing algorithms for easier parsing.'<tablename_suffix>': The suffix of the table name, which can be configured as needed or left empty, e.g.,'_bak'. The final table name would be.test_time_01_bak.
Example: If the result of
#user_id#%4is0, the corresponding table routing.<tablename_expression>would be written as follows:Target Table Name <tablename_expression>Exampletest_time_01 '.test_time_0'+(#user_id#%4+1) test_time_01_bak '.test_time_0'+(#user_id#%4+1)+'_bak' test_time_00 '.test_time_0'+(#user_id#%4) test_time_1 '.testtime'+(#user_id#%4+1)
Using the examples above, if the value of the user_id column is 0, the following routing algorithm will route to the test_time_01 table in the logical_db_01 database:
'logical_db_0'+(#user_id#%4+1)'.test_time_0'+(#user_id#%4+1)