Replicating Sharded Databases and Tables
NineData supports data replication for MySQL sharded databases and tables, allowing you to replicate data from multiple sharded databases and tables to other sharded databases or aggregate them into another MySQL instance.
Prerequisites
The source and target data sources must be added to NineData. For details on how to add data sources, see Adding Data Sources.
Usage Restrictions
- When creating database table groups, the physical sharded tables must exist and be extractable using an extraction expression.
- The data replication feature only applies to user databases in the data source. System databases will not be replicated. For example, in MySQL data sources,
information_schema
,mysql
,performance_schema
, andsys
databases will not be replicated. - The account for the source data source must have
SELECT
privileges for the replication objects (structure replication, full data replication),SHOW VIEW
privileges (for view replication), andREPLICATION CLIENT
andREPLICATION SLAVE
privileges (for incremental replication). The account for the target data source must have DML and DDL privileges. - Before performing data synchronization, assess the performance of both the source and target data sources. It is recommended to run data synchronization during off-peak business hours to avoid high load on the databases during the full data initialization, which consumes read and write resources.
- Only table objects are supported for replication; other objects will be automatically ignored.
- Ensure that every table in the synchronization object has a partition key, primary key, or unique constraint. Additionally, when using the sharding routing algorithm, the two rows of data routed to the same target table must not have the same primary key or unique constraint, or new data will overwrite the old data.
Step 1: Create Database Table Group
You need to add the sharded databases and tables from the source data source into NineData’s database table group to enable their migration. If your replication target also involves sharded databases and tables, follow this step to create a target database table group as well.
Log in to the NineData Console.
In the left navigation pane, click > .
Click the tab and click on the page.
Configure the form according to the table below, and click .
Parameter Description Enter the name of the database group, using only English characters, numbers, and underscores, starting with an English letter. To facilitate future use and management, use a meaningful name. (optional) Enter a business description for this database group. Select the name of the environment your business belongs to, which will help you filter the data sources. Click , select the data sources to add to the group. You can select multiple data sources, select all, or deselect them. You can also search by data source name. After clicking , you will need to select the specific sharded databases. The page will automatically redirect to . Click .
Configure the form according to the table below, and click .
Parameter Description Enter the name of the table group, using only English characters, numbers, and underscores, starting with an English letter. To facilitate future use and management, use a meaningful name. Supports and . When is set to , you need to configure this option. Enter an expression and click . The system will automatically traverse and extract all qualifying tables in the target database based on the expression provided. (optional) When is set to , you can configure this parameter based on the routing algorithm used in your application, which allows quick resolution of the tables to be accessed. For the configuration method of , please refer to the Appendix of this document. When is set to , you need to configure this. Click , select the data sources to add to the group. You can select multiple data sources, select all, or deselect them. You can also search by data source name. After clicking , you will need to select specific databases and tables. Depending on the data source, you may also need to select a specific schema.
Step 2: Create Data Replication Task
NineData’s data replication product has been commercialized. You can still use 10 replication tasks for free, with the following considerations:
Among the 10 replication tasks, you can include 1 task, with a specification of Micro.
Tasks with a status of do not count towards the 10-task limit. If you have already created 10 replication tasks and want to create more, you can terminate previous replication tasks and then create new ones.
When creating replication tasks, you can only select the you have purchased. Specifications that have not been purchased will be grayed out and cannot be selected. If you need to purchase additional specifications, please contact us through the customer service icon at the bottom right of the page.
Log in to the NineData Console.
In the left navigation bar, click .
On the page, click in the upper right corner.
In the tab, configure the settings according to the table below, and click .
Parameter Description Enter the name of the data synchronization task. To make it easier to find and manage later, try to use a meaningful name. Maximum 64 characters. Select the database group where the synchronization object is located. The target data source or database group to receive the synchronized object. Choose the content to be copied to the target data source or database group. - : Only synchronize the structure of the source database group, not the data.
- : Synchronize all objects and data from the source database group, i.e., full data replication. The switch on the right is for periodic full replication. For more information, see Periodic Full Replication.
(required if is selected) - : If a table with the same name is detected during the pre-check, stop the task.
- : If a table with the same name is detected during the pre-check, continue the task with a warning. The table will be skipped during schema replication. If you also perform data replication, the data will be appended to the same-name table without overwriting existing data.
- : If a table with the same name is detected during the pre-check, continue the task with a warning. During schema replication, the target table will be dropped and recreated based on the source schema. If you also perform data replication, the data will be written after the schema replication.
- (optional if performing both schema and data replication): If a table with the same name is detected during the pre-check, continue the task with a warning. The target table schema will be retained, and the data will be cleared before writing new data from the source table.
(required if is not selected) - : Stop the task if data is found in the target table during the pre-check.
- : Ignore existing data in the target table and append new data.
- : Delete existing data in the target table and replace it with new data from the source.
In the tab, configure the following parameters, then click .
Parameter Description Choose the content to be copied. You can select to copy all contents of the source database, or to manually select objects in the list and add them to the list on the right. (optional) Click to add a blacklist record, selecting databases or objects that will not be copied. This is used for excluding certain databases or objects during full or custom replication. - Left dropdown: Select the database to be blacklisted.
- Right dropdown: Select specific objects within the database, or leave it empty to blacklist the entire database.
If you need to create multiple replication routes with the same objects, you can create a configuration file and import it when creating a new task. Click in the top right, then click Download Template to download the configuration file template. After editing, upload the file using to import it in bulk. Configuration file instructions:
Parameter Description source_table_name
Name of the source table containing the object to be synchronized. destination_table_name
Name of the destination table to receive the synchronized object. source_schema_name
Schema name of the source table containing the object. destination_schema_name
Schema name of the destination table receiving the object. source_database_name
Source database name. target_database_name
Target database name. column_list
List of fields to be synchronized. extra_configuration
Additional configurations for field mapping, value transformation, or filtering conditions. - Field mapping:
column_name
,destination_column_name
- Field value:
column_value
- Data filtering:
filter_condition
tipExample content for
extra_configuration
:{
"column_name": "created_time",
"destination_column_name": "migrated_time",
"column_value": "current_timestamp()",
"filter_condition": "id != 0"
}For a full example, see the downloaded template.
In the tab, choose different operations based on the type of replication, then click . If there are updates to the source or target data sources, click in the top right to retrieve the latest information.
If is included: Configure the target table's name in the target data source after synchronization.
If is not included: The system defaults to selecting a target database with the same name. If the target database does not exist, you must manually choose it. The table and column names in the target database should match the source. You can also manually map table and column names if they differ.
In addition, you can perform ETL (Extract, Transform, Load) operations:
Use : For example, to remove trailing spaces from a field name, click next to the target table, find the desired field, and enter
removepattern(x,' +$')
in the field, then click and .Add a new column to the target table: For example, to add a
hospital_code
column with a constant value, click next to the target table, then click under , fill in the column name, expression, and type, and click .Set : Click on the right side of the page, and in the field, enter
emp_no>=10005
to exclude rows where emp_no is less than 10005 from being synchronized to the target data source.
In the tab, wait for the system to complete the pre-check. After the pre-check passes, click .
You can select to automatically start a data consistency comparison after the synchronization task is completed, ensuring data consistency between the source and destination. Depending on the you select, will start at the following times:
- : After schema replication is completed.
- + , : After full replication is completed.
If the pre-check fails, click in the column next to the failed check item to troubleshoot the issue, then click to rerun the pre-check until it passes.
For items marked with , you can decide whether to fix or ignore them based on the specific situation.
In the page, once appears, the synchronization task will begin. At this point, you can:
Click to view the progress of each stage of the synchronization task.
Click to return to the task list page.
Step 3: View Synchronization Results
Log in to the NineData console.
Click on > in the left navigation pane.
On the page, click on the of the target synchronization task, and the page description is as follows.
Number Function Description 1 Configure Alerts After configuring alerts, the system will notify you in the selected way when the task fails. For more information, please refer to Operational Monitoring Overview. 2 More - : Pause the task. Only tasks with the status Running are selectable.
- : Create a new replication task with the same configuration as the current task.
- : End tasks that are incomplete or in listening (i.e., in incremental synchronization). After terminating the task, it cannot be restarted, so please proceed with caution. If triggers are included in the synchronization object, trigger replication options will pop up, please choose as needed.
- : Delete the task. Once the task is deleted, it cannot be recovered, so please proceed with caution.
3 Structural Replication (Displayed in scenarios involving structural replication) Display the progress and details of structural replication. - Click on on the right side of the page: View the execution log of structural replication.
- Click on on the right side of the page: View the latest information.
- Click on in the column on the right side of the target object in the list: View SQL playback.
4 Full Replication (Displayed in scenarios involving full replication) Display the progress and details of full replication. - Click on on the right side of the page: View various monitoring indicators during full replication. During full replication, you can also click on on the right side of the monitoring indicator page to limit the rate of data written to the target data source per second. The unit is rows/second.
- Click on on the right side of the page: View the execution log of full replication.
- Click on on the right side of the page: View the latest information.
5 Data Comparison Display the comparison results between the source data source and the target data source. If you have not enabled data comparison, click on the page. - Click on on the right side of the page: Re-initiate the comparison of data between the current source and target ends.
- Click on on the right side of the page: After starting the comparison task, you can click this button to stop the comparison task immediately.
- Click on on the right side of the page: View the execution log of consistency comparison.
- Click on (only displayed in data comparison): View the trend chart of RPS (records per second compared) for comparison. Click on to view records from earlier times.
- Click on in the column on the right side of the comparison list (displayed only under the tab when there is inconsistency): View detailed comparison between the source and target ends.
- Click on in the column on the right side of the comparison list (displayed only in case of inconsistency): Generate change SQL, which you can directly copy to the target data source to execute and modify the inconsistent content.
6 Expand Display detailed information of the current replication task. Common Options: - : Export the current task's database and table configuration, allowing for quick import when creating a new replication task. This helps rapidly establish multiple replication links with the same replication objects.
- : Configure the alarm strategy for the current task.
Appendix: Routing Algorithm Description
The main function of routing algorithms is to automatically complete data routing. In routing algorithm configuration, target sharding of databases and tables is defined through the following expressions:
'<dbname_expression>''.<tablename_expression>'
<dbname_expression>
: Database name expression in the format:'<dbname_prefix>'
+(<expression>)
+'<dbname_suffix>'
.'<dbname_prefix>'
: The prefix of the database name, such as'logical_db_0'
.(<expression>)
: The dynamic numeric part of the database name composition, e.g.,#user_id#%4
. Suppose the value of theuser_id
column is1
. Dividing1
by4
and taking the remainder gives1
, resulting in a database name oflogical_db_01
when combined with the prefix. NineData specifies using#
to enclose field names in routing algorithms for easier parsing.'<dbname_suffix>'
: The suffix of the database name, which can be configured as needed or left empty, e.g.,'_bak'
. The final database name would belogical_db_01_bak
.
Example: If the result of
#user_id#%4
is0
, the corresponding database routing<dbname_expression>
would be written as follows:Target Database Name <dbname_expression>
Examplelogical_db_01 'logical_db_0'+(#user_id#%4+1) logical_db_01_bak 'logical_db_0'+(#user_id#%4+1)+'_bak' logical_db_00 'logical_db_0'+(#user_id#%4) logical_db_1 'logicaldb'+(#user_id#%4+1) .<tablename_expression>
: Table name expression in the format:'.<tablename_prefix>'
+(<expression>)
+'<tablename_suffix>'
.'.<tablename_prefix>'
: The prefix of the table name, such as'.test_time_0'
. The dot (.
) indicates the table belongs to the preceding database.(<expression>)
: The dynamic numeric part of the table name composition, e.g.,#user_id#%4
. Suppose the value of theuser_id
column is1
. Dividing1
by4
and taking the remainder gives1
, resulting in a table name oftest_time_01
when combined with the prefix. NineData specifies using#
to enclose field names in routing algorithms for easier parsing.'<tablename_suffix>'
: The suffix of the table name, which can be configured as needed or left empty, e.g.,'_bak'
. The final table name would be.test_time_01_bak
.
Example: If the result of
#user_id#%4
is0
, the corresponding table routing.<tablename_expression>
would be written as follows:Target Table Name <tablename_expression>
Exampletest_time_01 '.test_time_0'+(#user_id#%4+1) test_time_01_bak '.test_time_0'+(#user_id#%4+1)+'_bak' test_time_00 '.test_time_0'+(#user_id#%4) test_time_1 '.testtime'+(#user_id#%4+1)
Using the examples above, if the value of the user_id
column is 0
, the following routing algorithm will route to the test_time_01
table in the logical_db_01
database:
'logical_db_0'+(#user_id#%4+1)'.test_time_0'+(#user_id#%4+1)