PostgreSQL to ClickHouse Data Replication

NineData Data Replication supports schema replication and full replication from self-managed PostgreSQL to ClickHouse.

Overview

Use this guide to initialize ClickHouse tables from PostgreSQL, copy full data, and verify the replicated result before analytics or downstream workloads use the target.

NineData data replication supports schema, full data, and incremental data replication between data sources. For supported data sources, it also supports bidirectional replication for geo-distributed active-active architectures.

Schema replication: Replicates object structures between homogeneous and heterogeneous data sources.
Full data replication: Uses data sharding and row-level concurrent batch replication to improve throughput. Breakpoint resume helps preserve data accuracy, including for tables without primary keys.
Incremental data replication: Replicates DML and DDL changes for supported object types. Row-level concurrency and hotspot merge processing help maintain replication throughput.
Bidirectional real-time data replication (only between MySQL instances): Replicates changes in both directions between nodes so data can stay current across participating nodes.

Use these capabilities for full or incremental data replication, migration, synchronization, data integration, and low-downtime migration workflows.

Before you begin

Add the source and target data sources to NineData. For instructions, see Add Data Source.
Use supported source and target versions.
Source data source Target data source
PostgreSQL 9 or later ClickHouse 20.8 or later
The access_management parameter value for the account that connects to ClickHouse must be 1.
Configuration method
Open the User.xml configuration file in /etc/clickhouse-server, find the target user, and add <access_management>1</access_management>.
Make sure the source and target accounts have the following permissions.
Replication type Source data source permissions Target data source permissions
Full replication CONNECT, SELECT TABLE-related permissions (CREATE, CREATE TABLE, ALTER, ALTER COLUMN, DROP, SELECT, INSERT)

Source data source	Target data source
PostgreSQL 9 or later	ClickHouse 20.8 or later

Replication type	Source data source permissions	Target data source permissions
Full replication	CONNECT, SELECT	TABLE-related permissions (CREATE, CREATE TABLE, ALTER, ALTER COLUMN, DROP, SELECT, INSERT)

Notes

Unsigned integer fields in the source PostgreSQL data source are mapped to UInt types in ClickHouse.

PostgreSQL to ClickHouse replication typically includes schema replication. NineData migrates PostgreSQL table structures to ClickHouse and inserts two system columns into the replicated ClickHouse tables. If the replication type does not include schema replication, prepare the following items manually:

The ClickHouse table structures used in the task must match the PostgreSQL table structures.

The replicated ClickHouse tables must include the following two system columns to record DML operations and Binlog time.

Column name	Data type	Default value	Description
_jz_data_sign	Int8	DEFAULT 1	Records the DML operation type to keep ClickHouse and PostgreSQL data consistent. INSERT operation: records `1`. DELETE operation: records `-1`. UPDATE operation: splits the change into one INSERT record and one DELETE record.
_jz_data_time	String	DEFAULT now()	Records the time of Binlog.

Example:

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    ...
    _jz_data_sign Int8 DEFAULT 1 COMMENT 'replication data update sign: add = 1, remove = -1',
    _jz_data_time String DEFAULT now() COMMENT 'replication data update time'
) ENGINE = engine

Restrictions

Before you run a replication task, assess the performance of the source and target data sources. Run full replication during off-peak hours when possible because the initial load consumes read and write resources on both sides.
Each replicated table should have a primary key or unique constraint, and column names should be unique. Otherwise, duplicate rows may be replicated.

Procedure

Sign in to the NineData Console.
In the left navigation pane, click Replication > Data Replication.
On the Replication page, click Create Replication.

On the Source & Target tab, configure the fields in the table, and click Next.

Parameter	Description
Name	Enter a name for the data synchronization task. To make the task easier to find and manage later, use a meaningful name. Up to 64 characters are supported.
Source	The data source that contains the objects to synchronize.
Datahub Project	Select the target Datahub Project. Data from the source data source will be written to the specified Project.
Type	Select the replication type. Schema: Synchronize only the database and table schemas of the source data source, without synchronizing data. Full: Synchronize all objects and data from the source data source, namely full data replication. The switch on the right enables periodic full replication. For more information, see Periodic Full Replication.
Spec (Unavailable only when Schema is selected)	The specification of the replication task. A larger specification provides a higher replication rate. Hover over the icon to view the rate and configuration information of each specification. Each specification shows the available quantity and total quantity. When the available quantity is 0, the specification is grayed out and cannot be selected.
If target table already exists (Required when Schema is selected)	Pre-Check Error and Stop Task: Stop the task when a table with the same name is detected during the precheck stage. Skip and Continue Task: When a table with the same name is detected during the precheck stage, display a message and continue the task. During schema replication, ignore the table with the same name. If you also perform data replication, data is appended to the table with the same name and existing data is not overwritten. Delete Objects and Rewrite: When a table with the same name is detected during the precheck stage, display a message and continue the task. During schema replication, delete the table with the same name in the target database and replicate the table schema again based on the source database. If you also perform data replication, data is written after schema replication completes.
Target Table Exists Data (Required when Full is selected)	Pre-Check Error and Stop Task: Stop the task when data is detected in the target table during the precheck stage. Ignore existing target data and append to it.: When data is detected in the target table during the precheck stage, ignore that data and append other data. Clear target existing data before write: When data is detected in the target table during the precheck stage, delete that data and write it again.

On the Objects tab, configure the parameters in the table, and click Next.

Parameter	Description

To create multiple replication tasks with the same replication objects, import a configuration file. Click Import Config, click Download Template to download the template, edit the file, and then click Upload to upload it and import the objects in bulk. The configuration file uses these fields:

Parameter	Description
`source_table_name`	The source table name of the object to synchronize.
`destination_table_name`	The target table name that receives the synchronized object.
`source_schema_name`	The source schema name of the object to synchronize.
`destination_schema_name`	The target schema name that receives the synchronized object.
`source_database_name`	The source database name of the object to synchronize.
`target_database_name`	The target database name that receives the synchronized object.
`column_list`	The list of columns to synchronize.
`extra_configuration`	Additional configuration information. This field supports: `column_rules`: Defines column mappings and value rules. Field descriptions: `column_name`: Original column name. `destination_column_name`: Specifies the target column name. `column_value`: Specifies the column value, which can be an SQL function or a constant value. `filter_condition`: Specifies row-level data filtering conditions. Only rows that meet the conditions are replicated.

tip

Example of extra_configuration:

{
  "extra_config":{
    "column_rules":[
      {
         "column_name": "created_time",
         "destination_column_name": "migrated_time",
         "column_value": "current_timestamp()"
      }
    ],
     "filter_condition": "id != 0"
  }
}

In this example, created_time is mapped to migrated_time, the target column value is changed to current_timestamp(), and only rows whose id value is not 0 are synchronized.

For a complete example of the configuration file, see the downloaded template.

On the "Mapping" tab, configure the mapping that matches the selected replication type, then click Save and Pre-Check. If source or target metadata changes while you configure mappings, click Refresh Metadata to refresh the metadata.
- Includes Schema: Configure the table name after synchronization to the target data source.
- Does not include Schema: NineData selects the database with the same name in the target data source by default. If no such database exists, select the target database manually. The table names and column names in the target database must match the synchronization objects. If they do not match, map the table names and column names manually.
Other available actions:
- Click Mapping & Filtering to customize the column names after synchronization to the target data source.
- On the Mapping & Filtering page, click Data Filter to configure filtering conditions by using comparison expressions. Only data that meets the filtering conditions is synchronized to the target data source. For example, if the filtering condition is set to emp_no>=10005, data whose emp_no column value is less than 10005 is not synchronized to the target data source.
- Click the icon to the right of "Target Table" to search for a table name and replace it with the target name.
- Enter a table name in the Search Table text box to quickly locate the target table.
- Click Batch Configuration to define common rules in batches, such as table name and column name case conversion, prefix or suffix addition, and replacement. Use this option to apply mapping configuration to many tables and columns at the same time.

On the Pre-check tab, wait for NineData to complete the precheck. After the precheck passes, click Launch.
- Select Enable data consistency comparison to start a data consistency comparison task based on the source data source after synchronization completes. Based on the selected Type, Enable data consistency comparison starts at these times:
  - Schema: Starts after schema replication completes.
  - Schema+Full: Starts after full replication completes.
  - Full: Starts after full replication completes.
- If the precheck fails, click Details in the Actions column for the failed check item, review the cause, fix the issue, and then click Check Again to run the precheck again until it passes.
- Items with Warning in Result can be fixed or ignored if required.
On the Launch page, the Launch Successfully message appears, indicating that the synchronization task has started. Then perform these actions:
- Click View Details to view the execution status of each stage of the synchronization task.
- Click Back to list to return to the Replication task list page.

Result

Sign in to the NineData Console.
In the left navigation pane, select Replication > Data Replication.

On the Replication page, click the Task ID of the target synchronization task. Review the task details page.

result_no_incre

Number	Function	Description
1	Configure Alerts	When the task fails, NineData notifies the selected channel through the configured alert. For more information, see Operational Monitoring Overview.
2	More	Pause: Pause the task. Only tasks with the status Running are selectable. Duplicate: Create a new replication task with the same configuration as the current task. Terminate: End tasks that are incomplete or in listening (i.e., in incremental synchronization). After terminating the task, it cannot be restarted, so proceed with caution. If triggers are included in the synchronization object, trigger replication options appear for selection. Delete: Delete the task. Once the task is deleted, it cannot be recovered, so proceed with caution.
3	Structural Replication (Displayed in scenarios involving structural replication)	Displays the progress and details of structural replication. Select Log to view the execution log of structural replication. Select the to view the latest information. Select View DDL in the Actions column for the target object in the list to view SQL replay.
4	Full Replication (Displayed in scenarios involving full replication)	Displays the progress and details of full replication. Select Monitor to view various monitoring indicators during full replication. During full replication, select Flow Control Settings on the monitoring page to limit the rate of data written to the target data source per second. The unit is rows/second. Select Log to view the execution log of full replication. Select the to view the latest information.
5	Data Comparison	Displays the comparison results between the source and target data sources. If data comparison is not enabled, select Enable Comparison on the page to enable it. Select Re-compare to rerun the comparison for the current source and target data sources. Select Stop to stop the comparison task immediately after it starts. Select Log to view the execution log of consistency comparison. Select Monitor (only displayed in data comparison) to view the trend chart of RPS (records per second compared) for comparison. Select Details to view records from earlier times. Select the in the Actions column in the comparison list (displayed only under the Data tab when inconsistencies are found) to view detailed comparison between the source and target data sources. Select the in the Actions column in the comparison list (displayed only when inconsistencies are found): Generate change SQL. Copy this SQL to the target data source and run it to fix the mismatch.
6	Expand	Displays detailed information of the current replication task. Common actions: Export table configuration: Export the current task's database and table configuration for quick import when creating another replication task with the same objects. Alert Rules: Configure alerts for the current task.

Appendix 1: PostgreSQL to ClickHouse data type mapping

During replication, PostgreSQL data types are mapped to the corresponding ClickHouse data types.

Category	PostgreSQL Data Type	ClickHouse Data Type
Numeric	SMALLINT/INT2	INT16
	INTEGER/INT4/INT	INT32
	BIGINT/INT8	INT64
	BIT	STRING
	BIT VARYING	STRING
	DOUBLE	FLOAT64
	DOUBLE PRECISION	FLOAT64
	REAL	FLOAT64
	FLOAT4	FLOAT64
	FLOAT8	FLOAT64
	NUMERIC	DECIMAL
	MONEY	DECIMAL
	BOOL/BOOLEAN	UINT8
DATE AND TIME	DATE	DATE32
	TIMESTAMP WITHOUT TIME ZONE	Version 20.3 and later: DATETIME64 Earlier than version 20.3 With time precision: DATETIME Without time precision: STRING
	TIMESTAMP WITH TIME ZONE/ TIMESTAMPTZ	Version 20.3 and later: DATETIME64 Earlier than version 20.3 With time precision: DATETIME Without time precision: STRING
	TIME WITH TIME ZONE/TIMETZ	STRING
	TIMESTAMP	Version 20.3 and later: DATETIME64 Earlier than version 20.3 With time precision: DATETIME Without time precision: STRING
	TIME	STRING
	INTERVAL	STRING
STRING	CHAR	STRING
	CHARACTER VARYING	STRING
	CHARACTER	STRING
	TEXT	STRING
	INET	STRING
	CIDR	STRING
	MACADDR	STRING
	MACADDR8	STRING
	UUID	STRING
RANGE	INT4RANGE	STRING
	INT8RANGE	STRING
	NUMRANGE	STRING
	DATERANGE	STRING
JSON	JSON	STRING
JSON	JSONB	STRING
BINARY	BYTEA	STRING
SPATIAL	POINT	POINT
	LINE	STRING
	LSEG	STRING
	BOX	STRING
	PATH	STRING
	POLYGON	STRING
	CIRCLE	STRING
XML	XML	STRING

Appendix 2: Precheck items

Check item	Check content
Source data source connection check	Checks the gateway status of the source data source, whether the instance is reachable, and whether the username and password are correct
Target data source connection check	Checks the gateway status of the target data source, whether the instance is reachable, and whether the username and password are correct
Target database privilege check	Checks whether the target database account has the required permissions
Source database privilege check	Checks whether the source database account has the required permissions
Target database data existence check	Checks whether the object to be replicated already contains data in the target database
Target database same-name object existence check	Checks whether an object with the same name already exists in the target database

Next steps

Data Replication overview

PostgreSQL to ClickHouse Data Replication

Overview​

Before you begin​

Notes​

Restrictions​

Procedure​

Result​

Appendix 1: PostgreSQL to ClickHouse data type mapping​

Appendix 2: Precheck items​

Next steps​