Skip to main content

How to Synchronize Data from PostgreSQL to ClickHouse in Real Time

ClickHouse is a high-performance columnar storage database, and its characteristic can be summarized in one word: fast! However, ClickHouse is not just about speed; its columnar storage method makes it highly suitable for Online Analytical Processing (OLAP) scenarios, where it performs exceptionally well in complex queries and aggregation operations. This explains why an increasing number of enterprises choose it as their preferred real-time data warehouse.

In practical applications, the combination of PostgreSQL and ClickHouse is very common, which can fully leverage the advantages of both, for the following reasons.

PostgreSQL, as a powerful Online Transaction Processing system (OLTP), has the advantages of stability, strong scalability, rich functionality, and high performance, making it suitable for various scales and types of OLTP scenarios. However, there is still a certain performance gap in real-time data analysis.

Therefore, the speed advantage of ClickHouse in executing analytical queries can well compensate for the shortcomings of PostgreSQL in query performance.

How to Synchronize Data from PostgreSQL to ClickHouse in Real Time?

As we all know, data synchronization between two completely unrelated data management systems, especially real-time synchronization, is complex enough to give senior DBAs a headache. There are several issues that must be addressed:

  • Table Initialization: Before synchronizing data, it is necessary to manually create tables with the same structure as the source on the ClickHouse side, and then data can be received from the source. When there are a large number of tables with complex structures on the source side, don't you feel like dying?
  • Data Structure Mapping: The data structures of the two are different, how to ensure the integrity of the data after synchronization from the source? Unless you have an extremely thorough understanding of the data models of both and ensure absolute accuracy, the ideal result can be achieved.
  • Real-time Synchronization Rate: The premise of real-time data analysis through ClickHouse is that the data in ClickHouse must be the latest, that is, completely consistent with the source PostgreSQL, and the analysis results are meaningful. Ensuring this is not an easy task.
  • Source-side DDL Statement Linkage: The most painful point, how to ensure that the synchronization link is not interrupted when the data structure on the source side changes? It is necessary to capture the DDL changes of the source-side PostgreSQL in real time and synchronize them in the target-side ClickHouse in a timely manner.

If these issues can be resolved smoothly, the real-time synchronization work will be very smooth. So how to solve it? There are methods, and they are effortless. For the above 4 issues, NineData has absolute authority, let's first see how it solves these problems:

  • Structure Replication: Based on the characteristics of the target-side data source, automatically and efficiently complete table creation, data structure mapping, and other work.
  • Replication Performance: Based on dynamic batch accumulation, parallel replication, Stream Load, and other technologies, the replication performance can easily reach 200 MB/S.
  • DDL Capture and Execution: Real-time detection of DDL operations on the source side, and synchronization execution on the target side, ensuring that other business changes can proceed stably.

It's easy to solve and very pleasant, let's take a look at how to operate.

Step One: Enter Source and Target Data Sources

  1. Log in to the NineData console, click on Data Source Management > Data Sources, and then click on Create Data Source on the page to select the data source you need to enter.

    image-20240321112625957

  2. Configure according to the page prompts, and then click Create Data Source to complete the creation.

    image-20240319150009254

  1. Log in to the NineData console, click on Data Replication > Data Replication, and then click on Create Replication.

    image-20240319150313727

  2. Configure the replication task according to the page prompts. Since we want to achieve long-term real-time data synchronization, we need to check Incremental Replication in the Replication Type.

    image-20240319162149015

  3. After the configuration is completed, start the task. For all the synchronization objects you have configured, NineData will first perform a full migration of all the existing data, and then synchronize the incremental data newly added in PostgreSQL in real time. All newly written data will be synchronized to Doris without omission. Whenever the incremental data on the target side catches up with the source side, the task panel will display Delay 0 seconds, as shown in the figure below.

    image-20240320100557991

Step Three (Optional): Verify the Integrity of Synchronized Data on the Target Side

In addition to the synchronization function, NineData also provides a comparison function for the synchronized data between the source and target after synchronization to ensure the integrity of the data on the target side.

  1. Log in to the NineData console, click on Data Replication > Data Replication, and then click on the replication task ID created in Step Two.

    image-20240320101420287

  2. Click on the Data Comparison tab, and click on Start Data Comparison (if Start Data Consistency Comparison is checked in the task configuration of Step Two, the comparison results are directly displayed here).

    image-20240320101520639

    tip

    After enabling, the system will automatically compare the synchronization objects between the source and target, and provide comparison results.

    image-20240320101823011

  3. You can click on Re-compare on the page after a while to verify the synchronization results of the latest incremental data.

    image-20240320102016394

Step Four (Optional): Configure Task Exception Alerts

Since it is a long-term task, you may need the system to monitor the task status in real time and notify you immediately when there is an exception.

  1. Log in to the NineData console, click on Data Replication > Data Replication, and then click on the replication task ID created in Step Two.

    image-20240320101420287

  2. Click on Configure Alerts in the upper right corner.

    image-20240320104731392

  3. Enter the Policy Name, and click on Save Configuration. You can directly use the built-in default rules to send SMS reminders when the task fails or the replication delay is greater than or equal to 10 minutes. You can also create custom rules to notify according to your needs.

    image-20240320105025912

Summary

So far, you have obtained a long-term stable real-time synchronization link from PostgreSQL to ClickHouse, which is indispensable for users who need to use ClickHouse for real-time analysis of business data. At the same time, after synchronizing the old data in stock to ClickHouse, you can clean up the old data on the PostgreSQL side to free up storage space. Especially for enterprises using cloud resources, cloud resources are relatively expensive, and doing so can greatly reduce storage costs and invest more in more important businesses.