Skip to main content

How to Synchronize Data from PostgreSQL to Doris in Real Time

PostgreSQL can be said to be a currently popular relational database. In addition to having a variety of functions and strong performance, it also has excellent scalability. More importantly, it is open source, and there is a reason for its popularity.

Although PostgreSQL is very powerful, it also has its shortcomings. Compared with professional data analysis products, there are roughly the following deficiencies:

  • PostgreSQL itself is not very suitable for real-time data analysis. Although PostgreSQL has the ability to expand to support corresponding capabilities, its performance may not be very ideal when processing a large amount of real-time data.

  • As a business system, the accumulation of historical data in PostgreSQL will seriously affect the performance of the database and increase the storage cost, which is also a problem that must be solved.

Doris, on the other hand, is a professional data analysis product. It is a real-time data warehouse that returns query results at a sub-second level under massive data, and supports high-concurrency point queries and high-throughput complex analysis scenarios. It is very suitable for real-time data analysis and queries.

Therefore, we can choose to use Doris with PostgreSQL, which can relatively perfectly make up for the shortcomings of PostgreSQL.

Great~ The goal is set, so the question arises...

How to Synchronize Data from PostgreSQL to Doris in Real Time?

As we all know, the complexity of data synchronization between two unrelated data management systems, especially real-time synchronization, is enough to make senior DBAs headache. There are several problems that have to be solved:

  • Table Initialization: Before synchronizing data, it is necessary to manually create tables with the same structure as the source end in the Doris end, and then receive data from the source end. When there are a large number of tables and the table structure is very complex at the source end, do you want to die?

  • Data Structure Mapping: The data structures of the two are different, how to ensure the integrity of the data after synchronization from the source end? Unless you have an extremely acute understanding of the data models of the two and ensure that there are absolutely no mistakes, you can achieve the ideal result.

  • Real-time Synchronization Rate: The premise of real-time data analysis through Doris is that the data in Doris must be the latest, that is, completely consistent with the source end of PostgreSQL, and the analysis results are meaningful. Ensuring this is not an easy task.

  • Source End DDL Statement Linkage: The most painful point, how to ensure that the synchronization link is not interrupted when the data structure at the source end changes? It is necessary to capture the DDL changes of the source end PostgreSQL in real time and synchronize and execute them in the target end of Doris in a timely manner.

If these problems can be solved smoothly, the real-time synchronization work will be very smooth. So how to solve it? There are methods, and it is not difficult at all. For the above 4 problems, NineData has absolute speaking rights. Let's first look at its features:

  • Structure Copying: Based on the characteristics of the target end data source, automatically and efficiently complete the creation of tables, mapping of data structures, and other work.

  • Replication Performance: Based on dynamic batch accumulation, parallel replication, Stream Load, and other technologies, the replication performance can easily reach 200 MB/S.

  • DDL Capture and Execution: Real-time detection of DDL operations at the source end, and synchronization and execution at the target end, to ensure that other business changes can be carried out stably.

It is easy to solve and very pleasant. Let's take a look at how to operate.

Step One: Enter Source and Target Data Sources

  1. Log in to the NineData console, click on Data Source Management > Data Sources, and then click on Create Data Source on the page to select the data source you need to enter.

    image-20240319142535094

  2. Configure according to the page prompts, and then click on Create Data Source to complete the creation.

    image-20240319150009254

  1. Log in to the NineData console, click on Data Replication > Data Replication, and then click on Create Replication.

    image-20240319150313727

  2. Configure the replication task according to the page prompts. Since we want to achieve long-term real-time data synchronization, we need to check Incremental Replication in the Replication Type.

    image-20240319162149015

  3. After the configuration is completed, start the task. For all the synchronization objects you have configured, NineData will first perform a full migration of all the existing data, and then synchronize the incremental data newly added in PostgreSQL in real time. All newly written data will be synchronized to Doris without omission. Whenever the incremental data at the target end catches up with the source end, the task panel will display Delay 0 seconds, as shown in the figure below.

    image-20240320100557991

Step Three (Optional): Verify the Integrity of Synchronized Data at the Target End

In addition to the synchronization function, NineData also provides a comparison function for the synchronized data between the source end and the target end after synchronization to ensure the integrity of the data at the target end.

  1. Log in to the NineData console, click on Data Replication > Data Replication, and then click on the replication task ID created in Step Two.

    image-20240320101420287

  2. Click on the Data Comparison tab, and click on Start Data Comparison (if Start Data Consistency Comparison is checked in the task configuration of Step Two, the comparison result is directly displayed here).

    image-20240320101520639

    tip

    After starting, the system will automatically compare the synchronization objects at the source end and the target end, and give the comparison result.

    image-20240320101823011

  3. You can click on Re-compare on the page after a while to verify the synchronization result of the latest incremental data.

    image-20240320102016394

Step Four (Optional): Configure Task Exception Alerts

Since it is a long-term task, you may need the system to monitor the task status in real time and notify you immediately when there is an exception in the task.

  1. Log in to the NineData console, click on Data Replication > Data Replication, and then click on the replication task ID created in Step Two.

    image-20240320101420287

  2. Click on Configure Alerts in the upper right corner.

    image-20240320104731392

  3. Enter the Policy Name, and click on Save Configuration. You can directly use the built-in default rules to send a text message reminder when the task fails or the replication delay is greater than or equal to 10 minutes. You can also create custom rules to notify according to your needs.

    image-20240320105025912

Summary

So far, you have obtained a long-term stable real-time synchronization link from PostgreSQL to Doris. For users who need to use Doris for real-time analysis of business data, such a link is indispensable. At the same time, after synchronizing the old data in stock to Doris, you can clean up the old data on the PostgreSQL end to free up storage space. Especially for enterprises using cloud resources, cloud resources are relatively expensive, and doing so can greatly reduce storage costs and invest more in more important businesses.