Skip to main content

How to perform bulk data change operations without affecting business?

Many DBA colleagues have probably encountered this issue: when executing a large volume of data updates or deletions with a single DML SQL statement (for example: regularly deleting expired data or cleaning up invalid data records), if there are no appropriate indexes, and a single SQL affects too many rows, normal business write operations can be blocked, which may cause business fluctuations, and in severe cases, may lead to business unavailability.

Under normal circumstances, we wouldn't establish indexes for such one-time operations. To avoid causing too much impact on system performance, we usually adopt stored procedures or write temporary code to process these large datasets in batches.

However, since a single stored procedure or code can only correspond to one requirement and cannot be reused for other new requirements, it is necessary to modify or write them temporarily each time, which may also introduce uncertain risks. Therefore, when dealing with different types of data update requirements, we need a more flexible and general method.

Let's introduce our alternative solution today, which is to achieve data changes through a "lock-free" method, allowing other concurrent DMLs and improving the system's concurrency performance. Since the lock-free structural change is called OnlineDDL, let's call it OnlineDML for now.

Avoiding data table blocking and ensuring normal business operation

Since the goal is to prevent table blocking caused by a large transaction's DML operation, the core requirement of OnlineDML is to split the large DML into multiple small batch changes and execute them in batches to ensure that online business access is not affected. This is a crucial guarantee for online systems, especially in emergency situations.

Let's talk about the large volume of DML change scenarios in our company. We need to clean up the historical data in the order table from three years ago. This change involves tens of millions of rows of deletion operations. Using a single DML would have a significant impact on the business database, so we need to use NineData's OnlineDML feature to achieve lock-free data changes.

The operation is very simple, and they also have a more detailed document. I will list a rough operation process:

What I did:

  1. Enable the OnlineDML feature for the data source that needs to perform a large volume of DML changes.

    image-20240118153904372

  2. Initiate a change request for the data source through NineData's SQL task feature.

    iShot_2024-01-18_15.47.58

    iShot_2024-01-18_16.09.29

    tip

    This tool has a timing execution feature, which is quite surprising and has saved me a lot of trouble. I set the time to 3 a.m. during the off-peak period of business, so I don't have to stay up late at night.

    iShot_2024-01-18_16.10.46

What NineData does:

In fact, as long as you perform these two operations, the rest is taken care of by NineData. Generally, it will help me perform the following series of actions:

  1. Detect the number of data rows that need to be scanned by the DML statement.

  2. If the number of scanned data rows exceeds the preset threshold, split the DML statement appropriately.

  3. Execute the split DML statements in batches, with the number of rows changed in each batch based on the value I pre-configure.

  4. Synchronize and analyze the database pressure during the execution process. When the database pressure is detected to be too high, wait for a period of time automatically to ensure database performance.

Experience

After a series of operations, the first thing I felt was the simple and clear interface of NineData OnlineDML. In the NineData console, you can configure OnlineDML related rules through a few simple steps, from setting the DML statement scan row threshold to enabling the OnlineDML feature, and then configuring the batch size and waiting time, everything is very smooth.

The most important thing is the effect of use. During the process of cleaning up nearly 10 million rows of data, there was no obvious fluctuation in the business database. Without a doubt, the NineData OnlineDML solution has effortlessly solved the pain point of large volume data changes, providing a more flexible and general solution for various online systems.

Summary

This time, the NineData OnlineDML feature has easily solved the business table lock problem when cleaning up tens of millions of rows of expired data for the author. Next time, I will share a method to automatically archive periodic data and clean up the database, so stay tuned.