Skip to main content

Executing Large-scale Data Changes in Batches using NineData OnlineDML

NineData's SQL tasks support the OnlineDML feature, which automatically splits large-scale data changes into multiple statements for batch execution. This helps avoid table blocking caused by extensive data modifications, preventing any disruptions to business operations.

Background Information

Online systems often encounter various urgent situations, especially during malfunctions, requiring prompt handling of large-scale data updates and deletion requests. These demands typically lack appropriate indexes, and establishing indexes for one-time operations is often impractical. Therefore, we commonly resort to methods such as stored procedures or temporary code to process these vast datasets in batches, mitigating the impact on system performance.

However, this approach is not always flexible and universal. When dealing with different types of data update requirements, a more elastic method may be necessary. In such cases, lock-free data changes become an alternative solution, implementing data updates through a lock-free approach to enhance system concurrency.

In practical operations, we employ a strategy of batch scanning and shard modifications to ensure the efficient update of large-scale data without affecting normal online business access. This method not only proves efficient but also accommodates diverse data update needs, providing greater flexibility for system maintenance.

Prerequisites

  • You have created or joined an organization, and this organization has subscribed to either or . Please ensure that your annual or monthly subscription is still active. For more information, please refer to Manage Organizations.
  • Your current account has switched to the target organization. For more information, please refer to Switching to an Organization.
  • Your role is . For more information about roles, refer to Roles.
  • The data source type is MySQL, PostgreSQL.

Precautions

The OnlineDML feature is not supported under the following circumstances:

  • The DML statement executing the changes contains ORDER BY, GROUP BY, LIMIT, or involves operations on multiple tables.
  • The SELECT clause of the INSERT INTO SELECT statement includes a multi-table JOIN.
  • The target table lacks a primary key or unique key, or the corresponding primary key or unique key columns do not have the NOT NULL attribute.
  • The object being modified by the UPDATE statement is a primary key or unique key column.

Operational Steps

OnlineDML is enabled by default in the SQL standards provided by NineData. You need to adjust the execution strategy based on actual business scenarios and associate this standard with the target data source to make it effective. This article explains how to change the execution strategy for OnlineDML.

  1. Log in to the NineData Console.

  2. In the left navigation bar, click > .

  3. On the page, click the tab.

  4. Click on the target , or click on the column's .

  5. On the page, click the tab, find , and click the column's .

    tip

    Ensure that is enabled, indicated by the status in the column.

  6. Configure the execution strategy for Online DML. Refer to the table below for explanations of the main parameters.

    Parameter
    Description
    Set the threshold for the number of rows scanned by DML statements, indicating the number of rows the current DML statement needs to scan. The default is 20000, and if exceeded, it indicates a potential change risk. When OnlineDML is enabled, if the scanned rows exceed this threshold, SQL statements will automatically be executed in batches to avoid locking tables and affecting business operations.
    OnlineDML feature switch, off by default. When enabled, if the number of rows scanned by the DML statement exceeds the threshold, the system will automatically execute the DML statement in batches.
    Set the number of modified rows per batch for OnlineDML, default is 2000.
    Configure the wait time for OnlineDML, in seconds, disabled by default. When OnlineDML exceeds a certain batch count or the system detects excessive database pressure, it will automatically stop writing and wait for a specified time, the time you configure.
  7. Click .

Execution Method

After configuring the OnlineDML rule, submit SQL tasks containing DML statements for data sources with this strategy. During the standard pre-check phase, the system will automatically detect the number of rows scanned by the DML statement. If it exceeds the set threshold, NineData OnlineDML will be activated to execute the statement in batches.

onlinedml