Skip to main content

Data Archiving and Cleaning Features Have Been Significantly Upgraded, Everything You Need Is Here

NineData's data archiving and cleaning features have been released for nearly half a year. During this period, it has resolved a large number of issues that enterprises face in dealing with expired data, allowing for relatively automated periodic execution of the following data archiving and cleaning tasks.

Identify archival data in the business database → Move to the archival database → Clean up the archived data in the business database → Organize the business database table space

data_archive

In this way, in terms of disk capacity management, you can basically be a hands-off manager. In addition to no longer having to spend time and effort checking which data needs to be cleaned up, you also don't have to worry about mistakenly cleaning up valid data as invalid data, and the database access being slow due to the accumulation of invalid data and other issues.

In this case, if your database issues a capacity shortage alert, it means that the disk space really can no longer meet the current business needs, and you can directly and confidently expand the capacity.

NineData's Path to Pursue Perfection

Although the data archiving and cleaning features have a very high degree of completion, there may still be some stumbling blocks when dealing with various scenarios. The author spent several months collecting user feedback and roughly sorted out the following requirements:

  • No Time Zone Configuration Support: The time zone used by the database of overseas business is different from that of domestic business. After the user configures the execution time of the data archiving and cleaning for the overseas business, NineData will execute the task at the corresponding time point in China time, and at this time, the overseas business is in a busy period, which may affect the performance of the database's access and storage. Therefore, it is necessary to manually convert the archival time of the target area into China time, and then set the converted time as the execution time for data archiving and cleaning.

  • Incomplete Support for Time Fields: NineData needs to use a certain time field in the user table as the basis for archiving and cleaning. The data type of this time field needs to be of DATETIME type. However, due to the diversity of database configurations, not all time fields use the DATETIME type, and they may also be CHAR, VARCHAR, etc., which leads to NineData not recognizing the time field during configuration.

  • Data Conflict Strategy Not Flexible: It was found through actual testing that when the archiving operation encounters data conflicts, the system directly ignores the current operation, and some scenarios that need to be directly covered and updated cannot be configured.

  • No Preview SQL: For archiving and cleaning operations, although the page provides example SQL statements, there is no SQL statement that the system will actually execute according to the current configuration, so it is impossible to verify whether the SQL meets expectations before execution.

As of today, the above requirements have been fully delivered. In the latest iteration, NineData has released a series of optimizations for the data archiving and cleaning features. In addition to fixing known issues, it has also added a lot of practical new capabilities to this feature.

  • Add Time Zone Selection: When configuring data archiving and cleaning tasks, you can select the time zone of the target data source before configuring the task start time. There is no need to calculate it into China time, which not only increases efficiency but also avoids the possibility of calculation errors.

  • Support for Multiple Time Field Types: On the basis of the original DATETIME type, support for INT, BIGINT, CHAR, VARCHAR field types has been added, which basically meets the different time field type scenarios of most enterprises.

  • Support for Preview SQL: After the basic configuration of the parameters required for archiving and cleaning, you can clearly see the archiving and cleaning operations that NineData is about to perform on the database through Preview SQL, thereby determining whether the SQL meets expectations.

  • Add System Variable Configuration: A new bizdate system variable has been added, which can automatically obtain the current system time and supports customizing the format of the system time. It can be used for automatically taking the current system time as the table name when creating a new table during archiving, and it is also suitable for writing WHERE conditions and other scenarios.

  • Add Optimize Strategy: After performing the data cleaning operation in the database, it is usually necessary to manually perform the Optimize operation to release storage space. Although defining a stored procedure can automatically perform certain operations after the database performs some operations, it is difficult to achieve due to the high threshold of black screen operation. With NineData's Optimize strategy option, it is easy to configure NineData to operate and recover table space on your behalf.

  • Add Data Conflict Handling Strategy: Support users to choose the data conflict handling strategy during the data archiving process according to business requirements during the task configuration phase, supporting Ignore and Override options.

  • Add Execution Time Option: When creating a data archiving task, the Execution Time option is provided for periodic tasks. By limiting the execution time of the task through this option, it can avoid adverse effects on the business database when the task execution time exceeds the expected duration. For example, you can set the archiving task to start at 0 o'clock during the off-peak period of the business, and set the execution time to 6 hours. Then, at 6 o'clock in the morning, even if the task has not been completed, it will be automatically stopped to avoid affecting the business.

New Capability Configuration Demonstration

iShot_2024-06-19_18.00.11

  • Time Zone: The time zone can be selected on the first configuration page of the task, and all subsequent operations of the task will be based on this time zone.

  • Optimize Strategy: Choose how many times the cleaning operation is automatically executed after the Optimize operation to release table space.

  • Execution Time: Choose the maximum execution time allowed for the current task. If exceeded, the task will be stopped regardless of whether it has been completed, to avoid adverse effects on the business database during peak periods.

  • Data Conflict Handling Strategy: Choose the handling strategy when the same data exists in the target library during archiving, supporting Ignore and Override.

iShot_2024-06-17_17.23.29

  • System Variable: Set the format of the bizdate variable, the default is yyyyMMdd, for example, 20240617. You can also configure it according to your own needs, for example, yyyy-MM-dd, and the final effect will be 2024-06-17.

  • Add Table Name Based on System Variable: In the example, customers$(bizdate) is used, and the format of the newly created table name is customers<Current Time>, for example, customers20240617.

  • Time Field: It has been supported to select columns of INT, BIGINT, CHAR, VARCHAR field types as time fields.

  • Preview SQL: After the configuration is completed, click Preview SQL to display the archiving and cleaning operations that the system is about to perform on the database.

Summary

Through this major update, most of the problems have been basically solved. Even so, NineData is still not perfect and there may still be some subtle points that have not been noticed, so the voice of users is very important. NineData will continue to optimize and improve the product based on user feedback, to ensure the quality of the product and user experience as good as ever for ten years.