Skip to main content

How to Quickly Generate a Large Amount of Meaningful Test Data?

How to obtain test data for MySQL is a classic question. In various stages such as development, testing, and performance optimization, obtaining appropriate test data is essential. The MySQL official also provides a sample database called employees for testing purposes, but employees are not omnipotent. In some cases, generating test data closer to one's own business scenario is more effective.

In the past, the author used sysbench to generate test data, which is fast and easy to use, and very effective in performance testing scenarios. However, it also has a disadvantage, that is, the data generated by sysbench is a combination of meaningless characters and numbers, which is basically only suitable for performance pressure testing.

So why does test data need to be meaningful? Let's take a look together:

  • Performance testing is not just about read and write speed: In real business environments, database performance is not only related to data volume and query speed but also closely related to data distribution and the use of indexes. If the test data is too far from the real data, such as the random strings generated by sysbench and the logically regular data in actual business, the results of performance testing may not match the actual usage scenarios.
  • Functional testing and business logic verification require a sense of reality: Many system tests need to verify the correctness of business logic. Meaningless data cannot fully verify the system's performance under various input conditions.
  • Data correlation: In many business systems, data is correlated, such as the relationship between a user table and an order table in an e-commerce platform. When generating test data, if only random data is used, these correlations will be difficult to reflect, and the system's ability to correctly handle the relationships between these data during testing cannot be verified.

A long summary in one sentence: Test data created based on one's own business logic and meaningfulness can make test results more reliable. In this way, obtaining open-source datasets from the internet or using sysbench to create random strings will no longer meet the requirements. We need a method that can automatically generate real test data based on one's own business logic, and we have to mention the data generation function newly launched by NineData.

NineData Data Generation Introduction

NineData supports the automatic generation of random data in the database that conforms to specific business scenarios, simulating the actual production environment data, helping users to perform functional testing, stress testing, and other verification work without using real data.

Predefined simulation rules: There are 42 built-in predefined simulation rules, covering most business scenarios. In addition, it supports the creation of custom simulation rules.

data_generation_rules

Controllable data generation volume: Supports the generation of 1 to 10 million test data, which can be restricted through SQL development specifications.

image-20241015172212327

Automatic association of related fields: Through recognition rules, simulation rules will automatically associate corresponding database fields without manual association.

image-20241015172355026

image-20241015172618965

Practical Example

So let's directly try to add 1000 test data to the account table.

  1. On the data generation task page, select the data source, database, and table that need to generate test data, and then configure simulation rules for each field.

    image-20241015174222923

  2. Click on the Configure Algorithm on the right to configure more detailed options for simulation rules.

    image-20241015174338635

  3. The Generation data volume on the left can be configured to generate the required number of data. Click on the Preview on the right to preview the data generation effect and confirm whether the configuration algorithm meets expectations.

    image-20241015174549820

    image-20241017104017033

  4. If you need to configure foreign key logic, create test data for two tables with business correlations, you can use the Foreign Key Constraint rule to set the associated primary key value for the foreign key column. In this way, the program will randomly obtain data from the primary key of the associated table as the value of the current table's foreign key column.

    iShot_2024-10-17_17.18.52

  5. After execution, you can directly see the effect.

Summary

It can be seen that the test data generated by the NineData data generation function is very realistic, and it is difficult to distinguish between true and false at first glance. With this magic weapon, enterprises can quickly generate a large amount of real and meaningful test data, thereby accelerating the development process and optimizing system test results.