Lightweight ETL Sync: Process Data Before Loading to the Target Database | NineData
When source data needs to be filtered, mapped, calculated, or transformed before being written to the target database, traditional migration tools often cannot meet the requirement directly. NineData is suitable for lightweight ETL sync scenarios such as transformed migration, cleansed loading, and heterogeneous data synchronization.
Applicable Scenarios
- Cleansed loading: Filter invalid values, remove unwanted characters, or normalize field formats.
- Field transformation and mapping: Transform date, string, JSON, and numeric fields before writing to the target.
- Heterogeneous synchronization: Use expressions to reduce extra development work when source and target models differ.
For data migration scenarios with data processing needs, the steps usually include extracting data from the source database, then transforming it based on user rules, and finally loading the processed data into the target system. This entire process is called ETL (Extract, Transform, Load). In today's internet companies, ETL demand scenarios are numerous and diverse, such as:
- Data Cleaning: Remove phone numbers that do not meet requirements, remove unnecessary spaces or characters, etc.
- Data Standardization: Unify data formats, for example, process date fields, discard hours, minutes, and seconds, and only retain year, month, and day.
- Information Extraction: Extract required information from target fields, such as extracting household registration, gender, and date of birth from ID numbers.
- JSON Processing: Process JSON fields, for example, convert all fields in the source database JSON into columns.
- Column Processing: Add or remove columns, batch modify column names.
Under the premise of ETL requirements, the range of available migration products is further narrowed down. Products that are usable and easy to use are rare. In this field, good products are expensive, and there are also some open-source products to choose from. However, due to the lack of support for incremental replication, the business adaptability is not very high.
Most users want a product that meets the needs of low price, supports incremental data ETL, supports a visual interface, high performance, and stable operation. NineData's data replication function has also supported ETL in recent updates, providing more than 300 ETL functions, which can cover almost all common ETL scenarios.
Most importantly, NineData provides 10 free data replication quotas, including one incremental quota, which can meet the free use of most light users. So how can we efficiently perform ETL? Let's demonstrate next.
Take a common ETL scenario as an example. When analyzing data, we often only need to count the user visits on a specific date, and the date usually includes year, month, day, hour, minute, and second information. To optimize query performance, we need to reduce the date precision to reduce the consumption of computing resources. Therefore, retaining only the year, month, and day can achieve our goal.
Step 1: Enter Data Source
Enter all source and target data sources into the NineData platform.

Step 2: Configure Synchronization Chain ETL Rules
Log in to the NineData console, click on Data Replication > Data Replication, and then click on Create Replication.

When the task configuration process reaches the Mapping Configuration step, click on Mapping and Filtering on the right side of the target table.

Since we need to truncate the time of the
created_datecolumn in the source table to year, month, and day, discarding the hour, minute, and second information, we need to use the built-in function to transform this column. Let's try the following ETL configuration.
- Click on the Field Expression text box in the middle of the target field.
- In the pop-up window, find the function you need to use, and copy the function body. In this example, the
dateutils_truncatefunction is selected. - Paste the function body into the expression text box behind the equal sign (=), and configure the parameters. This function accepts two parameters, the first is the name of the time field, and the second is the time precision. The example in the figure
dateutils_truncate(created_date, 5)indicates that the hour, minute, and second information of thecreated_datefield is truncated, and the year, month, and day are retained. - Click Submit.
After the configuration is completed, click OK in the lower right corner to complete the ETL configuration. When the task starts, the
created_datein the source will be processed according to this ETL configuration and then loaded into the target end. Let's take a look at the final effect after the migration is completed.Source

Target

Summary
After the migration is completed, it can be seen that the date precision in the target has become year, month, and day, achieving our goal. In the incremental task, the newly written created_date in the business will also be synchronized and migrated to the target end in the same form. Retaining this task can achieve ETL on a long-term, stable, and unattended basis.
All the scenarios and demonstrations mentioned in this article are just the tip of the iceberg of ETL functions. The more than 300 functions provided by NineData each have different capabilities, which can handle various string, time, JSON transformation operations, and can also handle complex mathematical calculations, meeting various tricky ETL scenarios.

Related Solutions
- Cross-Cloud Database Migration: Non-Disruptive Migration and Real-Time Sync
- Oracle to PostgreSQL Migration: Non-Disruptive Migration Plan
- MySQL to TiDB Migration: Smooth Migration and Business Cutover
- MySQL to Doris Sync: Real-Time Data Synchronization
- MySQL to StarRocks Sync: Real-Time Data Synchronization