Skip to main content

MySQL to Elasticsearch

NineData data replication supports replicating data from MySQL to Elasticsearch, enabling Elasticsearch to search and analyze MySQL data.

Background Information

Elasticsearch is a popular distributed search and analytics engine used for processing large volumes of data and providing real-time search and analytics capabilities, including full-text search, auto-completion, aggregation, geospatial search, and machine learning. Replicating data from MySQL to Elasticsearch provides better search performance, powerful analytics and aggregation capabilities, and helps to further explore the value of your data.

Prerequisites

  • The source and target data sources have been added to NineData. For information on how to add them, please see Creating a Data Source.

  • The source database type is MySQL or a MySQL-compatible database, such as MySQL, MariaDB, PolarDB MySQL, TDSQL-C, GaussDB MySQL, etc.

  • The target database type is Elasticsearch version 8.7, 7.0, 6.0, or 5.0.

  • The source data source must have Binlog enabled, and the following Binlog-related parameters must be set:

    • binlog_format=ROW
    • binlog_row_image=FULL
    tip

    If the source data source is a standby database, to ensure that the complete Binlog log is obtained, the log_slave_updates parameter also needs to be enabled.

Procedure

  1. Log in to the NineData console.

  2. Click > in the left navigation pane.

  3. On the page, click the button in the upper right corner.

  4. On the tab, configure the parameters in the following table and click .

    Parameter
    Description
    Enter the name of the data synchronization task. To facilitate subsequent searching and management, please use a meaningful name of no more than 64 characters.
    The data source of the object to be synchronized. Select the MySQL data source where the data to be copied is located.
    The data source that receives the synchronized object. Select the target Elasticsearch data source.
    After data is migrated from MySQL to Elasticsearch, Elasticsearch adds indexes to make the management and querying of this data more convenient. You can choose the naming method for these indexes here.
    • (default): Use the table name of the original MySQL table directly. For example, if an index named "users" is created, it can be understood that the documents stored in this index have a structure similar to that of the "users" table.
    • : Use the format of <original database name>_<original table name> as the naming convention. For example, if an index named "mydb_users" is created, it can be understood that the documents stored in this index belong to the "users" table in the "mydb" database.
    • : Synchronize the database and table structure of the source data source.
    • : Synchronize all objects and data of the source data source, that is, full data replication.
    • : After the full synchronization is completed, perform incremental synchronization based on the log of the source data source. Click the setting icon to uncheck certain operation types according to your needs. Once unchecked, these operations will be ignored during incremental synchronization.
    Note: You can also click to expand and select the processing policy for tables with the same name or data.
    • (select when is selected)
      • : Stop the task when a table with the same name is detected during the pre-check stage.
      • : Send a prompt when a table with the same name is detected during the pre-check stage and continue the task. When replicating the structure, ignore the table with the same name. If you also replicate data, the data will be appended to the table with the same name without overwriting the original data.
      • : Send a prompt when a table with the same name is detected during the pre-check stage and continue the task. When replicating the structure, delete the table with the same name in the target database and replicate the table structure based on the source database. If you also replicate data, the data will be written after the table structure replication is complete.
      • (optional when replicating both structure and data): Send a prompt when a table with the same name is detected during the pre-check stage and continue the task. When replicating the structure, the table structure in the target database is retained, and the data in the table with the same name is cleared before being replicated from the original table.
    • (select when is not selected)
      • : Stop the task when data is detected in the target table during the pre-check stage.
      • : Ignore the data in the target table when it is detected during the pre-check stage and append and write other data.
      • : Delete the conflicting data in the target table when it is detected during the pre-check stage and write it again.
    • (currently only supported for MySQL to MySQL): After being enabled, data in the source data source that has sensitive columns configured is copied to the target data source in a desensitized form to ensure the security of sensitive data.
  5. On the tab, configure the following parameters, and then click .

    Parameter
    Description
    Select the content to be copied. You can select to copy all content in the source database, or select , select the content you want to copy in the list, and click >Add to the right list.
  6. On the tab, you can configure each table that needs to be copied to Elasticsearch, as well as each field in the table. By default, all tables and fields of the selected database will be copied. At the same time, you can customize the configuration of DOC ID for each index. DOC ID is the unique identifier for each document in Elasticsearch, and is used to retrieve, update or delete the document. You can specify it yourself or let Elasticsearch generate it, and this ID is unique in the same index., or leave it blank for Elasticsearch to generate.

    tip

    You can click on the right side of the page to customize the name of the fields after they are synchronized to Elasticsearch. In addition, you can set so that only data that meets the filtering conditions will be synchronized to Elasticsearch.

  7. On the tab, wait for the system to complete the pre-check. After the pre-check is passed, click .

    tip
    • If the pre-check fails, you need to click the button on the right side of the target check item in the column, troubleshoot the cause of the failure, manually fix it, and then click to perform the pre-check again until it passes.
    • The check items with in can be repaired or ignored depending on the specific situation.
  8. On the page, after being prompted that the task has started successfully, the synchronization task starts running. At this time, you can perform the following operations:

    • Click to view the execution status of each stage of the synchronization task.
    • Click to return to the task list page.

View Synchronization Results

  1. Log in to the NineData console.

  2. Click > in the left navigation pane.

  3. On the page, click the target.

    es_result

    No.
    Function
    Description
    1Sync delayThe data synchronization delay between the source and target data sources. 0 seconds means no delay between the two ends, which means that the data on the Kafka end has caught up with the MySQL end.
    2Alert configurationAfter configuring the alert, the system will notify you in the way you choose when the task fails. For more information, please see Introduction to operation and maintenance monitoring.
    3More
    • : Pause the task. Only tasks with a status of can be selected.
    • : Terminate tasks that are incomplete or listening (i.e., in incremental synchronization). After terminating the task, the task cannot be restarted, so please proceed with caution.
    • : Delete the task. Once the task is deleted, it cannot be restored, so please proceed with caution.
    4Structure replication (displayed in scenarios involving structure replication)Displays the progress and detailed information of structure replication.
    • Click on on the right side of the page: View the execution log of structure replication.
    • Click on : View the latest information.
    5Full replication (displayed in scenarios involving full replication)Displays the progress and detailed information of full replication.
    • Click on on the right side of the page: View various monitoring indicators during full replication. During full replication, you can also click on on the right side of the monitoring indicator page to limit the rate at which data is written to the target data source. The unit is MB/s.
    • Click on on the right side of the page: View the execution log of full replication.
    • Click on the icon on the right side of the page: View the latest information.
    6Incremental replication (displayed in scenarios involving incremental replication)Displays various monitoring indicators of incremental replication.
    • Click on on the right side of the page: View the operations currently being performed by the current replication task, including:
      • : The current thread number of the replication task which is executed in multiple threads.
      • : Details of the SQL statement currently executing in the thread.
      • : The response time of the current thread. If this value increases, the current thread may be stuck due to some reason.
      • : The timestamp when the current thread was started.
      • : The status of the current thread.
    • Click on on the right side of the page: Limit the rate at which data is written to the target data source. The unit is rows/s.
    • Click on on the right side of the page: View the execution log of incremental replication.
    • Click on the icon on the right side of the page: View the latest information.
    7Modify objectDisplays the modification records of the synchronized object.
    • Click on on the right side of the page to configure the synchronization object. .
    • Click on the icon on the right side of the page: View the latest information.
    8ExpandDisplays detailed information about the current replication task, including , , , etc.

Appendix 1: Mapping table of MySQL and Elasticsearch data types during data replication.

During the data replication process, MySQL data types are mapped to corresponding Elasticsearch data types.

Category MySQL Data Type ElasticSearch Data Type
Numeric TINYINT Short
TINYINT UNSIGNED
SMALLINT Integer
SMALLINT UNSIGNED
MEDIUMINT
MEDIUMINT UNSIGNED
INT
INT UNSIGNED
BIGINT Long
BIGINT UNSIGNED
BIT(M) Uint64
Decimal Decimal(10, 0)
Numeric
Decimal(P, S) Decimal(P, S)
Float Float32
Double Float64
BOOL, BOOLEAN 21.12 or later: Boolean
Before 21.12: Int8
DATE AND TIME DATE Date
DATETIME[(fsp)]
Timestamp[(fsp)]
Time[(fsp)]
YEAR[(4)]
STRING CHAR/VARCHAR Text
TINYTEXT/TEXT/MEDIUMTEXT/LONGTEXT
BINARY/VARBINARY Binary
TINYBLOB/BLOB/MEDIUMBOLB/LONGBLOB
ENUM Keyword
SET
JSON JSON Object
SPATIAL POINT Geo_point
LINESTRING Geo_shape
POLYGON
MULTIPOINT
MULTILINESTRING
MULTIPOLYGON
GEOMETRYCOLLECTION
GEOMCOLLECTION
GEOMETRY

Appendix 2: Checklist of Pre-Check Items

Check ItemCheck Content
Source Data Source Connection CheckCheck the status of the source data source gateway, instance accessibility, and accuracy of username and password
Target Data Source Connection CheckCheck the status of the target data source gateway, instance accessibility, and accuracy of username and password
Target Database Permission CheckCheck if the account permissions in the target database meet the requirements
Source Database Permission CheckCheck if the account permissions in the source database meet the requirements
Check if Source Database log_slave_updates is SupportedCheck if log_slave_updates is set to ON when the source database is a slave
Source Data Source and Target Data Source Version CheckCheck if the versions of the source database and target database are compatible
Check if Source Database is Enabled with BinlogCheck if the source database is enabled with Binlog
Check if Source Database Binlog Format is SupportedCheck if the binlog format of the source database is 'ROW'
Check if Source Database binlog_row_image is SupportedCheck if the binlog_row_image of the source database is 'FULL'
Target Database Data Existence CheckCheck if data exists for the objects to be replicated in the target database