Skip to main content

Step-by-Step Guide on How to Query Multiple Heterogeneous Data Sources with a Single SQL Statement

As the volume of enterprise data grows exponentially, the need for data interaction across departments, applications, and platforms is becoming more frequent. Traditional data querying methods are no longer sufficient to meet these needs. Moreover, there are differences in data formats, query languages, etc., between different database systems, making cross-database queries very difficult.

Limitations of Native Cross-Database Queries

Although database systems such as MySQL, Oracle, PostgreSQL, etc., provide their own cross-database query functions, there are many limitations:

  • Database Type Restriction: MySQL's Federated and Oracle's Database Links require the target database to be of the same type, which limits the scope of cross-database query functions and is not suitable for cross-database queries of heterogeneous databases.
  • Additional Performance Overhead: The built-in cross-database query functions may have additional performance overhead brought by JOIN operations, especially during large-scale data queries or data transfers, which can increase the system burden and affect the efficiency of other queries.

These limitations mean that the built-in cross-database query functions of database systems may not meet the complex and diverse needs of data analysis, especially in scenarios such as heterogeneous databases, large-scale data queries, and high-performance and high-security requirements.

To solve these problems, NineData DSQL was born.

What is NineData DSQL?

NineData DSQL is a function for querying multiple homogeneous and heterogeneous database systems across databases, currently supporting SELECT operations on tables and views. You can access multiple databases in a single query, obtain useful information scattered in various databases, and aggregate this information into a single query result, easily achieving data queries across multiple databases, multiple data sources, and even multiple heterogeneous data sources.

NineData DSQL has the following features:

  • Connect Multiple Database Systems: Supports connecting multiple database systems and ensures compatibility between these database systems. These database systems can come from different manufacturers or platforms, and there can be differences in data formats, storage methods, query languages, etc.
  • Unified Query Syntax: DSQL provides a unified query syntax, and you only need to use one syntax to query across multiple heterogeneous or homogeneous databases. The system will automatically parse the syntax and convert it into the query syntax of different types of data sources for query submission.
  • Result Integration and Output: Integrate the query results of homogeneous and heterogeneous database systems and unify the output format, making it easy for you to obtain the required information. You don't need to care which database the data is stored in, just one query can get the required results.
  • Data Privacy and Security Protection: Based on NineData's permission control function, it supports desensitization of sensitive data, control of access permissions, etc., to protect data privacy and security, and prevent data leakage and illegal access.
  • Graphical Interface Support: Provides a clear graphical interface, which is convenient for you to perform visual queries and management, and improves your work efficiency.

Easy-to-Remember DSQL Query Syntax

In DSQL, whether performing a join table (view) query or a single table (view) query, you need to use a three-part syntax when specifying the source data source, that is, <DBLINK Name>.<Database Name|Schema Name>.<Table Name (View Name)>.

Example 1: Cross-heterogeneous source query, query data from DBLINK1 (MySQL) and DBLINK2 (Oracle).

SELECT *
FROM DBLINK1.database_name.table_name a,
DBLINK2.schema_name.table_name b
WHERE a.id=b.id;

Example 2: Single table (view) query, query data from DBLINK1.

SELECT *
FROM DBLINK1.database.table_name
WHERE id=1;
tip

If your database name or table (view) name starts with a number or uses a reserved field, you need to enclose the name in double quotes (""), otherwise the query will fail.

For example: SELECT * FROM dblink_mysql_3451."9zdbtest3".sbtest1;

When to Use DSQL?

  • Enterprise Data Integration: Enterprises often need to integrate data from multiple departments or systems to provide a more comprehensive perspective. With DSQL, enterprises can access multiple databases in a single query, obtain useful information scattered in various databases, and then integrate them together, facilitating data analysis and decision-making.
  • Data Mining and Analysis: Data mining and analysis require a large amount of data support. With DSQL, you can retrieve the required data from multiple databases and then use mining and analysis tools to conduct in-depth research and analysis on the data.
  • Data Warehouse: A data warehouse is a system used to store and manage a large amount of data. With DSQL, you don't need to build a data warehouse to achieve the functionality of a data warehouse.

Operation Example

  • View the column information of the database table to be queried: You can view the DBLink name, database name|Schema name, table name, view name, column name, etc., that you need to query in the left navigation bar, without operating in the dark.

    dsql_1

  • Write and execute cross-database query statements: You can query multiple homogeneous and heterogeneous data sources through a simple query SQL, and support desensitization of sensitive fields in the query results.

    dsql_2

  • Integrate and export the results of multi-table queries: After executing a cross-database query, the system will integrate the query results into a table, and you can perform export operations on the result set. At the same time, it also supports searching the result set.

    dsql_3

  • Collect commonly used DBLink: If there are many DBLinks, you can collect a few commonly used DBLinks for easy search.

    dsql_4

In summary, NineData DSQL provides a global perspective on your data, reducing the complexity of data processing while increasing data utilization. It provides a more flexible, efficient, and secure way for enterprises to handle data, which can promote data-driven decision-making and business development, and help enterprises reduce costs and increase efficiency.