Skip to main content

How to Query Multiple Sharded Databases and Tables with a Single SQL Query?

Internet companies with a wide range of business and good business performance often face a problem: the increasing amount of data and slower query speeds, which even affect the access to applications. To solve this problem, many companies use sharding to store data in multiple databases, which solves the various problems caused by single-table storage. However, everything has its pros and cons, and sharding also has its downsides. In this article, we will discuss the problems brought by sharding and analyze how to solve these problems.

The Downsides of Sharding?

Sharding does effectively solve the problem of data storage and access performance for companies, but it also brings a lot of new troubles.

  • Manual Query and Data Integration: In a sharded environment, data is stored in different databases and tables. A simple query operation often requires connecting to multiple databases and then summarizing the results returned by each database into a data warehouse for unified analysis and processing. This not only increases the complexity of the query but also involves a lot of query, migration, and transformation work.
  • Manual Calculation of Routing Algorithms: Sharding usually requires a routing algorithm to determine which database and corresponding table the data should be stored in, and during each query process, developers need to manually calculate the routing algorithm to quickly obtain the required data without affecting system performance, which undoubtedly increases the difficulty and workload of development.

d7308d00-67ff-46b9-8b9c-c40ef195e28f

The above problems do exist, and it is indeed difficult to solve them. Does this mean we have to give up sharding? Of course not, the following is a recommended tool that can solve these problems in minutes.

NineData's Database and Table Group Query Feature

NineData's database and table group query feature is mainly used to simplify and optimize query operations in a sharded environment. With just a simple SQL query statement, you can efficiently access and process data from multiple databases and tables. You can operate as you would in a single database and table scenario, perfectly solving the above problems.

Let's take a look at the solutions provided by NineData for the above problems:

  • Simplify Query Complexity: NineData's database and table group query feature supports accessing multiple databases and tables with a single SQL query statement. Users only need to write a query once, and NineData will automatically parse and execute the corresponding subqueries, and finally merge the results of each subquery and return them to the user.

  • Improve Query Performance: NineData provides automatic routing and distributed execution engines. When executing database and table group queries, NineData will accurately route the query to the corresponding table based on the routing algorithm and query statement provided by the user, and quickly return the query results. This not only improves query speed but also effectively avoids single-point performance bottlenecks.

Application Case

Take an e-commerce company as an example, which shards order data by taking the modulus of the order number. Under the traditional method, to query the order statistics data within a certain period, it is necessary to connect to each table to query the data of the corresponding month, and finally summarize the results. However, by using NineData's database and table group query feature, you only need to write a simple SQL query, and NineData will automatically complete all subqueries and data summarization, greatly improving query efficiency and accuracy.

In this example, there are eight sharded tables from orders_0 to orders_7, storing order data with order_id from 1 to 100.

  • Create Table Group

    image-20240611101203004

    1. By using the expression orders[1-2].orders_[0-7], automatically add the eight sharded tables from orders_0 to orders_7 to this table group.
    2. Since the sharding logic of these eight tables is implemented by taking the modulus of order_id % 8, input #order_id#%8 in the routing algorithm.
    3. Click Create Table Group.
  • Query Table Group

    image-20240607110255506

    1. Create a new query in the SQL Console, select the database group, and choose the database group where the above table group is located.
    2. In the SQL Console of this table group, enter the query statement, and note that the WHERE condition of the query statement should include the routing field in the routing algorithm, which is order_id in this example. For example: SELECT * FROM orders2.orders_0 where order_id < 50 AND order_date > '2003-01-01'.

    image-20240611105425957

With just a few simple steps, NineData will automatically distribute the query request to various databases through the routing algorithm, query the order tables for the corresponding period, and summarize the results, returning them to the user.

Summary

NineData's database and table group query feature greatly simplifies the complexity of multi-database and multi-table queries, perfectly solving the problems brought by sharding, and helping companies efficiently manage and utilize data in a sharded environment. It is a very ideal solution for companies facing a surge in data volume.