Demystifying Shared Databases in Cloud Computing

shared databasecloud computingresource isolationpay-as-you-goJingdong Cloud Engine
Published·Modified·

Summary: With the advent of cloud computing, many new terms have emerged, such as cloud databases, cloud storage, elastic scaling, and resource isolation. This article explains the popular concept of "shared databases" to clarify what they are.

Cloud Computing

First, a brief introduction: I have 10 years of experience in the IT industry, including extensive experience as an architect and in product management. The following analysis is based on my personal experience and insights gained from communicating with cloud architects from companies like Google, BAT, and JD.com.

What is a Shared Database?

When discussing "shared databases," the most confusing part is the term "shared." "Shared" naturally brings to mind its opposite: "dedicated." Indeed, a "shared database" is a term used to describe a type of database regarding user resource ownership, contrasting with "dedicated databases." It is an innovation in databases that emerged with cloud computing, aimed at saving resources. Generally, data installed on one's own server or PC that is not shared with others is called a "dedicated database."

Many people do not understand the characteristics of shared databases, so let me briefly introduce them.

Comparison of Characteristics: Shared vs. Dedicated Databases

Before introducing shared databases, let's analyze what a "dedicated database" is. As the name suggests, "dedicated" means your own database. Before cloud computing, we hardly had the concept of dedicated databases because there was no comparison; all databases were assumed to be dedicated.

A "dedicated database" is like having a piece of land (with an address and house number, similar to a database IP and port). You build a house on this land to use as a warehouse (similar to creating a database), and you can use any room in the house to store items (similar to creating tables).

A "shared database" was born to save resources and reduce developer costs. Many people share the use of this building, but the building does not belong to any specific individual; it is the employer's asset. In the world of cloud platforms, the employer is the cloud database provider, such as "Jingdong Cloud Engine," and the users are called "tenants." Everyone shares one database instance. You have permissions to create and modify tables, but you do not have permissions to create or modify the database itself, because the building belongs to everyone, not just you; you are merely one of the tenants. Of course, your rooms are given a label and a name (which becomes the name of your database). Your rooms are completely isolated from others, ensuring no privacy exposure.

The rent for tenants in a shared database is relatively cheap because it is charged on demand, or even free. "Pay-as-you-go" is a common feature of cloud platforms. Since your rooms are used to store items, similar to how your database tables store data, using a dedicated database previously required a one-time investment. It's like spending a lifetime's savings to buy a house, but do you really need that many rooms? Perhaps you don't even know how much you have to store. The birth of shared databases solves this problem. Generally, you rent a room size proportional to your data, and the landlord charges you accordingly. This is called pay-as-you-go. Many cloud platforms offer such services, such as Baidu's BAE, Sina's SAE, and JD's JAE (Cloud Engine). Currently, BAE charges by storage space, SAE charges by both space and traffic, and JD's JAE is free for up to 25GB.

Characteristics of Jingdong Cloud Engine's Cloud Database

Let's use Jingdong Cloud Engine (referred to as "Cloud Engine") as an example. Cloud Engine is an application hosting cloud platform belonging to the PaaS layer of the cloud computing model. Jingdong Cloud Databases are divided into dedicated and shared types. Since Cloud Engine is a PaaS platform, it uses shared databases to save resources and allocate them on demand.

Here is a hand-drawn architecture diagram:

Architecture Diagram

(Note: This does not represent the actual architecture of Cloud Engine's cloud database; it is drawn solely to clarify the issue.)

  1. Tenant Isolation: Multiple tenants sharing the same database instance require an effective isolation scheme to prevent one user's slow or malicious queries from affecting other users. This isolation is implemented by the JProxy layer, which intercepts all user access. Based on data table index information, it predicts the resources required for the request and intercepts malicious requests or those affecting other users. To precisely control resource usage, the system records and monitors connection counts, memory usage, disk space, and bandwidth traffic for each user, controlling them according to their quotas.
  2. High Consistency of Cluster Routing Information: The overall cluster adopts a classic weakly centralized structure, ensuring high performance while maintaining controllability. JManager manages the entire cluster's routing information. To avoid single points of failure, multiple Slaves are used. When routing changes, JManager first synchronizes the changes to its Slaves before synchronizing all JProxies, ensuring routing consistency even if JManager fails.
  3. High Availability Assurance: All nodes in the cluster are free of single points of failure. User databases use ZooKeeper for master-slave high availability. If the master database fails, it automatically switches to the slave, and a floating IP is re-bound to the slave, ensuring uninterrupted service.
  4. Seamless Upgrade and Scaling: When a user's database data exceeds the specified quota, they can upgrade the database. The system automatically migrates the user's database to a more idle instance based on resource pool usage, without affecting user service. Incremental backups are achieved through scheduled snapshots combined with database binlog synchronization tools, which also facilitate migration.
  5. Security Considerations: For security reasons, shared databases restrict potentially risky database operation syntaxes via JProxy. For instance, users cannot use CREATE DATABASE. As mentioned, since others are also using the building, you only have a few or dozens of rooms. You can only operate within your rooms and cannot interfere with others' rooms.

Conclusion: Since the advent of cloud computing, it has brought convenience to enterprises, startups, and individuals. The era of wasting manpower and materials on building self-hosted data centers, purchasing or renting servers, setting up environments, middleware, deploying applications, and managing domains is likely to fade into the future. Pay-as-you-go, fast, and convenient internet services have made cloud platforms the favorites of the future software service market. Shared databases will gradually demonstrate their value and welcome a bright future, with talent demands following closely behind. Wishing the Chinese cloud computing industry a vast development space and more talent to serve it!

Links:

Thanks to Liu Wei for the submission. His blog address: http://www.cnblogs.com/cloud_china/