Databases: Partitioning & Sharding

As databases grow in size and complexity, ensuring efficient storage, retrieval, and management of data becomes a significant challenge. Two key strategies to handle large-scale data distribution are partitioning and sharding. While both techniques involve breaking down data into smaller segments, they serve different purposes and are used in different scenarios. Partitioning Partitioning is splitting a database table into smaller parts within one database ✨ Think of a database as a club with different rooms for different music genres. Partitioning is how you decide who goes where—pop lovers in one room, rock fans in another. ✨ Vertical & Horizontal Partitioning Vertical Partitioning - splits a table into multiple tables by columns. Each new table contains a subset of the columns from the original table, let's look at an example: Original Table: customers (customer_id, name, email, address, phone_number) Partitioned Tables: customer_details (customer_id, name, email) customer_contact (customer_id, address, phone_number) Horizontal partitioning - splits a table into multiple tables by rows. Each new table contains a subset of the rows from the original table, let's explore an example: Original Table: sales (sale_id, product_id, sale_date, amount) Partitioned Tables: sales_2023 (sales from 2023) sales_2024 (sales from 2024) Types of partitioning 1. By range: Data is split based on a range of values (dates or numbers) Suppose you have a music database and you want to partition the data based on the release year of the songs: Partition: Songs released from 1960 to 1969 Partition: Songs released from 1970 to 1979 Partition: Songs released from 1980 to 1989 Partition: Songs released from 1990 to 1999 CREATE TABLE songs ( song_id INT, title VARCHAR(100), artist VARCHAR(100), release_year INT ) PARTITION BY RANGE (release_year) ( PARTITION p1 VALUES LESS THAN (1970), PARTITION p2 VALUES LESS THAN (1980), PARTITION p3 VALUES LESS THAN (1990), PARTITION p4 VALUES LESS THAN (2000) );

Feb 19, 2025 - 18:21
 0
Databases: Partitioning & Sharding

As databases grow in size and complexity, ensuring efficient storage, retrieval, and management of data becomes a significant challenge. Two key strategies to handle large-scale data distribution are partitioning and sharding. While both techniques involve breaking down data into smaller segments, they serve different purposes and are used in different scenarios.

Partitioning

Partitioning is splitting a database table into smaller parts within one database

✨ Think of a database as a club with different rooms for different music genres. Partitioning is how you decide who goes where—pop lovers in one room, rock fans in another. ✨

Vertical & Horizontal Partitioning

Vertical Partitioning - splits a table into multiple tables by columns. Each new table contains a subset of the columns from the original table, let's look at an example:

Original Table:
customers (customer_id, name, email, address, phone_number)

Partitioned Tables:
customer_details (customer_id, name, email)
customer_contact (customer_id, address, phone_number)

Horizontal partitioning - splits a table into multiple tables by rows. Each new table contains a subset of the rows from the original table, let's explore an example:

Original Table:
sales (sale_id, product_id, sale_date, amount)

Partitioned Tables:
sales_2023 (sales from 2023)
sales_2024 (sales from 2024)

Types of partitioning

1. By range: Data is split based on a range of values (dates or numbers)

Suppose you have a music database and you want to partition the data based on the release year of the songs:

  • Partition: Songs released from 1960 to 1969
  • Partition: Songs released from 1970 to 1979
  • Partition: Songs released from 1980 to 1989
  • Partition: Songs released from 1990 to 1999
CREATE TABLE songs (
    song_id INT,
    title VARCHAR(100),
    artist VARCHAR(100),
    release_year INT
)
PARTITION BY RANGE (release_year) (
    PARTITION p1 VALUES LESS THAN (1970),
    PARTITION p2 VALUES LESS THAN (1980),
    PARTITION p3 VALUES LESS THAN (1990),
    PARTITION p4 VALUES LESS THAN (2000)
);