20250107

Denormalization

There are two major types of databases: relational databases such as PostgreSQL and document databases such as MongoDB.

When working with relational databases, normalization is usually recommended to reduce redundancy and make it easier to handle data.

Let’s say we have these records in Employees table

Employees

id name email department
1 Kazuki Kijima [email protected] R&D
2 Huy Nguyen [email protected] Sales
3 Gilbert Blythe [email protected] R&D

The common example I see in books/lectures is to consider a scenario to update the department fields R&D -> Engineering. If we keep data in this way, we need to iterate through all records and apply the update. In this example, we need to update records with id = 1 and id = 3.

Normalization comes in play here to extract the shared information and transfer it to another table.

Employees

id name email department_id
1 Kazuki Kijima [email protected] 1
2 Huy Nguyen [email protected] 2
3 Gilbert Blythe [email protected] 1

Departments

id name
1 R&D
2 Sales

Now, if we think about updating R&D -> Engineering, we just need to update one record in Departments table, leaving Employees records untouched.

One piece of information resides in one place. It’s neat, right?

Document databases tend to be used with a very different perspective on this; they encourage redundancy.

The problem with normalized relational data is that we need to query multiple tables to gather information (using multiple JOIN). This can be mitigated by having some redundancy and not avoiding storing the same data in multiple places.

To think about the example above, combining Employees table and Departments table into one can remove the need to join two different tables when reading. This can improve performance, and because we do the opposite operation to normalization, it’s called denormalization.

It’s beneficial under some scenarios, e.g., the entire document is used every time you render a web page, but write operations will cost much more.

(I didn’t provide much details, and the explanation may be incorrect, strictly speaking)


30 minutes walk


TODO:


index 20250106 20250108