There are two major types of databases: relational databases such as PostgreSQL and document databases such as MongoDB.
When working with relational databases, normalization is usually recommended to reduce redundancy and make it easier to handle data.
Let’s say we have these records in Employees
table
id | name | department | |
---|---|---|---|
1 | Kazuki Kijima | [email protected] | R&D |
2 | Huy Nguyen | [email protected] | Sales |
3 | Gilbert Blythe | [email protected] | R&D |
The common example I see in books/lectures is to consider a scenario to update the department fields R&D -> Engineering. If we keep data in this way, we need to iterate through all records and apply the update. In this example, we need to update records with id = 1 and id = 3.
Normalization comes in play here to extract the shared information and transfer it to another table.
id | name | department_id | |
---|---|---|---|
1 | Kazuki Kijima | [email protected] | 1 |
2 | Huy Nguyen | [email protected] | 2 |
3 | Gilbert Blythe | [email protected] | 1 |
id | name |
---|---|
1 | R&D |
2 | Sales |
Now, if we think about updating R&D -> Engineering, we just
need to update one record in Departments
table, leaving
Employees
records untouched.
One piece of information resides in one place. It’s neat, right?
Document databases tend to be used with a very different perspective on this; they encourage redundancy.
The problem with normalized relational data is that we need to query multiple tables to gather information (using multiple JOIN). This can be mitigated by having some redundancy and not avoiding storing the same data in multiple places.
To think about the example above, combining Employees
table and Departments
table into one can remove the need to
join two different tables when reading. This can improve performance,
and because we do the opposite operation to normalization, it’s called
denormalization.
It’s beneficial under some scenarios, e.g., the entire document is used every time you render a web page, but write operations will cost much more.
(I didn’t provide much details, and the explanation may be incorrect, strictly speaking)
30 minutes walk
TODO: