This chapter covers the implementation of databases’ read-and-write operations. Although the content is quite interesting, I don’t find much content relatable to my everyday work as an application engineer. Therefore, I will write down some clluttered memos for myself.
The use of the word “log”
The word log is often used to refer to application logs, where an application outputs text that describes what’s happening. In this book, log is used in the more general sense: an append-only sequence of records. It doesn’t have to be human-readable; it might be binary and intended only for other programs to read.
OLTP: online transaction processing
OLAP: online analytic processing
It is a common practice among big companies to have a read-only and separated copy of application data (from the main OLTP systems) for data analysis. The separated database is called data warehouse.
This way, we can ensure that the queries for data analysis run by non-engineers do not affect the central OLTP database, and by separating databases for different uses, we can optimize a data warehouse, in which we run OLAP-type (analytic-type) queries.
The process to populate a data warehouse with application data is called Extract-Transform-Load (ETL).
A data warehouse is commonly designed as column-oriented storage.
The idea behind column-oriented storage is simple: don’t store all the values from one row together, but store all the values from each column together instead.
A normal relational database is row-oriented, in a way that the entire row will be fetched from the database, and its columsn are tightly coupled (cannot extract some of them effectively).
I will skip the explanation of the column-oriented or materialized views (which cache the results of queries that are frequently used) for now.
This chapter also covers a wide range of the implementations of indexes. Starting from hash indexes, they show extensive forms or improvements of it.
Any kind of index usually slows down writes, because the index also needs to be updated every time data is written.
This is an important trade-off in storage systems: well-chosen indexes speed up read queries, but every index slows down writes.
Around two years ago, my colleague tried to add a bunch of indexes that seemed to be rarely used, and I left a comment on his pull request opposing the addition, but most of the collaborators did not think it was problematic, and the pull request was merged.
I don’t blame them at all as we did our best, but I wish I had read this book at that time so that I could have been more helpful to the team.
Then the simplest possible indexing strategy is this: keep an in-memory hash map where every key is mapped to a byte offset in the data file (the location at which the value can be found).
This approach is adopted by Bitcask.
Although this approach works efficiently both for reads and writes, but there are some limitations.
On top of the hash indexes approach, keys are sorted. This constraint does not directly address the problems of the hash indexes, but it provides several improvements.
I repeated reading the section, but I could not get the definition of LSM-Tree. I will research this later (maybe).
LSM-Tree seems to be a principle/strategy of data management, using SSTable as building blocks, but I’m not confident.
LSM-Tree seems not to be a tree structure in the conventional sense such as Binary Tree.
Bloom Filter
A Bloom filter is a memory-efficient data structure for approximating the contents of a set. It can tell you if a key does not appear in the database, and thus saves many unnecessary disk reads for nonexistent keys.
The most widely used indexing structure
B-trees break the database down into fixeed-size blocks or pages, traditionally 4KB in size (sometimes bigger), and read or write one page at a time (whereas the log-structured indexes break the database down into variable-sized segments). This design corresponds more closely to the underlying hardware, as disks are also arranged in fixed-size blocks.
The implementation of B-trees is quite impressive, but without illustrations, it’s very hard to explain. Ask me to buy this book for you.
This section ends with the comparision of B-Trees and LSM-Trees
2 GB RAM, 2 vCPUs, 60 GB SSD -> 4 GB RAM, 2 vCPUs, 80 GB SSD
sudo certbot delete
to delete the old certificatesudo certbot certonly --standalone
to get the new
cerficatedocker compose up -d
Now, it costs $24/month to run the server. Yikes!
but it is a good experience running a server on my own.
162.6 lb
Protein shake 50g Tapioca 0g Tuna salad 60g Beef eye of round 70g
total protein -> 180 g (>= 140)
bench press, jogging (1 hour), lat pull down
MUST: