Because the author covers many topics in this chapter, I won’t summarize the content. Instead, I’ll make some notes on the topics I was able to understand (at least to some extent).
Surprisingly often I see software engineers make statements like, “In my experience, 99% of people only need X” or “…don’t need X” (for various values of X). I think that such statements say more about the experience of the speaker than about the actual usefulness of a technology.
I usually avoid making bold statements, but I must admin that I tend to criticize the overuse of NoSQL. I don’t have much experience with databases, so the message made me think I should be more neutral about such things.
The beauty of such a gradual migration is that every stage of the process is easily reversible if something goes wrong.
I think a similar concept is mentioned as the strangler pattern in The Art of Readable Code. When migration from an old function to a new one, you can keep both for a while. Once all usages of the old function are removed, it is safe to delete it.
The Lambda architecture combines batch processing for the periodic computation of accurate values with stream processing for the real-time approximation of values.
Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of computers.
It consists of multiple components, with the Hadoop Distributed File System (HDFS) being the file storage system within Hadoop.
A materialized view is a precomputed and stored result of a query, acting as a cache to improve performance. Unlike regular views, it does not dynamically reflect changes in the underlying data unless it is refreshed.
As I said in the Preface, building for scale that you don’t need is wasted effort and may lock you into an inflexible design.
If there is a single technology that does everything you need, you’re most likely best off simply using that product rather than trying to reimplement it yourself from lower-level components.
Fortunately, I didn’t overengineer anything personally, but I’ve seen many teams build unnecessarily massive systems. That said, I understand that, at times, managers need to accommodate vanity requirements to keep engineering teams busy.
There has been a trend toward separating stateless application logic from stateful databases. One advantage of this separation is that it allows independent scaling.
Although challenging, making operations idempotent helps mitigate concerns about duplicate records. One way to achieve this is by including a request ID in INSERT or UPDATE operations. If a client accidentally sends the same request multiple times, the unique constraint on the request ID in the database ensures that only the first request is processed.
In slogan form: violations of timeliness are “eventual consistency,” whereas violations of integrity are “perpetual inconsistency.”
In practice, some level of data inconsistency may be acceptable. For example, if a customer is charged twice, the issue can be resolved by apologizing and issuing a refund. It is crucial to assess how much inconsistency your business can tolerate.
The author mentions his experience with hardware corruption and bugs in database software. Although very rare, it’s important to keep in mind that they are totally possible.
TODO: