Foundations of Scalable Systems - Designing Distributed Architectures

My personal notes on the book [Foundations of Scalable Systems: Designing Distributed Architectures](https://www.oreilly.com/library/view/foundations-of-scalable/9781098106058/].

Asynchronous Messaging

Conceptually, a messaging system comprises the following:
- Message queues: queues that store a sequence of messages.
- Producers: send messages to queues.
- Consumers: retrieve messages from queues.
- Message broker: manages one or more queues.
When messages are sent from producers to a queue, the broker adds messages to the queue in the order they arrive — basically a FIFO approach.
There are two modes of behavior for consumers to retrieve messages:
- in pull mode, consumers send a request to the broker, which responds with the next message available for processing.
- in push mode, a consumer informs the broker that it wishes to receive messages from a queue. The consumer provides a callback function that should be invoked when a message is available.
Consumers will also acknowledge message receipt. Upon consumer acknowledgment, the broker is free to mark a message as delivered and remove it from the queue.
A one-to-many messaging requirement is known as a publish–subscribe architecture pattern. In publish–subscribe systems, message queues are known as topics. A topic is basically a message queue that delivers each published message to one of more subscribers.
A message broker is potentially a single point of failure, therefore most message brokers enable logical queues and topics to be physically replicated across multiple brokers, each running on their own node.
Message queues trade data safety for performance. Therefore, you need to take into account scenarios such as when:
- a producer sends a message to a broker and message is not successfully accepted by the broker.
- a message is in a queue and the broker crashes.
- a message is successfully delivered to the consumer but the consumer fails before fully processing the message.
Common messaging patterns exist, such as:
- Competing Consumers: multiple consumers compete for the same message. This helps distribute messages between consumers and increase throughput.
- Exactly Once Processing: sometimes we need to put in place measures to ensure idempotent processing. They should be implemented by the consumer.
- Poison Messages: sometimes messages delivered to consumers can’t be processed, and they “poison” our system. To mitigate them, we need to limit the number of times a message, and send them to a dead-letter queue once the redelivery limit is reached.

Serverless

Tweak lambdas to extract more performance with lower costs - sometimes allocating more resources may be more expensive at first glance, but the execution time may reduce in a higher factor, thus saving you money in the end.

Microservices

Apart from timeouts and the circuit breaker pattern, there is also another one called the bulkhead pattern. It ensures requests to one API in a microservice don’t utilize all available resources during a request burst.

Eventual Consistency

Versions can be used to handle concurrent updates and not lose data.

Strong Consistency

NewSQL databases are distributed databases that offer ACID guarantees.
In systems based on eventually consistent databases, applications must be aware of the precise consistency guarantees of the underlying data store, and be designed to deal with these accordingly. In contrast, strongly consistent databases aim to deliver the same consistency guarantees as single-node systems.
Strong consistency is used to describe two subtly diffrent concepts in distributed databases:
- Transactional consistency, the C in ACID, also known as serializability.
- Replica consistency, also known as linearizabirity, meaning that all clients see the same value for a data object after it has been updated.
Consensus algorithms are used to ensure strong consistency guarantees in distributed databases.
From a developers perspective, distributed transactions works the same way as regular transactions - you simply wait or the database to inform the transaction outcome.
Two-Phase Commit (2PC) is a classic consensus algorithm.
- The protocol is driven by a coordinator (or leader), either an external service, or one partition that is being updated sa part of a multipartition transactional update.
- When a database starts a transaction, a coordinator is selected.
- The coordinator allocates a globally unique transaction identifier (tid) and returns this to the client. This identies a transaction context that is maintained by the coordinator.
- The client executes the operations defined by the transaction, passing the tid to each participant.
- Once all operations are completed, the client tries to commit the transaction. The 2PC algorithm commences by driving two rounds of votes with the participants:
  1. Prepare phase: the coordinator sends a message to all participants to prepare to commit the transaction. Each participant then informs the coordinator about its decision.
  2. Resolve phase: when all participants return, the coordinator examines the results. If all participants can commit, the coordinator sends a commit message to them, otherwise, it sends an abort message.
- 2PC has two failure methods: participant failure and coordinator failure. Participant failure does not threaten consistency, since the correct transaction outcome is reached. However, a coordinator failure blocks the participants from acting. To guarantee consistency, the coordinator has to recover and examine its pre-crash transaction log - the cost is paid in availability.