Foundations of Scalable Systems - Designing Distributed Architectures
My personal notes on the book [Foundations of Scalable Systems: Designing Distributed Architectures](https://www.oreilly.com/library/view/foundations-of-scalable/9781098106058/].
Asynchronous Messaging
- Conceptually, a messaging system comprises the following:
- Message queues: queues that store a sequence of messages.
- Producers: send messages to queues.
- Consumers: retrieve messages from queues.
- Message broker: manages one or more queues.
- When messages are sent from producers to a queue, the broker adds messages to the queue in the order they arrive — basically a FIFO approach.
- There are two modes of behavior for consumers to retrieve messages:
- in pull mode, consumers send a request to the broker, which responds with the next message available for processing.
- in push mode, a consumer informs the broker that it wishes to receive messages from a queue. The consumer provides a callback function that should be invoked when a message is available.
- Consumers will also acknowledge message receipt. Upon consumer acknowledgment, the broker is free to mark a message as delivered and remove it from the queue.
- A one-to-many messaging requirement is known as a publish–subscribe architecture pattern. In publish–subscribe systems, message queues are known as topics. A topic is basically a message queue that delivers each published message to one of more subscribers.
- A message broker is potentially a single point of failure, therefore most message brokers enable logical queues and topics to be physically replicated across multiple brokers, each running on their own node.
- Message queues trade data safety for performance. Therefore, you need to take into
account scenarios such as when:
- a producer sends a message to a broker and message is not successfully accepted by the broker.
- a message is in a queue and the broker crashes.
- a message is successfully delivered to the consumer but the consumer fails before fully processing the message.
- Common messaging patterns exist, such as:
- Competing Consumers: multiple consumers compete for the same message. This helps distribute messages between consumers and increase throughput.
- Exactly Once Processing: sometimes we need to put in place measures to ensure idempotent processing. They should be implemented by the consumer.
- Poison Messages: sometimes messages delivered to consumers can’t be processed, and they “poison” our system. To mitigate them, we need to limit the number of times a message, and send them to a dead-letter queue once the redelivery limit is reached.
Serverless
- Tweak lambdas to extract more performance with lower costs - sometimes allocating more resources may be more expensive at first glance, but the execution time may reduce in a higher factor, thus saving you money in the end.
Microservices
- Apart from timeouts and the circuit breaker pattern, there is also another one called the bulkhead pattern. It ensures requests to one API in a microservice don’t utilize all available resources during a request burst.
Eventual Consistency
- Versions can be used to handle concurrent updates and not lose data.
Strong Consistency
- NewSQL databases are distributed databases that offer ACID guarantees.
- In systems based on eventually consistent databases, applications must be aware of the precise consistency guarantees of the underlying data store, and be designed to deal with these accordingly. In contrast, strongly consistent databases aim to deliver the same consistency guarantees as single-node systems.
- Strong consistency is used to describe two subtly diffrent concepts in distributed databases:
- Transactional consistency, the C in ACID, also known as serializability.
- Replica consistency, also known as linearizabirity, meaning that all clients see the same value for a data object after it has been updated.
- Consensus algorithms are used to ensure strong consistency guarantees in distributed databases.
- From a developers perspective, distributed transactions works the same way as regular transactions - you simply wait or the database to inform the transaction outcome.
- Two-Phase Commit (2PC) is a classic consensus algorithm.
- The protocol is driven by a coordinator (or leader), either an external service, or one partition that is being updated sa part of a multipartition transactional update.
- When a database starts a transaction, a coordinator is selected.
- The coordinator allocates a globally unique transaction identifier (
tid) and returns this to the client. This identies a transaction context that is maintained by the coordinator. - The client executes the operations defined by the transaction, passing the
tidto each participant. - Once all operations are completed, the client tries to commit the transaction. The 2PC algorithm commences by driving two rounds of votes with the participants:
- Prepare phase: the coordinator sends a message to all participants to prepare to commit the transaction. Each participant then informs the coordinator about its decision.
- Resolve phase: when all participants return, the coordinator examines the results. If all participants can commit, the coordinator sends a commit message to them, otherwise, it sends an abort message.
- 2PC has two failure methods: participant failure and coordinator failure. Participant failure does not threaten consistency, since the correct transaction outcome is reached. However, a coordinator failure blocks the participants from acting. To guarantee consistency, the coordinator has to recover and examine its pre-crash transaction log - the cost is paid in availability.
- Distributed consensus approaches are based on a class of algorithms called atomic broadcast, total order broadcast, or replied state machines.
- To be fault tolerant, a consensus algorithm has to handle to leader and follower failures. It will involve electing a new leader, and have consensus on all its followers. Since followers may also fail, the algorithms must be designed to operate with a quorum (majority).
- Raft is a leader-based algorithm. The leader mantains a log of operations, and Raft essentially repliaces this log to all members of the system.
- Leader election: the leader sends a periodic heartbeat to followers, which then maintain an election timer. If the timer expires before another heartbeat is received, the follower starts an election. The vote is cast on whichever has the greatest term (a monotonically increasing value assigned to the leader). This ensures the elected leader has all the committed entries from previous terms in its log.
Scalable Event-Driven Processing
- In event-driven architectures, events represent that something interesting has happened in the application context.
- You can implement an event-based architecture using messaging systems like RabbitMQ’s publish/subscribe features, but it
has the effect of destroying any explicit record of the event. However, keeping a permanent record of immutable events in a simple log data structure has some useful characteristics.
- You can introduce new event consumers at any time.
- You can modify existing event-processing logic, either to add new features or fix, and then execute the new logic on the complete log to enrich results or fix errors.
- If a server or disk failure occurs, you can restore the last known state and replay events from the log to restore the data set.
- Some use cases require log entries to be deleted, like the “right to be forgotten” regulatory requirements. Apacha Kafka provides two mechanisms to achieve that: time to live and compacted topics.
- Kafka is a distributed persistent log store that employs a dumb broker/smart clients architecture.
- Kafka topics are persistent append-only logs. Consumers read events from them by specifying the name of the topic and the index (or offset) of the message they want to read. Reading an event is nondestructive. Kafka increments the consumer’s offset in the topic automatically to point to the next unprocessed event in the topic.
- Partitioning a topic has an implication for event ordering - here is no total order of events across partitions. It also affects concurrent event delivery to multiple consumers.
Stream Processing Systems
- Streaming systems process new data and events in real time - which can mean latencies from less than a second to a few seconds.
- Stream processing systems provide the capabilities for processing nodes to transform an input stream at one node into a new stream that is processed by one or more downstream nodes.