These are my personal notes on the book Building Event-Driven Services.

Building Event-Driven Services

Chapter 1: Why Event-Driven Microservices

Core Concepts

In event-driven microservices architecture, systems communicate by issuing and consuming events. Unlike message-passing systems, events are not destroyed upon consumption but remain available for other consumers.

Key Characteristics:

Services are small and purpose-built
Services consume events from input streams, apply business logic, and emit output events
Events act as both data storage and communication mechanism

Communication Structures

Three Types of Communication Structures:

Business Communication Structure - Communication between teams and departments
Implementation Communication Structure - Data and logic pertaining to subdomain models
Data Communication Structure - Process of data communication across business and implementations

Conway’s Law and Communication Structures

Organizations’ communication structures greatly influence engineering implementations at both organizational and team levels.

Event-Driven Communication Benefits

Events Are the Basis of Communication:

All shareable data is published to event streams
Forms a continuous, canonical narrative of everything that happened
Events are the data, not merely signals

Event Streams Provide Single Source of Truth:

Each event is a statement of fact
Together they form the single source of truth
Basis of communication for all systems

Key Advantages:

Decouples data production from access
Consumers perform their own modeling and querying
Accessible data supports business communication changes
Asynchronous processing enables business logic transformations

Chapter 2: Event-Driven Microservice Fundamentals

Microservice Types

Consumer Microservices: Consume and process events from input streams
Producer Microservices: Produce events to streams for other services
Hybrid: Both consumer and producer (most common)

Topology Concepts

Microservice Topology: Event-driven topology internal to a single microservice
Business Topology: Set of microservices, event streams, and APIs fulfilling complex business functions

Event Structure

Events contain:

Complete details of what happened
Key/value format (key for identification, routing, aggregation)
All information required to accurately describe the event

Table-Stream Duality

Materializing State from Entity Events:

Apply entity events in order from event stream
Each event is upserted into key/value table
Most recent event for given key is represented
Tombstone events (null values) handle deletions

Core Principles

Microservice Single Writer Principle:

Each event stream has one and only one producing microservice
This microservice owns each event produced to that stream

Event Broker Features:

Append-only immutable log
Durable storage mechanism
Single source of truth
Identical copies guaranteed to all consumers

Consumption Patterns

Stream Consumption: Each consumer maintains its own offset pointer
Queue Consumption: Each event consumed by one and only one instance

Chapter 3: Communication and Data Contracts

Fundamental Communication Problem

“The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” - Claude Shannon

Data Contracts

Two Components of Well-Defined Data Contract:

Data Definition - What will be produced (fields, types, data structures)
Triggering Logic - Why it is produced (business logic that triggered event creation)

Schema Management

Schema Benefits:

Explicit predefined structure prevents brittle implicit contracts
Comments and metadata support for communicating meaning
Schema evolution rules enable updates without breaking consumers

Compatibility Types:

Forward Compatibility - Newer schema data readable with older schema
Backward Compatibility - Older schema data readable with newer schema
Full Compatibility - Union of forward and backward (strongest guarantee)

Schema Evolution Best Practices

Communicate early and clearly with downstream consumers
Producer responsibility to resolve schema divergences
Leave old entities under old schema in original streams
Create new streams for updated entities with new schemas

Event Design Principles

Tell the Truth, the Whole Truth, and Nothing but the Truth:

Events must be complete descriptions of what happened
Consumers should not need other data sources to understand the event

Best Practices:

Use singular event definition per stream
Use narrowest data types possible
Keep events single-purpose (avoid type fields)
Minimize event size while maintaining completeness
Involve prospective consumers in design
Avoid events as semaphores or signals

Chapter 4: Integrating Event-Driven Architectures with Existing Systems

Data Liberation

Definition: Identification and publication of cross-domain data sets to corresponding event streams as part of migration strategy.

Goals:

Enforce single source of truth
Eliminate direct coupling between systems
Enable new event-driven microservices as consumers

Data Liberation Patterns

Three Main Patterns:

Query-based - Extract data by querying underlying state store
Log-based - Extract data by following append-only log for changes
Table-based - Push data to output queue table, then emit to event streams

Critical Requirement: All patterns must produce events in sorted timestamp order using source record’s updated_at time.

Change-Data Capture

Benefits:

Uses data store’s underlying logs (binary logs, write-ahead logs)
Real-time data liberation
Minimal impact on source systems

Outbox Pattern

Implementation:

Outbox table contains notable changes to internal data
Single transaction bundles internal updates and outbox updates
Prevents divergence with event stream as single source of truth

Event Sinking

Purpose: Consuming event data and inserting into data stores for non-event-driven applications

Use Cases:

Integration with legacy systems
Replacing point-to-point couplings
Batch-based big-data analysis

Chapter 5: Event-Driven Processing Basics

Stateless Topologies

Key Concept: Building microservice topology requires event-driven thinking - code executes in response to event arrival.

Topology Components:

Filters - Select relevant events
Routers - Direct events to appropriate streams
Transformations - Process single event, emit zero or more outputs
Materializations - Convert streams to tables
Aggregations - Combine multiple events

Stream Operations

Branching: Apply logical operator and output to new stream based on result
Merging: Consume from multiple streams, process, output to single stream

Important: When merging streams, define new unified schema representative of merged domain.

Partition Management

Consumer Groups: Each microservice maintains unique consumer group representing collective offsets

Partition Assignment Strategies: Ensure partitions are evenly distributed across consumer instances

Copartitioning: Event streams with same key, partitioner algorithm, and partition count guarantee data locality for consumer instances.

Failure Recovery

Stateless Recovery: Effectively same as adding new instance to consumer group - no state restoration required, immediate processing after partition assignment.

Chapter 6: Deterministic Stream Processing

Processing States

Two Main States:

Near real-time processing (typical of long-running microservices)
Historical processing (catching up to present time)

Determinism Goal

Microservice should produce same output whether processing in real-time or catching up to present time.

Timestamp Management

Critical Requirements:

Synchronized and consistent timestamps across distributed systems
Network Time Protocol (NTP) synchronization
Event time vs. processing time vs. ingestion time

Event Scheduling

Purpose: Process events consistently for reproducible results

Implementation: Select and dispatch event with oldest timestamp from all assigned input partitions.

When Needed: When order of event processing matters to business logic.

Time Types

Event Time: When event actually occurred (most accurate)
Processing Time: When event is processed by consumer
Ingestion Time: When event is ingested into event broker

Best Practice: Use event time when reliable, ingestion time as fallback.

Watermarks and Stream Time

Watermarks: Declaration that all events of time t and prior have been processed

Stream Time: Highest timestamp of processed events - never decreases, useful for coordinating between consumer instances.

Out-of-Order Events

Definition: Event timestamp isn’t equal to or greater than events ahead of it in stream.

Handling Strategies:

Drop event - Window closed, aggregations complete
Wait - Delay output until fixed time passes (higher latency)
Grace period - Output results, keep window open for updates

Windowing Functions

Tumbling Windows: Fixed size, non-overlapping windows
Sliding Windows: Fixed size with incremental step
Session Windows: Dynamically sized, terminated by timeout

Reprocessing

Key Capability: Rewind consumer offsets and replay from arbitrary point in time.

Requirement: Event scheduling ensures same processing order during reprocessing as real-time.

Chapter 7: Stateful Streaming

State Management

Materialized State: Projection of events from source event stream (immutable)
State Store: Where service’s business state is stored (mutable)

Changelog Streams

Purpose: Record of all changes made to state store data

Benefits:

Rebuild state from changelog
Checkpoint event processing progress
Permanent copy maintained outside microservice instance

Important: Changelog streams should be compacted (only need most recent key/value pairs).

State Store Types

Internal State Store: Coexists in same container/VM as microservice business logic

Global State Store: Materializes all partitions for complete data copy on each instance

Useful for small, commonly used, seldom-changing data

Scaling and Recovery

Process: New/recovered instance must materialize state before processing new events

Method: Reload changelog topic for each stateful store (quickest approach)

Hot Replicas: Multiple replicas for faster recovery

State Rebuilding vs Migration

Rebuilding Process:

Stop microservice
Reset consumer offsets to beginning
Delete intermediate state
Start new version - rebuild state from input streams

Transactions and Effectively-Once Processing

Goal: Updates to single source of truth consistently applied regardless of failures

Key Features:

Idempotent writes - Event written once and only once
Atomic transactions - Multiple events to multiple streams atomically

Deduplication Challenges:

Expensive without idempotent producers
Requires state store of processed dedupe IDs
Generally performed for specific time/offset windows

Chapter 8: Building Workflows with Microservices

Workflow Patterns

Choreography Pattern

Characteristics:

Highly decoupled microservice architectures
React to input events without blocking or waiting
Independent from upstream producers and downstream consumers
Emergent behavior from microservice relationships

Benefits:

Ideal for independent business workflows
Highly decoupled communication

Challenges:

Can be brittle across multiple microservice instances
Difficult to monitor distributed workflows
Business logic changes may require modifying numerous services

Orchestration Pattern

Characteristics:

Central orchestrator microservice issues commands to worker microservices
Contains entire workflow logic for business process
Awaits responses and handles results according to workflow logic

Benefits:

Flexible workflow definition within single microservice
Better visibility and monitoring
Centralized coordination

Best Practice: Orchestrator’s bounded context limited to workflow logic, workers contain business fulfillment logic.

Distributed Transactions

Definition: Transaction spanning two or more microservices

Implementation: Often known as sagas in event-driven world

Best Practice: Avoid when possible due to significant risk and complexity

Requirements:

Synchronizing work between systems
Facilitating rollbacks
Managing transient failures
Network connectivity management

Compensation Workflows

Purpose: Handle workflows that don’t need perfect reversibility

Use Case: Customer-facing products where compensatory actions can remedy failures

Benefit: Alternative to complex distributed transactions

Chapter 9: Microservices Using Function-as-a-Service

FaaS Characteristics

Function Behavior:

Starts up, runs until completion, terminates
No persistent connections or state
Scales up/down based on load automatically

Think of FaaS: Basic consumer/producer that regularly fails and must restart

Design Principles

Bounded Context: Functions and internal event streams must strictly belong to bounded context

Consumer Groups: Each function-based microservice must have independent consumer group

Offset Commits: Best practice is commit only after processing completed

Cold Start vs Warm Start

Cold Start: Default state upon starting first time or after inactivity
Warm Start: Function revived from hibernation cache

Suitable Use Cases

Ideal for:

Simple topologies
Stateless processing
Non-deterministic processing of multiple event streams
Wide scaling (queue-based processing)

Security: Use strict access permissions - nothing outside bounded context allowed access

Communication Patterns

Event-Driven: Output of one function produced to event stream for consuming function

Request-Response: Direct calls between functions

Hybrid: Combination of both patterns

Important: Complete processing of one event before processing next to avoid out-of-order issues

Function Optimization

Tuning Considerations:

Allocate sufficient resources based on workload
Optimize resource usage for performance and cost
Consider termination timeouts (typically 5-10 minutes)

Chapter 10: Basic Producer and Consumer Microservices

BPC Characteristics

Basic Producer and Consumer (BPC) microservices:

Ingest events from input streams
Apply transformations or business logic
Emit events to output streams
Use basic consumer and producer clients (no advanced features)

What BPCs Don’t Include:

Event scheduling
Watermarks
Materialization mechanisms
Changelogs
Horizontal scaling with local state stores

Suitable Use Cases

Simple Patterns:

Stateless transformations
Stateful patterns where deterministic event scheduling not required

Integration Scenarios:

Legacy system integration
Sidecar pattern for systems that can’t be modified safely

Gating Pattern: Business processes not reliant on event order but requiring all events eventually arrive

Data Layer Heavy: When underlying data layer performs most business logic (geospatial, search, ML/AI)

Hybrid Applications

External Stream Processing: BPC can leverage external stream-processing systems for complex operations while maintaining access to language features and libraries

Example: External framework for complex aggregations, BPC for populating local data store and serving request-response queries

Limitations

BPCs require investment in libraries for:

Simple state materialization
Event scheduling
Timestamp-based decision making

Chapter 13: Integrating Event-Driven and Request-Response Microservices

Integration Necessity

Event-driven patterns cannot serve all business needs - request-response endpoints provide real-time data serving capabilities.

Use Cases for Request-Response

Human-driven interactions
Machine-driven external system communication
Real-time data queries
Synchronous API requirements

Integration Patterns

External System Integration:

Convert API requests/responses to events
Enable asynchronous processing by event-driven microservices

Human Interface Integration:

Convert user interactions to events
Asynchronous processing with UI feedback indicating async handling

Chapter 14: Supportive Tooling

Ownership and Governance

Explicit Ownership Tracking:

Single writer principle attributes stream ownership to producing microservice
Event stream metadata tagging for ownership assignment
Only owning teams can modify metadata tags

Event Stream Management

Creation and Modification Rights:

Teams can automatically create internal event streams
Full control over partition count, retention policy, replication factor

Schema Registry

Critical Service for schema management providing:

Benefits:

Event schema not transported with event (uses placeholder ID)
Significantly reduced bandwidth usage
Single point of reference for obtaining schemas
Data discovery capabilities with free-text search

Schema Registry Features:

Precise data definitions (names, types, defaults, documentation)
Clarity for producers and consumers
Version management and evolution tracking

Chapter 15: Testing Event-Driven Microservices

Testing Levels

Unit Testing: Test smallest pieces of code to ensure expected functionality - foundation for larger tests

Topology Testing: More complex than unit tests - exercises entire topology as specified by business logic

Think of topology: Single, large, complex function with many moving parts

Schema Compatibility Testing

Automated Checks: Pull schemas from schema registry and perform evolutionary rule checking as part of code submission process

Ensures: Output schemas compatible with previous schemas according to stream evolution rules

Integration Testing

Two Main Flavors:

Local Integration Testing:

Testing performed on localized replica of production environment
Faster feedback loops
Isolated from external dependencies

Remote Integration Testing:

Microservice executed on environment external to local system
More realistic conditions
Tests actual integration points

Testing Strategy

Comprehensive Approach:

Unit tests for individual components
Topology tests for business logic flows
Schema compatibility for evolution safety
Integration tests for end-to-end validation

Building Event-Driven Microservices

Building Event-Driven Services

Chapter 1: Why Event-Driven Microservices

Core Concepts

Communication Structures

Conway’s Law and Communication Structures

Event-Driven Communication Benefits

Chapter 2: Event-Driven Microservice Fundamentals

Microservice Types

Topology Concepts

Event Structure

Table-Stream Duality

Core Principles

Consumption Patterns

Chapter 3: Communication and Data Contracts

Fundamental Communication Problem

Data Contracts

Schema Management

Schema Evolution Best Practices

Event Design Principles

Chapter 4: Integrating Event-Driven Architectures with Existing Systems

Data Liberation

Data Liberation Patterns

Change-Data Capture

Outbox Pattern

Event Sinking

Chapter 5: Event-Driven Processing Basics

Stateless Topologies

Stream Operations

Partition Management

Failure Recovery

Chapter 6: Deterministic Stream Processing

Processing States

Determinism Goal

Timestamp Management

Event Scheduling

Time Types

Watermarks and Stream Time

Out-of-Order Events

Windowing Functions

Reprocessing

Chapter 7: Stateful Streaming

State Management

Changelog Streams

State Store Types

Scaling and Recovery

State Rebuilding vs Migration

Transactions and Effectively-Once Processing

Chapter 8: Building Workflows with Microservices

Workflow Patterns

Choreography Pattern

Orchestration Pattern

Distributed Transactions

Compensation Workflows

Chapter 9: Microservices Using Function-as-a-Service

FaaS Characteristics

Design Principles

Cold Start vs Warm Start

Suitable Use Cases

Communication Patterns

Function Optimization

Chapter 10: Basic Producer and Consumer Microservices

BPC Characteristics

Suitable Use Cases

Hybrid Applications

Limitations

Chapter 13: Integrating Event-Driven and Request-Response Microservices

Integration Necessity

Use Cases for Request-Response

Integration Patterns

Chapter 14: Supportive Tooling

Ownership and Governance

Event Stream Management

Schema Registry

Chapter 15: Testing Event-Driven Microservices

Testing Levels

Schema Compatibility Testing

Integration Testing

Testing Strategy