Building Event-Driven Microservices

These are my personal notes on the book Building Event-Driven Services.

Building Event-Driven Services

Chapter 1: Why Event-Driven Microservices

Core Concepts

In event-driven microservices architecture, systems communicate by issuing and consuming events. Unlike message-passing systems, events are not destroyed upon consumption but remain available for other consumers.

Key Characteristics:

  • Services are small and purpose-built
  • Services consume events from input streams, apply business logic, and emit output events
  • Events act as both data storage and communication mechanism

Communication Structures

Three Types of Communication Structures:

  1. Business Communication Structure - Communication between teams and departments
  2. Implementation Communication Structure - Data and logic pertaining to subdomain models
  3. Data Communication Structure - Process of data communication across business and implementations

Conway’s Law and Communication Structures

Organizations’ communication structures greatly influence engineering implementations at both organizational and team levels.

Event-Driven Communication Benefits

Events Are the Basis of Communication:

  • All shareable data is published to event streams
  • Forms a continuous, canonical narrative of everything that happened
  • Events are the data, not merely signals

Event Streams Provide Single Source of Truth:

  • Each event is a statement of fact
  • Together they form the single source of truth
  • Basis of communication for all systems

Key Advantages:

  • Decouples data production from access
  • Consumers perform their own modeling and querying
  • Accessible data supports business communication changes
  • Asynchronous processing enables business logic transformations

Chapter 2: Event-Driven Microservice Fundamentals

Microservice Types

Consumer Microservices: Consume and process events from input streams
Producer Microservices: Produce events to streams for other services
Hybrid: Both consumer and producer (most common)

Topology Concepts

Microservice Topology: Event-driven topology internal to a single microservice
Business Topology: Set of microservices, event streams, and APIs fulfilling complex business functions

Event Structure

Events contain:

  • Complete details of what happened
  • Key/value format (key for identification, routing, aggregation)
  • All information required to accurately describe the event

Table-Stream Duality

Materializing State from Entity Events:

  • Apply entity events in order from event stream
  • Each event is upserted into key/value table
  • Most recent event for given key is represented
  • Tombstone events (null values) handle deletions

Core Principles

Microservice Single Writer Principle:

  • Each event stream has one and only one producing microservice
  • This microservice owns each event produced to that stream

Event Broker Features:

  • Append-only immutable log
  • Durable storage mechanism
  • Single source of truth
  • Identical copies guaranteed to all consumers

Consumption Patterns

Stream Consumption: Each consumer maintains its own offset pointer
Queue Consumption: Each event consumed by one and only one instance


Chapter 3: Communication and Data Contracts

Fundamental Communication Problem

“The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” - Claude Shannon

Data Contracts

Two Components of Well-Defined Data Contract:

  1. Data Definition - What will be produced (fields, types, data structures)
  2. Triggering Logic - Why it is produced (business logic that triggered event creation)

Schema Management

Schema Benefits:

  • Explicit predefined structure prevents brittle implicit contracts
  • Comments and metadata support for communicating meaning
  • Schema evolution rules enable updates without breaking consumers

Compatibility Types:

  • Forward Compatibility - Newer schema data readable with older schema
  • Backward Compatibility - Older schema data readable with newer schema
  • Full Compatibility - Union of forward and backward (strongest guarantee)

Schema Evolution Best Practices

  • Communicate early and clearly with downstream consumers
  • Producer responsibility to resolve schema divergences
  • Leave old entities under old schema in original streams
  • Create new streams for updated entities with new schemas

Event Design Principles

Tell the Truth, the Whole Truth, and Nothing but the Truth:

  • Events must be complete descriptions of what happened
  • Consumers should not need other data sources to understand the event

Best Practices:

  • Use singular event definition per stream
  • Use narrowest data types possible
  • Keep events single-purpose (avoid type fields)
  • Minimize event size while maintaining completeness
  • Involve prospective consumers in design
  • Avoid events as semaphores or signals

Chapter 4: Integrating Event-Driven Architectures with Existing Systems

Data Liberation

Definition: Identification and publication of cross-domain data sets to corresponding event streams as part of migration strategy.

Goals:

  • Enforce single source of truth
  • Eliminate direct coupling between systems
  • Enable new event-driven microservices as consumers

Data Liberation Patterns

Three Main Patterns:

  1. Query-based - Extract data by querying underlying state store
  2. Log-based - Extract data by following append-only log for changes
  3. Table-based - Push data to output queue table, then emit to event streams

Critical Requirement: All patterns must produce events in sorted timestamp order using source record’s updated_at time.

Change-Data Capture

Benefits:

  • Uses data store’s underlying logs (binary logs, write-ahead logs)
  • Real-time data liberation
  • Minimal impact on source systems

Outbox Pattern

Implementation:

  • Outbox table contains notable changes to internal data
  • Single transaction bundles internal updates and outbox updates
  • Prevents divergence with event stream as single source of truth

Event Sinking

Purpose: Consuming event data and inserting into data stores for non-event-driven applications

Use Cases:

  • Integration with legacy systems
  • Replacing point-to-point couplings
  • Batch-based big-data analysis

Chapter 5: Event-Driven Processing Basics

Stateless Topologies

Key Concept: Building microservice topology requires event-driven thinking - code executes in response to event arrival.

Topology Components:

  • Filters - Select relevant events
  • Routers - Direct events to appropriate streams
  • Transformations - Process single event, emit zero or more outputs
  • Materializations - Convert streams to tables
  • Aggregations - Combine multiple events

Stream Operations

Branching: Apply logical operator and output to new stream based on result
Merging: Consume from multiple streams, process, output to single stream

Important: When merging streams, define new unified schema representative of merged domain.

Partition Management

Consumer Groups: Each microservice maintains unique consumer group representing collective offsets

Partition Assignment Strategies: Ensure partitions are evenly distributed across consumer instances

Copartitioning: Event streams with same key, partitioner algorithm, and partition count guarantee data locality for consumer instances.

Failure Recovery

Stateless Recovery: Effectively same as adding new instance to consumer group - no state restoration required, immediate processing after partition assignment.


Chapter 6: Deterministic Stream Processing

Processing States

Two Main States:

  1. Near real-time processing (typical of long-running microservices)
  2. Historical processing (catching up to present time)

Determinism Goal

Microservice should produce same output whether processing in real-time or catching up to present time.

Timestamp Management

Critical Requirements:

  • Synchronized and consistent timestamps across distributed systems
  • Network Time Protocol (NTP) synchronization
  • Event time vs. processing time vs. ingestion time

Event Scheduling

Purpose: Process events consistently for reproducible results

Implementation: Select and dispatch event with oldest timestamp from all assigned input partitions.

When Needed: When order of event processing matters to business logic.

Time Types

Event Time: When event actually occurred (most accurate)
Processing Time: When event is processed by consumer
Ingestion Time: When event is ingested into event broker

Best Practice: Use event time when reliable, ingestion time as fallback.

Watermarks and Stream Time

Watermarks: Declaration that all events of time t and prior have been processed

Stream Time: Highest timestamp of processed events - never decreases, useful for coordinating between consumer instances.

Out-of-Order Events

Definition: Event timestamp isn’t equal to or greater than events ahead of it in stream.

Handling Strategies:

  • Drop event - Window closed, aggregations complete
  • Wait - Delay output until fixed time passes (higher latency)
  • Grace period - Output results, keep window open for updates

Windowing Functions

Tumbling Windows: Fixed size, non-overlapping windows
Sliding Windows: Fixed size with incremental step
Session Windows: Dynamically sized, terminated by timeout

Reprocessing

Key Capability: Rewind consumer offsets and replay from arbitrary point in time.

Requirement: Event scheduling ensures same processing order during reprocessing as real-time.


Chapter 7: Stateful Streaming

State Management

Materialized State: Projection of events from source event stream (immutable)
State Store: Where service’s business state is stored (mutable)

Changelog Streams

Purpose: Record of all changes made to state store data

Benefits:

  • Rebuild state from changelog
  • Checkpoint event processing progress
  • Permanent copy maintained outside microservice instance

Important: Changelog streams should be compacted (only need most recent key/value pairs).

State Store Types

Internal State Store: Coexists in same container/VM as microservice business logic

Global State Store: Materializes all partitions for complete data copy on each instance

  • Useful for small, commonly used, seldom-changing data

Scaling and Recovery

Process: New/recovered instance must materialize state before processing new events

Method: Reload changelog topic for each stateful store (quickest approach)

Hot Replicas: Multiple replicas for faster recovery

State Rebuilding vs Migration

Rebuilding Process:

  1. Stop microservice
  2. Reset consumer offsets to beginning
  3. Delete intermediate state
  4. Start new version - rebuild state from input streams

Transactions and Effectively-Once Processing

Goal: Updates to single source of truth consistently applied regardless of failures

Key Features:

  • Idempotent writes - Event written once and only once
  • Atomic transactions - Multiple events to multiple streams atomically

Deduplication Challenges:

  • Expensive without idempotent producers
  • Requires state store of processed dedupe IDs
  • Generally performed for specific time/offset windows

Chapter 8: Building Workflows with Microservices

Workflow Patterns

Choreography Pattern

Characteristics:

  • Highly decoupled microservice architectures
  • React to input events without blocking or waiting
  • Independent from upstream producers and downstream consumers
  • Emergent behavior from microservice relationships

Benefits:

  • Ideal for independent business workflows
  • Highly decoupled communication

Challenges:

  • Can be brittle across multiple microservice instances
  • Difficult to monitor distributed workflows
  • Business logic changes may require modifying numerous services

Orchestration Pattern

Characteristics:

  • Central orchestrator microservice issues commands to worker microservices
  • Contains entire workflow logic for business process
  • Awaits responses and handles results according to workflow logic

Benefits:

  • Flexible workflow definition within single microservice
  • Better visibility and monitoring
  • Centralized coordination

Best Practice: Orchestrator’s bounded context limited to workflow logic, workers contain business fulfillment logic.

Distributed Transactions

Definition: Transaction spanning two or more microservices

Implementation: Often known as sagas in event-driven world

Best Practice: Avoid when possible due to significant risk and complexity

Requirements:

  • Synchronizing work between systems
  • Facilitating rollbacks
  • Managing transient failures
  • Network connectivity management

Compensation Workflows

Purpose: Handle workflows that don’t need perfect reversibility

Use Case: Customer-facing products where compensatory actions can remedy failures

Benefit: Alternative to complex distributed transactions


Chapter 9: Microservices Using Function-as-a-Service

FaaS Characteristics

Function Behavior:

  • Starts up, runs until completion, terminates
  • No persistent connections or state
  • Scales up/down based on load automatically

Think of FaaS: Basic consumer/producer that regularly fails and must restart

Design Principles

Bounded Context: Functions and internal event streams must strictly belong to bounded context

Consumer Groups: Each function-based microservice must have independent consumer group

Offset Commits: Best practice is commit only after processing completed

Cold Start vs Warm Start

Cold Start: Default state upon starting first time or after inactivity
Warm Start: Function revived from hibernation cache

Suitable Use Cases

Ideal for:

  • Simple topologies
  • Stateless processing
  • Non-deterministic processing of multiple event streams
  • Wide scaling (queue-based processing)

Security: Use strict access permissions - nothing outside bounded context allowed access

Communication Patterns

Event-Driven: Output of one function produced to event stream for consuming function

Request-Response: Direct calls between functions

Hybrid: Combination of both patterns

Important: Complete processing of one event before processing next to avoid out-of-order issues

Function Optimization

Tuning Considerations:

  • Allocate sufficient resources based on workload
  • Optimize resource usage for performance and cost
  • Consider termination timeouts (typically 5-10 minutes)

Chapter 10: Basic Producer and Consumer Microservices

BPC Characteristics

Basic Producer and Consumer (BPC) microservices:

  • Ingest events from input streams
  • Apply transformations or business logic
  • Emit events to output streams
  • Use basic consumer and producer clients (no advanced features)

What BPCs Don’t Include:

  • Event scheduling
  • Watermarks
  • Materialization mechanisms
  • Changelogs
  • Horizontal scaling with local state stores

Suitable Use Cases

Simple Patterns:

  • Stateless transformations
  • Stateful patterns where deterministic event scheduling not required

Integration Scenarios:

  • Legacy system integration
  • Sidecar pattern for systems that can’t be modified safely

Gating Pattern: Business processes not reliant on event order but requiring all events eventually arrive

Data Layer Heavy: When underlying data layer performs most business logic (geospatial, search, ML/AI)

Hybrid Applications

External Stream Processing: BPC can leverage external stream-processing systems for complex operations while maintaining access to language features and libraries

Example: External framework for complex aggregations, BPC for populating local data store and serving request-response queries

Limitations

BPCs require investment in libraries for:

  • Simple state materialization
  • Event scheduling
  • Timestamp-based decision making

Chapter 13: Integrating Event-Driven and Request-Response Microservices

Integration Necessity

Event-driven patterns cannot serve all business needs - request-response endpoints provide real-time data serving capabilities.

Use Cases for Request-Response

  • Human-driven interactions
  • Machine-driven external system communication
  • Real-time data queries
  • Synchronous API requirements

Integration Patterns

External System Integration:

  • Convert API requests/responses to events
  • Enable asynchronous processing by event-driven microservices

Human Interface Integration:

  • Convert user interactions to events
  • Asynchronous processing with UI feedback indicating async handling

Chapter 14: Supportive Tooling

Ownership and Governance

Explicit Ownership Tracking:

  • Single writer principle attributes stream ownership to producing microservice
  • Event stream metadata tagging for ownership assignment
  • Only owning teams can modify metadata tags

Event Stream Management

Creation and Modification Rights:

  • Teams can automatically create internal event streams
  • Full control over partition count, retention policy, replication factor

Schema Registry

Critical Service for schema management providing:

Benefits:

  • Event schema not transported with event (uses placeholder ID)
  • Significantly reduced bandwidth usage
  • Single point of reference for obtaining schemas
  • Data discovery capabilities with free-text search

Schema Registry Features:

  • Precise data definitions (names, types, defaults, documentation)
  • Clarity for producers and consumers
  • Version management and evolution tracking

Chapter 15: Testing Event-Driven Microservices

Testing Levels

Unit Testing: Test smallest pieces of code to ensure expected functionality - foundation for larger tests

Topology Testing: More complex than unit tests - exercises entire topology as specified by business logic

Think of topology: Single, large, complex function with many moving parts

Schema Compatibility Testing

Automated Checks: Pull schemas from schema registry and perform evolutionary rule checking as part of code submission process

Ensures: Output schemas compatible with previous schemas according to stream evolution rules

Integration Testing

Two Main Flavors:

Local Integration Testing:

  • Testing performed on localized replica of production environment
  • Faster feedback loops
  • Isolated from external dependencies

Remote Integration Testing:

  • Microservice executed on environment external to local system
  • More realistic conditions
  • Tests actual integration points

Testing Strategy

Comprehensive Approach:

  • Unit tests for individual components
  • Topology tests for business logic flows
  • Schema compatibility for evolution safety
  • Integration tests for end-to-end validation