These are my personal notes on the book Building Event-Driven Services.
Building Event-Driven Services
Chapter 1: Why Event-Driven Microservices
Core Concepts
In event-driven microservices architecture, systems communicate by issuing and consuming events. Unlike message-passing systems, events are not destroyed upon consumption but remain available for other consumers.
Key Characteristics:
- Services are small and purpose-built
- Services consume events from input streams, apply business logic, and emit output events
- Events act as both data storage and communication mechanism
Communication Structures
Three Types of Communication Structures:
- Business Communication Structure - Communication between teams and departments
- Implementation Communication Structure - Data and logic pertaining to subdomain models
- Data Communication Structure - Process of data communication across business and implementations
Conway’s Law and Communication Structures
Organizations’ communication structures greatly influence engineering implementations at both organizational and team levels.
Event-Driven Communication Benefits
Events Are the Basis of Communication:
- All shareable data is published to event streams
- Forms a continuous, canonical narrative of everything that happened
- Events are the data, not merely signals
Event Streams Provide Single Source of Truth:
- Each event is a statement of fact
- Together they form the single source of truth
- Basis of communication for all systems
Key Advantages:
- Decouples data production from access
- Consumers perform their own modeling and querying
- Accessible data supports business communication changes
- Asynchronous processing enables business logic transformations
Chapter 2: Event-Driven Microservice Fundamentals
Microservice Types
Consumer Microservices: Consume and process events from input streams
Producer Microservices: Produce events to streams for other services
Hybrid: Both consumer and producer (most common)
Topology Concepts
Microservice Topology: Event-driven topology internal to a single microservice
Business Topology: Set of microservices, event streams, and APIs fulfilling complex business functions
Event Structure
Events contain:
- Complete details of what happened
- Key/value format (key for identification, routing, aggregation)
- All information required to accurately describe the event
Table-Stream Duality
Materializing State from Entity Events:
- Apply entity events in order from event stream
- Each event is upserted into key/value table
- Most recent event for given key is represented
- Tombstone events (null values) handle deletions
Core Principles
Microservice Single Writer Principle:
- Each event stream has one and only one producing microservice
- This microservice owns each event produced to that stream
Event Broker Features:
- Append-only immutable log
- Durable storage mechanism
- Single source of truth
- Identical copies guaranteed to all consumers
Consumption Patterns
Stream Consumption: Each consumer maintains its own offset pointer
Queue Consumption: Each event consumed by one and only one instance
Chapter 3: Communication and Data Contracts
Fundamental Communication Problem
“The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” - Claude Shannon
Data Contracts
Two Components of Well-Defined Data Contract:
- Data Definition - What will be produced (fields, types, data structures)
- Triggering Logic - Why it is produced (business logic that triggered event creation)
Schema Management
Schema Benefits:
- Explicit predefined structure prevents brittle implicit contracts
- Comments and metadata support for communicating meaning
- Schema evolution rules enable updates without breaking consumers
Compatibility Types:
- Forward Compatibility - Newer schema data readable with older schema
- Backward Compatibility - Older schema data readable with newer schema
- Full Compatibility - Union of forward and backward (strongest guarantee)
Schema Evolution Best Practices
- Communicate early and clearly with downstream consumers
- Producer responsibility to resolve schema divergences
- Leave old entities under old schema in original streams
- Create new streams for updated entities with new schemas
Event Design Principles
Tell the Truth, the Whole Truth, and Nothing but the Truth:
- Events must be complete descriptions of what happened
- Consumers should not need other data sources to understand the event
Best Practices:
- Use singular event definition per stream
- Use narrowest data types possible
- Keep events single-purpose (avoid type fields)
- Minimize event size while maintaining completeness
- Involve prospective consumers in design
- Avoid events as semaphores or signals
Chapter 4: Integrating Event-Driven Architectures with Existing Systems
Data Liberation
Definition: Identification and publication of cross-domain data sets to corresponding event streams as part of migration strategy.
Goals:
- Enforce single source of truth
- Eliminate direct coupling between systems
- Enable new event-driven microservices as consumers
Data Liberation Patterns
Three Main Patterns:
- Query-based - Extract data by querying underlying state store
- Log-based - Extract data by following append-only log for changes
- Table-based - Push data to output queue table, then emit to event streams
Critical Requirement: All patterns must produce events in sorted timestamp order using source record’s updated_at time.
Change-Data Capture
Benefits:
- Uses data store’s underlying logs (binary logs, write-ahead logs)
- Real-time data liberation
- Minimal impact on source systems
Outbox Pattern
Implementation:
- Outbox table contains notable changes to internal data
- Single transaction bundles internal updates and outbox updates
- Prevents divergence with event stream as single source of truth
Event Sinking
Purpose: Consuming event data and inserting into data stores for non-event-driven applications
Use Cases:
- Integration with legacy systems
- Replacing point-to-point couplings
- Batch-based big-data analysis
Chapter 5: Event-Driven Processing Basics
Stateless Topologies
Key Concept: Building microservice topology requires event-driven thinking - code executes in response to event arrival.
Topology Components:
- Filters - Select relevant events
- Routers - Direct events to appropriate streams
- Transformations - Process single event, emit zero or more outputs
- Materializations - Convert streams to tables
- Aggregations - Combine multiple events
Stream Operations
Branching: Apply logical operator and output to new stream based on result
Merging: Consume from multiple streams, process, output to single stream
Important: When merging streams, define new unified schema representative of merged domain.
Partition Management
Consumer Groups: Each microservice maintains unique consumer group representing collective offsets
Partition Assignment Strategies: Ensure partitions are evenly distributed across consumer instances
Copartitioning: Event streams with same key, partitioner algorithm, and partition count guarantee data locality for consumer instances.
Failure Recovery
Stateless Recovery: Effectively same as adding new instance to consumer group - no state restoration required, immediate processing after partition assignment.
Chapter 6: Deterministic Stream Processing
Processing States
Two Main States:
- Near real-time processing (typical of long-running microservices)
- Historical processing (catching up to present time)
Determinism Goal
Microservice should produce same output whether processing in real-time or catching up to present time.
Timestamp Management
Critical Requirements:
- Synchronized and consistent timestamps across distributed systems
- Network Time Protocol (NTP) synchronization
- Event time vs. processing time vs. ingestion time
Event Scheduling
Purpose: Process events consistently for reproducible results
Implementation: Select and dispatch event with oldest timestamp from all assigned input partitions.
When Needed: When order of event processing matters to business logic.
Time Types
Event Time: When event actually occurred (most accurate)
Processing Time: When event is processed by consumer
Ingestion Time: When event is ingested into event broker
Best Practice: Use event time when reliable, ingestion time as fallback.
Watermarks and Stream Time
Watermarks: Declaration that all events of time t and prior have been processed
Stream Time: Highest timestamp of processed events - never decreases, useful for coordinating between consumer instances.
Out-of-Order Events
Definition: Event timestamp isn’t equal to or greater than events ahead of it in stream.
Handling Strategies:
- Drop event - Window closed, aggregations complete
- Wait - Delay output until fixed time passes (higher latency)
- Grace period - Output results, keep window open for updates
Windowing Functions
Tumbling Windows: Fixed size, non-overlapping windows
Sliding Windows: Fixed size with incremental step
Session Windows: Dynamically sized, terminated by timeout
Reprocessing
Key Capability: Rewind consumer offsets and replay from arbitrary point in time.
Requirement: Event scheduling ensures same processing order during reprocessing as real-time.
Chapter 7: Stateful Streaming
State Management
Materialized State: Projection of events from source event stream (immutable)
State Store: Where service’s business state is stored (mutable)
Changelog Streams
Purpose: Record of all changes made to state store data
Benefits:
- Rebuild state from changelog
- Checkpoint event processing progress
- Permanent copy maintained outside microservice instance
Important: Changelog streams should be compacted (only need most recent key/value pairs).
State Store Types
Internal State Store: Coexists in same container/VM as microservice business logic
Global State Store: Materializes all partitions for complete data copy on each instance
- Useful for small, commonly used, seldom-changing data
Scaling and Recovery
Process: New/recovered instance must materialize state before processing new events
Method: Reload changelog topic for each stateful store (quickest approach)
Hot Replicas: Multiple replicas for faster recovery
State Rebuilding vs Migration
Rebuilding Process:
- Stop microservice
- Reset consumer offsets to beginning
- Delete intermediate state
- Start new version - rebuild state from input streams
Transactions and Effectively-Once Processing
Goal: Updates to single source of truth consistently applied regardless of failures
Key Features:
- Idempotent writes - Event written once and only once
- Atomic transactions - Multiple events to multiple streams atomically
Deduplication Challenges:
- Expensive without idempotent producers
- Requires state store of processed dedupe IDs
- Generally performed for specific time/offset windows
Chapter 8: Building Workflows with Microservices
Workflow Patterns
Choreography Pattern
Characteristics:
- Highly decoupled microservice architectures
- React to input events without blocking or waiting
- Independent from upstream producers and downstream consumers
- Emergent behavior from microservice relationships
Benefits:
- Ideal for independent business workflows
- Highly decoupled communication
Challenges:
- Can be brittle across multiple microservice instances
- Difficult to monitor distributed workflows
- Business logic changes may require modifying numerous services
Orchestration Pattern
Characteristics:
- Central orchestrator microservice issues commands to worker microservices
- Contains entire workflow logic for business process
- Awaits responses and handles results according to workflow logic
Benefits:
- Flexible workflow definition within single microservice
- Better visibility and monitoring
- Centralized coordination
Best Practice: Orchestrator’s bounded context limited to workflow logic, workers contain business fulfillment logic.
Distributed Transactions
Definition: Transaction spanning two or more microservices
Implementation: Often known as sagas in event-driven world
Best Practice: Avoid when possible due to significant risk and complexity
Requirements:
- Synchronizing work between systems
- Facilitating rollbacks
- Managing transient failures
- Network connectivity management
Compensation Workflows
Purpose: Handle workflows that don’t need perfect reversibility
Use Case: Customer-facing products where compensatory actions can remedy failures
Benefit: Alternative to complex distributed transactions
Chapter 9: Microservices Using Function-as-a-Service
FaaS Characteristics
Function Behavior:
- Starts up, runs until completion, terminates
- No persistent connections or state
- Scales up/down based on load automatically
Think of FaaS: Basic consumer/producer that regularly fails and must restart
Design Principles
Bounded Context: Functions and internal event streams must strictly belong to bounded context
Consumer Groups: Each function-based microservice must have independent consumer group
Offset Commits: Best practice is commit only after processing completed
Cold Start vs Warm Start
Cold Start: Default state upon starting first time or after inactivity
Warm Start: Function revived from hibernation cache
Suitable Use Cases
Ideal for:
- Simple topologies
- Stateless processing
- Non-deterministic processing of multiple event streams
- Wide scaling (queue-based processing)
Security: Use strict access permissions - nothing outside bounded context allowed access
Communication Patterns
Event-Driven: Output of one function produced to event stream for consuming function
Request-Response: Direct calls between functions
Hybrid: Combination of both patterns
Important: Complete processing of one event before processing next to avoid out-of-order issues
Function Optimization
Tuning Considerations:
- Allocate sufficient resources based on workload
- Optimize resource usage for performance and cost
- Consider termination timeouts (typically 5-10 minutes)
Chapter 10: Basic Producer and Consumer Microservices
BPC Characteristics
Basic Producer and Consumer (BPC) microservices:
- Ingest events from input streams
- Apply transformations or business logic
- Emit events to output streams
- Use basic consumer and producer clients (no advanced features)
What BPCs Don’t Include:
- Event scheduling
- Watermarks
- Materialization mechanisms
- Changelogs
- Horizontal scaling with local state stores
Suitable Use Cases
Simple Patterns:
- Stateless transformations
- Stateful patterns where deterministic event scheduling not required
Integration Scenarios:
- Legacy system integration
- Sidecar pattern for systems that can’t be modified safely
Gating Pattern: Business processes not reliant on event order but requiring all events eventually arrive
Data Layer Heavy: When underlying data layer performs most business logic (geospatial, search, ML/AI)
Hybrid Applications
External Stream Processing: BPC can leverage external stream-processing systems for complex operations while maintaining access to language features and libraries
Example: External framework for complex aggregations, BPC for populating local data store and serving request-response queries
Limitations
BPCs require investment in libraries for:
- Simple state materialization
- Event scheduling
- Timestamp-based decision making
Chapter 13: Integrating Event-Driven and Request-Response Microservices
Integration Necessity
Event-driven patterns cannot serve all business needs - request-response endpoints provide real-time data serving capabilities.
Use Cases for Request-Response
- Human-driven interactions
- Machine-driven external system communication
- Real-time data queries
- Synchronous API requirements
Integration Patterns
External System Integration:
- Convert API requests/responses to events
- Enable asynchronous processing by event-driven microservices
Human Interface Integration:
- Convert user interactions to events
- Asynchronous processing with UI feedback indicating async handling
Chapter 14: Supportive Tooling
Ownership and Governance
Explicit Ownership Tracking:
- Single writer principle attributes stream ownership to producing microservice
- Event stream metadata tagging for ownership assignment
- Only owning teams can modify metadata tags
Event Stream Management
Creation and Modification Rights:
- Teams can automatically create internal event streams
- Full control over partition count, retention policy, replication factor
Schema Registry
Critical Service for schema management providing:
Benefits:
- Event schema not transported with event (uses placeholder ID)
- Significantly reduced bandwidth usage
- Single point of reference for obtaining schemas
- Data discovery capabilities with free-text search
Schema Registry Features:
- Precise data definitions (names, types, defaults, documentation)
- Clarity for producers and consumers
- Version management and evolution tracking
Chapter 15: Testing Event-Driven Microservices
Testing Levels
Unit Testing: Test smallest pieces of code to ensure expected functionality - foundation for larger tests
Topology Testing: More complex than unit tests - exercises entire topology as specified by business logic
Think of topology: Single, large, complex function with many moving parts
Schema Compatibility Testing
Automated Checks: Pull schemas from schema registry and perform evolutionary rule checking as part of code submission process
Ensures: Output schemas compatible with previous schemas according to stream evolution rules
Integration Testing
Two Main Flavors:
Local Integration Testing:
- Testing performed on localized replica of production environment
- Faster feedback loops
- Isolated from external dependencies
Remote Integration Testing:
- Microservice executed on environment external to local system
- More realistic conditions
- Tests actual integration points
Testing Strategy
Comprehensive Approach:
- Unit tests for individual components
- Topology tests for business logic flows
- Schema compatibility for evolution safety
- Integration tests for end-to-end validation