Mastering API Architecture

My personal notes on the book Mastering API Architecture.

Architecture Fundamentals

Architecture is a journey without a destination, and you cannot predict how technologies and architectural approaches will change.

API-First design is an approach where developers and architects consider the functionality of their service and design an API in a consumer-centric manner.

An API represents an abstraction of the underlying implementation and has defined semantics or behavior to effectively model the exchange of information.

Documentation and Design

C4 Architecture Diagrams

C4 Context Diagram - The intention of this diagram is to set context for both a technical and nontechnical audience.

A container diagram helps describe the technical breakout of the major participants in the architecture. A container in C4 is defined as “something that needs to be running in order for the overall system to work”. Container diagrams are technical in nature and build on the higher-level system context diagram.

The C4 component diagram helps to define the roles and responsibilities within each container, along with the internal interactions. This diagram provides a very useful map to the codebase.

Architecture Decision Records (ADRs)

Architecture Decision Records (ADRs) help make decisions clear in software architecture. In software architecture there will be many constraints that we have to build around, so it is important to ensure our decisions are recorded and transparent.

There are four key sections in an ADR: status, context, decision, and consequences. An ADR is created in a proposed status and based on discussion will usually be either accepted or rejected.

The context helps to set the scene and describe the problem or the bounds in which the decision will be made
The decision clearly sets out what you plan to do and how you plan to do it
All decisions carry consequences or trade-offs in architecture, and these can sometimes be incredibly costly to get wrong

REST API Design

REST (REpresentation State Transfer) is a set of architectural constraints, most commonly applied using HTTP as the underlying transport protocol.

REST is relatively straightforward to implement because the client and server relationship is stateless, meaning no client state is persisted by the server. The client must pass the context back to the server in subsequent requests.

Richardson Maturity Model

The Richardson Maturity Model presents levels of adoption that teams apply to building APIs from a REST perspective, ranging from Level 0 (HTTP/RPC) to full REST implementation.

Error Handling and Versioning

Error handling is an important consideration when extending APIs to consumers. It is important that errors describe to the consumer exactly what has gone wrong with the request, as this will avoid increasing the support required for the API. An accurate status code must be provided to the consumer.

API Versioning - What happens when there is a change to the API or one of the consumers requests the addition of new features to the API? Semantic versioning offers an approach that can be applied to REST APIs to give a combination of upgrade options.

Versioning should be an active decision in the product feature set and a mechanism to convey versioning to consumers should be part of the discussion. Not planning for versioning in APIs exposed to consumers is dangerous.

OpenAPI Specifications are a useful way of sharing API structure and automating many coding-related activities.

Testing Strategy

Test Quadrant Framework

The test quadrant was first introduced by Brian Marick and brings together tests that help technology and business stakeholders alike. The four quadrants can be generally described as follows:

Unit and component tests for technology - verify that the service works using automated testing
Tests with the business - ensure what is being built is serving a purpose
Testing for the business - ensuring that functional requirements are met, includes exploratory testing
Technical validation - ensuring that what exists works from a technical standpoint (security enforcement, SLA integrity, autoscaling)

Test Pyramid

The test pyramid illustrates how much time should be spent on a given test area, its corresponding difficulty to maintain, and the value it provides. The test pyramid has unit tests as its foundation, service tests in the middle block, and UI tests at the peak.

Contract Testing

Contract testing has two entities: a consumer and a producer. A contract is a definition of an interaction between the consumer and producer. APIs should have a specification, and it is important that your API responses conform to the API specification.

A key benefit of using contracts is that once the producer agrees to implement a contract, this decouples the dependency of building the consumer and producer.

Consumer-driven contracts (CDCs) are implemented by a consumer driving the functionality that they wish to see in an interaction. Consumers submit contracts to the producer for new or additional API functionality.

Infrastructure Patterns

API Gateway

An API gateway is a critical part of any modern technology stack, sitting at the network “edge” of systems and acting as a management tool that mediates between a consumer and a collection of backend services.

Always use the simplest solution for your requirements, with an eye to the immediate future and known requirements. If you have advanced cross-functional requirements, an API gateway is typically the best choice.

Service Mesh

Service mesh is a pattern for managing all service-to-service communication within a distributed software system. At a fundamental level, service mesh implementations provide functionality for routing, observing, and securing traffic for service-to-service communication.

The service mesh pattern focuses on providing traffic management (routing), resilience, observability, and security for service-to-service communication.

A service mesh can be implemented using language-specific libraries, sidecar proxies, proxyless communication frameworks (gRPC), or kernel-based technologies like eBPF.

Deployment and Release Strategies

Separating Deployment and Release

It’s important to understand the difference between deployment and release:

Deployment involves taking a feature all the way into production
Release involves activating the new feature in a controlled manner, allowing you to control the risk

Release Strategies

Canary releases introduce a new version of the software and flows a small percentage of the traffic to the canary. Reduction in risk is achieved by performing a test or experiment with a small fraction of traffic and verifying the result.

Traffic mirroring can copy or duplicate traffic and send this to additional locations. Frequently with traffic mirroring, the results of the duplicated requests are not returned to the calling service or end user.

Blue-green deployment uses a router, gateway, or load balancer, behind which sits a complete blue environment and a green environment. The current blue environment represents the current live environment, and the green environment represents the next version.

Security

Threat Modeling

Threat modeling is a technique you can use to help identify threats, attacks, vulnerabilities, and countermeasures that could affect your application. This approach is beneficial as it is only possible to mitigate security risks once the threats have been clearly identified.

The high-level approach to threat modeling includes:

Identify your objectives
Gather the right information
Decompose the system
Identify threats
Evaluate the risk of the threats
Validate the results

STRIDE Methodology

The STRIDE acronym stands for:

Spoofing - Breaching the user’s authentication information
Tampering - Modifying system or user data with or without detection
Repudiation - An untrusted user performing an illegal operation without the ability to be traced
Information disclosure - Compromising private or business-critical information
Denial of Service - Making the system temporarily unavailable or unusable
Elevation of privilege - An unprivileged user gains privileged access

DREAD Risk Assessment

DREAD is a qualitative risk calculation system:

Damage - How bad would an attack be?
Reproducibility - Can an attack be easily reproduced?
Exploitability - How easy is it to mount a successful attack?
Affected Users - How many users are impacted?
Discoverability - What is the likelihood of this threat being discovered?

Authentication and Authorization

Authentication Fundamentals

Authentication is the act of verifying an identity. Multi-Factor Authentication (MFA) is becoming more common to give higher levels of assurance that the user is who they say they are.

For system-to-system authentication, credentials can be in the form of API keys or certificates. API keys should be secure, generated using a cryptographically secure random number generator and of an unguessable length.

OAuth2

OAuth2 is a token-based authorization framework that allows a user to consent that a third-party application can access their data on their behalf.

The OAuth2 roles include:

Resource Owner - An entity capable of granting access to a protected resource
Authorization Server - The server issuing access tokens to the client
Client - An application making protected resource requests on behalf of the resource owner
Resource Server - The server hosting the protected resources

JSON Web Tokens (JWT)

JSON Web Tokens (JWTs) are an RFC standardized token format that is the de facto standard token for OAuth2. JWTs are structured and encoded using standards to ensure the token is unmodifiable and additionally can be encrypted.

Reserved claims have special meaning:

iss (Issuer) - The authority that issued the token
sub (Subject) - A unique identifier to identify the principal
aud (Audience) - Who this token is intended for
exp (Expiration time) - When the token expires
iat (Issued at) - The time the token was issued

OAuth2 Grants

Authorization Code Grant + PKCE allows you to use OAuth2 for SPA applications. PKCE stands for Proof Key for Code Exchange and is used to mitigate interception attacks.

Refresh tokens are long-lived tokens used by the client to request additional access tokens when the previous token expires. It is good practice to issue tokens that are short-lived.

OAuth2 Scopes are used to limit the access of a client acting on behalf of a user. Scopes are typically used as a coarse-grained separation within an API and must make sense to the end user.

OpenID Connect (OIDC)

OpenID Connect (OIDC) provides an identity layer that builds on top of OAuth2. Using the openid scope provides the client with an ID token, which is a JWT that contains claims about the user.

Never substitute ID tokens for access tokens - this is very dangerous practice as ID tokens are not intended for this purpose.

Evolutionary Architecture

API-Driven Architecture Benefits

APIs are the natural interfaces, abstractions, and (encapsulated) entry points to and within a system, and as such can be instrumental in supporting an evolutionary architecture.

As an architect, APIs can help you evolve a system. An API can be a boundary to a module or component, and this makes an API a natural point of leverage when trying to ensure a system is highly cohesive and loosely coupled.

Cohesion and Coupling

Cohesion refers to the degree to which the elements inside a system belong together. Implementing APIs and systems with high cohesion enables the easier evolution of both the API provider and consumer.

A loosely coupled system has two properties:

Components are weakly associated with each other
Each component has little or no knowledge of the definitions of other separate components

Zero Trust Architecture

The zero trust security model describes an approach where the main concept is “never trust, always verify”. Devices should not be trusted by default, even if they are connected to a permissioned network.

The zero trust approach advocates mutual authentication, including checking the identity and integrity of devices without respect to location.

Cloud Migration

There are a number of approaches to evolving or migrating an API-based system toward the cloud, ranging from retain (“do nothing”), to rehost, replatform, repurchase, refactor/re-architect, and retire.

An API gateway can be used as a tool for migration, as it can encapsulate functionality and act as a facade for multiple backend systems operating from different environments and networks.

API management can play a key role in migration and also in unlocking the value of APIs across and even outside of an organization.