Unit Testing Principles, Practices, and Patterns | guh.me

These are my notes on the book Unit Testing Principles, Practices, and Patterns, by Vladimir Khorikov. This is a great book on testing that I read during our book club at Instapro.

1. The Goal of Unit Testing

The ability to unit test a piece of code is a good negative indicator - it points out poor-quality code with high accuracy. It does not say anything about the quality of the tests though.
The goal of unit testing is to enable sustainable growth of the software project.
Tests act as a safety net to prevent regressions.
Focus on high-quality tests.
Code is a liability, not an asset.
Code coverage metrics are not a guarantee of a good test suite.
A successful test suite:
- is integrated into the development life cycle.
- targets only the most important parts of your code base.
- provides maximum value with minimum maintenance costs.

2. What is a Unit Test?

A unit test verifies a small piece of code and does it quickly.
A controversial point is whether unit tests ran in an isolate manner or not.
- The London school of testing says one should replace dependencies with test doubles.
- The Classical school of testing interprets the isolation as that the unit tests are run in isolation from each other - they should never reach out to shared resources, such as the database.
A mock is a test double that lets you analyze the interaction between the system under test and its collaborators.
A collaborator is a dependency that is shared or mutable (e.g. database, API, etc).
Tests should not verify units of code, but units of behavior.
Types of tests:
- Unit test:
  - verifies a single unit of behavior
  - does it quickly
  - does it in isolation from
- Integration test:
  - any test that does not meet one of the criteria above to be a unit test.
  - it verifies two or more units of behavior.
- Functional/End-to-End test:
  - a test that verifies the system from the users point-of-view.
  - it is a subset of integration tests.

3. The Anatomy of a Unit Test

How to structure a unit test?
- AAA → Arrange, Act, Assert
- Given → When → Then
- Try to keep the sections in order.
- Avoid if statements in tests. It is an anti-pattern.
- The Act section should usually be the smallest one. Multiple Act method calls may indicate issues with encapsulation, which cause invariant violations.
Reusing fixtures between tests
- Reusing code between arrange sections is a good way to shorten or simplify your tests.
- Private factory methods in the test class avoid introducing coupling when reusing factories.
Naming
- Don’t follow a rigid naming policy.
- Name the test as if you were describing the scenario to a non-programmer.
- Separate words with underscores for better legibility.

SUT = System Under Test

4. The Four Pillars of a Good Unit Test

Protection against regressions
- Protects against features breaking.
- It lets us grow and change the codebase with more confidence.
- How to evaluate a test:
  - How much code is executed during the test.
  - The complexity of the tested code.
  - The code domain significance.
Resistance to refactoring
- The degree to which a test can sustain a refactoring of the underlying application code without failing.
- Refactoring: changing code without modifying its observable behaviour.
- How to evaluate a test: how many false positives (false alarms) does it generate?
- The best way to structure a test is to make it tell a history about the problem domain. This helps us focus on the system behaviour, not on its implementation details.
Protection against regressions and resistance to refactoring aim at maximising the accuracy of the test suite.
The accuracy metric itself consists of two components
- How good the test is at indicating the presence of bugs (lack of false negatives, the sphere of protection against regressions)
- How good the test is at indicating the absence of bugs (lack of false positives, the sphere of resistance to refactoring)
As a codebase grows, the need for refactoring increases, and the importance of resistance to refactoring in tests increases with it.
Fast feedback
- The faster the tests, the more of them you can have in the suite and the more often you can run them.
Maintainability
- How hard is it to understand the test?
- How hard is it to run the test?
By evaluating how a test scores in each pillar, you can have an estimate of the test value, and decide whether or not the test is worth keeping. A small amount of highly valuable tests is much more valuable than a large amount of mediocre tests.
A test must always be resistant to refactoring, which leaves us to choose between how good the tests are at pointing out bugs and how fast they do that: that is, between protection against regressions and fast feedback.
The Test Pyramid is a concept that advocates for a certain ratio of different types of tests in the test suite: Unit tests ~> Integration tests ~> End-to-end tests
- Tests in higher pyramid layers favour protection against regressions, while lower layers emphasise execution speed.
Black-box testing is a method of software testing that examines the functionality of a system without knowing its internal structure. Such testing is normally built around specifications and requirements: what the application is supposed to do, rather than how it does it.
White-box testing is the opposite of that. It’s a method of testing that verifies the application’s inner workings. The tests are derived from the source code, not requirements or specifications. They are often more brittle than black-box tests.
Use the black-box testing method when writing tests. Use the white-box method when analysing the tests - use code coverage tools to see which code branches are not exercised, but then turn around and test them as if you know nothing about the code’s internal structure.

5. Mocks and test fragility

Mocks often result in fragile tests - tests that lack the metric of resistance to refactoring. A test double is an overarching term that describes all kinds of non-production-ready, fake dependencies in tests.
- Mocks (mock, spy) help to emulate and examine outcoming interactions. These interactions are calls the SUT makes to its dependencies to change their state. - Stubs (stub, dummy, fake) help to emulate incoming interactions. These interactions are calls the SUT makes to its dependencies to get input data.
- Don’t assert interactions with stubs: a call to a stub is not part of the end result the SUT produces.
- This practice of verifying things that aren’t part of the end result is also called overspecification.
Observable behaviour vs. implementation details
- For a piece of code to be part of the system’s observable behaviour, it has to do one of the following things:
  - Expose an operation that helps the client achieve one of its goals. An operation is a method that performs a calculation or incurs a side effect or both.
  - Expose a state that helps the client achieve one of its goals. State is the current condition of the system.
- Any code that does neither of these two things is an implementation detail. - In a well-designed API, the observable behaviour coincides with the public API, while all implementation details are hidden behind the private API.
- If the number of operations the client has to invoke on the class to achieve a single goal is greater than one, then that class is likely leaking implementation details. Ideally, any individual goal should be achieved with a single operation.
- If an out-of-process dependency (e.g. database) is only accessible through your application, then communications with such a dependency are not part of your system’s observable behaviour. When your application acts as a proxy to an external system, and no client can access it directly, the backward-compatibility requirement vanishes. Now you can deploy your application together with this external system, and it won’t affect the clients. The communication pattern with such a system becomes an implementation detail.

6. Styles of Unit Testing

There are three styles of unit testing:
- Output-based testing: you feed an input to the system under test (SUT) and check the output it produces. This is also known as functional testing, as in having no side-effects.
- State-based testing: it is about verifying the state of the system after an operation is complete. State can be about anything in the system.
- Communication-based testing: this style uses movies to verify communications between the SUT and its collaborators.
The biggest difference between the styles is how they point regarding the four attributes of a good unit test.
Functional architecture maximizes the amount of code written in a purely functional (immutable) way, while minimizing code that deals with side effects.
Immutability tackles this issue of preserving invariants from another angle. With immutable classes, you don’t need to worry about state corruption because it’s impossible to corrupt something that cannot be changed in the first place.
The difference between functional and hexagonal architectures is in their treatment of side effects. Functional architecture pushes all side effects out of the domain layer. Conversely, hexagonal architecture is fine with side effects made by the domain layer, as long as they are limited to that domain layer only. Functional architecture is hexagonal architecture taken to an extreme.

7. Refactoring toward valuable unit tests

It’s rarely possible to significantly improve a test suite without refactoring the underlying code.
All production code can be categorized along two dimensions:
- Code complexity is defined by the number of decision-making (branching) points in the code and domain significance shows how significant the code is for the problem domain of your project.
- The number of collaborators, i.e. the dependencies that are either mutable or out-of-process, that a class or method has.
- Unit testing the top-left quadrant (domain model and algorithms) gives you the best return for your efforts.
- Trivial code shouldn’t be tested at all; such tests have a close-to-zero value. - As for controllers, you should test them briefly as part of a much smaller set of the overarching integration tests.
- The most problematic type of code is the overcomplicated quadrant. It’s hard to unit test but too risky to leave without test coverage.
- The more important or complex the code, the fewer collaborators it should have.*
- Functional architecture goes even further and separates business logic from communications with all collaborators, not just out-of-process ones. This is what makes functional architecture so testable: its functional core has no collaborators. We separate business logic and orchestration.
Refactoring toward valuable unit tests
- Apply the Single Responsibility Principle extensively.
- Make implicit dependencies explicit.
- Introduce an application services layer.
- Test all pre-conditions that have domain significance.

8. Why Integration Testing?

A test that does not meet at least one of the requirements for a unit test falls into the category of integration tests. In practice, these tests verify how your system works in integration with out-of-process dependencies.
They are less maintainable, but since they involve a greater number of collaborators, they provide more protection against regressions.
For an integration test, select the longest happy path in order to verify interactions with all out-of-process dependencies. If there’s no one path that goes through all such interactions, write additional integration tests—as many as needed to capture communications with every external system.
Out-of-process dependencies: use real instances of managed dependencies; replace unmanaged dependencies with mocks.
Abstractions are discovered, not implemented - don’t introduce interfaces for out-of-process dependencies unless you need to mock out those dependencies. You only mock out unmanaged dependencies, so the guideline can be boiled down to this: use interfaces for unmanaged dependencies only.
Best practices:
- Making domain model boundaries explicit: the explicit separation of the domain classes and controllers makes it easier to tell the difference between unit and integration tests.
- Reducing the number of layers: layers of indirection negatively affect your ability to reason about the code.
- Eliminating circular dependencies: this also eliminates a significant amount of cognitive load.
Each test should focus on a single unit of behavior - having multiple act blocks is a code smell.
- Hard-to-manage out-of-process dependencies are the only legitimate reason to write a test with more than one act section.
Logging should only be tested if it is part of the application’s observable behavior.

9. Mocking Best Practices

Mocks should be limited to unmanaged dependencies.
When mocking, verify interactions with unmanaged dependencies only at the very edges of your system.
Interfaces only make sense when mocking, therefore they can be deleted if this need does not exist anymore.
Spies are usually superior to mocks when mocking classes residing at the edges, since they also let you reuse code in the assertion phase.
Mocks are for integration tests only, and you shouldn’t use mocks in unit tests, because of the separation of business logic and orchestration. Your code should either communicate with out-of-process dependencies or be complex, but never both.
Verify the number of calls: when it comes to communications with unmanaged dependencies, it’s important to ensure both of the following:
- The existence of expected calls
- The absence of unexpected calls
Only mock types that you own: always write your adapter on top of third-party libraries and mock them instead of the underlying types.

10. Testing the Database

Running tests against a real database provides bulletproof protection against regressions, but those tests aren’t easy to set up.
Prerequisites for testing the database:
- Keeping the database (schema) in the source control system.
- Using a separate database instance per developer.
- Applying the migration-based approach to database delivery (migrations transition the database from one version to another).
Reference data (data that must be populated in order for the application to operate properly) is also part of the database schema.
Don’t reuse database transactions or units of work between sections of the test - they should be isolated from each other.
Always clean up the data between test runs.
Executing integration tests sequentially is easier than in parallel. Run them in parallel only if the effort is worth it.
I don’t agree with the author: wrapping integration tests into an outer transaction works really well.
Extracting technical, non-business-related bits into private methods or helper classes makes integration tests more succinct.
Using fluent interfaces for data assertions can help improve readability.
Testing repositories may not be valuable, since they offer inferior protection against regressions and have high maintenance costs.

11. Unit testing anti-patterns

Private methods should not be tested - they are usually implementation details and not part of the observable behavior. When testing the observable behavior does not provide sufficient coverage for private methods, it is likely that it is either dead of, or there is a missing abstraction.
Do not expose private state for the sole purpose of unit testing - test observable behavior only.
Avoid leaking domain knowledge to tests - don’t imply any specific implementation when writing tests. Instead of duplicating the algorithm, hard-code its results into the test. - Avoid code pollution, i.e. adding production code that’s only needed for testing. - Don’t mock concrete classes.
When testing time, inject the time dependency explicitly (e.g. a Clock).