Testing Strategies in a Microservice Architecture

The enhanced testability feature of microservice architecture provides multiple forms of automated testing that can be used together to achieve maximum robustness. I have compiled a comprehensive list of the types of automated tests suitable for these systems.

Hands type on a keyboard, with a Rubik's Cube and colorful balls on the desk. Computer screens show code, creating a focused workspace mood.

Clarifying Microservices Architecture Terminology

The field of software engineering often presents a confusing landscape when it comes to the terminology surrounding microservices architecture. To clarify the discussion in this blog post, let's define one important term: an autonomous microservice.

An autonomous microservice operates without runtime dependencies on other components that frequently change their business logic. In simpler terms, if your components communicate through RESTful APIs that handle business logic, then you do not have autonomous microservices. These components rely on one another during runtime.

To achieve true autonomy, components must remain temporally decoupled. Today, we can accomplish this by utilizing messaging through a message broker.

When we develop autonomous microservices, we gain the advantage of higher testability, along with other qualities and trade-offs associated with the microservices architecture style.

In this blog post, I refer to a microservice as an autonomous microservice that receives its data via messages, maintains its own state/database, and utilizes this state while executing business logic, instead of making calls to other components to retrieve data during the execution of business logic.

With this terminology clarified, let's explore the types of automated tests we create in such a distributed system.

The Test Pyramid for Microservices

Distributed systems utilizing autonomous microservices incorporate various levels of automated testing. This approach is an adaptation of the well-known test pyramid, featuring a few additional layers due to the enhanced testability of the system. Below is the structure of the test pyramid for such a distributed system with autonomous microservices.

This pyramid includes an extra level, namely the Microservice Acceptance Tests, on top of the testing types listed in this resource. The reason I added this level is because the out of process component tests are developer written, in technical language, while acceptance tests focus on business cases written in business terms. Acceptance tests would not mock out any dependencies, but simulate the scenario with customized input. One could argue that out of process components tests and acceptance tests of a single microservice should not be separate levels in this diagram and one or the other can be chosen. This makes sense, although it might also be that both types of tests are used to cover technical expectations from developer's understanding vs business expectations from business's understanding. Therefore, I left both in place in this diagram.

Test pyramid for microservice architecture — Adjusted Test Pyramid for Microservice Architecture

Unit Tests

Written by software developers in technical language, these tests focus on one unit at a time, such as a method, property, or class. Writing these tests allows us to verify at the most detailed level that our code functions as expected. They encourage us to consider potential scenarios a unit might encounter more thoroughly. These tests help identify the root cause of bugs at a critical level by eliminating what is not faulty.

In-process Component Tests

Out-of process Component Tests

Contract Testing

Acceptance Tests (for a single autonomous microservice)

Integration Tests

End to End Tests

We aim to write tests at the lowest level possible within this pyramid. The reason for this becomes evident when we consider the purpose of automated testing:

Faster Root Cause Detection: Reducing the time spent identifying the root cause of bugs. Research from the past indicates that software maintenance is the most expensive aspect of software ownership. Upon examining the distribution of these costs, it was discovered that the majority of time is spent locating the root cause of bugs. Once the root cause is determined, resolving the issue requires relatively little time. It's the process of finding the root cause that consumes most of our valuable time. To enhance this time-intensive task, we adjust our automated testing to concentrate on small segments of the system, allowing us to eliminate non-faulty parts during troubleshooting and identify the root cause more quickly. We aim to ensure that system units function correctly by thoroughly testing each unit with a wide range of cases. When a malfunction is detected and the root cause lies within our code, we will increase testing on that specific part of the system to first reproduce the issue, then confirm the fix, and finally conduct regression testing to ensure the problem does not recur.

Confidence in System Robustness in Regression: This refers to guaranteeing that the system continues to function correctly following modifications. While manual testing is beneficial for exploratory testing, relying solely on it for regression testing of expanding and often complex systems is not feasible after every change.

Why developer tests alone aren't enough

If developers have already written automated tests that cover most of the logic, here are some reasons why we still need the higher level tests.

Clarify functional and non-functional requirements.
When test scenarios are identified for crucial functionality, often with the help of quality assurance engineers, this clarifies focus for the development team. It also helps bridge the gap between business and development, bringing consensus on the right functionality to deliver. The process of writing these test cases down facilitates these discoveries.
It is not always possible to opt for more detailed tests for verification of certain behavior.
Some frameworks or libraries we use, might not offer unit testable interfaces. Take, for example, an ORM framework where the configuration is done in XML and not abstracted constructions, don't allow us to unit test our own code while using this framework.
Finding out broken integrations including the connections to the underlying infrastructure quickly.
Instead of relying on manual checks to verify integration to infstructure is not broken, being notified of this by automated tests saves time. Without these tests, it would either be found by often coincidental manual tests or on production. Both of these are more frustrating to deal with. Thinking that a shipped feature was done and moving on to another task, while the changes broke integration with a certain infrastructure means to adjust planning expectations.

In a distributed system featuring autonomous microservices, the most effective way to achieve these benefits is at the detailed testing level where test scenarios are articulated in business language, specifically through acceptance tests for an individual microservice. Consequently, it is preferable to conduct acceptance testing of a microservice instead of integration or end-to-end testing.

Why Acceptance Tests for Microservices Lag Behind

Developers typically do not write acceptance tests on their own. The crucial role of a quality assurance engineer is essential in creating these tests. Acceptance tests should be expressed in business language, distinguishing them from purely technical tests like unit or component tests. A quality assurance engineer assists the product owner and developers in crafting functional and non-functional requirements as test scenarios, emphasizing the business perspective. Therefore, hiring the right quality assurance engineer is integral to achieving automated acceptance testing.

At Eneco, it took us several months to establish automated acceptance testing for critical behavior for one of the microservices.

Initially, we conducted numerous interviews to hire the right QA engineer.
Once hired, he took the time to learn the domain and asked insightful questions to understand the core functionality that needed to be developed.
He taught us how to define acceptance test scenarios that accurately reflect business expectations using Gherkin language.
We dedicated a significant portion of a sprint to set up the automated testing project, enabling the initial test to run.
We added a step to the continuous deployment pipeline to execute these tests against the deployed test instance after each deployment.
In each subsequent sprint, we continued to define new test scenarios and add more test code as the functionality evolved.

Looking at these steps, it becomes clear to see why achieving automated acceptance testing for a microservice in critical scenarios requires significant effort and a dedicated team.

Due to the extensive effort required, acceptance testing often falls behind. Startups and scale-ups with event-driven backend systems typically lack the budget for such a prolonged process.

Conclusion

The microservices architecture style enhances testability by allowing automated tests to be performed on smaller components independently from the rest of the system. In this context, we can develop automated tests such as in-process/out-of-process component testing and acceptance testing. Although component tests are conducted more often, acceptance testing may fall behind due to the time-consuming nature of creating these tests. However, acceptance testing is crucial for ensuring alignment between business and IT and for identifying issues before deployment to production.

What has been your experience with implementing automated tests to ensure system robustness? Do you recognize the points I make in this post? I would appreciate hearing your thoughts in the comments. Happy coding!

Testing Strategies in a Microservice Architecture

Clarifying Microservices Architecture Terminology

The Test Pyramid for Microservices

Why developer tests alone aren't enough

Clarify functional and non-functional requirements.

It is not always possible to opt for more detailed tests for verification of certain behavior.

Finding out broken integrations including the connections to the underlying infrastructure quickly.

Why Acceptance Tests for Microservices Lag Behind

Conclusion

Recent Posts

Comments