Software Testing: Theory and Practice (Part 7) - Fundamentals and Strategies for Integration & E2E Testing
Key Takeaways Integration tests can uncover integration bugs caused by specification mismatches between components that unit tests cannot detect. Their downsides are long execution times, flaky results, and high maintenance costs. To keep maintenance costs low, minimize the number of integration-test cases and compose them from high-value test scenarios. Combine them with “autopilot”–style tests that require no explicit scenarios. Characteristics of Integration & E2E Testing Integration testing targets a group of components combined together. Unlike unit tests—where all dependencies are replaced with test doubles—integration tests run the real components. Because some of those components may have long processing times, test runs tend to be slow. If the components communicate over a network, disconnections or latency can also make results unstable. E2E (end-to-end) testing exercises the entire system with all components integrated. Since more components are involved, execution takes even longer, and E2E tests inherit the same flakiness issues. Despite these drawbacks, integration and E2E tests remain indispensable because they can detect integration bugs earlier than system-level manual testing. An integration bug is a defect arising from a mismatch between the specifications of two or more components. Example: Component A outputs a string, while Component B expects an integer. Each component individually satisfies its own specification, so their unit tests pass. But when they are wired together in a dynamically-typed language, a run-time type error can occur. Other examples include misunderstandings of a dependency’s API or unintended states/transitions when concurrently running components interact. Integration—and especially E2E—targets almost invariably hold state. As you combine more components, the state space explodes, so any practical test can explore only a tiny fraction of all possible states. Here are two keys to successful integration/E2E testing: Cover the tiny fraction you can test with high-value scenarios. Use scenario-free tests to coarsely cover the vast remainder of the state space. We start with choosing high-value scenarios. Column: Formal Specification Descriptions Because integration bugs stem from spec mismatches, they could be caught during specification writing—if specifications are written in a machine-checkable form. Such techniques are known as formal specification descriptions. A concise explanation is beyond this article’s scope. Choosing High-Value Scenarios In this article, a test scenario consists of a sequence of events fed into the integrated system, and an oracle that judges the final state. An event is anything that triggers a change of internal state—method calls, network messages, UI actions, and so on. Events also represent interaction: one event can simultaneously change several components (e.g., a request alters both client and server state). In short, a scenario executes events in a prescribed order/timing and then inspects the resulting state. If an exception or crash occurs during the sequence, the test fails. Below is a sample E2E scenario for a login screen: describe("Login screen", () => { context("When a valid user name and password are entered and the Login button is pressed", () => { it("navigates to the Welcome screen", async () => { // Event: open the login page await driver.get("https://example.com/login"); // Event: type “kuniwak” into the user name field await driver.findElement(By.id("input.username")).sendKeys("kuniwak"); // Event: type “p4$SW0rD” into the password field await driver.findElement(By.id("input.password")).sendKeys("p4$SW0rD"); // Event: click the Login button await driver.findElement(By.id("button.login")).click(); // Assert navigation to a page whose is “Welcome” await driver.wait(until.titleIs("Welcome"), 1000); }); }); }); The value of a scenario is how much its success reduces failure risk. Higher-risk-reduction ⇒ higher value. High-value scenarios fall into two patterns: High-frequency: executed often in production (typical happy paths). High severity on failure: even if rare, a defect would be catastrophic. High-frequency examples The login path of a service that every user must pass through is high-value because of sheer usage volume. Analyzing user behavior—e.g., with Google Analytics—helps identify such paths. Because user behavior evolves, repeat this analysis periodically. High-severity examples Payment failure flows rarely occur, but if they do—charging without delivering goods or vice versa—the impact is huge. Thus payment scenarios are likewise high-value. To identify high-severity cases, interview people who best understand how the system creates value—domain experts—and enumerate what could go wrong. For an e-commerce site, two expert-provided scenarios might be:
Key Takeaways
- Integration tests can uncover integration bugs caused by specification mismatches between components that unit tests cannot detect. Their downsides are long execution times, flaky results, and high maintenance costs.
- To keep maintenance costs low, minimize the number of integration-test cases and compose them from high-value test scenarios.
- Combine them with “autopilot”–style tests that require no explicit scenarios.
Characteristics of Integration & E2E Testing
Integration testing targets a group of components combined together.
Unlike unit tests—where all dependencies are replaced with test doubles—integration tests run the real components.
Because some of those components may have long processing times, test runs tend to be slow.
If the components communicate over a network, disconnections or latency can also make results unstable.
E2E (end-to-end) testing exercises the entire system with all components integrated.
Since more components are involved, execution takes even longer, and E2E tests inherit the same flakiness issues.
Despite these drawbacks, integration and E2E tests remain indispensable because they can detect integration bugs earlier than system-level manual testing.
An integration bug is a defect arising from a mismatch between the specifications of two or more components.
Example: Component A outputs a string, while Component B expects an integer.
Each component individually satisfies its own specification, so their unit tests pass.
But when they are wired together in a dynamically-typed language, a run-time type error can occur.
Other examples include misunderstandings of a dependency’s API or unintended states/transitions when concurrently running components interact.
Integration—and especially E2E—targets almost invariably hold state.
As you combine more components, the state space explodes, so any practical test can explore only a tiny fraction of all possible states.
Here are two keys to successful integration/E2E testing:
- Cover the tiny fraction you can test with high-value scenarios.
- Use scenario-free tests to coarsely cover the vast remainder of the state space.
We start with choosing high-value scenarios.
Column: Formal Specification Descriptions
Because integration bugs stem from spec mismatches, they could be caught during specification writing—if specifications are written in a machine-checkable form.
Such techniques are known as formal specification descriptions.
A concise explanation is beyond this article’s scope.
Choosing High-Value Scenarios
In this article, a test scenario consists of
- a sequence of events fed into the integrated system, and
- an oracle that judges the final state.
An event is anything that triggers a change of internal state—method calls, network messages, UI actions, and so on.
Events also represent interaction: one event can simultaneously change several components (e.g., a request alters both client and server state).
In short, a scenario executes events in a prescribed order/timing and then inspects the resulting state.
If an exception or crash occurs during the sequence, the test fails.
Below is a sample E2E scenario for a login screen:
describe("Login screen", () => {
context("When a valid user name and password are entered and the Login button is pressed", () => {
it("navigates to the Welcome screen", async () => {
// Event: open the login page
await driver.get("https://example.com/login");
// Event: type “kuniwak” into the user name field
await driver.findElement(By.id("input.username")).sendKeys("kuniwak");
// Event: type “p4$SW0rD” into the password field
await driver.findElement(By.id("input.password")).sendKeys("p4$SW0rD");
// Event: click the Login button
await driver.findElement(By.id("button.login")).click();
// Assert navigation to a page whose is “Welcome”
await driver.wait(until.titleIs("Welcome"), 1000);
});
});
});
The value of a scenario is how much its success reduces failure risk.
Higher-risk-reduction ⇒ higher value.
High-value scenarios fall into two patterns:
- High-frequency: executed often in production (typical happy paths).
- High severity on failure: even if rare, a defect would be catastrophic.
High-frequency examples
The login path of a service that every user must pass through is high-value because of sheer usage volume.
Analyzing user behavior—e.g., with Google Analytics—helps identify such paths.
Because user behavior evolves, repeat this analysis periodically.
High-severity examples
Payment failure flows rarely occur, but if they do—charging without delivering goods or vice versa—the impact is huge.
Thus payment scenarios are likewise high-value.
To identify high-severity cases, interview people who best understand how the system creates value—domain experts—and enumerate what could go wrong.
For an e-commerce site, two expert-provided scenarios might be:
- User flow: discover the site → find an item → add to cart → fill in details → pay → receive purchase.
- Service flow: procure popular items → store them → promote → ship according to user payments.
If the site fails at any step of either flow and no recovery is possible, the store ceases to function.
Both flows therefore deserve E2E protection.
Conversely, auxiliary features around high-value flows—e.g., wish-list management or viewing order history—might be left untested if their failure is tolerable or operationally recoverable.
But if a wish-list differentiates you from competitors, or customer-service costs are huge, these become high-value too.
The assessment is case-by-case; interview as many domain experts as possible for a broad view.
If experts are unavailable, risk-analysis techniques such as STAMP/STPA can help.
Selecting such scenarios lets you cut failure risk with few tests.
But scenario-based tests are costly to maintain, so limit them to high-value paths and complement the rest with scenario-free tests.
Combining Scenario-Free Tests
Scenario-free tests include model checking and property-based testing (PBT)—especially fuzzing.
We outlined them in Part 4 (Jan 2025 issue) and detailed PBT in Part 5 (Feb 2025 issue).
- Model checking exhaustively explores a system’s state space to verify properties.
- Property-based testing auto-generates inputs while humans supply properties relating inputs and outputs. When the only property is “no exception/crash,” the practice is called fuzzing.
Their advantage: even without concrete scenarios, they can roughly test vast areas that high-value scenarios never visit.
Example: a crawler that follows every link on a site and asserts no 5xx status or client-side error.
It checks that reachable pages at least don’t error out—though it can’t verify you landed on the intended page, so it is “coarser” than scenario-based E2E tests.
A common pain point is wasting time revisiting already-seen states.
Consider a linear sequence of N screens with Next and Back buttons (only one button at each end).
A random tester that presses available buttons uniformly needs an expected
(N−1)2(N-1)^2(N−1)2
presses to reach the far end.
For
N=10N = 10N=10
, that is 81 presses.
If the tester never revisits a state, the expectation drops to
N−1=9N − 1 = 9N−1=9
— an 81 vs 9 difference.
Thus, a key to efficient scenario-free testing is state deduplication: avoid revisiting.
For the link crawler, treat each URL as a state ID; record visited URLs and skip repeats.
If no natural ID exists, you must add a mechanism to recognize and remember states.