Software Testing: Theory and Practice (Part 1) - Fundamental Concepts of Software Testing

Purpose of This Series In this series, we will thoroughly explain basic concepts and best practices related to writing test code, test design, and test strategies in software testing. The intended audience ranges from those who have heard of software testing but don't know what it entails, to those struggling to understand the theoretical aspects of it. This series includes definitions and explanations of foundational concepts in software testing. Unfortunately, many books on software testing do not clearly define basic terms like "specification" and "behavior." Merely reading vague explanations can create a false sense of understanding, or lead to inconsistent interpretations. This series aims to provide a more solid understanding of software testing based on clear definitions from computer science. Key Takeaways Software testing is the process of verifying that the implementation meets the specification. A specification refers to the expected behavior of a program. If the program determines output based on input, this behavior can be represented as an input-output table. The purpose of software testing is to confirm that the risk of failure is within acceptable limits, and to prompt fixes if it is not. Overview of Software Testing In this series, software testing is succinctly defined as "the process of verifying that the implementation meets the specification"1. Some literature may include checking whether the implementation meets requirements, but for the sake of clarity, this series adopts the narrower definition. For example, consider a specification that requires "displaying the name of the logged-in user." If a user registers their name as null or constructor, and the name is not displayed or causes an error, then the implementation fails to satisfy the specification2. In short, the purpose of software testing is to ensure that names like null or constructor do not cause errors, and to determine whether the implementation needs fixing if they do. To understand software testing more deeply, we must also define the terms "specification" and "implementation." Can you clearly explain what a specification or implementation is? If not, read on for definitions and theory. Definitions and Theory Surrounding Software Testing Understanding the definitions of specification and implementation, and the relationship between them, is essential to understanding software testing. Yet, most testing books do not define these terms properly. In engineering disciplines like software testing, definitions are often provided by fundamental sciences like computer science. This section borrows definitions from computer science literature to explain key concepts. Specification In computer science, one common definition is: Generally, the expected behavior of a program is called the program's specification. A specification is defined as the "expected behavior" of a program. While this seems straightforward, digging deeper often reveals surprising misunderstandings. Behavior A key point that varies in understanding is the definition of "behavior." This article focuses on functional programs, where the output is uniquely determined by the input3. For example, a Fizz Buzz program — outputting "Fizz" for multiples of 3, "Buzz" for 5, and "Fizz Buzz" for both — is functional. A compiler translating source code into machine code is also a functional program. In functional programs, behavior refers to the relationship between input and output, which can be represented in a table like Table 1. Exceptions can be listed instead of return values. Table 1: Fizz Buzz Program Specification Input Output 1 "1" 2 "2" 3 "Fizz" 4 "4" 5 "Buzz" 6 "Fizz" 7 "7" 8 "8" 9 "Fizz" 10 "Buzz" 11 "11" 12 "Fizz" 13 "13" 14 "14" 15 "Fizz Buzz" (and so on4) … Inputs not listed in the table are considered to have implementation-dependent undefined behavior — the implementation can handle them freely. This allows for more efficient or optimized implementations. However, implementations for undefined behavior have no intrinsic value. Good specifications allow some freedom but avoid sacrificing value5. Returning to the "display logged-in user name" example: a program that displays any logged-in user’s name is correct. If it fails even once or crashes, it does not meet the specification. If the user is not logged in, the spec allows undefined behavior (e.g., an error message). Aside: Reactive Programs So far, we’ve covered functional programs, which take input, produce output, and stop. Some programs are reactive — continuously interacting with users or other programs. A web server is one example. Reactive programs require a different definition of behavior. One formalism for specifying such behavior is Communicating Sequential Processes (CSP), which can also express implementations and automatically verify conformance.

Apr 23, 2025 - 02:15

Software Testing: Theory and Practice (Part 1) - Fundamental Concepts of Software Testing

Purpose of This Series

In this series, we will thoroughly explain basic concepts and best practices related to writing test code, test design, and test strategies in software testing. The intended audience ranges from those who have heard of software testing but don't know what it entails, to those struggling to understand the theoretical aspects of it.

This series includes definitions and explanations of foundational concepts in software testing. Unfortunately, many books on software testing do not clearly define basic terms like "specification" and "behavior." Merely reading vague explanations can create a false sense of understanding, or lead to inconsistent interpretations. This series aims to provide a more solid understanding of software testing based on clear definitions from computer science.

Key Takeaways

Software testing is the process of verifying that the implementation meets the specification.
A specification refers to the expected behavior of a program. If the program determines output based on input, this behavior can be represented as an input-output table.
The purpose of software testing is to confirm that the risk of failure is within acceptable limits, and to prompt fixes if it is not.

Overview of Software Testing

In this series, software testing is succinctly defined as "the process of verifying that the implementation meets the specification"¹. Some literature may include checking whether the implementation meets requirements, but for the sake of clarity, this series adopts the narrower definition.

For example, consider a specification that requires "displaying the name of the logged-in user." If a user registers their name as null or constructor, and the name is not displayed or causes an error, then the implementation fails to satisfy the specification².

In short, the purpose of software testing is to ensure that names like null or constructor do not cause errors, and to determine whether the implementation needs fixing if they do.

To understand software testing more deeply, we must also define the terms "specification" and "implementation." Can you clearly explain what a specification or implementation is? If not, read on for definitions and theory.

Definitions and Theory Surrounding Software Testing

Understanding the definitions of specification and implementation, and the relationship between them, is essential to understanding software testing. Yet, most testing books do not define these terms properly. In engineering disciplines like software testing, definitions are often provided by fundamental sciences like computer science. This section borrows definitions from computer science literature to explain key concepts.

Specification

In computer science, one common definition is:

Generally, the expected behavior of a program is called the program's specification.

A specification is defined as the "expected behavior" of a program. While this seems straightforward, digging deeper often reveals surprising misunderstandings.

Behavior

A key point that varies in understanding is the definition of "behavior." This article focuses on functional programs, where the output is uniquely determined by the input³.

For example, a Fizz Buzz program — outputting "Fizz" for multiples of 3, "Buzz" for 5, and "Fizz Buzz" for both — is functional. A compiler translating source code into machine code is also a functional program.

In functional programs, behavior refers to the relationship between input and output, which can be represented in a table like Table 1. Exceptions can be listed instead of return values.

Table 1: Fizz Buzz Program Specification

Input	Output
1	"1"
2	"2"
3	"Fizz"
4	"4"
5	"Buzz"
6	"Fizz"
7	"7"
8	"8"
9	"Fizz"
10	"Buzz"
11	"11"
12	"Fizz"
13	"13"
14	"14"
15	"Fizz Buzz"
(and so on⁴)	…

Inputs not listed in the table are considered to have implementation-dependent undefined behavior — the implementation can handle them freely. This allows for more efficient or optimized implementations.

However, implementations for undefined behavior have no intrinsic value. Good specifications allow some freedom but avoid sacrificing value⁵.

Returning to the "display logged-in user name" example: a program that displays any logged-in user’s name is correct. If it fails even once or crashes, it does not meet the specification. If the user is not logged in, the spec allows undefined behavior (e.g., an error message).

Aside: Reactive Programs

So far, we’ve covered functional programs, which take input, produce output, and stop. Some programs are reactive — continuously interacting with users or other programs. A web server is one example.

Reactive programs require a different definition of behavior. One formalism for specifying such behavior is Communicating Sequential Processes (CSP), which can also express implementations and automatically verify conformance.

Implementation

An implementation is "a program intended to satisfy a given specification." While not always precisely defined in literature, this is the general usage.

Code 1 will show an example Fizz Buzz implementation, which throws an error for inputs <1.

Code 1: Example Implementation of the Fizz Buzz Program

// Fizz Buzz program implementation written in TypeScript
function fizzBuzz(i: number) {
    if (i < 1 || !Number.isInteger(i)) throw Error('Unsupported input');
    if (i % 3 === 0 && i % 5 === 0) return "FizzBuzz";
    if (i % 3 === 0) return "Fizz";
    if (i % 5 === 0) return "Buzz";
    return i.toString();
}

Unlike specifications, implementations must define output for all inputs. The input-output relationship in implementations is called the meaning of the program and can also be shown in a table (Table 2).

Table 2: Fizz Buzz Program Meaning

Input	Output
<1	Exception
1	"1"
2	"2"
3	"Fizz"
4	"4"
5	"Buzz"
…	…

The implementation satisfies the specification if the meaning (actual output) is included in the specification (expected output). If any output differs, the implementation fails to meet the spec.

Defect

A defect (or bug) is a point where the implementation does not match the specification. Formally, it is an input that produces different output than specified — including exceptions or infinite loops.

Failure

A failure occurs when a defective input is actually executed, and the resulting output differs from the spec. A defect doesn't always cause a failure unless the input is used. But every failure has an underlying defect.

Failure Risk

Failure risk combines severity and probability. Often, severity is rated with integers, and risk is calculated as severity × probability to enable comparison.

With this groundwork, we can better understand the definition and purpose of software testing.

Software Testing

As stated, software testing is verifying that an implementation meets the specification — or confirming the absence of defects.

To verify there are no defects, all possible (often infinite) inputs must be tested. Even for Fizz Buzz, verifying every number ≥1 is required. Reactive programs can have infinite traces even with finite input spaces⁶.

Since exhaustive testing is unrealistic⁷, software testing relies on sampling — selecting testable inputs from the specification.

Testing only samples can't confirm the absence of all defects. Thus, the goal is often adjusted: to verify that failure risk is acceptable.

Even partial testing reduces failure risk compared to no testing. As long as the benefits outweigh the risks, the program has value.

So even if software testing doesn’t prove correctness, it remains highly valuable.

Test Case

A test case is a pair of input and expected output sampled from the specification⁸. For Fizz Buzz, test cases can be extracted from the table (see Table 3).

There are two major approaches:

Example-based testing: selects concrete input-output pairs
Property-based testing: defines relations between inputs and outputs

Property-based testing allows more automatic input generation but requires discovering correct relationships — a more difficult task. Both will be discussed in later parts.

Table 3: Example Fizz Buzz Test Cases

Input	Expected Output
1	"1"
3	"Fizz"
5	"Buzz"
15	"FizzBuzz"

Conclusion

Software testing is the process of verifying that the implementation meets the specification.
A specification refers to the expected behavior of a program. If the program determines output based on input, this behavior can be represented as an input-output table.
The purpose of software testing is to confirm that the risk of failure is within acceptable limits, and to prompt fixes if it is not.

This explanation limits software testing to verification — confirming that the implementation meets the specification. Broader definitions (e.g., from ISTQB or SQuaRE) include validation — confirming that the implementation meets the requirements. ↩
null can cause issues in SQL queries without parameter escaping. constructor is problematic in JavaScript when using objects as dictionaries without hasOwnProperty. ↩
For clarity, we exclude non-deterministic programs like those relying on random numbers or the current time. ↩
Although inputs ≥16 are omitted, they should be specified. Specifications may also be written in natural language or logic. For example: “For input n > 0, if divisible by both 3 and 5, return 'FizzBuzz'; by 3 only, 'Fizz'; by 5 only, 'Buzz'; otherwise, return n as a string.” ↩
If all inputs are undefined, even random outputs or errors satisfy the spec, yet such implementations are worthless. Specifications must balance between freedom and meaningfulness. ↩
Functional programs have finite but huge input spaces. Reactive programs may have infinite traces, even if inputs are finite. ↩
Logical proof tools like Isabelle or Rocq can prove absence of defects even for infinite input spaces. ↩
ISTQB defines test cases more broadly to support reactive programs. ↩