Data Science: From School to Work, Part IV

Good practices for testing your Python projects The post Data Science: From School to Work, Part IV appeared first on Towards Data Science.

Apr 23, 2025 - 20:47

Introduction

Let’s start with a simple example that will appeal to most of us. If you want to check if the blinkers of your car are working properly, you sit in the car, turn on the ignition and test a turn signal to see if the front and rear lights work. But if the lights don’t work, it’s hard to tell why. The bulbs may be dead, the battery may be dead, the turn signal switch may be faulty. In short, there’s a lot to check. This is exactly what the tests are for. Every part of a function such as the blinker must be tested to find out what is going wrong. A test of the bulbs, a test of the battery, a test of the communication between the control unit and the indicators, and so on.

To test all this, there are different types of tests, often presented in the form of a pyramid, from the fastest to the slowest and from the most isolating to the most integrated. This test pyramid can vary depending on the specifics of the project (database connection test, authentication test, etc.).

The Base of the Pyramid: Unit Tests

Unit tests form the basis of the test pyramid, regardless of the type of project (and language). Their purpose is to test a unit of code, e.g. a method or a function. For a unit test to be truly considered as such, it must adhere to a basic rule: A unit test must not depend on functionalities outside the unit under test. They have the advantage of being fast and automatable.

Example: Consider a function that extracts even numbers from an iterable. To test this function, we’d need to create several types of iterable with integers and check the output. But we’d also need to check the behavior in the case of empty iterables, element types other than int, and so on.

Intermediate Level: Integration and Functional Tests

Just above the unit tests are the integration tests. Their purpose is to detect errors that cannot be detected by unit tests. These tests check that the addition of a new feature does not cause problems when it is integrated into the application. The functional tests are similar but aim at testing one precise fonctionality (e.g an authentification process).

In a project, especially in a team environment, many functions are developed by different developers. Integration/functional tests ensure that all these features work well together. They are also run automatically, making them fast and reliable.

Example: Consider an application that displays a bank balance. When a withdrawal operation is carried out, the balance is modified. An integration test would be to check that with a balance initialized at 1000 euros, then a withdrawal of 500 euros, the balance changes to 500 euros.

The Top of the Pyramid: End-to-End Tests

End-to-end (E2E) tests are tests at the top of the pyramid. They verify that the application functions as expected from end to end, i.e. from the user interface to the database or external services. They are generally long and complicated to set up, but there’s no need for a lot of tests.

Example: Consider a forecasting application based on new data. This can be very complex, involving data retrieval, variable transformations, learning and so on. The aim of the End-to-End test is to check that, given the new data selected, the forecasts correspond to expectations.

The Unit Tests with Doctest

A fast and simple way of making unit tests is to use docstring. Let’s take the example of a script calculate_stats.py with two functions: calculate_mean() with a complete docstring, which was presented in Python best practices, and the function calculate_std() with a usual one.

import math
from typing import List

def calculate_mean(numbers: List[float]) -> float:
    """
    Calculate the mean of a list of numbers.

    Parameters
    ----------
    numbers : list of float
        A list of numerical values for which the mean is to be calculated.

    Returns
    -------
    float
        The mean of the input numbers.

    Raises
    ------
    ValueError
        If the input list is empty.

    Notes
    -----
    The mean is calculated as the sum of all elements divided by the number of elements.

    Examples
    --------
    >>> calculate_mean([1.0, 2.0, 3.0, 4.0])
    2.5
    >>> calculate_mean([])
    0
    """

    if len(numbers) > 0:
        return sum(numbers) / len(numbers)
    else:
        return 0

def calculate_std(numbers: List[float]) -> float:
    """
    Calculate the standard deviation of a list of numbers.

    Parameters
    ----------
    numbers : list of float
        A list of numerical values for which the mean is to be calculated.

    Returns
    -------
    float
        The std of the input numbers.
    """

    if len(numbers) > 0:
        m = calculate_mean(numbers)
        gap = [abs(x - m)**2 for x in numbers]
        return math.sqrt(sum(gap) / (len(numbers) - 1))
    else:
        return 0

The test is included in the “Examples” section at the end of the docstring of the function calculate_mean(). A doctest follows the layout of a terminal: three chevrons at the beginning of a line with the command to be executed and the expected result just below. To run the tests, simply type the command

  python -m doctests calculate_stats.py -v

or if you use uv (what I encourage)

uv run python -m doctest calculate_stats.py -v

The -v argument permits to display the following output:

As you can see that there have been two tests and no failures, and doctest has the intelligence to point out all the methods that don’t have a test (as with calculate_std()).

The Unit Tests with Pytest

Using doctest is interesting, but quickly becomes limited. For a truly comprehensive testing process, we use a specific framework. There are two main frameworks for testing: unittest and Pytest. The latter is generally considered simpler and more intuitive.

To install the package, simply type:

pip install pytest (in your virtual environment)

uv add pytest

1 – Write your first test

Let’s take the calculate_stats.py script and write a test for the calculate_mean() function. To do this, we create a script test_calculate_stats.py containing the following lines:

from calculate_stats import calculate_mean

def test_calculate_mean():
    assert calculate_mean([1, 2, 3, 4, 5, 6]) == 3.5

Tests are based on the assert command. This instruction is used with the following syntax:

assert expression1 [, expression2]

The expression1 is the condition to be tested, and the optional expression2 is the error message if the condition is not verified.

The Python interpreter transforms each assert statement into:

if __debug__:
    if not expression1:
        raise AssertionError(expression2)

2 – Run a test

To run the test, we use the following command:

pytest (in your virtual environment)

uv run pytest

The result is as follows:

3 – Analyse the output

One of the great advantages of pytest is the quality of its feedback. For each test, you get:

A green dot (.) for success;
An F for a failure;
An E for an error;
An s for a skipped test (with the decorator @pytest.mark.skip(reason="message")).

In the event of failure, pytest provides:

The exact name of the failed test;
The problematic line of code;
Expected and obtained values;
A complete trace to facilitate debugging.

For example, if we replace the == 3.5 with == 4, we obtain the following output:

4 – Use parametrize

To test a function properly, you need to test it exhaustively. In other words, test it with different types of inputs and outputs. The problem is that very quickly you end up with a succession of assert and test functions that get longer and longer, which is not easy to read.

To overcome this problem and test several data sets in a single unit test, we use the parametrize. The idea is to create a list containing all the datasets you wish to test in tuple form, then use the @pytest.mark.parametrize decorator. The last test can read write as follows

from calculate_stats import calculate_mean
import pytest

testdata = [
    ([1, 2, 3, 4, 5, 6], 3.5),
    ([], 0),
    ([1.2, 3.8, -1], 4 / 3),
]

@pytest.mark.parametrize("numbers, expected", testdata)
def test_calculate_mean(numbers, expected):
    assert calculate_mean(numbers) == expected

If you wish to add a test set, simply add a tuple to testdata.

It is also advisable to create another type of test to check whether errors are raised, using the context with pytest.raises(Exception):

testdata_fail = [
    1,
    "a",
]

@pytest.mark.parametrize("numbers", testdata_fail)
def test_calculate_mean_fail(numbers):
    with pytest.raises(Exception):
        calculate_mean(numbers)

In this case, the test will be a success on the function returns an error with the testdata_fail data.

5 – Use mocks

As mentioined in introduction, the purpose of a unit test is to test a single unit of code and, above all, it must not depend on external components. This is where mocks come in.

Mocks simulate the behavior of a constant, a function or even a class. To create and use mocks, we’ll use the pytest-mock package. To install it:

pip install pytest-mock (in your virtual environment)

uv add pytest-mock

a) Mock a function

To illustrate the use of a mock, let’s take our test_calculate_stats.py script and implement the test for the calculate_std() function. The problem is that it depends on the calculate_mean() function. So we’re going to use the mocker.patch method to mock its behavior.

The test for the calculate_std() function is written as follows

def test_calculate_std(mocker):
    mocker.patch("calculate_stats.calculate_mean", return_value=0)

    assert calculate_std([2, 2]) == 2
    assert calculate_std([2, -2]) == 2

Executing the pytest command yields

Explanation:
The mocker.patch("calculate_stats.calculate_mean", return_value=0) line assigns the output 0 to the calculate_mean() return in calculate_stats.py. The calculation of the standard deviation of the series [2, 2] is distorted because we mock the behavior of calculate_mean() by always returning 0. The calculation is correct if the mean of the series is really 0, as shown by the second assertion.

b) Mock a class

In a similar way, you can mock the behavior of a class and simulate its methods and/or attributes. To do this, you need to implement a Mock class with the methods/attributes to be modified.

Consider a function, need_pruning(), which tests whether or not a decision tree should be pruned according to the minimum number of points in its leaves:

from sklearn.tree import BaseDecisionTree


def need_pruning(tree: BaseDecisionTree, max_point_per_node: int) -> bool:
    # Get the number of samples in each node
    n_samples_per_node = tree.tree_.n_node_samples

    # Identify which nodes are leaves.
    is_leaves = (tree.tree_.children_left == -1) & (tree.tree_.children_right == -1)

    # Get the number of samples in leaf nodes
    n_samples_leaf_nodes = n_samples_per_node[is_leaves]
    return any(n_samples_leaf_nodes < max_point_per_node)

Testing this function can be complicated, since it depends on a class, DecisionTree, from the scikit-learn package. What’s more, you’d need data to train a DecisionTree before testing the function.
To get around all these difficulties, we need to mock the attributes of a DecisionTree‘s tree_ object.

from model import need_pruning
from sklearn.tree import DecisionTreeRegressor
import numpy as np


class MockTree:
    # Mock tree with two leaves with 5 points each.
    @property
    def n_node_samples(self):
        return np.array([20, 10, 10, 5, 5])

    @property
    def children_left(self):
        return np.array([1, 3, 4, -1, -1])

    @property
    def children_right(self):
        return np.array([2, -1, -1, -1, -1])


def test_need_pruning(mocker):
    new_model = DecisionTreeRegressor()
    new_model.tree_ = MockTree()

    assert need_pruning(new_model, 6)
    assert not need_pruning(new_model, 2)

Explanation:
The MockTree class can be used to mock the n_node_samples, children_left and children_right attributes of a tree_object. In the test, we create a DecisionTreeRegressor object whose tree_ attribute is replaced by the MockTree. This controls the n_node_samples, children_left and children_right attributes required for the need_pruning() function.

4 – Use fixtures

Let’s complete the previous example by adding a function, get_predictions(), to retrieve the average of the variable of interest in each of the tree’s leaves:

def get_predictions(tree: BaseDecisionTree) -> np.ndarray:
    # Identify which nodes are leaves.
    is_leaves = (tree.tree_.children_left == -1) & (tree.tree_.children_right == -1)

    # Get the target mean in the leaves
    values = tree.tree_.value.flatten()[is_leaves]
    return values

One way of testing this function would be to repeat the first two lines of the test_need_pruning() test. But a simpler solution is to use the pytest.fixture decorator to create a fixture.

To test this new function, we need the MockTree we created earlier. But, to avoid repeating code, we use a fixture. The test script then becomes:

from model import need_pruning, get_predictions
from sklearn.tree import DecisionTreeRegressor
import numpy as np
import pytest


class MockTree:
    @property
    def n_node_samples(self):
        return np.array([20, 10, 10, 5, 5])

    @property
    def children_left(self):
        return np.array([1, 3, 4, -1, -1])

    @property
    def children_right(self):
        return np.array([2, -1, -1, -1, -1])

    @property
    def value(self):
        return np.array([[[5]], [[-2]], [[-8]], [[3]], [[-3]]])

@pytest.fixture
def tree_regressor():
    model = DecisionTreeRegressor()
    model.tree_ = MockTree()
    return model


def test_nedd_pruning(tree_regressor):
    assert need_pruning(tree_regressor, 6)
    assert not need_pruning(tree_regressor, 2)


def test_get_predictions(tree_regressor):
    assert all(get_predictions(tree_regressor) == np.array([3, -3]))

In our case, the fixture allows us to have a DecisionTreeRegressor object whose tree_ attribute is our MockTree.

The advantage of a fixture is that it provides a fixed development environment for configuring a set of tests with the same context or dataset. This can be used to:

Prepare objects;
Start or stop services;
Initialize the database with a dataset;
Create test client for web project;
Configure mocks.

5 – Organize the tests directory

pytest will run tests on all files beginning with test_ or ending with _test. With this method, you can simply use the pytest command to run all the tests in your project.

As with the rest of a Python project, the test directory must be structured. We recommend:

Break down your tests by package
Test no more than one module per script

However, you can also run only the tests of a script by specifying the path of the .py script.

pytest .\test\Package1\tests_module1.py  (in your virtual environment)

uv run pytest .\test\Package1\tests_module1.py

6 – Analyze your test coverage

Once the tests have been written, it’s worth looking at the test coverage rate. To do this, we install the following two packages: coverage and pytest-cov and run a coverage measure.

pip install pytest-cov, coverage (in your virtual environment)
pytest --cov=your_main_directory

uv add pytest-mock, coverage
uv run pytest --cov=your_main_directory

The tool then measures coverage by counting the number of lines tested. The following output is obtained.

The 92% obtained for the calculate_stats.py script comes from the line where the squares of the deviations from the mean are calculated:

gap = [abs(x - m)**2 for x in numbers]

To prevent certain scripts from being analyzed, you can specify exclusions in a .coveragerc configuration file at the root of the project. For example, to exclude the two test files, write

[run]
omit = .\test_*.py

And we get

Finally, for larger projects, you can manage an html report of the coverage analysis by typing

pytest --cov=your_main_directory --cov-report html  (in your virtual environment)

uv run pytest --cov=your_main_directory --cov-report html

7 – Some usefull packages

pytest-xdist: Speed up test execution by using multiple CPUs
pytest-randomly: Randomly mix the order of test items. Reduces the risk of surprising inter-test dependencies.
pytest-instafail: Displays failures and errors immediately instead of waiting for all tests to complete.
pytest-tldr: The default pytest outputs are chatty. This plugin limits the output to only traces of failed tests.
pytest-mlp: Allows you to test Matplotlib results by comparing images.
pytest-timeout: Ends tests that take too long, probably due to infinite loops.
freezegun: Allows to mock datetime module with the decorator @freeze_time().

Special thanks to Banias Baabe for this list.

Integration and fonctional tests

Now that the unit tests have been written, most of the work is done. Courage, we’re almost there!

As a reminder, unit tests aim to test a unit of code without it interacting with another function. This way we know that each function/method does what it was developed for. It is time to test how they work together!

1 – Integration tests

Integration tests are used to check the combinations of different code units, their interactions and the way in which subsystems are combined to form a common system.

The way we write integration tests is no different from the way we write unit tests. To illustrate it we create a very simple FastApi application to get or to set the couple Login/Password in a “database”. To simplify the example, the database is just a dict named users. We create a main.py script with the following code

from fastapi import FastAPI, HTTPException

app = FastAPI()

users = {"user_admin": {"Login": "admin", "Password": "admin123"}}


@app.get("/users/{user_id}")
async def read_user(user_id: str):
    if user_id not in users:
        raise HTTPException(status_code=404, detail="Users not found")
    return users[user_id]


@app.post("/users/{user_id}")
async def create_user(user_id: str, user: dict):
    if user_id in users:
        raise HTTPException(status_code=400, detail="User already exists")
    users[user_id] = user
    return user

To test a this application, you have to use httpx and fastapi.testclient packages to make requests to your endpoints and verify the responses. The script of tests is as follows:

from fastapi.testclient import TestClient
from main import app

client = TestClient(app)


def test_read_user():
    response = client.get("/users/user_admin")
    assert response.status_code == 200
    assert response.json() == {"Login": "admin", "Password": "admin123"}


def test_read_user_not_found():
    response = client.get("/users/new_user")
    assert response.status_code == 404
    assert response.json() == {"detail": "User not found"}


def test_create_user():
    new_user = {"Login": "admin2", "Password": "123admin"}
    response = client.post("/users/new_user", json=new_user)
    assert response.status_code == 200
    assert response.json() == new_user


def test_create_user_already_exists():
    new_user = {"Login": "duplicate_admin", "Password": "admin123"}
    response = client.post("/users/user_admin", json=new_user)
    assert response.status_code == 400
    assert response.json() == {"detail": "User already exists"}

In this example, the tests depend on the application created in the main.py script. These are therefore not unit tests. We test different scenarios to check whether the application works well.

Integration tests determine whether independently developed code units work correctly when they are linked together. To implement an integration test, we need to:

write a function that contains a scenario
add assertions to check the test case

2 – Fonctional tests

Functional testing ensures that the application’s functionality complies with the specification. They differ from integration tests and unit tests because you don’t need to know the code to perform them. Indeed, a good knowledge of the functional specification will suffice.

The project manager can write the all specifications of the application and developpers can write tests to perform this specifications.

In our previous example of a FastApi application, one of the specifications is to be able to add a new user and then check that this new user is in the database. Thus, we test the fonctionallity “adding a user” with this test

from fastapi.testclient import TestClient
from main import app

client = TestClient(app)


def test_add_user():
    new_user = {"Login": "new_user", "Password": "new_password"}
    response = client.post("/users/new_user", json=new_user)
    assert response.status_code == 200
    assert response.json() == new_user

    # Check if the user was added to the database
    response = client.get("/users/new_user")
    assert response.status_code == 200
    assert response.json() == new_user

The End-to-End tests

The end is near! End-to-end (E2E) tests focus on simulating real-world scenarios, covering a range of flows from simple to complex. In essence, they can be thought of as foncntional tests with multiple steps.

However, E2E tests are the most time-consuming to execute, as they require building, deploying, and launching a browser to interact with the application.

When E2E tests fail, identifying the issue can be challenging due to the broad scope of the test, which encompasses the entire application. So you can now see why the testing pyramid has been designed in this way.

E2E tests are also the most difficult to write and maintain, owing to their extensive scope and the fact that they involve the entire application.

It’s essential to understand that E2E testing is not a replacement for other testing methods, but rather a complementary approach. E2E tests should be used to validate specific aspects of the application, such as button functionality, form submissions, and workflow integrity.

Ideally, tests should detect bugs as early as possible, closer to the base of the pyramid. E2E testing serves to verify that the overall workflow and key interactions function correctly, providing a final layer of assurance.

In our last example, if the user database is connected to an authentication service, an E2E test would consist of creating a new user, selecting their username and password, and then testing authentication with that new user, all through the graphical interface.

Conclusion

To summarize, a balanced testing strategy is essential for any production project. By implementing a system of unit tests, integration tests, functional tests and E2E tests, you can ensure that your application meets the specifications. And, by following best practice and using the right testing tools, you can write more reliable, maintainable and efficient code and deliver high quality software to your users. Finally, it also simplifies future development and ensures that new features don’t break the code.

References

1 – pytest documentation https://docs.pytest.org/en/stable/

2 – An interresting blog https://realpython.com/python-testing/ and https://realpython.com/pytest-python-testing/

The post Data Science: From School to Work, Part IV appeared first on Towards Data Science.