About Me

a company that do scientific software.

ESSS

About ESSS... we build customized software for scientific simulation. We develop the full solution, from the numerical solver, that most of the time involves somes kind of CFD, to the user interface and 3D visualization.

We started to use python in 2003, and at the time very few people were using Python to build heavyweight desktop applications, so I think we were pretty innovative in these regard.

Rocky

We have a very strong testing culture. The tool that will be showing in this talk was an internal testing lib that we open sourced some time ago, and, Our main product, Rocky, has more than 13.000 tests. Rocky is a particle interaction simulator built in Python and C++. It can simulate separation of mining raw material, coating of pharmaceutical drug pills, this animation on the right are potato chips in a mechanical separator...

In 2018 we migrated the code base from Python 2 to 3, and thanks to this extensive test coverage, no defect related to the migration went to the production version undetected.

Shirt

It took us about 10 months to finish the 2to3 migration, not only for Rocky but for all other active projects within the company. At the end we were so happy that we made a special shirt to celebrate.

Whetting Your Appetite

Before explaining the concept I want to give a quick glance of the tool I'll be showing here, so you'll some taste of what we'll be talking about.

Class

Let's say I have a class that stores car specifications. Here I have a dataclass, but it could be a standard Python class.

And I have a method that receives a car name, and it returns an object of CarSpec

test

The standard unit test for this method would be like this. I call the method and assert that each object attribute has the expected value.

This is OK for an object of this size, but what if my object has dozens and dozens of attributes? You also can se that here I forgot to check one of the attributes (the displacement), this happens all the time in the real world, since this process this is a very manual.

So what I'll show you is how to replace all these asserts ...

after...

...this single line. And it will make the test more complete, more maintanable and easier to debug in case of failure.

But before we dive into more examples, I would like to proper define "Data Regression Testing."

Resgression Testing

At first I thought in using just "Regressions Testing" for this talk.

Most of the definitions of "Regressions Testing" that I found were similar to this one, that "regression testing is done to ensure that a change in the code has not introduced any new defect"

Well... but this definition seems to cover 99% of tests I write. Most of tests, being them end-to-end tests, integration test or unit tests, are to ensure that our implementation would not be broken by future changes. We may have other kind of tests, like tests to prevent performance degradation, or some smoke test that covers parts of the code to do type checking, but the great majority of my test suite has the objective of prevent software regression.

So in the search of a more specific definition, I ended up with term "data regression testing"

Data regression testing

I couldn't find a definition for "data regression testing" in any of the "classical literature", of Software Quality and Software Testing books.

So I came up with this definition: "data regression testing is used to prevent software regression by comparing the output data of the code that I changed with the data generated by a previous version of this code"

As I sad, I couldn't find that in any of the "classical literature", (If anyone knows of some book that mentions that, please send me a tweet.) but I found some blog posts that indicate the idea is already being used by many teams. Some of these posts uses "database regression testing", when it's about databases, and matplotlib team defines something called "Image Comparisson testing", where their test suite, when run, generate images from the current version code and compare them to known good results. If any difference between images were detected, the test suite fails

So, I'm not introducing anything new here.

pytest-regression

The implementation of data regression testing I'll be showing here is a pytest plugin.

You'll find out that the concept is far from complex, so you can implement a version of it for other testing frameworks.

pytest graph

but Having sad that.... it's safe to say that pytest is currently the de facto testing framework for Python

x If you aree still using other framework to run python test right now, I strongly recommend you to migrate to pytest, mainly due the large collection of plugins available for pytest. When you go from a unit test tutorial that just very simple testing functing to testing real world code bases, a lot of challenges appears (like how to test code that depends on the database, how can I test my web framework, and oh my test suite is taking too long to run). Probably there's a pytest plugin that with help you with these kind of challenges. So, one of the greates assets of pytest is its collection o community-built plugins.

It'd be good, but not required, if you know pytest and the concept of pytest-fixtures to better understand the examples I'll show next

Installation

pytest-regressions is available on PyPI and also in Conda, using the conda-forge channels. Docs are available readthedocs.

pytest-regressions fixtures

So afte install pytest-regressions, these 4 fixtures would be available for use. I'll be showing each of them with examples.

num_regression

Let's talk about num_regression...

When "test first" doesn't fit

There's this concept of "test first" from TDD where you must write the test before any real code. It should be a failing test, them you write the code to make it pass.

But for that, you must known the result you are expecting from the code you are writing beforehand, and there are many situations this is not possible, or is very hard to do.

This happens all the time in numerical simulation, where it's not always possible to known beforehand the result of complex diferential equations. It's also true for machine learning methods that uses complex statistical algorithms. and many other situations

We'll see that data regression becomes pretty handy for these kind of situations.

Bezier

So, I'll be using one of these cases as an example. The Bezier curve algorithm, used in many drawing softwares, defines a curve from two or more control points. The first and last control points are always the end points of the curve, and intermediate control points define curve inclination.

In this example, I'm gonna implement the quadratic bezier to generate 100 points curve from 3 control points.

test

If I'd follow the "test first" approach, this is how far I can get: the only information that I have is that first and last control points are always the end points of the curve. But drawing a straight line would make this test pass. I can't known the intermediary points without solving "by hand" the quadratic bezier equation, and although the equation is not that complicated, I don't think any reasonable person would do that.

Of course If I had another implementation of quadratic bezier that I could trust, I could use that to get the curve points. But for the sake of the example, let's say that the only reference I have are the drawings I got from wikipedia.

Bezier Implementation

So this is my quadratic bezier implementation, and I'll plot the results to see If I got it right

LIVE -

Naive approach

I can now improve my test by picking some intermediary points of the curve and add an assert for it. It's far from elegant, but it works.

And this is a key point here: for these cases kind of situation, tests do not drive development, they are mostly used to avoid regression. And here is where pytest-regressions would improve our tests.

pytest-regressions approach

So, as I showed in the first example, I will replace all asserts to a single call and the test would be more complete and easier to maintain.

num-regression test

So i declare the pytest fixture num_regression and use the method check passing a dict of the arrays I want to test. Let's see what happen

LIVE

Run the test
Show the generated directory, same name of test module
Run test again to show it passing
Show the file in Excel
shows a regression

defining tolerance

LIVE

Change the array type
Show the diff
Add tolerance and show the test pass

defining tolerance

Tolerance could also be set individually for each array using the tolerances parameter. The tolerance uses numpy standard for comparing values, defining the parameters "atol" (absolute tolerance), and "rtol" (relative tolerance). Most of the time you can go with "atol", but when values has different magnitudes, relative tolerance should be used to make sure you are correctly testing very small values.

file_regression

The next fixture I want to show is file_regression, which does data regression for generic text files.

mardownify

In this example let's say I have a function that converts a piece of HTML code to Markdown. It receives an HTML string and returns a Markdown string.

test_markdownify

To test this function we will use file_regression fixture. Let's call check passing the generated string, and use the extra parameter extension, so the generated file has the approprieta extension (the default one is .txt).

LIVE

If any regression happens, a nice diff is shown in the error message

LIVE

An HTML diff file is also generated. The link is printed in the error message

Flask Views

file_regression is also a good fit for testing web frameworks template based views, like we see in Flask or Django. If you are not familiar with web frameworks, a template based view is basically a web route that responds an HTML file rendered at run time. The renderization could be parametrized by pre-defined variables on the template file. So here we have the template hello.html and the route /hello that will render my template with the string "PyCon 2020"

Naive Approach

Here is one approach, just check if the response data contains the string I'm expecting. There is no shame in that, it works, but it's far from complete.

Regression Approach

So I can use file_regression to make sure that there were no changes on entire the HTML file.

LIVE - As we showed early, any regression will fail with a nice diff on the error message.

Real HTML file

I showed a very short HTML example so it would fit in the slide, but if you already implemented something a little bit more complex than a web framework tutorial, you know that HTML files are more like that. There's a lot of meta-data, javascript imports, styles definition. And probably if you do a good separation between the view and the contents, you don't want to changes in the style to break your test.

So it's a good idea to use some HTML parser to do regression only in the piece of HTML that matters. Here I'm using BeautifulSoup to select only the element of the HTML file for regression. It'll also reduce the size of your regression file, making it easier to debug.

data_regression

The third fixture is data_regression, use to do data regression testing for serializable objects and dict-like structures.

Car class

Let's get back to our first example, the CarSpec class

test

And here is our test. The data_regression test method receives a dictionary, so the object to be tested must be serializable to a dict.

regression file

The generated regression file is a Yaml containing all dict attributes. It'll also contain nested objects attributes.

data_regression diff

As in other fixtures, any regression will make the test fails and the diff will be shown in the error message

Rest APIs

data_regression is a very good fit for testing web APIs. Let's say that this list of Heroes is a collection that should be exposed through a REST API.

Rest APIs - routes

Here I'll be using Flask to expose our collection through HTTP methods. The first route exposes a single item of the collection, where the second one returns the entire collection of heroes. Both uses JSON as protocol.

Rest APIs - tests

I could use data_regression just like that to test both end-points. The Flask response object has a method get_json, which returns the JSON content of the response.

Rest APIs - test_heroes_item

In the first test, get_json returns a single dictionary representing the fourth item of the collection. The generated regression file would be like that.

Rest APIs - test_heroes_collection

On the secont test, test_heroes_collection, get_json returns a list representing the collection. The generated regression file would contain the list of items in the YAML format. As we can see, data_regression.check also supports sequences of basic Python types.

image_regression

Last but the lease, we have the image_regression fixture that creates data regression for images

example

This is a sample code from Matplotlib which generates a 3D plot. Let's see how we can write a data regression test for it.

test

The image_regression check function receives the image contents as bytes. We used a BytesIO object to trick the function to write the image contents in memory, and pass the buffer contents to the image_regression fixture. On the first run, the image file would be created by the fixture in the test folder. With the regression file created, a new run will compare images checking any difference between pixels.

regression

In the case of a regression, the test fails with a percentage, where 100 means you are testing an all white image against a black image.

threshold

The comparisson threshold can be customized through the diff_threshold parameter. A very common problem when using image regression testing is when the same font is rendered with small variations in different operational system. Increasing the diff threshold may help with that.

Ok, other quirks...

You can regenerate all regression data of your test suite with the --force-regen option. So, consider our web framework view example, where we use file_regression to compare HTML files. Let's say we change a single javascript library version that is used in all the pages, I may use --force-regen to update all the HTML regression files without hassle.

The num_regression depends on Pandas to write CSV files and image_regression depends on pillow to do image comparisson. pytest-regerssions do not install these dependencies automatically to avoid increasing your virtuale env without necessity... you may only use data_regression fixture in your project so you probably don't want to install Pandas for that...

So these dependencies must be installed manually.

Bonus round

I want to highlight two more libraries that, in someway, are related with pytest-regressions

pytest-datadir

The first one is pytest-datadir, it's a pytest plugin that makes easier to store support file tests.

pytest-datadir example1

So, let's say you want to test a function that count lines of a file. Using pytest-datadir, I can make a directory using the same name of my testing module and add any support file to it. These files would become easily accessible through the datadir fixture, which is a Path object. Here I'm passing the support_file.txt to the count_lines function

pytest-datadir example2

It's also worth noting that datadir would copy the folder contents to a temporary directory. so changing support file contents would not change the original file... you can also use datadir to write files generated by your tests.

Here we're test a tab2space converter, and writing the generated file to datadir to then check it doesn't have any tab character

pytest-datadir example2

pytest-regressions uses pytest-datadir behind the scenes and they can be used together without problem, so instead of testing if spaced_file.txt has no tab character, we can use file_regression to make the test way more effective in terms of preventing regression.

serialchemy

Serialchemy is a serialization library for SQLAlchemy models. SQLAlchemy is the most used Python ORM, and we originally created this library to build Web APIs for database models that needs to be serialized to json.

But it can be combined with data_regression to greatly improve testing suite of SQLAlchemy models.

Here we have two very simple models, User and Address. Address has a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!