a company that do scientific software.
About ESSS... we build customized software for scientific simulation. We develop the full solution, from the numerical solver, that most of the time involves somes kind of CFD, to the user interface and 3D visualization.
We started to use python in 2003, and at the time very few people were using Python to build heavyweight desktop applications, so I think we were pretty innovative in these regard.
We have a very strong testing culture. The tool that will be showing in this talk was an internal testing lib that we open sourced some time ago, and, Our main product, Rocky, has more than 13.000 tests. Rocky is a particle interaction simulator built in Python and C++. It can simulate separation of mining raw material, coating of pharmaceutical drug pills, this animation on the right are potato chips in a mechanical separator...
In 2018 we migrated the code base from Python 2 to 3, and thanks to this extensive test coverage, no defect related to the migration went to the production version undetected.
It took us about 10 months to finish the 2to3 migration, not only for Rocky but for all other active projects within the company. At the end we were so happy that we made a special shirt to celebrate.
Before explaining the concept I want to give a quick glance of the tool I'll be showing here, so you'll some taste of what we'll be talking about.
Let's say I have a class that stores car specifications. Here I have a dataclass, but it could be a standard Python class.
And I have a method that receives a car name, and it returns an object of CarSpec
The standard unit test for this method would be like this. I call the method and assert that each object attribute has the expected value.
This is OK for an object of this size, but what if my object has dozens and dozens of attributes? You also can se that here I forgot to check one of the attributes (the displacement), this happens all the time in the real world, since this process this is a very manual.
So what I'll show you is how to replace all these asserts ...
...this single line. And it will make the test more complete, more maintanable and easier to debug in case of failure.
But before we dive into more examples, I would like to proper define "Data Regression Testing."
At first I thought in using just "Regressions Testing" for this talk.
Most of the definitions of "Regressions Testing" that I found were similar to this one, that "regression testing is done to ensure that a change in the code has not introduced any new defect"
Well... but this definition seems to cover 99% of tests I write. Most of tests, being them end-to-end tests, integration test or unit tests, are to ensure that our implementation would not be broken by future changes. We may have other kind of tests, like tests to prevent performance degradation, or some smoke test that covers parts of the code to do type checking, but the great majority of my test suite has the objective of prevent software regression.
So in the search of a more specific definition, I ended up with term "data regression testing"
I couldn't find a definition for "data regression testing" in any of the "classical literature", of Software Quality and Software Testing books.
So I came up with this definition: "data regression testing is used to prevent software regression by comparing the output data of the code that I changed with the data generated by a previous version of this code"
As I sad, I couldn't find that in any of the "classical literature", (If anyone knows of some book that mentions that, please send me a tweet.) but I found some blog posts that indicate the idea is already being used by many teams. Some of these posts uses "database regression testing", when it's about databases, and matplotlib team defines something called "Image Comparisson testing", where their test suite, when run, generate images from the current version code and compare them to known good results. If any difference between images were detected, the test suite fails
So, I'm not introducing anything new here.
The implementation of data regression testing I'll be showing here is a pytest plugin.
You'll find out that the concept is far from complex, so you can implement a version of it for other testing frameworks.
but Having sad that.... it's safe to say that pytest is currently the de facto testing framework for Python
x If you aree still using other framework to run python test right now, I strongly recommend you to migrate to pytest, mainly due the large collection of plugins available for pytest. When you go from a unit test tutorial that just very simple testing functing to testing real world code bases, a lot of challenges appears (like how to test code that depends on the database, how can I test my web framework, and oh my test suite is taking too long to run). Probably there's a pytest plugin that with help you with these kind of challenges. So, one of the greates assets of pytest is its collection o community-built plugins.
It'd be good, but not required, if you know pytest and the concept of pytest-fixtures to better understand the examples I'll show next
pytest-regressions is available on PyPI and also in Conda, using the conda-forge channels. Docs are available readthedocs.
So afte install pytest-regressions, these 4 fixtures would be available for use. I'll be showing each of them with examples.
Let's talk about num_regression...
There's this concept of "test first" from TDD where you must write the test before any real code. It should be a failing test, them you write the code to make it pass.
But for that, you must known the result you are expecting from the code you are writing beforehand, and there are many situations this is not possible, or is very hard to do.
This happens all the time in numerical simulation, where it's not always possible to known beforehand the result of complex diferential equations. It's also true for machine learning methods that uses complex statistical algorithms. and many other situations
We'll see that data regression becomes pretty handy for these kind of situations.
So, I'll be using one of these cases as an example. The Bezier curve algorithm, used in many drawing softwares, defines a curve from two or more control points. The first and last control points are always the end points of the curve, and intermediate control points define curve inclination.
In this example, I'm gonna implement the quadratic bezier to generate 100 points curve from 3 control points.
If I'd follow the "test first" approach, this is how far I can get: the only information that I have is that first and last control points are always the end points of the curve. But drawing a straight line would make this test pass. I can't known the intermediary points without solving "by hand" the quadratic bezier equation, and although the equation is not that complicated, I don't think any reasonable person would do that.
Of course If I had another implementation of quadratic bezier that I could trust, I could use that to get the curve points. But for the sake of the example, let's say that the only reference I have are the drawings I got from wikipedia.
So this is my quadratic bezier implementation, and I'll plot the results to see If I got it right
LIVE -
I can now improve my test by picking some intermediary points of the curve and add an assert for it. It's far from elegant, but it works.
And this is a key point here: for these cases kind of situation, tests do not drive development, they are mostly used to avoid regression. And here is where pytest-regressions would improve our tests.
So, as I showed in the first example, I will replace all asserts to a single call and the test would be more complete and easier to maintain.
So i declare the pytest fixture num_regression
and use the method check
passing a dict of the
arrays I want to test. Let's see what happen
LIVE
-
Run the test
-
Show the generated directory, same name of test module
-
Run test again to show it passing
-
Show the file in Excel
-
shows a regression
LIVE
- Change the array type
- Show the diff
- Add tolerance and show the test pass
Tolerance could also be set individually for each array using the tolerances
parameter. The tolerance
uses numpy standard for comparing values, defining the parameters "atol" (absolute tolerance), and "rtol"
(relative tolerance). Most of the time you can go with "atol", but when values has different magnitudes,
relative tolerance should be used to make sure you are correctly testing very small values.
The next fixture I want to show is file_regression, which does data regression for generic text files.
In this example let's say I have a function that converts a piece of HTML code to Markdown. It receives an HTML string and returns a Markdown string.
To test this function we will use file_regression
fixture. Let's call check
passing the
generated string, and use the extra parameter extension
, so the generated file has the
approprieta extension (the default one is .txt).
LIVE
If any regression happens, a nice diff is shown in the error message
LIVE
An HTML diff file is also generated. The link is printed in the error message
file_regression
is also a good fit for testing web frameworks template based views,
like we see in Flask or Django. If you are not familiar with web frameworks, a template
based view is basically a web route that
responds an HTML file rendered at run time. The renderization could be parametrized
by pre-defined variables on the template file. So here we have the template hello.html
and the route /hello
that will render my template with the string "PyCon 2020"
Here is one approach, just check if the response data contains the string I'm expecting. There is no shame in that, it works, but it's far from complete.
So I can use file_regression
to make sure that there were no changes on entire the HTML
file.
LIVE - As we showed early, any regression will fail with a nice diff on the error message.
I showed a very short HTML example so it would fit in the slide, but if you already implemented something a little bit more complex than a web framework tutorial, you know that HTML files are more like that. There's a lot of meta-data, javascript imports, styles definition. And probably if you do a good separation between the view and the contents, you don't want to changes in the style to break your test.
So it's a good idea to use some HTML parser to do regression only in the piece of HTML that matters. Here I'm using BeautifulSoup to select only the element of the HTML file for regression. It'll also reduce the size of your regression file, making it easier to debug.
The third fixture is data_regression
, use to do data regression testing for serializable objects
and dict-like structures.
Let's get back to our first example, the CarSpec class
And here is our test. The data_regression
test method receives a dictionary, so the object to be
tested must be serializable to a dict.
The generated regression file is a Yaml containing all dict attributes. It'll also contain nested objects attributes.
As in other fixtures, any regression will make the test fails and the diff will be shown in the error message
data_regression
is a very good fit for testing web APIs. Let's say that this
list of Heroes is a collection that should be exposed through a REST API.
Here I'll be using Flask to expose our collection through HTTP methods. The first route exposes a single item of the collection, where the second one returns the entire collection of heroes. Both uses JSON as protocol.
I could use data_regression
just like that to test both end-points. The Flask response
object has a method get_json
, which returns the JSON content of the response.
In the first test, get_json
returns a single dictionary representing the fourth item
of the collection. The generated regression file would be like that.
On the secont test, test_heroes_collection
, get_json
returns a list
representing the collection. The generated regression file would contain the list of items
in the YAML format. As we can see, data_regression.check
also supports sequences of basic
Python types.
Last but the lease, we have the image_regression
fixture that creates data regression for
images
This is a sample code from Matplotlib which generates a 3D plot. Let's see how we can write a data regression test for it.
The image_regression
check
function receives the image contents as bytes. We used a BytesIO
object to trick the function to write the image contents in memory, and pass the buffer contents
to the image_regression
fixture. On the first run, the image file would be created by the fixture
in the test folder. With the regression file created, a new run will compare images checking any
difference between pixels.
In the case of a regression, the test fails with a percentage, where 100 means you are testing an all white image against a black image.
The comparisson threshold can be customized through the diff_threshold
parameter. A very
common problem when using image regression testing is when the same font is rendered
with small variations in different operational system. Increasing the diff threshold may
help with that.
Ok, other quirks...
You can regenerate all regression data of your test suite with the --force-regen
option.
So, consider our web framework view example, where we use file_regression
to compare
HTML files. Let's say we change a single javascript library version that is used in all
the pages, I may use --force-regen
to update all the HTML regression files without hassle.
The num_regression
depends on Pandas to write CSV files and image_regression
depends
on pillow to do image comparisson. pytest-regerssions
do not install these dependencies
automatically to avoid increasing your virtuale env without necessity... you may only use
data_regression
fixture in your project so you probably don't want to install Pandas for that...
So these dependencies must be installed manually.
I want to highlight two more libraries that, in someway, are related with pytest-regressions
The first one is pytest-datadir
, it's a pytest plugin that makes easier to store support
file tests.
So, let's say you want to test a function that count lines of a file. Using pytest-datadir
,
I can make a directory using the same name of my testing module and add any support file to it.
These files would become easily accessible through the datadir
fixture, which is a Path object.
Here I'm passing the support_file.txt
to the count_lines
function
It's also worth noting that datadir
would copy the folder contents to a temporary directory.
so changing support file contents would not change the original file... you can also use datadir
to write files generated by your tests.
Here we're test a tab2space converter, and writing the generated file to datadir
to then check it
doesn't have any tab character
pytest-regressions
uses pytest-datadir
behind the scenes and they can be used together without problem,
so instead of testing if spaced_file.txt
has no tab character, we can use file_regression
to make the
test way more effective in terms of preventing regression.
Serialchemy is a serialization library for SQLAlchemy models. SQLAlchemy is the most used Python ORM, and we originally created this library to build Web APIs for database models that needs to be serialized to json.
But it can be combined with data_regression
to greatly improve testing suite of SQLAlchemy models.
Here we have two very simple models, User and Address. Address has a