Skip to content

[feat] Add support for marking tests as expected failures #3481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/regression_test_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ Test Decorators

.. autodecorator:: reframe.core.decorators.simple_test

.. autodecorator:: reframe.core.decorators.xfail


.. _builtins:

Expand Down Expand Up @@ -86,6 +88,7 @@ The use of this module is required only when creating new tests programmatically

.. autofunction:: reframe.core.builtins.variable

.. autofunction:: reframe.core.builtins.xfail

.. _pipeline-hooks:

Expand Down
142 changes: 142 additions & 0 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1423,6 +1423,148 @@ Note that we are using another configuration file, which defines an MPI-enabled
You should adapt them if running on an actual parallel cluster.


Marking expected failures
=========================

.. versionadded:: 4.9

In ReFrame you can mark a test failure as expected and attach a message explaining the failure.
There are two types of failures, sanity and performance failures, and you can mark either type.

Expected sanity failures
------------------------

To mark an expected sanity failure, you need to decorate the test with the :func:`@xfail <reframe.core.decorators.xfail>` decorator as follows:

.. literalinclude:: ../examples/tutorial/stream/stream_runonly_xfail.py
:caption:
:lines: 5-

The :func:`@xfail <reframe.core.decorators.xfail>` decorator takes a message to issue when the test fails, explaining the failure, and optionally a predicate to control when the test is expected to fail (we will discuss this later in the section).
In the example above, we have introduced a typo in the sanity checking function to cause the test to fail, but we have marked it as an expected failure.
Here is the output:

.. code-block:: bash
:caption: Run in the single-node container

reframe -C tutorial/config/baseline.py -c tutorial/stream/stream_runonly_xfail.py -r

.. code-block:: console


[==========] Running 1 check(s)
[==========] Started on Thu May 15 09:10:06 2025+0000

[----------] start processing checks
[ RUN ] stream_test /2e15a047 @tutorialsys:default+baseline
[ XFAIL ] stream_test /2e15a047 @tutorialsys:default+baseline [demo failure]
[----------] all spawned checks have finished

[ PASSED ] Ran 1/1 test case(s) from 1 check(s) (0 failure(s), 1 expected failure(s), 0 skipped, 0 aborted)
[==========] Finished on Thu May 15 09:10:09 2025+0000

Notice that the test is marked as ``XFAIL`` and no ``FAILURE INFO`` is generated.

As mentioned previously you can control when a sanity failure should be considered expected or not by passing the ``predicate`` argument to :func:`@xfail <reframe.core.decorators.xfail>`.
In the following example, a failure is expected only if ``x<=2``:

.. literalinclude:: ../examples/tutorial/stream/stream_runonly_xfail_cond.py
:caption:
:lines: 5-

If we run with ``x=3``, the test will fail normally:

.. code-block:: bash
:caption: Run in the single-node container

reframe -C tutorial/config/baseline.py -c tutorial/stream/stream_runonly_xfail_cond.py -r -S x=3

.. code-block:: console

[----------] start processing checks
[ RUN ] stream_test /2e15a047 @tutorialsys:default+baseline
[ FAIL ] (1/1) stream_test /2e15a047 @tutorialsys:default+baseline
==> test failed during 'sanity': test staged in '/home/user/reframe-examples/stage/tutorialsys/default/baseline/stream_test'
[----------] all spawned checks have finished


If a test passes unexpectedly, its status will be set to ``XPASS`` and it will be counted as failure.
In the previous example, we fix the typo in the assertion for ``x=2``, so that the test passes insted of expectedly failing.
Here is the outcome:

.. code-block:: bash
:caption: Run in the single-node container

reframe -C tutorial/config/baseline.py -c tutorial/stream/stream_runonly_xfail_cond.py -r -S x=2

.. code-block:: console

[----------] start processing checks
[ RUN ] stream_test /2e15a047 @tutorialsys:default+baseline
[ XPASS ] stream_test /2e15a047 @tutorialsys:default+baseline
[----------] all spawned checks have finished

[ FAILED ] Ran 1/1 test case(s) from 1 check(s) (1 failure(s), 0 expected failure(s), 0 skipped, 0 aborted)

Note that the test is counted as a failure and the overall run result is FAIL.
The reason in the ``FAILURE INFO`` of the test mentions that the test has passed unexpectedly and also states the original reason of why it should fail.

.. code-block:: console

* Reason: unexpected success error: demo failure


Expected performance failures
-----------------------------

A test fails a performance check when it cannot meet its reference performance within certain user-defined bounds.
As dicussed :ref:`earlier <writing-your-first-test>`, a test may define multiple performance variables, each one associated with a different reference.
To mark an expected performance failure, you need to wrap the :attr:`~reframe.core.pipeline.RegressionTest.reference` tuple with the :func:`~reframe.core.builtins.xfail` builtin.
Here is an example:

.. code-block:: python

reference = {
'tutorialsys': {
'copy_bw': xfail('demo fail', (100_000, -0.1, 0.1, 'MB/s')),
'triad_bw': (30_000, -0.1, 0.1, 'MB/s')
}
}

For demonstration purposes, we have increased significantly the ``copy_bw`` reference so as to cause the test to fail and mark this as an expected failure.
Note that the :func:`~reframe.core.builtins.xfail` builtin is *different* from the :func:`@xfail <reframe.core.decorators.xfail>` decorator.
Although the first argument is always the message to be printed, the builtin takes the reference tuple as a second argument, whereas the decorator takes optionally a conditional.

.. note::

For testing out this example, you need to set the :attr:`reference` in the ``stream_run_only.py`` test based on your system's performance.

Since a test might define multiple performance variables, some of which may be marked as expected failures, the overall final state of the test has to be determined in a more complex way.
Assuming that the notation ``A>B`` means that ``A`` takes precendence over ``B``, the types of performance failures use the following hierarchy:

.. code-block:: console

FAIL > XPASS > XFAIL > PASS

In other words, if at least one performance variable fails, the test is a ``FAIL``.
If none of the performance variables fails, but at least one passes unexpectedly, the test is an ``XPASS``.
If none of the performance variables fails nor passes unexpectedly and all the expected failures fail, then the test is an ``XFAIL``.
In all other cases the test is a ``PASS``.

In cases of failures or unexpected passes, the status of every performance will be printed in the ``FAILURE INFO``.
Also, if colors are enabled (see :option:`--nocolor`), then each variable's status will be color coded in the ``P:`` lines printed just after the test finishes.
Here is an example combined failure (``XPASS`` and ``FAIL``):


.. code-block:: console

* Reason: performance error:
reference(s) met unexpectedly: copy_bw=40299.1 MB/s, expected 40000 (l=36000.0, u=44000.0)
failed to meet reference(s): triad_bw=30561.6 MB/s, expected 100000 (l=90000.0, u=110000.00000000001)

You can try different combinations of :func:`~reframe.core.builtins.xfail` markings and reference values to explore the behavior.


Managing the run session
========================

Expand Down
26 changes: 26 additions & 0 deletions examples/tutorial/stream/stream_runonly_xfail.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Copyright 2016-2025 Swiss National Supercomputing Centre (CSCS/ETH Zurich)
# ReFrame Project Developers. See the top-level LICENSE file for details.
#
# SPDX-License-Identifier: BSD-3-Clause
import reframe as rfm
import reframe.utility.sanity as sn


@rfm.simple_test
@rfm.xfail('demo failure')
class stream_test(rfm.RunOnlyRegressionTest):
valid_systems = ['*']
valid_prog_environs = ['*']
executable = 'stream.x'

@sanity_function
def validate(self):
return sn.assert_found(r'Slution Validates', self.stdout)

@performance_function('MB/s')
def copy_bw(self):
return sn.extractsingle(r'Copy:\s+(\S+)', self.stdout, 1, float)

@performance_function('MB/s')
def triad_bw(self):
return sn.extractsingle(r'Triad:\s+(\S+)', self.stdout, 1, float)
30 changes: 30 additions & 0 deletions examples/tutorial/stream/stream_runonly_xfail_cond.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Copyright 2016-2025 Swiss National Supercomputing Centre (CSCS/ETH Zurich)
# ReFrame Project Developers. See the top-level LICENSE file for details.
#
# SPDX-License-Identifier: BSD-3-Clause
import reframe as rfm
import reframe.utility.sanity as sn


@rfm.simple_test
@rfm.xfail('demo failure', lambda test: test.x <= 2)
class stream_test(rfm.RunOnlyRegressionTest):
valid_systems = ['*']
valid_prog_environs = ['*']
executable = 'stream.x'
x = variable(int, value=1)

@sanity_function
def validate(self):
if self.x < 2 or self.x > 2:
return sn.assert_found(r'Slution Validates', self.stdout)
elif self.x == 2:
return sn.assert_found(r'Solution Validates', self.stdout)

@performance_function('MB/s')
def copy_bw(self):
return sn.extractsingle(r'Copy:\s+(\S+)', self.stdout, 1, float)

@performance_function('MB/s')
def triad_bw(self):
return sn.extractsingle(r'Triad:\s+(\S+)', self.stdout, 1, float)
19 changes: 18 additions & 1 deletion reframe/core/builtins.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
#

import functools
from collections import namedtuple

import reframe.core.parameters as parameters
import reframe.core.variables as variables
import reframe.core.fixtures as fixtures
Expand All @@ -19,7 +21,7 @@
__all__ = ['deferrable', 'deprecate', 'final', 'fixture', 'loggable',
'loggable_as', 'parameter', 'performance_function', 'required',
'require_deps', 'run_before', 'run_after', 'sanity_function',
'variable']
'variable', 'xfail']

parameter = parameters.TestParam
variable = variables.TestVar
Expand Down Expand Up @@ -221,3 +223,18 @@ def _loggable(fn):

loggable = loggable_as(None)
loggable.__doc__ = '''Equivalent to :func:`loggable_as(None) <loggable_as>`.'''

_XFailReference = namedtuple('XFailReference', ['message', 'data'])


def xfail(message, reference):
'''Mark a test :attr:`~reframe.core.pipeline.RegressionTest.reference` as
an expected failure.

:arg message: The message to issue when this expected failure is
encountered.
:arg reference: The original reference tuple.

.. versionadded:: 4.9
'''
return _XFailReference(message=message, data=reference)
45 changes: 44 additions & 1 deletion reframe/core/decorators.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
# Decorators used for the definition of tests
#

__all__ = ['simple_test']
__all__ = ['simple_test', 'xfail']

import collections
import inspect
Expand Down Expand Up @@ -202,3 +202,46 @@
_register_test(cls, variant_num=n)

return cls


def xfail(message, predicate=None):
'''Mark a test as an expected failure in sanity checking.

If you want mark an expected performance failure, take a look at the
:func:`~reframe.core.builtins.xfail` builtin.

If a marked test passes sanity checking, then it will be marked as a
failure.

:arg message: the message to be printed when this test fails expectedly.
:arg predicate: A callable taking the test instance as its sole argument
and returning :obj:`True` or :obj:`False`. If it returns :obj:`True`,
then the test is marked as an expected failure, otherwise not. For
example, the following test will only be marked as an expected failure
only if variable ``x`` equals 1.

.. code-block:: python

@rfm.xfail('bug 123', lambda test: return test.x == 1)
@rfm.simple_test
class MyTest(...):
x = variable(int, value=0)

If ``predicate=None`` then the test is marked as an expected failure
unconditionally. It is equivalent to ``predicate=lambda _: True``.

.. versionadded:: 4.9
'''
def _default_predicate(_):
return True

Check warning on line 236 in reframe/core/decorators.py

View check run for this annotation

Codecov / codecov/patch

reframe/core/decorators.py#L235-L236

Added lines #L235 - L236 were not covered by tests

predicate = predicate or _default_predicate

Check warning on line 238 in reframe/core/decorators.py

View check run for this annotation

Codecov / codecov/patch

reframe/core/decorators.py#L238

Added line #L238 was not covered by tests

def _xfail_fn(obj):
return predicate(obj), message

Check warning on line 241 in reframe/core/decorators.py

View check run for this annotation

Codecov / codecov/patch

reframe/core/decorators.py#L240-L241

Added lines #L240 - L241 were not covered by tests

def _deco(cls):
cls.__rfm_xfail_sanity__ = _xfail_fn
return cls

Check warning on line 245 in reframe/core/decorators.py

View check run for this annotation

Codecov / codecov/patch

reframe/core/decorators.py#L243-L245

Added lines #L243 - L245 were not covered by tests

return _deco

Check warning on line 247 in reframe/core/decorators.py

View check run for this annotation

Codecov / codecov/patch

reframe/core/decorators.py#L247

Added line #L247 was not covered by tests
8 changes: 8 additions & 0 deletions reframe/core/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,14 @@ class SkipTestError(ReframeError):
'''Raised when a test needs to be skipped.'''


class ExpectedFailureError(ReframeError):
'''Raised when a test failure is expected'''


class UnexpectedSuccessError(ReframeError):
'''Raised when a test unexpectedly passes'''


def user_frame(exc_type, exc_value, tb):
'''Return a user frame from the exception's traceback.

Expand Down
Loading
Loading