Tests are very slow and a lot of time is spent in the wait4() system call #733

brgl · 2023-06-02T08:22:38Z

Describe the bug
The libgpiod project uses bats for its command-line tools tests. The tests use linux kernel's gpio-sim module for mocking up GPIO chips and in general cover the same functionality in shell tools as the GLib tests covering the C library.

While it's expected for the C tests to run much faster than bash ones, the bats test-suite is painfully slow - it takes 26 seconds on a regular work laptop to complete vs 0,28 seconds for the C tests on the same machine.

Upon investigation it turned out that the majority of the time in all sub-processes is spent waiting for child processes to exit as evidenced by the following output of strace -c -f:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 91.48   23.231926        1598     14537      6173 wait4
  2.30    0.584469           0    614218           rt_sigprocmask
  1.46    0.370631           2    158077         8 read
  1.21    0.308347           3     97831     80115 readlink
  0.64    0.162784          20      8059           clone
  0.51    0.129295           1     75338      7495 newfstatat
  0.41    0.104257          20      5081       498 execve
  0.24    0.061996           1     42027           mmap
  0.24    0.059774           1     51657     37681 ioctl
  0.20    0.050864           0    139673           rt_sigaction
  0.18    0.046581           0     57523      3246 close
  0.18    0.045685           3     13972           write
  0.10    0.025445           1     15038           mprotect
  0.10    0.025105           1     19771      6448 lseek
  0.10    0.025086           1     16949         9 openat
  0.09    0.022840           0     25950       657 fcntl
  0.06    0.016108           0     16903           brk
  0.04    0.010011           0     14011           dup2
  0.04    0.009773           2      4685           munmap
[...]

This doesn't seem normal and I'd like to request some help in figuring out what's wrong. I'm posting it as a bats issue but it may very well be an issue in how I'm using the framework.

To Reproduce
Steps to reproduce the behavior:

Clone current master of the libgpiod repo
run ./autogen.sh --enable-tests --enable-tools --enable-gpioset-interactive && make
run sudo ./tools/gpio-tools-test and observe slow execution.

Please note that your kernel needs to be built with the gpio-sim module enabled for the tests to work.

Alternatively one can use yocto to build a qemu vm allowing to run these tests without recompiling the kernel - if anyone is interested, I can prepare a repo that would automate the build.

Expected behavior
Tests execute much faster and don't spend ~92% of time in wait4().

Environment (please complete the following information):
Tested on two separate environments with same results - one was a regular Ubuntu 23.04 system, the other was a custom Yocto based distro.

Bats version: 1.8.2 and 1.9.0 respectively.
operating system (including version): Ubuntu 20.04 and Yocto based on current poky master branch.
bash --version: 5.2.15
Install method: distribution package and bitbake recipe respectively

The text was updated successfully, but these errors were encountered:

martin-schulze-vireso · 2023-06-02T16:06:20Z

I'd be interested in the yocto build for qemu to reproduce this.

brgl · 2023-06-05T14:44:23Z

@martin-schulze-vireso Here you go

Let me know if anything is missing from the README

brgl · 2023-06-15T09:49:53Z

I think this may just be the way bats works - running the bats' own test-suite results in a very similar strace output:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 94,78   87,937096        2697     32603     14298 wait4
  1,20    1,117514           0   1695481           rt_sigprocmask
  1,09    1,008525           0   1045150        27 read
  0,66    0,608701          36     16741           clone
  0,48    0,441647          11     37725     23464 execve
  0,21    0,199233           0    274617    107718 newfstatat
  0,20    0,182347           1     98627      4962 openat
  0,18    0,165625           0    169950           mmap
  0,15    0,134789           0    184782    162361 ioctl
  0,14    0,129233           3     40735           write
  0,12    0,113985           0    198739      4196 close
  0,10    0,095850           0    139639      3159 fcntl
  0,09    0,086048           0    312173           rt_sigaction
  0,08    0,070505           0     95010     34204 lseek
  0,06    0,053366           0     56400           dup2
  0,06    0,052509           1     51470           mprotect
  0,05    0,045429           7      5704      3457 mkdir

martin-schulze-vireso · 2023-06-15T10:30:48Z

That may well be but maybe we can improve Bats runtime...

brgl added Priority: NeedsTriage Issue has not been vetted yet Type: Bug labels Jun 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests are very slow and a lot of time is spent in the wait4() system call #733

Tests are very slow and a lot of time is spent in the wait4() system call #733

brgl commented Jun 2, 2023

martin-schulze-vireso commented Jun 2, 2023

brgl commented Jun 5, 2023

brgl commented Jun 15, 2023

martin-schulze-vireso commented Jun 15, 2023

Tests are very slow and a lot of time is spent in the wait4() system call #733

Tests are very slow and a lot of time is spent in the wait4() system call #733

Comments

brgl commented Jun 2, 2023

martin-schulze-vireso commented Jun 2, 2023

brgl commented Jun 5, 2023

brgl commented Jun 15, 2023

martin-schulze-vireso commented Jun 15, 2023