-
-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed
Description
teip degrades the performance rapidly if is has large number of small chunks.
$ yes | tr -d \\n | fold -w 1024 | TEIP_HIGHLIGHT="<{}>" teip -og . | head -n 1
<y><y><y><y><y><y><y><y><y><y><y><y><y><y><y><y><y><y><y><y><y><y><y><y><y><y><y>...
$ yes | tr -d \\n | fold -w 1024 | TEIP_HIGHLIGHT="<{}>" teip -og . | pv >/dev/null
.0MiB 0:00:05 [3.59MiB/s] [ <=> ]
$ yes | tr -d \\n | fold -w 1024 | TEIP_HIGHLIGHT="<{}>" teip -og '.{64}' | pv >/dev/null
30MiB 0:00:04 [31.7MiB/s] [ <=> ]
I believe that this performance degration is caused by the large mount of inter-thread communication.
Actually, futex occupies large ratio of execution time.
$ yes | tr -d \\n | fold -w 1024 | head -n 1024 | TEIP_HIGHLIGHT="<{}>" strace -cf teip -og .
︙
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
92.52 0.582339 7 85927 34598 futex
5.85 0.036801 36 1026 write
0.67 0.004226 9 485 sched_yield
0.49 0.003090 24 130 read
0.35 0.002197 9 248 brk
0.06 0.000409 102 4 mmap
0.01 0.000083 10 8 rt_sigprocmask
0.01 0.000055 11 5 rt_sigaction
0.01 0.000047 8 6 sigaltstack
0.01 0.000039 13 3 munmap
0.00 0.000028 9 3 mprotect
0.00 0.000020 20 1 clone
0.00 0.000016 16 1 poll
0.00 0.000014 14 1 getrandom
0.00 0.000013 13 1 ioctl
0.00 0.000013 13 1 arch_prctl
0.00 0.000010 10 1 set_tid_address
0.00 0.000000 0 1 execve
------ ----------- ----------- --------- --------- ----------------
100.00 0.629400 87852 34598 total
To mitigate the issue, it is good idea to reduce the number of futex calls.
Currently, the PipeIntercepter sends single chunk by tx.send for each time.
https://github.com/greymd/teip/blob/v2.2.0/src/pipeintercepter.rs#L23-L53
https://github.com/greymd/teip/blob/v2.2.0/src/pipeintercepter.rs#L235
This part can be improved more.
If PipeIntercepter can bufferes chunks and sends it to other thread,
number of futex will be reduced and performance improves rapidly under such the particular situation likea above.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed