-
Notifications
You must be signed in to change notification settings - Fork 133
BUG: line events from non-profiled functions misattributed to profiled functions sharing bytecodes #424
Description
Short description
If a function not passed to the profiler compiles down to the same bytecode as another function that is, line events of the former may be misattributed to the latter.
Example
`dupe_test/dupe_test_module.py`
def func():
print(0)
def main():
func()
print(1)
print(2)`dupe-test-script.py`
from dupe_test_module import func
def main():
func()
print(1)
print(2)
if __name__ == '__main__':
main()Profiling (normal)
$ PYTHONPATH="${PYTHONPATH}:${PWD}/dupe_test" kernprof -l --outfile=out --prof-mod=dupe_test_module --prof-mod=dupe-test-script.py -vz dupe-test-script.py
0
1
2
Wrote profile results to 'out'
Timer unit: 1e-06 s
Total time: 1e-05 s
File: [...]/dupe_test_module.py
Function: func at line 1
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 def func():
2 1 10.0 10.0 100.0 print(0)
Total time: 1.4e-05 s
File: dupe-test-script.py
Function: main at line 4
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 def main():
5 1 11.0 11.0 78.6 func()
6 1 2.0 2.0 14.3 print(1)
7 1 1.0 1.0 7.1 print(2)
Profiling (bugged)
$ PYTHONPATH="${PYTHONPATH}:${PWD}/dupe_test" kernprof -l --outfile=out --prof-mod=dupe_test_module -vz dupe-test-script.py
0
1
2
Wrote profile results to 'out'
Timer unit: 1e-06 s
Total time: 1.2e-05 s
File: [...]/dupe_test_module.py
Function: func at line 1
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 def func():
2 1 12.0 12.0 100.0 print(0)
Total time: 1.6e-05 s
File: [...]/dupe_test_module.py
Function: main at line 5
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 1 13.0 13.0 81.2 def main():
6 1 2.0 2.0 12.5 func()
7 1 1.0 1.0 6.2 print(1)
8 print(2)
Note how for the first invocation (1) dupe_test/dupe_test_module.py::main() has never actually been called, but has lines attributed thereto due to its bytecode twin dupe-test-script.py::main(), and (2) the line-event attributions seem to be offset by one (since the two function definitions are one line apart).
Analysis and fixes
line_profiler/_line_profiler.pyx::LineProfiler.add_function() pads the .co_code of code objects sharing the same bytecode with (copies of) the no-op bytecode, in order to quickly distinguish between dupes. Unfortunately, this only happens from the second code object that the profiler has seen onwards.
Hence, if during the Python runtime multiple functions compiling down to the same bytecode exist, but only one of them is passed to the profiler, the profiled function's .co_code is left untouched. When any of them is called, line_profiler/_line_profiler.pyx::inner_trace_callback() would then hash all of them to the same block hash, which may result in stray lines in the non-profiled functions hashing to the same line hashes as other lines in the profiled function, depending on their physical line numbers in their respective source files.
(In the second kernprof invocation, both of the functions sharing the same implementation (and thus bytecode) were passed to the profiler. Hence, one of them went through bytecode de-duplication and the two are thus not conflated.)
The naïve way to fix this would be to include more metadata (e.g. .co_name, (.co_qualname), .co_filename, and .co_firstlineno) in the block hash, or just hash the entire code object, but that is obviously a bad idea since the hashing happens for every single line event the interpreter reports. A more reasonable way would be to start the padding on the first "duplicate", i.e. to pad all code objects the profiler ever sees. This would probably incur a slight (if any) bookkeeping overhead in add_function(), but not during code execution.
Note
As described in the discussions for #422; made a separate issue for documentation purposes.