BUG: line events from non-profiled functions misattributed to profiled functions sharing bytecodes

Short description
----
If a function not passed to the profiler compiles down to the same bytecode as another function that is, line events of the former may be misattributed to the latter.


Example
----

<details>
<summary>`dupe_test/dupe_test_module.py`</summary>

```python
def func():
    print(0)


def main():
    func()
    print(1)
    print(2)
```

</details>

<details>
<summary>`dupe-test-script.py`</summary>

```python
from dupe_test_module import func


def main():
    func()
    print(1)
    print(2)


if __name__ == '__main__':
    main()
```

</details>

<details>
<summary>Profiling (normal)</summary>

```shell
$ PYTHONPATH="${PYTHONPATH}:${PWD}/dupe_test" kernprof -l --outfile=out --prof-mod=dupe_test_module --prof-mod=dupe-test-script.py -vz dupe-test-script.py
0
1
2
Wrote profile results to 'out'
Timer unit: 1e-06 s

Total time: 1e-05 s
File: [...]/dupe_test_module.py
Function: func at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def func():
     2         1         10.0     10.0    100.0      print(0)

Total time: 1.4e-05 s
File: dupe-test-script.py
Function: main at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           def main():
     5         1         11.0     11.0     78.6      func()
     6         1          2.0      2.0     14.3      print(1)
     7         1          1.0      1.0      7.1      print(2)

```

</details>

<details>
<summary>Profiling (bugged)</summary>

```shell
$ PYTHONPATH="${PYTHONPATH}:${PWD}/dupe_test" kernprof -l --outfile=out --prof-mod=dupe_test_module -vz dupe-test-script.py
0
1
2
Wrote profile results to 'out'
Timer unit: 1e-06 s

Total time: 1.2e-05 s
File: [...]/dupe_test_module.py
Function: func at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def func():
     2         1         12.0     12.0    100.0      print(0)

Total time: 1.6e-05 s
File: [...]/dupe_test_module.py
Function: main at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5         1         13.0     13.0     81.2  def main():
     6         1          2.0      2.0     12.5      func()
     7         1          1.0      1.0      6.2      print(1)
     8                                               print(2)

```

</details>

Note how for the first invocation (1) `dupe_test/dupe_test_module.py::main()` has never actually been called, but has lines attributed thereto due to its bytecode twin `dupe-test-script.py::main()`, and (2) the line-event attributions seem to be offset by one (since the two function definitions are one line apart).

Analysis and fixes
----

`line_profiler/_line_profiler.pyx::LineProfiler.add_function()` pads the `.co_code` of code objects sharing the same bytecode with (copies of) the no-op bytecode, in order to quickly distinguish between dupes. Unfortunately, this only happens from the second code object *that the profiler has seen* onwards.

Hence, if during the Python runtime multiple functions compiling down to the same bytecode exist, but only one of them is passed to the profiler, the profiled function's `.co_code` is left untouched. When any of them is called,  `line_profiler/_line_profiler.pyx::inner_trace_callback()` would then hash all of them to the same block hash, which may result in stray lines in the non-profiled functions hashing to the same line hashes as other lines in the profiled function, depending on their physical line numbers in their respective source files.

(In the second `kernprof` invocation, both of the functions sharing the same implementation (and thus bytecode) were passed to the profiler. Hence, one of them went through bytecode de-duplication and the two are thus not conflated.)

The naïve way to fix this would be to include more metadata (e.g. `.co_name`, (`.co_qualname`), `.co_filename`, and `.co_firstlineno`) in the block hash, or just hash the entire code object, but that is obviously a bad idea since the hashing happens for *every single line event* the interpreter reports. A more reasonable way would be to start the padding on the first "duplicate", i.e. to pad *all* code objects the profiler ever sees. This would probably incur a slight (if any) bookkeeping overhead in `add_function()`, but not during code execution.

Note
----
As described in [the discussions](https://github.com/pyutils/line_profiler/issues/422#issuecomment-4017146142) for #422; made a separate issue for documentation purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: line events from non-profiled functions misattributed to profiled functions sharing bytecodes #424

Short description

Example

Analysis and fixes

Note

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BUG: line events from non-profiled functions misattributed to profiled functions sharing bytecodes #424

Description

Short description

Example

Analysis and fixes

Note

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions