Skip to content

Support chained attributes in thread unsafe detection #79

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lysnikolaou
Copy link
Member

  • Correctly detect calls to thread-unsafe functions when the function call comes from a chained attribute with length three or more.
  • Do the same for walking down functions in search for assignments to __thread_unsafe__.
  • Do a minor refactor to make code easier to reason about.

This is the first of a series of PRs aiming to make thread-unsafe detection a bit more stable. The final purpose is to be able to support #73 (disallow stuff at the module level), but going over a few improvements/fixes first, so that that becomes easier to implement.

- Correctly detect calls to thread-unsafe functions when the function
  call comes from a chained attribute with length three or more.
- Do the same for walking down functions in search for assignments
  to `__thread_unsafe__`.
- Do a minor refactor to make code easier to reason about.
@ngoldbaum
Copy link
Contributor

Good catch!

Did you check whether this impacts the time to collect tests in scipy or scikit-learn? Just to double-check that we don't repeat the slowdown we caused in the 0.4 release. I think this is still using the memoization I added to fix that issue so I'd be surprised if the performance impact is significant.

It may also be worth doing a before/after on both scipy and scikit-learn to see if this is marking any tests as thread-unsafe that were not marked before.

@rgommers
Copy link
Member

when the function call comes from a chained attribute with length three or more.

Limiting it to length <=2 was definitely on purpose, because arbitrary depth ast walking is prohibitively expensive.

@lysnikolaou
Copy link
Member Author

when the function call comes from a chained attribute with length three or more.

Limiting it to length <=2 was definitely on purpose, because arbitrary depth ast walking is prohibitively expensive.

I thought that was about the number of functions we analyze when searching for __thread_safe__ = ... assignments. That stays this way.

This is about things like a.b.c() etc. I don't expect people to chain more than a few attributes like this, so the performance cost should be negligible. Having said that, I'll do a check to verify that like Nathan suggested above.

@rgommers
Copy link
Member

I don't expect people to chain more than a few attributes like this, so the performance cost should be negligible

Ah okay, thanks for the explanation. If you can treat this separately and it doesn't impact collection time, then that sounds good.

@lysnikolaou
Copy link
Member Author

So here's the results of testing this on scipy and scikit-learn.

SciPy:

Collection time Collected items to run in parallel
0.4.4 32.10s 53047
HEAD 33.76s 53047

scikit-learn:

Collection time Collected items to run in parallel
0.4.4 11.19s 23103
HEAD 11.26s 23103

My feeling is this is an okay trade-off. It also makes sense that no new tests were marked as thread-unsafe, because:

  • The only modules we're blocklisting that could be used in chained attributes are in _pytest, so not so widely used.
  • SciPy only uses __thread_safe__ once and that function is always called in a normal ast.Name node. scikit-learn does not use __thread_safe__ at all.

@ngoldbaum
Copy link
Contributor

Sorry for the pause here. Given Ralf indicated the changes are OK for SciPy, let's pull this in.

@ngoldbaum ngoldbaum merged commit c92280f into Quansight-Labs:main Jun 23, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants