Skip to content

Improvements to thread safety detection and implement disallowing whole modules #82

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

lysnikolaou
Copy link
Member

@lysnikolaou lysnikolaou commented Jun 25, 2025

This includes a few things:

  • General improvements to thread safety detection like:
    • Call generic_visit in all paths and bail out early when already thread unsafe
    • Do not recurse when target is not a function
    • Fix message printed in error report when unsafety is detected because of __thread_safe__
  • Add ability to blocklist whole modules and do that for ctypes and unittest.mock

It's easier to review this commit by commit.

Also, here's the test results for scipy and scikit-learn when it comes to collection performance:

SciPy:

Collection time Collected items to run in parallel
main 33.92s 53101
HEAD 38.19s 53089

scikit-learn:

Collection time Collected items to run in parallel
main 11.40s 23049
HEAD 10.46s 22833

EDIT: Changed number of tests discovered to run in parallel under HEAD in scikit-learn.

@ngoldbaum
Copy link
Contributor

Can you update the readme based on the new functionality?

I'll try to take a look at the code ASAP.

@ngoldbaum
Copy link
Contributor

Do you have any idea why there are so many more scikit-learn tests getting collected? What's special about the 12 tests in SciPy that don't get collected anymore?

@ngoldbaum
Copy link
Contributor

All the code changes look reasonable to me and the new tests look like they're catching tricky corner cases along with just detecting that a module is imported.

Maybe add a test to make sure that tests still run in parallel if they don't use any functionality from a blacklisted module?

@lysnikolaou
Copy link
Member Author

Can you update the readme based on the new functionality?

README updated.

Do you have any idea why there are so many more scikit-learn tests getting collected? What's special about the 12 tests in SciPy that don't get collected anymore?

The scikit-learn number was due to a bug after all. The inspect.isfunction returns False for methods so that thread safety inspection was stopping there.

What's special about the 12 tests in SciPy that don't get collected anymore?

The SciPy and scikit-learn that now get skipped were false positives. The ones I had a look at had calls to functions that were importing a different way than what we had blocklisted (e.g. from unittest.mock import patch) and/or tricky edge cases with methods that weren't analyzed before.

@ngoldbaum
Copy link
Contributor

Thanks for looking at that! Making this stuff more robust will pay ecosystem-wide dividends.

Does adding the test I suggested in my other comment make sense?

@lysnikolaou
Copy link
Member Author

Maybe add a test to make sure that tests still run in parallel if they don't use any functionality from a blacklisted module?

Isn't this covered from all of the other tests (in files other than tests/test_thread_unsafe_detection.py) that do not exercise this path?

@ngoldbaum
Copy link
Contributor

Isn't this covered from all of the other tests (in files other than tests/test_thread_unsafe_detection.py) that do not exercise this path?

I don't think so. Or at least it's not obvious to me.

I think you have to add a new test to the test file created by test_thread_unsafe_ctypes that asserts num_parallel_threads is > 1 and doesn't actually use the ctypes module.

@lysnikolaou
Copy link
Member Author

I think you have to add a new test to the test file created by test_thread_unsafe_ctypes that asserts num_parallel_threads is > 1 and doesn't actually use the ctypes module.

Done!

@ngoldbaum
Copy link
Contributor

I spent some time profiling this and I didn't see any obvious way to speed things up - now that the visitor is doing more stuff, it takes more time. We also spend a decent amount of time in ast.parse and inspect.getsource - if there were faster alternatives that would also help, but I don't think there are any right now.

@ngoldbaum
Copy link
Contributor

@rgommers what do you think about the marginally increased collection time for SciPy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants