-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
MacOS state_dict tests in CI are failing during shutdown #1256
Comments
I've been trying to isolate the problem here on this branch #1255. I'm unable to repro on my mac laptop, so i'm just trying to bisect it by kicking off so far it's definitely due to test_state_dict.py. The best sign I get is sometimes the Digging in to the docs and code, it looks like on MacOS, the default sharing strategy is file_system (instead of file_descriptor) which will launch torch_shm_manager process in the background. It gets launched here, but the PID is never held on to, and there is no obvious clean up code that gets called here. |
It seems like on MacOS, multiprocessing fork is more like a spawn and requires importing all the modules again. Something about increasing the total number of worker subprocesses in the test causes massive slowdowns in clean up. The simplest thing to do at this point is to shard the tests. I'll probably give this a shot tomorrow |
馃悰 Describe the bug
MacOS tests of StatefulDataLoader CI action fail intermittently during shutdown. on Mac it also takes a lot longer than both windows and ubuntu to shut down (10 minutes vs 1s). I'm not sure what causes Github Actions to mark the test as failed, but
Created an issue here on actions/setup-python but still no response: actions/setup-python#857
Although we still get positive signals from the test, it shows up as an X in Github
Versions
Nightly / main branch in CI,
The text was updated successfully, but these errors were encountered: