You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use s3fs to access s3, which is apparently not safe for multi-processing. i.e. I have data loading code which calls read_eval_log. Using a ProcessPoolExecutor (but not a ThreadPoolExecutor!) hangs when using read_eval_log on an s3 bucket:
I opened an issue in their repo, and they replied saying this was a known limitation, and suggested to use spawn (new python interpret state) when doing multi-processing, i.e.:
This works, but we should have a more robust solution to multi-processing
The text was updated successfully, but these errors were encountered:
max-kaufmann
changed the title
The library we use to access s3 isn't thread-safe
Multi-processing with s3 is broken, unless you use spawn()
Nov 13, 2024
max-kaufmann
changed the title
Multi-processing with s3 is broken, unless you use spawn()
Multi-processing while reading s3 logs hangs, unless you use spawn to launch your subprocesses
Nov 13, 2024
max-kaufmann
changed the title
Multi-processing while reading s3 logs hangs, unless you use spawn to launch your subprocesses
Multi-processing w/ s3 logs, unless you use spawn to launch your subprocesses
Nov 13, 2024
We use s3fs to access s3, which is apparently not safe for multi-processing. i.e. I have data loading code which calls read_eval_log. Using a ProcessPoolExecutor (but not a ThreadPoolExecutor!) hangs when using read_eval_log on an s3 bucket:
I opened an issue in their repo, and they replied saying this was a known limitation, and suggested to use spawn (new python interpret state) when doing multi-processing, i.e.:
This works, but we should have a more robust solution to multi-processing
The text was updated successfully, but these errors were encountered: