Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-processing w/ s3 logs, unless you use spawn to launch your subprocesses #840

Open
max-kaufmann opened this issue Nov 13, 2024 · 0 comments

Comments

@max-kaufmann
Copy link
Contributor

max-kaufmann commented Nov 13, 2024

We use s3fs to access s3, which is apparently not safe for multi-processing. i.e. I have data loading code which calls read_eval_log. Using a ProcessPoolExecutor (but not a ThreadPoolExecutor!) hangs when using read_eval_log on an s3 bucket:

with ProcessPoolExecutor(max_workers=num_workers) as executor:
    futures = {executor.submit(read_eval_log, eval_log): eval_log for eval_log in eval_logs}

I opened an issue in their repo, and they replied saying this was a known limitation, and suggested to use spawn (new python interpret state) when doing multi-processing, i.e.:

with ProcessPoolExecutor(max_workers=num_workers,mp_context=get_context("spawn")) as executor:
    futures = {executor.submit(process_eval_log_single, eval_log): eval_log for eval_log in eval_logs}

This works, but we should have a more robust solution to multi-processing

@max-kaufmann max-kaufmann changed the title The library we use to access s3 isn't thread-safe Multi-processing with s3 is broken, unless you use spawn() Nov 13, 2024
@max-kaufmann max-kaufmann changed the title Multi-processing with s3 is broken, unless you use spawn() Multi-processing while reading s3 logs hangs, unless you use spawn to launch your subprocesses Nov 13, 2024
@max-kaufmann max-kaufmann changed the title Multi-processing while reading s3 logs hangs, unless you use spawn to launch your subprocesses Multi-processing w/ s3 logs, unless you use spawn to launch your subprocesses Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant