Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update LAION-example and default value of scheduled_tasks #1092

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

SvenDS9
Copy link
Contributor

@SvenDS9 SvenDS9 commented Mar 16, 2023

Changes

  • Update LAION-example to use threadpool_map
  • Change default value of scheduled_tasks to 500

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 16, 2023
@SvenDS9
Copy link
Contributor Author

SvenDS9 commented Mar 16, 2023

128 was still to low - at least with a timeout of 5 seconds. With this new value I am mostly limited by my internet connection.

I have also compared the speed to async_map. With the correct values (high batch_size e.g. 800 and high enough max_concurrency e.g. 64) I get similar performance - also limited by my internet connection.

examples/vision/laion5b.py Outdated Show resolved Hide resolved
@@ -695,7 +696,7 @@ def __init__(
fn: Callable,
input_col=None,
output_col=None,
scheduled_tasks: int = 128,
scheduled_tasks: int = 500,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, this highly depends on the size of request and response.
If we are loading video files, such a high number might not provide the best perf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm scheduled_tasks negatively affects performance if it is too low, yes. Higher values will increase memory usage (e.g. if the current video takes a lot longer to download than the next few which are then prefetched and stored in memory) but in this case one is limited by internet connection speed anyway. Why should it negatively affect performance (throughput) if it's too high?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants