-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set CPU threads according to the machine #965
Set CPU threads according to the machine #965
Conversation
faster_whisper/transcribe.py
Outdated
@@ -587,7 +588,7 @@ def __init__( | |||
device: str = "auto", | |||
device_index: Union[int, List[int]] = 0, | |||
compute_type: str = "default", | |||
cpu_threads: int = 16, | |||
cpu_threads: int = multiprocessing.cpu_count() // 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not be correct on Intel processors that use E and P cores, for example intel i7 12700k has 12 cores and 20 threads, wouldn't it be better to set it to 0 to use the system default value? as even if cpu_count//2
is correct, this is the number of threads that will be used per process, and if num_workers
was set to 2 or higher, each worker is going to request the total number of threads to itself which will result in a lot of context switching.
Refer to intra_thread
history in CT2 Changelog and Performance Tips for more information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not be correct on Intel processors that use E and P cores, for example intel i7 12700k has 12 cores and 20 threads, wouldn't it be better to set it to 0 to use the system default value? as even if
cpu_count//2
is correct, this is the number of threads that will be used per process, and ifnum_workers
was set to 2 or higher, each worker is going to request the total number of threads to itself which will result in a lot of context switching.Refer to
intra_thread
history in CT2 Changelog and Performance Tips for more information
To account for this you could take function in my last message and divide whatever number you want to use by the number of workers, could you not...thus always leaving four threads/logical cores for a user's background tasks on their computer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather use the CT2 default value which will suit most users, power users always have the option to tune this value to their needs, as you mentioned, each use case will have a best value, there is no one value fits all here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather use the CT2 default value which will suit most users, power users always have the option to tune this value to their needs, as you mentioned, each use case will have a best value, there is no one value fits all here
True, impossible to plan for all use cases dynamically...At least @ozancaglayan has some cool functionality if he wants to experiment with it in his scripts - i.e. a dynamic way to set "cpu_threads" based on p/e cores, logical cores or what have you! Just determine the cpu_threads in your own script and then pass it to faster-whisper
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverting to the original '0' seems best for widest compatibility
I just created a perfect function for this hold on and I'll post it.. |
Here it is - NOTE, this is for Intel cpus and would need to be modified to work with apple/amd:
EXPLANATION:
Apple/AMD
Conclusion:I've always wondered why the default cpu_threads was 4...likely implemented awhile ago before the increasing core counts. Regardless, if you want to set it dynamically to maximize the cpu-speed of |
Yes, its true that the
So in short:
https://stackoverflow.com/a/55423170/821797 I don't have any preferences tbh. Just trying to contribute. I'm not even using CPUs for Whisper. Thanks |
Personally, I'd like to like for the 4 number to be updated for faster-whisper, even if it deviates from what ctranslate2 sets as the default. This would reflect the reality that cpu thread counts have increased. With that being said, I understand that others like @MahmoudAshraf97 and others have more clout than me since they actually contribute to the code base while I simply stand on the sidelines reaping the benefits...but if it were my repo here's how I'd set it:
I don't see the harm in this. 4 threads is clearly outdated...maybe bump it to 6...8 even? That's all I'll say. Will let the big boys who actually contribute code to the library decide. ;-) |
We are talking two different directions here, setting it to 0 to use CT2 default value might not result in the best performance but it will be the most compatible option for most users, setting it to use the max number of threads may (or may not, needs to be tested on multiple configurations) result in the best performance, but it also has an implicit assumption that FW is the highest priority process running which might not be the case, in such case, I'd opt for the safest option which is setting it to 0, in all cases this is not a hardcoded parameters and users should tune it to their liking |
8ad11ae
to
37a0365
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after a further investigation, I found that after using 4 or more cores the returns are negligible, so I'd prefer to stick to the original value until a better solution is implemented in CT2
Co-authored-by: Mahmoud Ashraf <[email protected]>
just to clarify - in 1.0.3 before the 'big pr' cpu_threads default was 0 not 4, and as ozancaglayan noted it ends up being 4 if OMP_NUM_THREADS isn't set. Haven't looked at the code but perhaps better as 0 for back compatibility?
|
@ooobo You are correct, my bad for not confirming this |
Fixes suboptimal performance on CPUs with less than 16 cores by setting the number of CPU threads to the actual number of physical cores dynamically.
Fixes #917