Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Threads.@threads support #26

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

cuihantao
Copy link

@cuihantao cuihantao commented Jun 12, 2020

Hello,

This PR contains some API changes to enable threading. All in-place assignments to mutable structs have been removed/deprecated. All structs have been changed to immutable.

Using two threads gives the best performance on my PC (Ryzen 7 2700X, using export JULIA_NUM_THREADS=2). Time went down from 7 minutes to 5 minutes for the example in README.

Assigning more threads will not further improve the performance, because now it is memory-bound. There are lots of memory assignments (about 700MB for each parameter combination). I suspect that is related to TS but have not looked into it.

I know the API changes, especially getting rid of ! operations, may not be ideal. So, feel free to accept, reject, or update this PR.

@dysonance
Copy link
Owner

@cuihantao Thanks for proposing these changes! Threading is definitely something that has been on the to-do list for a while. The only thing that makes me a little curious is the move to immutable structs. If I recall correctly, Julia passes immutable structs by value and mutable by reference, which can generally cut down on memory usage.

Would you be able to experiment with minor alterations where the structs are mutable and see if that improves memory efficiency?

@cuihantao
Copy link
Author

cuihantao commented Jun 18, 2020

I did some tests and found that changing the structs to mutable does not bring any benefits. It also surprises me that the proposed change does not bring convincing improvements.

Three codebases are tested:

  • A: your original code with threaded for loops
  • B: proposed PR, which is based on A with all mutable structs changed to immutable. Modifications to mutable structs have been avoided.
  • C: based on B with all structs changed to mutable. Still, modifications to structs have been removed.

For a backtest of 1,000 combinations of parameters, the run times are as follows.

-- 1 Thread 2 Threads 3 Threads
A 37.7 34.1 34.4
B 37.5 33.59 34.45
C 37.82 34 34.68

All the above cases have around 22.5 GBs of memory assignment.

So for now, I cannot reach any conclusion. Could you investigate where most assignments happen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants