-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround for @cfunction
closures on unsupported platforms
#131
base: master
Are you sure you want to change the base?
Conversation
…res are unavailable
I might have PR'ed too soon, as I see some julia> using Polyester
julia> let
@batch threadlocal=rand(10:99) for i in 0:Threads.nthreads()
threadlocal += 1
end
findall(i -> !isassigned(threadlocal, i), 1:length(threadlocal))
end
7-element Vector{Int64}:
65
66
67
68
69
70
71 All threads after the 64-th are |
Well... I might have forgotten to commit a single line from my PR a year ago, sorry about that. Somehow I didn't stumble upon this sooner. Now everything is working fine on my side. CI is failing but apparently it is only because of Aqua.jl, all multithreading tests are passing. |
I found that while this works, it does allocate a bit more than what I would like (which is zero), and it seems to be proportional to the number of threads, which is too detrimental for performance. More info tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Polyester is supposed to try and avoid closures closing over objects by identifying the objects that would be captured, and then passing them as function arguments instead.
Ideally, we'd be able to fix examples of cases where this doesn't work.
Behavior dependent on capturing and mutating isn't going to be threadsafe anyway, and is thus unsupported.
fptr::Ptr{Cvoid}, | ||
closure_obj, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we want both fptr
and closure_obj
? Wouldn't it be one or the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fptr
allows to infer the type of BatchClosure{F,A,C}
thanks to the @cfunction
.
We can't place the closure object directly into the TLS because it contains the captured values in a Core.Box
, which doesn't have a fixed size at compile-time, and therefore store!
fails.
So I put the closure in a Reference
, which does have a fixed size, and when calling FakeClosure{F,A,C}
we have the type of the closure before load
-ing it, allowing type-inference.
But currently users are oblivious of this optimization on x86, even when it fails and falls back to a closure instead, as performance is only slightly altered. I think that having a working fallback on all platforms might be preferable over the current unhelpful error
It seems that the allocation issue is unrelated to my changes, and only happen on ARM when using |
Fixes #88 by storing the
BatchClosure
object in the thread local storage of each task.Architectures which do not support closures are detected in the same way as in this line, which should cover all affected platforms.
Tested on Nvidia Grace, not on Mac. Comparing this method with the
@cfunction
closure on x86, the overhead seems to be almost the same, but with additional allocations when usingthreadlocal
.