-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed sharding and cache agnosticism #317
Comments
Alternative for the Connector, rough concept: def handle_call({:ratelimit, bucket_pattern}, {to, tag} = _from, state) do
{delay, state} = calculate_delay(bucket_pattern, state)
# This is how it's done internally in GenServers [1]
Process.send_after(to, {tag, :ok}, delay)
{:no_reply, state}
end |
Summarizing from DMs with @jchristgit:
Next steps:
Work on supervisors and shards is still to be finalized. |
Distributed caching and cache agnosticism have been implemented, with the cache being held as generic as possible and nostrum performing internal query operations cache-agnostically via QLC. For the distributed sharding, which is planned to be the headlining feature for 1.0, I will make a separate issue. |
With Elixir/BEAM, there's a big advantage we have over most other languages: almost no need for orchestration. There are still hurdles, but ideally, this would be handled by tools like Horde or Swarm. Still there are blockers making Nostrum unable to run distributed.
There are three primary aspects of Nostrum that need to be extensible or replaceable in order for this to be reasonable: supervision, shard management, and caching.
Supervision
For future proofing this issue, the best suggestion I have is to delegate as much supervision to the end user as possible. This means taking the prep work from the Application and moving it elsewhere, perhaps a GenServer meant for coordination.
The line can be drawn at Sessions. Sessions are a fairly thin wrapper around the WS connection, so they can remain children of the Shard. The only thing I think is missing at this point is persisting enough information to enable the Session to resume on crash. I'd recommend using an Agent with the increment. If we close the Session intentionally, we can set the value to
nil
.This migration will allow the user to use whatever spawning/supervision techniques they desire. This will enable Horde support since they require you use their own DynamicSupervisor. We can still allow the user to have us supervise everything. We can take a function that accepts our intended process name and provides the actual process name to be used. This will allow for much more simple support for Swarm.
Caching
Some people prefer per-shard caching and some prefer global. Both have valid reasons and issues, so both should be supported in order to best support all use cases.
A cache Protocol should be written to enable swapping backends. CRUD operations aside, I think the most simple way to handle the divergence of approaches is to have a function in the Protocol that takes the shard ID and returns a PID. This allows the cache implementation to decide whether each shard gets its own cache or whether they all get the same cache. Note that this means that implementations will need to use the function to process what the name should be, else they won't be able to take advantage of all benefits in the previous section.
This work should probably get adopted by #143.
Shard Management
Shard management is mostly solved by the above proposals. There is nuance in breaking out the Shard Supervisor and Connector, however. The Connector can become part of the aforementioned coordination process. To not block this new process, we can put the process sleep in a Task that then uses
GenServer.reply/2
. The Supervisor needs to become a Protocol, converting the concrete module into a series of helper functions and a use macro. This should allow people to compose their own shard management with whatever rules they wish, or pick our standard implementation with the use.The text was updated successfully, but these errors were encountered: