-
Notifications
You must be signed in to change notification settings - Fork 1
Standalone consensus component #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standalone consensus component #75
Conversation
#profile {:dev "/ip4/127.0.0.1/tcp/62071" | ||
:prod nil | ||
:docker "/ip4/127.0.0.1/tcp/62071"}]}} | ||
:fluree/raft {:log-history #or [#env FLUREE_RAFT_LOG_HISTORY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to preserve these settings somewhere, maybe a second config-raft.edn file to at least have them captured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, either in a separate config raft file, docs, or?
I was planning on starting another branch for config stuff to see how far I get with implementing your suggestions from yesterday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've saved the raft config options in a separate file. We can merge the config map in that file with the main one, and then dissoc
the :fluree/standalone
key and start raft normally. I will wait until we implement the more user friendly configs to make that process more ergonomic.
(queue-new-ledger [_ ledger-id tx-id txn opts] | ||
(go | ||
(let [event-msg (msg-format/queue-new-ledger ledger-id tx-id txn opts)] | ||
(>! tx-queue event-msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point I think we need to use put!
instead of >!
and maybe you are thinking of that upstream, else we can never return a response and the whole server is locked.
If put! is false, the"requestor" should get a 503 server busy response or equivalent (but the server still might be able to respond to queries, for example). Hopefully it could catch up with backlog and accept new tx requests soon.
If we block all the way up the stack, nothing can happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can use put!
because we'll run into a "too many pending puts" error under heavy load.
We could wrap upstream code with a timeout channel to handle returning the 503s, or we could put a dropping buffer on the tx-queue channel and bubble up the return value of >!
(as long as the return value is false when the buffer is full, but I'd have to verify that this is the case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must have misunderstood put!
. I thought it was exactly for this reason and would return false (and not try to >!
if the buffer was full). I thought there was something like that, maybe a different fn name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it is: https://clojuredocs.org/clojure.core.async/offer!
::closed) | ||
|
||
:else | ||
(let [result (<! (process-event conn subscriptions watcher event)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are missing exception handling - which should likely be addressed upstream from this, but this is the point it is swallowed (process-event
will return nil
) and things will progress.
The issue is no response will be delivered to the "requestor" that something happened and the will get a timeout response instead.
fluree/db | ||
(fluree/stage txn opts) | ||
deref!) | ||
commit-result (deref! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the process-event
exception handling comment below, here we'll throw and never make it to delivering a message that the HTTP handler is waiting for.
Probably here we don't throw, then deliver the response below (which might be an exception).
Same thing with create-ledger!
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've covered this scenario by reusing the error handling machinery from the raft implementation. If there's an error while processing events, then that error is broadcasted out and the assigned watcher is delivered with the error.
For your test failure here, not sure but it might need either fluree/db#800 which isn't merged yet. Not certain that is culprit, but think it might be. |
Not all implementations are "groups"
|
||
(defn new-tx-queue | ||
[conn subscriptions watcher] | ||
(let [tx-queue (async/chan 16)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set the buffer size to 16 here to allow for 16 pending transactions at once before we report back that the transactor is too busy. I just made that number up because it kinda felt right, but I have no evidence to back this up. I'm happy to make this number higher or lower. Perhaps we could make it a configuration option that users can set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably more size sensitive than quantity sensitive since it sits in memory, and its a lot of work to estimate/maintain the size so I'm not suggesting we go down that path at least for now.
I don't have a better suggestion, mine would be just a guess too. I'd probably guess a number like 50 though, but have no problem with 16 either.
I think ultimately it should be a server config at the point we have more robust server configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's easy enough to add it as a config option now with a default.
These components explicitly interact with the consensus subsystem, so they should come under the consensus namespace hierarchy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This patch to the feature/no-consensus branch renames the consensus type from "none" to "standalone" and makes some changes to use explicit configuration values up front instead of passing around config maps, starts and stops the new standalone transactors with integrant, as well as some cleanup.