Skip to content

Standalone consensus component #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Jul 21, 2024

Conversation

zonotope
Copy link
Contributor

This patch to the feature/no-consensus branch renames the consensus type from "none" to "standalone" and makes some changes to use explicit configuration values up front instead of passing around config maps, starts and stops the new standalone transactors with integrant, as well as some cleanup.

@zonotope zonotope requested review from bplatz and a team July 20, 2024 11:06
@zonotope zonotope self-assigned this Jul 20, 2024
#profile {:dev "/ip4/127.0.0.1/tcp/62071"
:prod nil
:docker "/ip4/127.0.0.1/tcp/62071"}]}}
:fluree/raft {:log-history #or [#env FLUREE_RAFT_LOG_HISTORY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to preserve these settings somewhere, maybe a second config-raft.edn file to at least have them captured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, either in a separate config raft file, docs, or?

I was planning on starting another branch for config stuff to see how far I get with implementing your suggestions from yesterday.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've saved the raft config options in a separate file. We can merge the config map in that file with the main one, and then dissoc the :fluree/standalone key and start raft normally. I will wait until we implement the more user friendly configs to make that process more ergonomic.

(queue-new-ledger [_ ledger-id tx-id txn opts]
(go
(let [event-msg (msg-format/queue-new-ledger ledger-id tx-id txn opts)]
(>! tx-queue event-msg)
Copy link
Contributor

@bplatz bplatz Jul 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point I think we need to use put! instead of >! and maybe you are thinking of that upstream, else we can never return a response and the whole server is locked.

If put! is false, the"requestor" should get a 503 server busy response or equivalent (but the server still might be able to respond to queries, for example). Hopefully it could catch up with backlog and accept new tx requests soon.

If we block all the way up the stack, nothing can happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use put! because we'll run into a "too many pending puts" error under heavy load.

We could wrap upstream code with a timeout channel to handle returning the 503s, or we could put a dropping buffer on the tx-queue channel and bubble up the return value of >! (as long as the return value is false when the buffer is full, but I'd have to verify that this is the case).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have misunderstood put!. I thought it was exactly for this reason and would return false (and not try to >! if the buffer was full). I thought there was something like that, maybe a different fn name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

::closed)

:else
(let [result (<! (process-event conn subscriptions watcher event))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing exception handling - which should likely be addressed upstream from this, but this is the point it is swallowed (process-event will return nil) and things will progress.

The issue is no response will be delivered to the "requestor" that something happened and the will get a timeout response instead.

fluree/db
(fluree/stage txn opts)
deref!)
commit-result (deref!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the process-event exception handling comment below, here we'll throw and never make it to delivering a message that the HTTP handler is waiting for.

Probably here we don't throw, then deliver the response below (which might be an exception).

Same thing with create-ledger! above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've covered this scenario by reusing the error handling machinery from the raft implementation. If there's an error while processing events, then that error is broadcasted out and the assigned watcher is delivered with the error.

@bplatz
Copy link
Contributor

bplatz commented Jul 20, 2024

For your test failure here, not sure but it might need either fluree/db#800 which isn't merged yet. Not certain that is culprit, but think it might be.


(defn new-tx-queue
[conn subscriptions watcher]
(let [tx-queue (async/chan 16)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set the buffer size to 16 here to allow for 16 pending transactions at once before we report back that the transactor is too busy. I just made that number up because it kinda felt right, but I have no evidence to back this up. I'm happy to make this number higher or lower. Perhaps we could make it a configuration option that users can set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably more size sensitive than quantity sensitive since it sits in memory, and its a lot of work to estimate/maintain the size so I'm not suggesting we go down that path at least for now.

I don't have a better suggestion, mine would be just a guess too. I'd probably guess a number like 50 though, but have no problem with 16 either.

I think ultimately it should be a server config at the point we have more robust server configs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy enough to add it as a config option now with a default.

These components explicitly interact with the consensus subsystem, so they
should come under the consensus namespace hierarchy.
Copy link
Contributor

@bplatz bplatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@zonotope zonotope merged commit 0d0ac54 into feature/no-consensus Jul 21, 2024
3 of 4 checks passed
@zonotope zonotope deleted the feature/standalone-consensus-component branch July 21, 2024 18:27
@zonotope zonotope mentioned this pull request Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants