Standalone consensus component #75

zonotope · 2024-07-20T11:06:11Z

This patch to the feature/no-consensus branch renames the consensus type from "none" to "standalone" and makes some changes to use explicit configuration values up front instead of passing around config maps, starts and stops the new standalone transactors with integrant, as well as some cleanup.

bplatz · 2024-07-20T12:55:40Z

resources/config.edn

-                                   #profile {:dev    "/ip4/127.0.0.1/tcp/62071"
-                                             :prod   nil
-                                             :docker "/ip4/127.0.0.1/tcp/62071"}]}}
- :fluree/raft {:log-history      #or [#env FLUREE_RAFT_LOG_HISTORY


I think we want to preserve these settings somewhere, maybe a second config-raft.edn file to at least have them captured.

Yes, either in a separate config raft file, docs, or?

I was planning on starting another branch for config stuff to see how far I get with implementing your suggestions from yesterday.

I've saved the raft config options in a separate file. We can merge the config map in that file with the main one, and then dissoc the :fluree/standalone key and start raft normally. I will wait until we implement the more user friendly configs to make that process more ergonomic.

bplatz · 2024-07-20T13:01:07Z

src/fluree/server/consensus/standalone.clj

+  (queue-new-ledger [_ ledger-id tx-id txn opts]
+    (go
+      (let [event-msg (msg-format/queue-new-ledger ledger-id tx-id txn opts)]
+        (>! tx-queue event-msg)


At some point I think we need to use put! instead of >! and maybe you are thinking of that upstream, else we can never return a response and the whole server is locked.

If put! is false, the"requestor" should get a 503 server busy response or equivalent (but the server still might be able to respond to queries, for example). Hopefully it could catch up with backlog and accept new tx requests soon.

If we block all the way up the stack, nothing can happen.

I don't think we can use put! because we'll run into a "too many pending puts" error under heavy load.

We could wrap upstream code with a timeout channel to handle returning the 503s, or we could put a dropping buffer on the tx-queue channel and bubble up the return value of >! (as long as the return value is false when the buffer is full, but I'd have to verify that this is the case).

I must have misunderstood put!. I thought it was exactly for this reason and would return false (and not try to >! if the buffer was full). I thought there was something like that, maybe a different fn name?

Here it is: https://clojuredocs.org/clojure.core.async/offer!

src/fluree/server/consensus/standalone.clj

bplatz · 2024-07-20T13:12:40Z

src/fluree/server/consensus/standalone.clj

+              ::closed)
+
+            :else
+            (let [result (<! (process-event conn subscriptions watcher event))


I think we are missing exception handling - which should likely be addressed upstream from this, but this is the point it is swallowed (process-event will return nil) and things will progress.

The issue is no response will be delivered to the "requestor" that something happened and the will get a timeout response instead.

bplatz · 2024-07-20T13:14:21Z

src/fluree/server/consensus/standalone.clj

+                            fluree/db
+                            (fluree/stage txn opts)
+                            deref!)
+          commit-result (deref!


Per the process-event exception handling comment below, here we'll throw and never make it to delivering a message that the HTTP handler is waiting for.

Probably here we don't throw, then deliver the response below (which might be an exception).

Same thing with create-ledger! above.

I think I've covered this scenario by reusing the error handling machinery from the raft implementation. If there's an error while processing events, then that error is broadcasted out and the assigned watcher is delivered with the error.

bplatz · 2024-07-20T13:18:22Z

For your test failure here, not sure but it might need either fluree/db#800 which isn't merged yet. Not certain that is culprit, but think it might be.

Not all implementations are "groups"

zonotope · 2024-07-21T17:59:56Z

src/fluree/server/consensus/standalone.clj

+
+(defn new-tx-queue
+  [conn subscriptions watcher]
+  (let [tx-queue (async/chan 16)]


I set the buffer size to 16 here to allow for 16 pending transactions at once before we report back that the transactor is too busy. I just made that number up because it kinda felt right, but I have no evidence to back this up. I'm happy to make this number higher or lower. Perhaps we could make it a configuration option that users can set.

It is probably more size sensitive than quantity sensitive since it sits in memory, and its a lot of work to estimate/maintain the size so I'm not suggesting we go down that path at least for now.

I don't have a better suggestion, mine would be just a guess too. I'd probably guess a number like 50 though, but have no problem with 16 either.

I think ultimately it should be a server config at the point we have more robust server configs.

It's easy enough to add it as a config option now with a default.

These components explicitly interact with the consensus subsystem, so they should come under the consensus namespace hierarchy.

bplatz

LGTM!

zonotope added 18 commits July 19, 2024 22:13

add references to connection, watcher, and subscription components

0feba71

rename from none consensus to standalone

94c4c8a

LocalConsensus -> StandalonTransactor

49c8811

add dedicated stop function instead of closure

4a15c13

pass explicit arguments to standalone transactor functions

d207d64

use >! instead of put! to allow for backpressure

5b66b99

use predefined functions to parse opts

b4a9eea

use inline method definitions

8155af8

more descriptive function names

2d22361

single function to create tx-queue

bc5921d

remove unnecessary address key from standalone config

07f25c8

start/stop standalone components

aa3b580

remove raft from default config

03b3cd2

fix require

e09947a

fix typo

36d98f4

don't include raft in config overrides

e8fee6b

increase buffer size on tx-queue

b007ced

cljfmt fixes

98cc335

zonotope requested review from bplatz and a team July 20, 2024 11:06

zonotope self-assigned this Jul 20, 2024

throw exceptions when processing events

e8e6e55

bplatz reviewed Jul 20, 2024

View reviewed changes

src/fluree/server/consensus/standalone.clj Outdated Show resolved Hide resolved

bplatz reviewed Jul 20, 2024

View reviewed changes

zonotope added 2 commits July 20, 2024 09:27

fix typo

df9d51c

use offer! to indicate if the tx-queue is full

ecfb0c2

zonotope added 16 commits July 20, 2024 12:26

limit transaction queue to 16 pending transactions

5d43569

remove unused refer

8d19432

adjust environment variable name

624a6f0

use go-loop instead of go ... loop

e4dd3af

use ns docstring

e4964b5

msg-format -> messages

c9fa3b1

define common functions to broadcast commit messages

8e20c71

broadcast exceptions from standalone consensus

2c51a2f

move message creation outside of protocol

688eec2

more descriptive message names

1360e52

rename messages namespace to events

97406ba

add namespace for broadcasting messages to consensus

fdbed64

better name for consensus protocol

4a6cb55

Not all implementations are "groups"

cleanup

d144290

cleanup the cleanup

522743a

add config-raft.edn to preserve raft config options

182077d

zonotope commented Jul 21, 2024

View reviewed changes

move watcher and subscriptions back under consensus

85db0a9

These components explicitly interact with the consensus subsystem, so they should come under the consensus namespace hierarchy.

bplatz approved these changes Jul 21, 2024

View reviewed changes

add configuration option to set the max pending transactions

dbaec41

zonotope merged commit 0d0ac54 into feature/no-consensus Jul 21, 2024
3 of 4 checks passed

zonotope deleted the feature/standalone-consensus-component branch July 21, 2024 18:27

zonotope mentioned this pull request Jul 24, 2024

User JSON Configuration #78

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standalone consensus component #75

Standalone consensus component #75

zonotope commented Jul 20, 2024

bplatz Jul 20, 2024

zonotope Jul 20, 2024

zonotope Jul 21, 2024

bplatz Jul 20, 2024 •

edited

Loading

zonotope Jul 20, 2024

bplatz Jul 20, 2024

bplatz Jul 20, 2024

bplatz Jul 20, 2024

bplatz Jul 20, 2024

zonotope Jul 21, 2024

bplatz commented Jul 20, 2024

zonotope Jul 21, 2024

bplatz Jul 21, 2024

zonotope Jul 21, 2024

bplatz left a comment

Standalone consensus component #75

Standalone consensus component #75

Conversation

zonotope commented Jul 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bplatz Jul 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bplatz commented Jul 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bplatz left a comment

Choose a reason for hiding this comment

bplatz Jul 20, 2024 •

edited

Loading