Introduce plugable cluster backends #3642

awelzel · 2024-03-07T12:43:26Z

This is draft/rfc material. I'm happy to hear feedback on terminology and if people see anything conceptually off. Initially this work was meant to result in some documentation/prototype, but it's been reasonably straight-forward, I could even see something like this going in, just so there's no large diff hanging around.

High-level

There's a new Cluster::publish() method that acts like Broker::publish(). Cluster::publish(), however, may use an alternative cluster backend if enabled. There's also new Cluster::subscribe() and Cluster::unsubscribe() bifs.

By default, these new cluster bifs use the existing broker implementation.

However, it's now possible to redef Cluster::backend` to another cluster backend. This PR provides one possible example that is using NATS. It's not decided that NATS will be Zeek's future cluster backend, it's just very easily to get setup with and their C library is available as package on Ubuntu 😅

Additionally, redef Cluster::serializer allows to change the encoding of Zeek events. The added implementation BROKER_BIN_V1 directly leverages broker's functions for serialization/de-serialization.

The commit titles should give an impression about the steps involved. A summary of the new components:

CLUSTER_SERIALIZER
- Serializing a cluster::detail::Zeek instance (basically FuncValPtr/EventHandler, zeek::Args and (todo) metadata) into a byte buffer
- De-serializing byte buffers back into a cluster::detail::Zeek instance that can then be queued
- There's one implementation which is directly re-using the existing broker::data and broker binary v1 format. Leveraging the JSON one should be straight-forward..
CLUSTER_BACKEND
- Mainly defines an API for publishing and subscribe.
- Receives a CLUSTER_SERIALIZER instance upon construction.
- Stored on zeek::cluster::backend which is used by the Cluster bifs.
- No stores, no logging, just Zeek events

Missing

CI. This PR adds a NATS.io cluster backend and that upsets baselines. It would also introduce a new optional dependency. Either the code is moved to a plugin proper, or could be made opt-in and not built by default. Not sure. It's a bit sad to not build it by default because a number of baselines fall apart 😢
Update existing Broker::publish() in the base scripts unless they are broker specific (think acld or zeekctl where broker Python bindings are involved).
Logging. @rsmmr mentioned logging could possibly be just an event. With the batching of individual log writes that may be efficient enough (transporting the encoded batch as a zeek::StringVal that's deserialized at the receiver). One wouldn't handle that event in script land, but could setup a C++ based event handler directly. Could be an interesting debug channel though..
Feedback

With the idea of an alternative cluster backend, we should not maintain Cluster state within low-level Broker events. Bit annoying doing the double loop, but hey.

Allows cluster backends control of the topics where messages are published to.

There are a few places where Broker::node_id() is used. Make this configurable by introducing Cluster::node_id() that can be redefined by alternative cluster backends.

Puts infrastructure in place to make pub-sub communication and serialization of event plug-able. The manager is really just there to allow registration of components. It does not offer any functionality itself. A backend that is instantiated provides actual functionality.

And use the zeek::cluster::backend instance rather than zeek::broker_mgr.

These are essentially the same as Broker::subscribe() and Broker::unsubscribe() but depend on the enabled backend.

When broker is the cluster backend, this makes no difference, but going through Cluster::subscribe() allows to use alternative backends.

This isn't great because we convert forth and back. If we settle on the Backend::PublishEvent(), should re-implement this more efficiently.

Do not raise node_up() and node_down() from Broker::peer_added() or Broker::peer_lost() if the cluster backend isn't actually broker. We could also move these into policy/backend/broker, but for now this seems okay.

All the event lookup and type conversion maybe can be moved in a common base or helper class. But for now there's only one format..

This could also be an external plugin in which case the policy scripts could be moved, too.

awelzel · 2024-03-13T15:05:23Z

Logging. @rsmmr mentioned logging could possibly be just an event. With the batching of individual log writes that may be efficient enough (transporting the encoded batch as a zeek::StringVal that's deserialized at the receiver). One wouldn't handle that event in script land, but could setup a C++ based event handler directly. Could be an interesting debug channel though..

Had chatted with Robin. In the end this appears more like a work-around and using a dedicated message type more sensible. Potentially as a separate interface.

The round-robin/batching logic of log writes currently lives in broker/Manager. Lifting this logic into logging/Manager allows other transports to benefit as well - maybe not the round-robin part, as NATS for example has a queue feature that could be leveraged.

awelzel added 15 commits March 6, 2024 15:43

plugin/CompnentManager: Support lookup by EnumValPtr

741973f

Func: Add const std::string& GetName()

2ecc9c0

scripts/base/cluster: Move active node management into node_down()

0749503

With the idea of an alternative cluster backend, we should not maintain Cluster state within low-level Broker events. Bit annoying doing the double loop, but hey.

scripts/base/cluster: Mark node_topic/nodeid_topic &redef

dba2057

Allows cluster backends control of the topics where messages are published to.

scripts/base/cluster: Introduce Cluster::node_id()

7d41961

There are a few places where Broker::node_id() is used. Make this configurable by introducing Cluster::node_id() that can be redefined by alternative cluster backends.

plugin/Component: Add CLUSTER_BACKEND and CLUSTER_SERIALIZER

afc11f1

zeek-setup: Allow toggling Cluster::backend and Cluster::serializer

df4c5b4

broker/messaging: Move publish_hrw() and publish_rr() into cluster

25fdd3e

And use the zeek::cluster::backend instance rather than zeek::broker_mgr.

scripts/base/cluster: Add Cluster::subscribe(), Cluster::unsubscribe()

8c380a1

These are essentially the same as Broker::subscribe() and Broker::unsubscribe() but depend on the enabled backend.

scripts/base/setup-connections: Use Cluster::subscribe()

bebe5f0

When broker is the cluster backend, this makes no difference, but going through Cluster::subscribe() allows to use alternative backends.

broker: Implement cluster::Backend

263c5ac

This isn't great because we convert forth and back. If we settle on the Backend::PublishEvent(), should re-implement this more efficiently.

scripts/base/cluster: Ignore broker events for alternative backend

e534700

Do not raise node_up() and node_down() from Broker::peer_added() or Broker::peer_lost() if the cluster backend isn't actually broker. We could also move these into policy/backend/broker, but for now this seems okay.

cluster/serializer: Add Broker binary v1 format as plugin

2eb79be

All the event lookup and type conversion maybe can be moved in a common base or helper class. But for now there's only one format..

cluster/backends: Add NATS plugin providing publish() / subscribe()

6273516

This could also be an external plugin in which case the policy scripts could be moved, too.

awelzel force-pushed the topic/awelzel/plugable-cluster-backend branch from d972031 to 6273516 Compare March 7, 2024 12:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce plugable cluster backends #3642

Introduce plugable cluster backends #3642

awelzel commented Mar 7, 2024 •

edited

awelzel commented Mar 13, 2024

Introduce plugable cluster backends #3642

Are you sure you want to change the base?

Introduce plugable cluster backends #3642

Conversation

awelzel commented Mar 7, 2024 • edited

High-level

Missing

awelzel commented Mar 13, 2024

awelzel commented Mar 7, 2024 •

edited