Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce plugable cluster backends #3642

Draft
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

awelzel
Copy link
Contributor

@awelzel awelzel commented Mar 7, 2024

This is draft/rfc material. I'm happy to hear feedback on terminology and if people see anything conceptually off. Initially this work was meant to result in some documentation/prototype, but it's been reasonably straight-forward, I could even see something like this going in, just so there's no large diff hanging around.

High-level

There's a new Cluster::publish() method that acts like Broker::publish(). Cluster::publish(), however, may use an alternative cluster backend if enabled. There's also new Cluster::subscribe() and Cluster::unsubscribe() bifs.

By default, these new cluster bifs use the existing broker implementation.

However, it's now possible to redef Cluster::backend` to another cluster backend. This PR provides one possible example that is using NATS. It's not decided that NATS will be Zeek's future cluster backend, it's just very easily to get setup with and their C library is available as package on Ubuntu 馃槄

Additionally, redef Cluster::serializer allows to change the encoding of Zeek events. The added implementation BROKER_BIN_V1 directly leverages broker's functions for serialization/de-serialization.

The commit titles should give an impression about the steps involved. A summary of the new components:

  • CLUSTER_SERIALIZER

    • Serializing a cluster::detail::Zeek instance (basically FuncValPtr/EventHandler, zeek::Args and (todo) metadata) into a byte buffer
    • De-serializing byte buffers back into a cluster::detail::Zeek instance that can then be queued
    • There's one implementation which is directly re-using the existing broker::data and broker binary v1 format. Leveraging the JSON one should be straight-forward..
  • CLUSTER_BACKEND

    • Mainly defines an API for publishing and subscribe.
    • Receives a CLUSTER_SERIALIZER instance upon construction.
    • Stored on zeek::cluster::backend which is used by the Cluster bifs.
    • No stores, no logging, just Zeek events

Missing

  • CI. This PR adds a NATS.io cluster backend and that upsets baselines. It would also introduce a new optional dependency. Either the code is moved to a plugin proper, or could be made opt-in and not built by default. Not sure. It's a bit sad to not build it by default because a number of baselines fall apart 馃槩
  • Update existing Broker::publish() in the base scripts unless they are broker specific (think acld or zeekctl where broker Python bindings are involved).
  • Logging. @rsmmr mentioned logging could possibly be just an event. With the batching of individual log writes that may be efficient enough (transporting the encoded batch as a zeek::StringVal that's deserialized at the receiver). One wouldn't handle that event in script land, but could setup a C++ based event handler directly. Could be an interesting debug channel though..
  • Feedback

With the idea of an alternative cluster backend, we should
not maintain Cluster state within low-level Broker events.

Bit annoying doing the double loop, but hey.
Allows cluster backends control of the topics where
messages are published to.
There are a few places where Broker::node_id() is used. Make this
configurable by introducing Cluster::node_id() that can be redefined
by alternative cluster backends.
Puts infrastructure in place to make pub-sub communication and
serialization of event plug-able.

The manager is really just there to allow registration of components. It
does not offer any functionality itself. A backend that is instantiated
provides actual functionality.
And use the zeek::cluster::backend instance rather than zeek::broker_mgr.
These are essentially the same as Broker::subscribe() and
Broker::unsubscribe() but depend on the enabled backend.
When broker is the cluster backend, this makes no difference, but going
through Cluster::subscribe() allows to use alternative backends.
This isn't great because we convert forth and back. If we settle on the
Backend::PublishEvent(), should re-implement this more efficiently.
Do not raise node_up() and node_down() from Broker::peer_added() or
Broker::peer_lost() if the cluster backend isn't actually broker.

We could also move these into policy/backend/broker, but for now
this seems okay.
All the event lookup and type conversion maybe can be moved in a common
base or helper class. But for now there's only one format..
This could also be an external plugin in which case the policy
scripts could be moved, too.
@awelzel awelzel force-pushed the topic/awelzel/plugable-cluster-backend branch from d972031 to 6273516 Compare March 7, 2024 12:49
@awelzel
Copy link
Contributor Author

awelzel commented Mar 13, 2024

Logging. @rsmmr mentioned logging could possibly be just an event. With the batching of individual log writes that may be efficient enough (transporting the encoded batch as a zeek::StringVal that's deserialized at the receiver). One wouldn't handle that event in script land, but could setup a C++ based event handler directly. Could be an interesting debug channel though..

Had chatted with Robin. In the end this appears more like a work-around and using a dedicated message type more sensible. Potentially as a separate interface.

The round-robin/batching logic of log writes currently lives in broker/Manager. Lifting this logic into logging/Manager allows other transports to benefit as well - maybe not the round-robin part, as NATS for example has a queue feature that could be leveraged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant