Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Operator proposal - align closely with Gateway API #52

Closed
wants to merge 1 commit into from

Conversation

robobario
Copy link

No description provided.

Signed-off-by: Robert Young <[email protected]>
@robobario robobario changed the title Draft: Add operator design Draft: Operator proposal Feb 17, 2025
@robobario robobario changed the title Draft: Operator proposal Draft: Operator proposal - align closely with Gateway API Feb 17, 2025
Copy link
Member

@SamBarker SamBarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the hard work on this @tombentley & @robobario.

I was initially quite keen on the idea of aligning with GatewayAPI but reading this and comparing it with #51 I'm seeing a lot of compromises between what the GatewayAPI spec wants and what we want to offer. Thats both in terms of difficulty expressing things in a gateway API compliant way and in exposing Kafka the way we want to.

What I'm not clear on at this stage is how many of the compromises outlined here actually carry over to #51 because you've done a lot more detailed work on this.

name: my-tls-policy
spec:
targetRefs:
- kind: Service
Copy link
Contributor

@k-wall k-wall Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assumes the backend is represented by a service. This might not be the case if the backend is off-cluster or a cloud.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use an ExternalName service but it's a bit indirect yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://kubernetes.io/docs/concepts/services-networking/service/#externalname doesn't include a port. I think we'd need our reference object.

- advertisedBrokerIds: [ 1,2,3,4,5,6,7,8,9 ]
brokerIdentityMapping: identity | hash # to be added eventually, default is identity
hostnames:
# When a hostnamePattern ends with the cluster/namespace
Copy link
Contributor

@k-wall k-wall Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using naming convention to let the user hint the exposition behaviour: cluster ip, load balancer or openshift routes. What if we want to extend this to exposing the cluster using a TLSRoute of a real Gateway? I don't think this scales.

config:
kms: Foo
kmsConfig:
myApiTokenFile: $(secretFile:secretName:subPath) # secret will be mounted into container and config replaced with path to secret
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this idea passes the 'good enough' test.

certificateFile: /opt/kroxylicious/server/key-material-2/tls.crt
```

Then the Proxy will be responsible for determining which certificate to use for the virtual cluster based on the SNI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to remember to make sure the intermediate certificates too.

# When a hostnamePattern ends with the cluster/namespace
# domain we'd create ClusterIP Services automatically. Else
# we expose a single LoadBalancer service.
- foo-%.svc.ns.cluster.local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the operator should publish the bootstrap addresses to the status section so a kubernetes based client app can read the bootstrap information for itself.

- Known issues with haproxy getting restarted every time a new Route is created = dropped Kafka connections. However
they seem to be fixing that (https://issues.redhat.com/browse/NE-879)!

We could have a configurable list of ingress domains in KafkaGatewayParameters, then when we find a hostname that is a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is somewhat unfortunate to need to duplicate the ingressdomain. If the ingressdomain changes for whatever, the configuration in the Kroxylicious CRs would be stale.

`my-ingress-route-broker-1.<ingress-domain>`
`my-ingress-route-broker-2.<ingress-domain>`

Then the user can bootstrap with `my-ingress-route-bootstrap.<ingress-domain>:443`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's slightly weird. The KafkaGateway's port is ignored in this case.

Because the hostname is not suffixed with `svc.cluster.local` the operator will infer that we want to expose this
hostname off-cluster.

The operator will create a LoadBalancer Service named `my-route` mapping port 9092 to all the Proxy instances
Copy link
Contributor

@k-wall k-wall Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes you want to make hints to the loadbalancer to control its behaviour. This is done via the Service's annotations.

We've got the infrastructure: field at the Gateway, but that will apply to all the resources created by the operator. AFAICS, we have no way to target an annotation at a particular resource.

We have the same issue with OpenShift Routes. Sometimes you want extra labels on those (https://docs.openshift.com/container-platform/4.17/networking/configuring_ingress_cluster_traffic/configuring-ingress-cluster-traffic-ingress-controller.html#nw-traditional-sharding_configuring-ingress-cluster-traffic-ingress-controller). Again, we haven't got a way to apply those in a precise way.

port: 9093
```

The Operator will manifest this as two virtual clusters in the proxy configuration, both with 9092 as the bootstrap
Copy link
Contributor

@k-wall k-wall Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really want to fix this so that the proxy has the ability to present n listeners on each virtualcluster.

@k-wall
Copy link
Contributor

k-wall commented Feb 17, 2025

Thank you, this is a really through write up of this idea.

@k-wall
Copy link
Contributor

k-wall commented Feb 17, 2025

I too initially believed that modelling our API based on the Gateway API was the way to go. I've changed my mind on that over the last few weeks. Whilst I think the proposal is implementable and would function, I worry the result would difficult to use.

Thinking about my biggest areas of concern:

  1. I don't like using the suffix of the hostname field to decide which sort of ingress to create. It works for ClusterIP and LoadBalancer but it does really extend to other forms of ingress. For OpenShift Routes, we'd need to hardcode router addresses (which might change). I also have doubts about how it work would with a real Gateway API. How could our operator know whether to follow its LoadBalancer or Gateway API path? It'd wouldn't have sufficient information.
  2. KafkaGateway has a port number, but in the case of OpenShift Routes or real Gateway TLSRoute, we'd ignore it. The port number is owned by the Ingress Operator (443) or (real) Gateway Operator.
  3. It seems common for ingress behaviours to controlled via labels or annotations. Whilst we can apply labels via the infrastructure field in the KafkaGateway, but we've got no way to be specific and just apply a label to say, the ClusterIP's service.

@robobario
Copy link
Author

Closing, we have decided #51 has fewer compromises and we will go in that direction

@robobario robobario closed this Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants