Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve validation of PodCIDR and ServiceClusterIPRange #15623

Merged
merged 1 commit into from
Jul 19, 2023

Conversation

johngmyers
Copy link
Member

Fixes #15034

Removes the requirement (on non-GCE) that the ServiceClusterIPRange be within NonMasqueradeCIDR, as kube-proxy reroutes the ServiceClusterIPRange before masquerading.

Prohibits overlap between the PodCIDR and the ServiceClusterIPRange

Prohibits overlap between either the PodCIDR and the ServiceClusterIPRange and any subnet CIDR or IPv6CIDR. (except IPv6 doesn't have a PodCIDR, so can't overlap).

Adds validation of any podCIDR and requires it be within any NonMasqueradeCIDR.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/api labels Jul 12, 2023
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 12, 2023
@johngmyers
Copy link
Member Author

/retest

@johngmyers
Copy link
Member Author

/cc @justinsb

@@ -460,6 +461,7 @@ func Test_Validate_AdditionalPolicies(t *testing.T) {
Networking: kops.NetworkingSpec{
NetworkCIDR: "10.10.0.0/16",
NonMasqueradeCIDR: "100.64.0.0/10",
PodCIDR: "100.96.0.0/11",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't PodCIDR and ServiceClusterIPRange be both part of NonMasqueradeCIDR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServiceClusterIPRange doesn't have to be part of NonMasqueradeCIDR. That is because only things within the cluster network can talk to ServiceClusterIPRange and kube-proxy reroutes ServiceClusterIPRange before masquerading. GCE currently doesn't require ServiceClusterIPRange to be in NonMasqueradeCIDR; this PR drops that validation for all of the other cloud providers as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, does is the NonMasqueradeCIDR still used for anything? It was dropped by kubelet when Docker shim was removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used by ContainerdBuilder and AWSCloudControllerManagerOptionsBuilder. It is also used as an exclusion for the egress proxy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use by AWSCloudControllerManagerOptionsBuilder looks to me like it's a bug. It should be using the PodCIDR instead.

Copy link
Member

@hakman hakman Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, same thought. Not sure about the other places, but I suspect we may get rid of it also.
The PR looks ok as is. We can merge it and see about NonMasqueradeCIDR later. Your choice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can handle AWSCloudControllerManagerOptionsBuilder as a separate PR. That one we might want to backport.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 19, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 19, 2023
@hakman
Copy link
Member

hakman commented Jul 19, 2023

/retest

@k8s-ci-robot k8s-ci-robot merged commit d5c2458 into kubernetes:master Jul 19, 2023
8 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone Jul 19, 2023
@johngmyers johngmyers deleted the service-ip-range branch July 19, 2023 14:02
@minkimipt
Copy link
Contributor

we are running a number of production clusters that were setup with kops 1.18 or even earlier on AWS. We’ve been upgrading kops and k8s over the last years and we are currently on kops 1.27 and k8s 1.25. We are expanding our infra into other regions and in aim to keep our older clusters upgraded too. We’ve recently kicked off creation of new clusters, that we do directly with kops 1.28. We are following the same subnetting scheme, that we used to apply for all of our clusters, but we’ve hit some problems in kops 1.28, which leaves us in doubt how to continue further. Here’s the particular problem that we are facing when trying to create a new cluster:

Error: completed cluster failed validation: spec.networking.serviceClusterIPRange: Forbidden: serviceClusterIPRange "10.22.128.0/17" must not overlap podCIDR "10.22.128.0/17"

Those subnets indeed overlap, but that hasn’t been an issue until kops 1.28 and we’ve been able to track that change down to this PR, which introduced the serviceClusterIPRange check, i.e.:

			if subnet.Overlap(podCIDR, serviceClusterIPRange) {
				allErrs = append(allErrs, field.Forbidden(fldPath.Child("serviceClusterIPRange"), fmt.Sprintf("serviceClusterIPRange %q must not overlap podCIDR %q", serviceClusterIPRange, podCIDR)))
			}

Since our clusters used to work well while those subnets were overlapping, could we make this check optional? Is there any risk adding a command line parameters that disables this check? Will we hit any issues in kops 1.28 and later and k8s 1.28 if this check is not done?

@johngmyers may I ask you to elaborate about the reason of this change?

@justinsb
Copy link
Member

justinsb commented Feb 10, 2024

@minkimipt thanks for reporting and sorry about the issue. IIRC when we merged this we believed this was not supported / caused other problems. However, as it's a regression and existings clusters at least appear to be working fine, I think we should reduce this to a warning / disable it entirely while we track down exactly what goes wrong when these CIDRs overlap, and if it is indeed a problem, figure out how to move the CIDRs to be non-overlapping. I propose we track as #16340

@ebdekock
Copy link

@minkimipt thanks for reporting and sorry about the issue. IIRC when we merged this we believed this was not supported / caused other problems. However, as it's a regression and existings clusters at least appear to be working fine, I think we should reduce this to a warning / disable it entirely while we track down exactly what goes wrong when these CIDRs overlap, and if it is indeed a problem, figure out how to move the CIDRs to be non-overlapping. I propose we track as #16340

@justinsb @johngmyers we are running into the same issue, except for the subnet range. We are trying to update a cluster where the subnet and service IP range overlap. We are hitting this validation, preventing the update:

https://github.com/kubernetes/kops/pull/15623/files#diff-ae412ac68b83570fd50e9d5d63873f060eb8f503f5e44be78d710076294c3285R623-R625

I have the same questions here:

Since our clusters used to work well while those subnets were overlapping, could we make this check optional? Is there any risk adding a command line parameters that disables this check? Will we hit any issues in kops 1.28 and later and k8s 1.28 if this check is not done?

I see we've reverted the pod overlap validation: a1bba9d
Are the subnets different or would the same be possible?

@minkimipt
Copy link
Contributor

@ebdekock we were able to upgrade kops to 1.28.5 where that validation was disabled in a cluster where we had those networks overlapping. Just to confirm that it's not a problem anymore.

@ebdekock
Copy link

@ebdekock we were able to upgrade kops to 1.28.5 where that validation was disabled in a cluster where we had those networks overlapping. Just to confirm that it's not a problem anymore.

I was waiting for the 1.28.5 release, as I thought it would also fix my issue, but our overlap is with the subnet CIDR and not the PodCIDR

@johngmyers
Copy link
Member Author

The issue is that if the podCDR and serviceClusterIPRange overlap, then a pod could be assigned the same IP as a service. Then traffic intended for the pod would be routed to the wrong place.

Similarly, if the subnet range overlaps with the serviceClusterIPRange, then a node could be assigned the same IP as a service, causing routing problems for host-network pods.

I would say the "appear to be working fine" is merely that problems are rarely detected, despite being there. I would not be in favor of weakening the validations and would suggest the clusters' networking configurations be fixed.

@ebdekock
Copy link

I would say the "appear to be working fine" is merely that problems are rarely detected, despite being there. I would not be in favor of weakening the validations and would suggest the clusters' networking configurations be fixed.

Yup, that makes total sense and we agree that fixing the underlying issue is the best way forward. We would like to get the cluster updated and need to prioritize getting the networking fixed. Would it be possible to get this validation behind a temporary flag (that's off by default) to give us time to correct the issue?

@ebdekock
Copy link

I would say the "appear to be working fine" is merely that problems are rarely detected, despite being there. I would not be in favor of weakening the validations and would suggest the clusters' networking configurations be fixed.

We have fixed the networking like you suggested and no longer need this, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adjust ServiceClusterIPRange validation restrictions
6 participants