Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ip-reconciler: Use ContainerID instead of PodRef #180

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

xagent003
Copy link
Contributor

Fixes issue: #176

@xagent003 xagent003 force-pushed the private/arjun/useCIDnotPodRef branch 2 times, most recently from 5f03ce5 to 4b04987 Compare January 20, 2022 20:56
@xagent003
Copy link
Contributor Author

@dougbtv @maiqueb Can you please review?

One of the failing testcases did not make sense, but I fixed it regardless by using container ID. It is negative testcase that checks whether the ip-reconciler fails to delete a reservation because the podRef name does not match.

But this could never be the actual case, because the orphanedIP Reservation object is formed from the IPPool itself. I instead changed the testcase so now the podRef names are the same, but the container ID is different, because my fix changes the MatchingFunc to use containerID instead of podName.

@xagent003 xagent003 force-pushed the private/arjun/useCIDnotPodRef branch 14 times, most recently from a48b24c to 8d22508 Compare January 21, 2022 04:19
Copy link

@roopakparikh roopakparikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Copy link
Collaborator

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this PR to my to-do list, I'll get back to you next week.

What about the unit test currently failing on this PR ?
I think you need to add the containerID to this remainingAllocation allocation map

remainingAllocation := map[string]v1alpha1.IPAllocation{

EDIT: couldn't help noticing there's also something wrong with the vendoring ... You'll need to sort that out as well.

@@ -289,7 +291,8 @@ func generateIPPoolSpec(ipRange string, namespace string, poolName string, podNa
allocations := map[string]v1alpha1.IPAllocation{}
for i, podName := range podNames {
allocations[fmt.Sprintf("%d", i+1)] = v1alpha1.IPAllocation{
PodRef: fmt.Sprintf("%s/%s", namespace, podName),
PodRef: fmt.Sprintf("%s/%s", namespace, podName),
ContainerID: strconv.Itoa(rand.Intn(1000)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please run gofmt before pushing; there's something wrong with the formatting on this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Collaborator

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code looks good; I honestly do not understand right now why I hadn't matched using the containerID instead.

Let's wait for the follow up work :)

Good catch.

func findOutPodRefsToDeallocateIPsFrom(orphanedIP OrphanedIPReservations) []string {
var podRefsToDeallocate []string
func findOutContainerIDsToDeallocateIPsFrom(orphanedIP OrphanedIPReservations) []string {
var cidsToDeallocate []string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename this variable. All the time I've seen cid my brain translated it to CIDR and I couldn't understand what was happening. I suggest using containerID , which makes it crystal clear.

@@ -289,7 +291,8 @@ func generateIPPoolSpec(ipRange string, namespace string, poolName string, podNa
allocations := map[string]v1alpha1.IPAllocation{}
for i, podName := range podNames {
allocations[fmt.Sprintf("%d", i+1)] = v1alpha1.IPAllocation{
PodRef: fmt.Sprintf("%s/%s", namespace, podName),
PodRef: fmt.Sprintf("%s/%s", namespace, podName),
ContainerID: strconv.Itoa(rand.Intn(1000)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@maiqueb
Copy link
Collaborator

maiqueb commented Jan 26, 2022

@TothFerenc would you review this PR as well, since you're introduced the pod ref attribute to the model.

I'm also thinking of replacing the whole reconciler pod / cron by a controller that would listen to pod deletions, and assure the allocations for the deleted pods are indeed gone.

If / once this route holds, we can drop the reconciler, and, for instance, merge your bash-based alternative for it.

@xagent003
Copy link
Contributor Author

xagent003 commented Mar 29, 2022

@dougbtv @maiqueb Took care of comments and gofmt, rebased to top of tree upstream

@xagent003 xagent003 force-pushed the private/arjun/useCIDnotPodRef branch from f7d884c to f5a64e6 Compare March 30, 2022 03:14
Copy link
Collaborator

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I worry about this is upgrade: let's say you have a running cluster with an existing IPPool. We then upgrade whereabouts to store the allocations (and garbage collect them) based on the container ID. Wouldn't we garbage collect the entire pool ?

My point is... don't we need some sort of an upgrade out of band tool to re-generate the pool, porting it to the "new" model ?

@maiqueb
Copy link
Collaborator

maiqueb commented Apr 1, 2022

BTW: you should rebase this with latest master. It will fix the unit test issue you're currently facing, and give us results.

@xagent003
Copy link
Contributor Author

One thing I worry about this is upgrade: let's say you have a running cluster with an existing IPPool. We then upgrade whereabouts to store the allocations (and garbage collect them) based on the container ID. Wouldn't we garbage collect the entire pool ?

My point is... don't we need some sort of an upgrade out of band tool to re-generate the pool, porting it to the "new" model ?

@maiqueb The storage model doesnt change nor the logic to determine what is a stale reservation in first place (that still uses podRef, since getting the pause Container ID from a Pod resource is impossible).

This is just the matching function that looks for reservations to delete AFTER on the already computed orphaned reservations list. The bug was that it tried to match an already computed orphaned reservation to a IP reservation in the IPPool and used the podRef in the matching function

We've already upgraded and used this fix, FYI in our custom whereabouts release.

@dougbtv
Copy link
Member

dougbtv commented Jun 23, 2022

@xagent003 do you know if the reconciliation control loop covers this one appropriately? It might.

If it doesn't, mind rebasing it and we can take another look? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants