Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

neonvm-controller: make IPAM code production-ready #1278

Merged
merged 11 commits into from
Mar 7, 2025
Merged

Conversation

Omrigan
Copy link
Contributor

@Omrigan Omrigan commented Feb 21, 2025

Split into commits, to be merged with rebase.

Fixes #1281

Copy link

github-actions bot commented Feb 21, 2025

No changes to the coverage.

HTML Report

Click to open

@cicdteam
Copy link
Contributor

I haven't gone into the changes in depth, but I just want to ask what issues are being addressed?

@Omrigan
Copy link
Contributor Author

Omrigan commented Feb 23, 2025

I haven't gone into the changes in depth, but I just want to ask what issues are being addressed?

Key issues are:

  1. Leader election times out when the rate of allocations is significant.
  2. When a release of an IP fails, it is not retried, so we have 13k stale allocations.
  3. k8s client is recreated on every request.

Plus a few other things.

@Omrigan Omrigan changed the title neonvm-controller: overhaul IPAM neonvm-controller: make IPAM code production-ready Feb 23, 2025
Base automatically changed from oleg/migration-tests to main February 23, 2025 21:42
@Omrigan Omrigan force-pushed the oleg/ipam-fixes branch 2 times, most recently from 49ef878 to 382bcfa Compare February 25, 2025 12:00
@Omrigan Omrigan marked this pull request as ready for review February 25, 2025 12:04
@Omrigan Omrigan force-pushed the oleg/ipam-fixes branch 5 times, most recently from e675b7b to ffed348 Compare February 25, 2025 16:14
Copy link
Member

@sharnoff sharnoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, just skimming through -- broadly looks good, need to dig a little deeper into the new core of the ipam actions.

One thing that I'm not sure about: What happens if we get a lot of reconcile operations simultaneously waiting on ipam operations? Should we have some mechanism to immediately bail if there's too many other threads also waiting? (so that, under load, other operations don't completely grind to a halt as well)

@Omrigan Omrigan force-pushed the oleg/ipam-fixes branch 2 times, most recently from 185af9f to 0176fbe Compare February 27, 2025 12:59
Copy link
Contributor Author

@Omrigan Omrigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied suggestions from Em

@sharnoff
Copy link
Member

(assigned you for now @Omrigan, please feel free to assign back to me after resolving comments if you think it's ready for another round of review)

@Omrigan Omrigan force-pushed the oleg/ipam-fixes branch 2 times, most recently from b7c36e5 to 0b620bd Compare March 3, 2025 11:29
@Omrigan Omrigan assigned sharnoff and unassigned Omrigan Mar 3, 2025
Copy link
Member

@sharnoff sharnoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More thorough review, lots of comments -- broadly looks ok. If there's smaller commits that I haven't commented on, I'm happy to approve them separately if that would help (reducing surface area of the diff, etc.)

@Omrigan Omrigan assigned Omrigan and unassigned sharnoff Mar 4, 2025
Omrigan added a commit that referenced this pull request Mar 5, 2025
Leader election is too heavy to run it on every IP allocation.

We can achieve the same with a leader election on neonvm-controller +
a local mutex.

Signed-off-by: Oleg Vasilev <[email protected]>
Omrigan added a commit that referenced this pull request Mar 6, 2025
We must not leak IP addresses. The release will be retried.

Also, we have to make it idempotent.

Signed-off-by: Oleg Vasilev <[email protected]>
@Omrigan Omrigan disabled auto-merge March 6, 2025 22:20
Omrigan added a commit that referenced this pull request Mar 6, 2025
We must not leak IP addresses. The release will be retried.

Also, we have to make it idempotent.

Signed-off-by: Oleg Vasilev <[email protected]>
Omrigan added 11 commits March 7, 2025 11:59
Leader election is too heavy to run it on every IP allocation.

We can achieve the same with a leader election on neonvm-controller +
a local mutex.

Signed-off-by: Oleg Vasilev <[email protected]>
We will know if it is broken when we try to use it.

Signed-off-by: Oleg Vasilev <[email protected]>
Non-functional change.

Signed-off-by: Oleg Vasilev <[email protected]>
Two things:
1. Use NamespacedName.
2. Make the loop less verbose

Signed-off-by: Oleg Vasilev <[email protected]>
This supresses the following warning, which would be introduced by the
next commit:
 exitAfterDefer: os.Exit will exit, and `defer ipam.Close()` will not run

Signed-off-by: Oleg Vasilev <[email protected]>
There is no need to create it for every request.

Signed-off-by: Oleg Vasilev <[email protected]>
Non-functional change.

Preparation to have a cleanup action.

Signed-off-by: Oleg Vasilev <[email protected]>
The previous version is marked deprecated.

Signed-off-by: Oleg Vasilev <[email protected]>
Otherwise all reconcilliation workers can be stuck

Signed-off-by: Oleg Vasilev <[email protected]>
We must not leak IP addresses. The release will be retried.

Also, we have to make it idempotent.

Signed-off-by: Oleg Vasilev <[email protected]>
@Omrigan Omrigan enabled auto-merge (rebase) March 7, 2025 14:14
@Omrigan Omrigan merged commit 2723e60 into main Mar 7, 2025
35 checks passed
@Omrigan Omrigan deleted the oleg/ipam-fixes branch March 7, 2025 14:15
Omrigan added a commit that referenced this pull request Mar 7, 2025
Leader election is too heavy to run it on every IP allocation.

We can achieve the same with a leader election on neonvm-controller +
a local mutex.

Signed-off-by: Oleg Vasilev <[email protected]>
Omrigan added a commit that referenced this pull request Mar 7, 2025
We will know if it is broken when we try to use it.

Signed-off-by: Oleg Vasilev <[email protected]>
Omrigan added a commit that referenced this pull request Mar 7, 2025
Non-functional change.

Signed-off-by: Oleg Vasilev <[email protected]>
Omrigan added a commit that referenced this pull request Mar 7, 2025
Two things:
1. Use NamespacedName.
2. Make the loop less verbose

Signed-off-by: Oleg Vasilev <[email protected]>
Omrigan added a commit that referenced this pull request Mar 7, 2025
This supresses the following warning, which would be introduced by the
next commit:
 exitAfterDefer: os.Exit will exit, and `defer ipam.Close()` will not run

Signed-off-by: Oleg Vasilev <[email protected]>
Omrigan added a commit that referenced this pull request Mar 7, 2025
There is no need to create it for every request.

Signed-off-by: Oleg Vasilev <[email protected]>
Omrigan added a commit that referenced this pull request Mar 7, 2025
Non-functional change.

Preparation to have a cleanup action.

Signed-off-by: Oleg Vasilev <[email protected]>
Omrigan added a commit that referenced this pull request Mar 7, 2025
Omrigan added a commit that referenced this pull request Mar 7, 2025
The previous version is marked deprecated.

Signed-off-by: Oleg Vasilev <[email protected]>
Omrigan added a commit that referenced this pull request Mar 7, 2025
Otherwise all reconcilliation workers can be stuck

Signed-off-by: Oleg Vasilev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

neonvm-controller: rework IPAM
3 participants