-
Notifications
You must be signed in to change notification settings - Fork 183
[ARO-9395] FixSSH refactoring for CPMS enablement #4177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[ARO-9395] FixSSH refactoring for CPMS enablement #4177
Conversation
- Refactoring fixssh to work with CPMS. - Upgraded armnetwork package to v6.2.0 to get azfake.PagerResponder[armnetwork.InterfacesClientListResponse] for mocking InterfacesClient.NewListPager. - Refactoring fixssh tests to use mocked pager responses for various cases.
- Imports auto-updated and armnetwork mocks regenerated.
- Refactored fixssh_test to move client option for mock interface server into function to remove duplicate code and some cleanup. - Minor refactoring for fixssh.go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks extensive, nothing that leaps out at me red flag wise. Would be a good one to go through code review wise IMO
armnetwork "github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/network/armnetwork/v2" | ||
armnetwork0 "github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/network/armnetwork/v6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we loading in two version of the armnetwork library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did test upgrading the armnetwork package from v2 to v6 everywhere in the code base and didn't hit any issues, but figured that is a change that might be better in a separate PR. I'm pretty sure it's fully backward compatible.
The main reason for using the newer armnetwork v6 package is to get the fake server implementations, which make it easier to mock the InterfacesClient interface. The fake servers allow us to return custom responses to API requests with the same behaviour as the real Azure APIs.
pkg/cluster/fixssh_test.go
Outdated
"github.com/Azure/azure-sdk-for-go/sdk/azcore/policy" | ||
"github.com/Azure/azure-sdk-for-go/sdk/azcore/runtime" | ||
"github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/network/armnetwork/v6" | ||
fake_armnetwork "github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/network/armnetwork/v6/fake" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the fake_armnetwork section refer to what that library will be referenced as in code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's an import alias to the fake package name, which is just "fake". Using an alias makes it clear which fake package we're referring to.
pkg/cluster/fixssh.go
Outdated
func (m *manager) checkAndUpdateNICsInResourceGroup(ctx context.Context, resourceGroup string, infraID string, lb *mgmtnetwork.LoadBalancer) (err error) { | ||
masterSubnetID := m.doc.OpenShiftCluster.Properties.MasterProfile.SubnetID | ||
pagerOpts := &armnetwork.InterfacesClientListOptions{} | ||
pager := m.armInterfaces.NewListPager(resourceGroup, pagerOpts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have traditionally constructed wrappers around clients which will fetch all the resources rather than paging through them.
I don't know if we're doing this today still, but can/should we add an Addon to the interfaces client we have today?
Something like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10-4. I'll refactor that bit and add the wrapper as per the example you linked to.
- Adding new InterfacesClient.List wrapper method InterfacesClient.NewListPager and refactored fixssh code to use that. - Refactored fixssh mocks to return lists of interfaces directly instead of via mocked pager.
Which issue this PR addresses:
Fixes https://issues.redhat.com/browse/ARO-9395
What this PR does / why we need it:
Running a CPMS update of a cluster breaks SSH connectivity to the master nodes. This PR refactors the FixSSH admin update step to add the master NICs to the correct SSH backend pools after the update. In addition, we are adding the capability to find and delete any orphaned NICs from the previous master nodes.
There are some differences in how the NICs for a public vs private cluster are updated by the machine API. For public clusters, the new NICs are not added to the SSH backend pools at all. For the private clusters, all 3 new NICs are added to the ssh-0 backend address pool. The refactoring covers both cases.
Test plan for issue:
The changes have been tested against private and public dev clusters. This branch will be tested in canary against prod public and private clusters as well.
make test-go is passing all tests.
Is there any documentation that needs to be updated for this PR?
N/A
How do you know this will function as expected in production?
Production testing in the canary environment.