-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Daemon redesign - using controller-runtime #788
base: master
Are you sure you want to change the base?
Conversation
Thanks for your PR,
To skip the vendors CIs, Maintainers can use one of:
|
9f5207e
to
7d4e946
Compare
e17b584
to
ddc70f2
Compare
929886d
to
9377404
Compare
79cc69b
to
dc6eb11
Compare
Signed-off-by: Sebastian Sch <[email protected]>
Signed-off-by: Sebastian Sch <[email protected]>
Signed-off-by: Sebastian Sch <[email protected]>
Signed-off-by: Sebastian Sch <[email protected]>
Signed-off-by: Sebastian Sch <[email protected]>
Signed-off-by: Sebastian Sch <[email protected]>
Signed-off-by: Sebastian Sch <[email protected]>
Signed-off-by: Sebastian Sch <[email protected]>
It's only a one time run script so it should not be a problem to run with debug logs. Signed-off-by: Sebastian Sch <[email protected]>
dc6eb11
to
50ad79a
Compare
Signed-off-by: Sebastian Sch <[email protected]>
Signed-off-by: Sebastian Sch <[email protected]>
50ad79a
to
f369b65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few minor comments. This PR hugely improves the config-daemon shape.
nodeStateDrainAnnotationCurrent, nodeStateExist, err := dr.ensureAnnotationExists(ctx, nodeNetworkState, constants.NodeStateDrainAnnotationCurrent) | ||
nodeStateDrainAnnotationCurrent, currentNodeStateExist, err := dr.ensureAnnotationExists(ctx, nodeNetworkState, constants.NodeStateDrainAnnotationCurrent) | ||
if err != nil { | ||
reqLogger.Error(err, "failed to ensure nodeStateDrainAnnotation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reqLogger.Error(err, "failed to ensure nodeStateDrainAnnotation") | |
reqLogger.Error(err, "failed to ensure nodeStateDrainAnnotationCurrent") |
can you also fix on L123 with nodeStateDrainAnnotation
-> nodeDrainAnnotation
?
log.Log.V(2).Info("ReadSriovResult(): file does not exist, return empty result") | ||
return &SriovResult{}, nil | ||
return nil, false, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not return an empty SriovResult anymore. who's right, the log or the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code always! :P
return false, nil | ||
} | ||
|
||
func (dn *DaemonReconcile) getHostNetworkStatus(nodeState *sriovnetworkv1.SriovNetworkNodeState) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please rename to updateStatusFromHost()
or something similar.
a get*()
function does not sound like something that changes the passed parameters.
var sriovResult = &systemd.SriovResult{SyncStatus: consts.SyncStatusSucceeded, LastSyncError: ""} | ||
dn.desiredNodeState, err = dn.sriovClient.SriovnetworkV1().SriovNetworkNodeStates(vars.Namespace).Get(context.Background(), vars.NodeName, metav1.GetOptions{}) | ||
// if we are running in systemd mode we want to get the sriov result from the config-daemon that runs in systemd | ||
sriovResult, exist, err := dn.checkSystemdStatus(ctx, desiredNodeState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sriovResult, exist, err := dn.checkSystemdStatus(ctx, desiredNodeState) | |
sriovResult, sriovResultExists, err := dn.checkSystemdStatus(ctx, desiredNodeState) |
Renaming this variable might be useful when reading around L235, where it's not obvious to know what "exist" refers to.
@@ -1,290 +0,0 @@ | |||
package daemon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great to see this file go away! :)
c := current.Status.DeepCopy().Interfaces | ||
d := desiredNodeState.Status.DeepCopy().Interfaces | ||
for idx := range d { | ||
// check if it's a new device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// check if it's a new device | |
if idx >= len(c) { | |
return true, nil | |
} | |
// check if it's a new device |
I don't know if it can ever happen (maybe if a new device is hot-plugged between reconcile loops). But it's safer to check for slice length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but in theory (not sure if it's possible) you can have a new device like hot-unplug and after that hot-plug so the len will be the same but the device is not no?
return nil | ||
} | ||
|
||
func (dn *DaemonReconcile) shouldUpdateStatus(current, desiredNodeState *sriovnetworkv1.SriovNetworkNodeState) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this function returns no error. can get rid of the second return parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial review
@@ -108,14 +108,7 @@ type DrainStateAnnotationPredicate struct { | |||
} | |||
|
|||
func (DrainStateAnnotationPredicate) Create(e event.CreateEvent) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about removing the Create fn ?
the default of predicate.Funcs is true.
is object being nil a real concern here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the concern is not nil. without this when a new node is added to the cluster the reconcile will not get called without this.
and if the reconcile is not created no one adds the labels to make the drain possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without this when a new node is added to the cluster the reconcile will not get called without this
are you sure ? i thought by default it will call reconcile since predicate.Funcs contains implementation that will return true. which means to enqueue event for reconcile.
// creates the in-cluster config | ||
config, err = rest.InClusterConfig() | ||
func getOperatorConfig(kClient runtimeclient.Client) (*sriovnetworkv1.SriovOperatorConfig, error) { | ||
// Init feature gates once to prevent race conditions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unrelated comment
|
||
refreshCh := make(chan daemon.Message) | ||
defer close(refreshCh) | ||
func UseKubeletKubeConfig() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it need to start with upper case ?
|
||
// Setup reconcile loop with manager | ||
if err = dm.SetupWithManager(mgr); err != nil { | ||
setupLog.Error(err, "unable to create setup daemon manager for SriovNetworkNodeState") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unable to setup daemon with manager for SriovNetworkNodeState
} | ||
|
||
type Daemon struct { | ||
type DaemonReconcile struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DaemonReconciler ? or better yet, NodeReconciler ?
ns := &sriovnetworkv1.SriovNetworkNodeState{} | ||
// init openstack info | ||
if vars.PlatformType == consts.VirtualOpenStack { | ||
ns, err = dn.HostHelpers.GetCheckPointNodeState() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we limit the usage of nodestate from checkpoint to just this block ? i.e define a new var.
it seems odd that other platform types we dont load from checkpoint during init (which initializes InitialState
global var
|
||
// save init state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: // save init state if it does not exist on host
log.Log.Error(err, "nodeStateSyncHandler(): Failed to fetch node state", "name", vars.NodeName) | ||
return err | ||
reqLogger.Error(err, "failed to check systemd status unexpected error") | ||
return ctrl.Result{}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we dont want to fail here ?
<-dn.syncCh | ||
return nil | ||
} | ||
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: lets define requeueTime as a const
return err | ||
} | ||
postNetworkServiceEnabled, err := dn.HostHelpers.IsServiceEnabled(systemd.SriovPostNetworkServicePath) | ||
// if there are no node drifted changes, and we are on the latest applied policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: // if there are no host state drift changes, ...
} | ||
reqDrain = reqDrain || systemdConfModified || !exist | ||
// require reboot if drain needed for systemd mode | ||
reqReboot = reqReboot || systemdConfModified || reqDrain || !exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can drop : || systemdConfModified
and || !exist
they are checked in reqDrain
log.Log.Error(err, "nodeStateSyncHandler(): failed to write supported nic ids file for systemd mode") | ||
return err | ||
} | ||
func (dn *DaemonReconcile) checkSystemdStatus(ctx context.Context, desiredNodeState *sriovnetworkv1.SriovNetworkNodeState) (*systemd.SriovResult, bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add docstring explaining the returned args ?
|
||
// update the lastAppliedGeneration | ||
dn.lastAppliedGeneration = desiredNodeState.Generation | ||
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same as previous comment. lets use a const
// return sriovResult, err | ||
//} | ||
|
||
// TODO: check if we need this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe we need.
in case systemd services were enabled, and there is error in sriovResult, we want to reflect that in status and stop reconcile. else we end up in a reboot loop no ? (which we dont want here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ykulazhenkov for some more feedback on this bit
once we get this in, i think we should move the node controllers under |
@@ -7,7 +7,7 @@ contents: | | |||
|
|||
[Service] | |||
Type=oneshot | |||
ExecStart=/var/lib/sriov/sriov-network-config-daemon -v 2 --zap-log-level 2 service --phase pre | |||
ExecStart=/var/lib/sriov/sriov-network-config-daemon service --phase pre |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this means reboot on update ? since systemd file changes
@@ -7,4 +7,4 @@ GOPATH="${GOPATH:-$HOME/go}" | |||
JUNIT_OUTPUT="${JUNIT_OUTPUT:-/tmp/artifacts}" | |||
export PATH=$PATH:$GOPATH/bin | |||
|
|||
${root}/bin/ginkgo -output-dir=$JUNIT_OUTPUT --junit-report "unit_report.xml" -v "$SUITE" -- -report=$JUNIT_OUTPUT | |||
${root}/bin/ginkgo --timeout 3h -output-dir=$JUNIT_OUTPUT --junit-report "unit_report.xml" -v "$SUITE" -- -report=$JUNIT_OUTPUT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this change related ?
No description provided.