-
Notifications
You must be signed in to change notification settings - Fork 123
Description
Describe the bug
TLDR: After a restart (during initial reconciliation loop iteration) the replicator incorrectly identity if objects exists.
This results to errors being logged.
Note that replicator works in general,
in the worst case the change is replicated to object in the second reconciliation loop 30 minutes later.
More in details info:
I upgraded replicator pod and the logs contains the following error:
time="2026-02-02T09:35:53Z" level=debug msg="using flag values main.flags{Kubeconfig:\"\", ResyncPeriodS:\"30m\", ResyncPeriod:1800000000000, StatusAddr:\":9102\", AllowAll:false, LogLevel:\"debug\", LogFormat:\"plain\", ReplicateSecrets:true, ReplicateConfigMaps:true, ReplicateRoles:false, ReplicateRoleBindings:false, ReplicateServiceAccounts:true, SyncByContent:false, ExcludeNamespaces:\"\"}"
time="2026-02-02T09:35:53Z" level=info msg="using in-cluster configuration"
time="2026-02-02T09:35:53Z" level=info msg="running Namespace controller" kind=Namespace
time="2026-02-02T09:35:53Z" level=info msg="starting liveness monitor at :9102"
time="2026-02-02T09:35:53Z" level=info msg="running ConfigMap controller" kind=ConfigMap
time="2026-02-02T09:35:53Z" level=info msg="running Secret controller" kind=Secret
time="2026-02-02T09:35:53Z" level=info msg="running ServiceAccount controller" kind=ServiceAccount
time="2026-02-02T09:35:54Z" level=info msg="ServiceAccount kubernetes-replicator/default to be replicated to: [*]" kind=ServiceAccount source=kubernetes-replicator/default
time="2026-02-02T09:35:54Z" level=info msg="Checking if kube-system/default exists? true" kind=ServiceAccount source=kubernetes-replicator/default target=kube-system/default
time="2026-02-02T09:35:54Z" level=debug msg="ServiceAccount kube-system/default is already up-to-date" kind=ServiceAccount source=kubernetes-replicator/default target=kube-system/default
time="2026-02-02T09:35:54Z" level=info msg="Replicated kubernetes-replicator/default to: kube-system" source=kubernetes-replicator/default
time="2026-02-02T09:35:54Z" level=info msg="Checking if logging/default exists? false" kind=ServiceAccount source=kubernetes-replicator/default target=logging/default
time="2026-02-02T09:35:54Z" level=debug msg="Creating a new serviceAccount logging/default" kind=ServiceAccount source=kubernetes-replicator/default target=logging/default
time="2026-02-02T09:35:55Z" level=info msg="Checking if operators/default exists? false" kind=ServiceAccount source=kubernetes-replicator/default target=operators/default
time="2026-02-02T09:35:55Z" level=debug msg="Creating a new serviceAccount operators/default" kind=ServiceAccount source=kubernetes-replicator/default target=operators/default
...
time="2026-02-02T09:35:55Z" level=error msg="could not replicate object to other namespaces" error="# Error edited to be more readable (see bellow)" kind=ServiceAccount resource=kubernetes-replicator/default
Replicated kubernetes-replicator/default to 11 out of 23 namespaces: 12 errors occurred:
* Failed to replicate ServiceAccount kubernetes-replicator/default -> logging: Failed to update serviceAccount logging/default: serviceaccounts "default" already exists: Failed to update serviceAccount logging/default: serviceaccounts "default" already exists
* Failed to replicate ServiceAccount kubernetes-replicator/default -> operators: Failed to update serviceAccount operators/default: serviceaccounts "default" already exists: Failed to update serviceAccount operators/default: serviceaccounts "default" already exists
...
Note that we are syncing the following service account.
We are syncing it because to set default imagePullSecrets in the cluster.
# kubectl get serviceaccount default --namespace kubernetes-replicator --output yaml
apiVersion: v1
imagePullSecrets:
- name: img-pull-XXXXXXX.dkr.ecr.eu-central-1.amazonaws.com
- name: img-pull-registry01
kind: ServiceAccount
metadata:
annotations:
replicator.v1.mittwald.de/replicate-to: '*'
creationTimestamp: "2025-03-19T14:37:39Z"
name: default
namespace: kubernetes-replicator
resourceVersion: "396010346"
uid: 84b113ae-e15a-4c1d-814e-9f185148c218Obviously the serviceaccount default exists in all namespaces (it is a prerequisite to catch this issue, the target object has to exists.)
Watch the creation date.
# kubectl get serviceaccount default --namespace logging --output yaml
apiVersion: v1
imagePullSecrets:
- name: img-pull-XXXXXXX.dkr.ecr.eu-central-1.amazonaws.com
- name: img-pull-registry01
kind: ServiceAccount
metadata:
annotations:
replicator.v1.mittwald.de/replicated-at: "2025-03-19T14:44:54Z"
replicator.v1.mittwald.de/replicated-from-version: "396010346"
creationTimestamp: "2024-10-24T15:23:15Z"
name: default
namespace: logging
resourceVersion: "396010543"
uid: fdf45f0c-f444-4e2e-96bb-b08b4307e4deWe also replicate (image pull) secrets this way and we see the same issue for secrets as well.
To Reproduce
Configure the replicator ro replicate a secret / service account to all namespaces.
Restart the pod and watch the logs. It is pretty reliable.
Expected behavior
No error is logged, changes to existing objects are propagated completely even during first reconciliation run.
Environment:
- Kubernetes version: 1.29
- kubernetes-replicator version: 2.12.2 but log analysis suggests 2.11.1 behaves the same
Additional context
Mine suspicion is that you are using the "cache/Store" before it is fully synced/filled.
-
the usage (based on logs) e.g.
kubernetes-replicator/replicate/serviceaccount/serviceaccounts.go
Lines 103 to 107 in 71d6bbe
targetResource, exists, err := r.Store.GetByKey(targetLocation) if err != nil { return errors.Wrapf(err, "Could not get %s from cache!", targetLocation) } logger.Infof("Checking if %s exists? %v", targetLocation, exists) -
store/cache is created in
kubernetes-replicator/replicate/common/generic-replicator.go
Lines 70 to 90 in 71d6bbe
} store, controller := cache.NewInformer( &cache.ListWatch{ ListFunc: config.ListFunc, WatchFunc: config.WatchFunc, }, config.ObjType, config.ResyncPeriod, cache.ResourceEventHandlerFuncs{ AddFunc: repl.ResourceAdded, UpdateFunc: func(old interface{}, new interface{}) { repl.ResourceAdded(new) }, DeleteFunc: repl.ResourceDeleted, }, ) namespaceWatcher.OnNamespaceAdded(config.Client, config.ResyncPeriod, repl.NamespaceAdded) namespaceWatcher.OnNamespaceUpdated(config.Client, config.ResyncPeriod, repl.NamespaceUpdated) repl.Store = store repl.Controller = controller -
Documentation to Informers https://pkg.go.dev/k8s.io/client-go#section-readme
Cite fromExample (UsingInformers)
// Wait for the initial cache sync.
if !cache.WaitForCacheSync(stopCh, podInformer.HasSynced) {
log.Println("Timed out waiting for caches to sync")
return
}