Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SolrCloud Pod moved to new Node - Replica Migration pending #668

Open
brickpattern opened this issue Dec 14, 2023 · 8 comments
Open

SolrCloud Pod moved to new Node - Replica Migration pending #668

brickpattern opened this issue Dec 14, 2023 · 8 comments

Comments

@brickpattern
Copy link

brickpattern commented Dec 14, 2023

Environment:
Solr Operator Helm : 0.8.0
Solr 9.4 container image
3 node cluster
Persistent storage option (w/ localvolume provisioner)
Managed upgrade strategy.

The K8S node for solrcloud-0 got cordon and Pod was moved to a new node.

When the pod came up on new node, its recognized by as part of the SolrCloud statefulset, but at the Collection level replica was lost on the node. Looking at the Stateful , there's a cluster lock.

solr.apache.org/clusterOpsLock: >-
  {"operation":"RollingUpdate","lastStartTime":"2023-12-14T22:08:47Z","metadata":"{\"requiresReplicaMigration\":false}"}

Plz allow me to ask if im missing a step in the process...

Should the operator automatically do the replica migration?
I have read about Rebalance API and using 9.4 version.

Is there a way to manually kick off the replica migration step to that specific POD?

SolrCloud custom definition.

apiVersion: solr.apache.org/v1beta1
kind: SolrCloud
metadata:
  annotations:
    meta.helm.sh/release-name: solr
    meta.helm.sh/release-namespace: solr
  creationTimestamp: '2023-12-14T21:10:55Z'
  finalizers:
    - storage.finalizers.solr.apache.org
  generation: 3
  labels:
    app.kubernetes.io/instance: solr
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: solr
    app.kubernetes.io/version: 8.11.1
    helm.sh/chart: solr-0.8.0
  managedFields:
    - apiVersion: solr.apache.org/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
         ...
           --removed metadata here for readability
         ...
      manager: solr-operator
      operation: Update
      time: '2023-12-14T21:10:55Z'
    - apiVersion: solr.apache.org/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
        ...
        --removed metadata here for readability
       ...
      manager: helm
      operation: Update
      time: '2023-12-14T21:58:46Z'
    - apiVersion: solr.apache.org/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
         ... 
         ---removed fields{} for readability
         ...
      manager: solr-operator
      operation: Update
      subresource: status
      time: '2023-12-14T21:59:05Z'
  name: solr
  namespace: solr
  resourceVersion: '115467'
  uid: 559977d2-2fd0-42fa-bf28-08bc5cebf851
  selfLink: /apis/solr.apache.org/v1beta1/namespaces/solr/solrclouds/solr
status:
  externalCommonAddress: http://solr-solr-solrcloud.k8s.solr.cloud
  internalCommonAddress: http://solr-solrcloud-common.solr
  podSelector: solr-cloud=solr,technology=solr-cloud
  readyReplicas: 3
  replicas: 3
  solrNodes:
    - externalAddress: http://solr-solr-solrcloud-0.k8s.solr.cloud
      internalAddress: http://solr-solrcloud-0.solr
      name: solr-solrcloud-0
      nodeName: ip-x-y-162-17.us-west-2.compute.internal
      ready: true
      scheduledForDeletion: false
      specUpToDate: true
      version: '0.8'
    - externalAddress: http://solr-solr-solrcloud-1.k8s.solr.cloud
      internalAddress: http://solr-solrcloud-1.solr
      name: solr-solrcloud-1
      nodeName: ip-x-y-160-139.us-west-2.compute.internal
      ready: true
      scheduledForDeletion: false
      specUpToDate: false
      version: '0.8'
    - externalAddress: http://solr-solr-solrcloud-2.k8s.solr.cloud
      internalAddress: http://solr-solrcloud-2.solr
      name: solr-solrcloud-2
      nodeName: ip-x-y-163-213.us-west-2.compute.internal
      ready: true
      scheduledForDeletion: false
      specUpToDate: false
      version: '0.8'
  upToDateNodes: 1
  version: '0.8'
  zookeeperConnectionInfo:
    chroot: /
    externalConnectionString: N/A
    internalConnectionString: >-
      solr-solrcloud-zookeeper-0.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-1.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-2.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181
spec:
  availability:
    podDisruptionBudget:
      enabled: true
      method: ClusterWide
  busyBoxImage:
    repository: library/busybox
    tag: 1.28.0-glibc
  customSolrKubeOptions:
    podOptions:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: technology
                      operator: In
                      values:
                        - solr-cloud
                    - key: solr-cloud
                      operator: In
                      values:
                        - solr
                topologyKey: topology.kubernetes.io/zone
              weight: 100
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: technology
                    operator: In
                    values:
                      - solr-cloud
                  - key: solr-cloud
                    operator: In
                    values:
                      - solr
              topologyKey: kubernetes.io/hostname
      annotations:
        manualrestart: '2023-12-14T00:00:01Z'
      defaultInitContainerResources: {}
      resources:
        limits:
          cpu: '16'
          memory: 32G
        requests:
          cpu: '8'
          memory: 16G
      serviceAccountName: solr-operator
      tolerations:
        - effect: NoSchedule
          key: role
          operator: Equal
          value: solr-cluster
  dataStorage:
    persistent:
      pvcTemplate:
        metadata: {}
        spec:
          resources:
            requests:
              storage: 500Gi
          storageClassName: my-disks
      reclaimPolicy: Delete
  replicas: 3
  scaling:
    populatePodsOnScaleUp: true
    vacatePodsOnScaleDown: true
  solrAddressability:
    commonServicePort: 80
    external:
      domainName: k8s.solr.cloud
      method: Ingress
      nodePortOverride: 80
      useExternalAddress: false
    podPort: 8983
  solrImage:
    pullPolicy: Always
    repository: mycustom.registry.builton.9-4solr
    tag: latest
  solrJavaMem: '-Xms8192m -Xmx16384m'
  solrLogLevel: INFO
  solrOpts: '-Denable.runtime.lib=true -Denable.packages=true'
  updateStrategy:
    managed: {}
    method: Managed
  zookeeperRef:
    provided:
      adminServerService: {}
      chroot: /
      clientService: {}
      config: {}
      ephemeral:
        emptydirvolumesource: {}
      headlessService: {}
      image:
        pullPolicy: IfNotPresent
        repository: pravega/zookeeper
      maxUnavailableReplicas: 1
      replicas: 3
      zookeeperPodPolicy:
        resources: {}
        serviceAccountName: solr-operator

@brickpattern
Copy link
Author

brickpattern commented Dec 14, 2023

Solr Operator logs

2023-12-14T21:39:42Z	INFO	Update required because field changed	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "c747c5ad-f7b1-400a-8957-ff60a71b7531", "statefulSet": "solr-solrcloud", "kind": "statefulSet", "field": "Spec.Template.Annotations", "from": {"kubectl.kubernetes.io/restartedAt":"2023-12-14T15:39:42-06:00","solr.apache.org/solrXmlMd5":"5fe99d590bc63efc3caa743ca939aa5a"}, "to": {"solr.apache.org/solrXmlMd5":"5fe99d590bc63efc3caa743ca939aa5a"}}
2023-12-14T21:39:42Z	INFO	Updating StatefulSet	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "c747c5ad-f7b1-400a-8957-ff60a71b7531", "statefulSet": "solr-solrcloud"}
2023-12-14T21:39:42Z	INFO	Started locked clusterOp	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "122fddd5-d036-4d31-872e-0d4b95bb27a3", "clusterOp": "RollingUpdate", "clusterOpMetadata": "{\"requiresReplicaMigration\":false}"}
2023-12-14T21:39:42Z	INFO	Updating SolrCloud Status	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "122fddd5-d036-4d31-872e-0d4b95bb27a3", "status": {"solrNodes":[{"name":"solr-solrcloud-0","nodeName":"ip-x-y-162-17.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-0.solr","externalAddress":"http://solr-solr-solrcloud-0.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false},{"name":"solr-solrcloud-1","nodeName":"ip-x-y-160-139.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-1.solr","externalAddress":"http://solr-solr-solrcloud-1.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false},{"name":"solr-solrcloud-2","nodeName":"ip-x-y-163-213.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-2.solr","externalAddress":"http://solr-solr-solrcloud-2.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false}],"replicas":3,"podSelector":"solr-cloud=solr,technology=solr-cloud","readyReplicas":3,"upToDateNodes":0,"version":"0.8","internalCommonAddress":"http://solr-solrcloud-common.solr","externalCommonAddress":"http://solr-solr-solrcloud.k8s.solr.cloud","zookeeperConnectionInfo":{"internalConnectionString":"solr-solrcloud-zookeeper-0.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-1.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-2.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181","externalConnectionString":"N/A","chroot":"/"},"backupRestoreReady":false}}
2023-12-14T21:39:42Z	INFO	Removed unneeded clusterOpLock annotation from statefulSet	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "cb77fc36-beda-4e54-a0c5-0b2726756c66", "reason": "RollingUpdate complete"}
2023-12-14T21:39:42Z	INFO	Updating SolrCloud Status	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "cb77fc36-beda-4e54-a0c5-0b2726756c66", "status": {"solrNodes":[{"name":"solr-solrcloud-0","nodeName":"ip-x-y-162-17.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-0.solr","externalAddress":"http://solr-solr-solrcloud-0.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":true,"scheduledForDeletion":false},{"name":"solr-solrcloud-1","nodeName":"ip-x-y-160-139.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-1.solr","externalAddress":"http://solr-solr-solrcloud-1.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":true,"scheduledForDeletion":false},{"name":"solr-solrcloud-2","nodeName":"ip-x-y-163-213.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-2.solr","externalAddress":"http://solr-solr-solrcloud-2.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":true,"scheduledForDeletion":false}],"replicas":3,"podSelector":"solr-cloud=solr,technology=solr-cloud","readyReplicas":3,"upToDateNodes":3,"version":"0.8","internalCommonAddress":"http://solr-solrcloud-common.solr","externalCommonAddress":"http://solr-solr-solrcloud.k8s.solr.cloud","zookeeperConnectionInfo":{"internalConnectionString":"solr-solrcloud-zookeeper-0.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-1.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-2.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181","externalConnectionString":"N/A","chroot":"/"},"backupRestoreReady":false}}
2023-12-14T21:58:46Z	INFO	Update required because field changed	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "71340161-9ee5-47d9-b7f4-7630ad2ab132", "statefulSet": "solr-solrcloud", "kind": "statefulSet", "field": "Spec.Template.Annotations", "from": {"solr.apache.org/solrXmlMd5":"5fe99d590bc63efc3caa743ca939aa5a"}, "to": {"manualrestart":"2023-12-14T00:00:01Z","solr.apache.org/solrXmlMd5":"5fe99d590bc63efc3caa743ca939aa5a"}}
2023-12-14T21:58:46Z	INFO	Updating StatefulSet	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "71340161-9ee5-47d9-b7f4-7630ad2ab132", "statefulSet": "solr-solrcloud"}
2023-12-14T21:58:46Z	INFO	Started locked clusterOp	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "a09d90c0-06f2-4b59-9c68-463c97261d5b", "clusterOp": "RollingUpdate", "clusterOpMetadata": "{\"requiresReplicaMigration\":false}"}
2023-12-14T21:58:46Z	INFO	Updating SolrCloud Status	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "a09d90c0-06f2-4b59-9c68-463c97261d5b", "status": {"solrNodes":[{"name":"solr-solrcloud-0","nodeName":"ip-x-y-162-17.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-0.solr","externalAddress":"http://solr-solr-solrcloud-0.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false},{"name":"solr-solrcloud-1","nodeName":"ip-x-y-160-139.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-1.solr","externalAddress":"http://solr-solr-solrcloud-1.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false},{"name":"solr-solrcloud-2","nodeName":"ip-x-y-163-213.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-2.solr","externalAddress":"http://solr-solr-solrcloud-2.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false}],"replicas":3,"podSelector":"solr-cloud=solr,technology=solr-cloud","readyReplicas":3,"upToDateNodes":0,"version":"0.8","internalCommonAddress":"http://solr-solrcloud-common.solr","externalCommonAddress":"http://solr-solr-solrcloud.k8s.solr.cloud","zookeeperConnectionInfo":{"internalConnectionString":"solr-solrcloud-zookeeper-0.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-1.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-2.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181","externalConnectionString":"N/A","chroot":"/"},"backupRestoreReady":false}}
2023-12-14T21:58:46Z	INFO	ManagedUpdateSelector	Pod update selection started.	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "22e68695-cbc6-4f0d-8a40-7cf26a1841b6", "outOfDatePods": 3, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 0, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 0, "maxPodsToUpdate": 1}
2023-12-14T21:58:46Z	INFO	ManagedUpdateSelector	Pod selected to be deleted for update.	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "22e68695-cbc6-4f0d-8a40-7cf26a1841b6", "pod": "solr-solrcloud-0", "reason": "Pod's replicas are safe to take down, adhering to the minimum active replicas per shard."}
2023-12-14T21:58:46Z	INFO	ManagedUpdateSelector	Pod update selection complete. Maximum number of pods able to be updated reached.	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "22e68695-cbc6-4f0d-8a40-7cf26a1841b6", "maxPodsToUpdate": 1}
2023-12-14T21:58:46Z	INFO	ManagedUpdateSelector	Deleting solr pod for update	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "22e68695-cbc6-4f0d-8a40-7cf26a1841b6", "pod": "solr-solrcloud-0"}
2023-12-14T21:58:46Z	INFO	Updating SolrCloud Status	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "22e68695-cbc6-4f0d-8a40-7cf26a1841b6", "status": {"solrNodes":[{"name":"solr-solrcloud-0","nodeName":"ip-x-y-162-17.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-0.solr","externalAddress":"http://solr-solr-solrcloud-0.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false},{"name":"solr-solrcloud-1","nodeName":"ip-x-y-160-139.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-1.solr","externalAddress":"http://solr-solr-solrcloud-1.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false},{"name":"solr-solrcloud-2","nodeName":"ip-x-y-163-213.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-2.solr","externalAddress":"http://solr-solr-solrcloud-2.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false}],"replicas":3,"podSelector":"solr-cloud=solr,technology=solr-cloud","readyReplicas":3,"upToDateNodes":0,"version":"0.8","internalCommonAddress":"http://solr-solrcloud-common.solr","externalCommonAddress":"http://solr-solr-solrcloud.k8s.solr.cloud","zookeeperConnectionInfo":{"internalConnectionString":"solr-solrcloud-zookeeper-0.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-1.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-2.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181","externalConnectionString":"N/A","chroot":"/"},"backupRestoreReady":false}}
2023-12-14T21:58:46Z	INFO	ManagedUpdateSelector	Pod update selection not started. The number of unavailable pods unavailable (or scheduled for deletion) equals or exceeds the calculated maxPodsUnavailable.	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "d091496b-8199-419f-b541-3709fc4cbd03", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 0, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 1}
2023-12-14T21:58:46Z	INFO	ManagedUpdateSelector	Deleting solr pod for update	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "d091496b-8199-419f-b541-3709fc4cbd03", "pod": "solr-solrcloud-0"}
2023-12-14T21:58:46Z	INFO	Updating SolrCloud Status	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "d091496b-8199-419f-b541-3709fc4cbd03", "status": {"solrNodes":[{"name":"solr-solrcloud-0","nodeName":"ip-x-y-162-17.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-0.solr","externalAddress":"http://solr-solr-solrcloud-0.k8s.solr.cloud","ready":false,"version":"0.8","specUpToDate":false,"scheduledForDeletion":true},{"name":"solr-solrcloud-1","nodeName":"ip-x-y-160-139.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-1.solr","externalAddress":"http://solr-solr-solrcloud-1.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false},{"name":"solr-solrcloud-2","nodeName":"ip-x-y-163-213.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-2.solr","externalAddress":"http://solr-solr-solrcloud-2.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false}],"replicas":3,"podSelector":"solr-cloud=solr,technology=solr-cloud","readyReplicas":2,"upToDateNodes":0,"version":"0.8","internalCommonAddress":"http://solr-solrcloud-common.solr","externalCommonAddress":"http://solr-solr-solrcloud.k8s.solr.cloud","zookeeperConnectionInfo":{"internalConnectionString":"solr-solrcloud-zookeeper-0.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-1.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-2.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181","externalConnectionString":"N/A","chroot":"/"},"backupRestoreReady":false}}
2023-12-14T21:58:47Z	INFO	ManagedUpdateSelector	Pod update selection not started. The number of unavailable pods unavailable (or scheduled for deletion) equals or exceeds the calculated maxPodsUnavailable.	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "63e8cd10-0938-451f-bdfe-d5c7e0bbcb6a", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 0, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 1}
2023-12-14T21:58:47Z	INFO	ManagedUpdateSelector	Deleting solr pod for update	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "63e8cd10-0938-451f-bdfe-d5c7e0bbcb6a", "pod": "solr-solrcloud-0"}
2023-12-14T21:58:47Z	INFO	ManagedUpdateSelector	Pod update selection not started. The number of unavailable pods unavailable (or scheduled for deletion) equals or exceeds the calculated maxPodsUnavailable.	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "d7ad8183-8e6b-4377-8901-fbd4a8129672", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 0, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 1}
2023-12-14T21:58:47Z	INFO	ManagedUpdateSelector	Deleting solr pod for update	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "d7ad8183-8e6b-4377-8901-fbd4a8129672", "pod": "solr-solrcloud-0"}
2023-12-14T21:58:49Z	INFO	ManagedUpdateSelector	Pod update selection not started. The number of unavailable pods unavailable (or scheduled for deletion) equals or exceeds the calculated maxPodsUnavailable.	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "291ef125-ec68-40aa-945f-c3fd10cfbf5b", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 1, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 0}
2023-12-14T21:58:49Z	INFO	Updating SolrCloud Status	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "291ef125-ec68-40aa-945f-c3fd10cfbf5b", "status": {"solrNodes":[{"name":"solr-solrcloud-0","nodeName":"ip-x-y-162-17.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-0.solr","externalAddress":"http://solr-solr-solrcloud-0.k8s.solr.cloud","ready":false,"version":"0.8","specUpToDate":true,"scheduledForDeletion":false},{"name":"solr-solrcloud-1","nodeName":"ip-x-y-160-139.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-1.solr","externalAddress":"http://solr-solr-solrcloud-1.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false},{"name":"solr-solrcloud-2","nodeName":"ip-x-y-163-213.us-west-2.compute.internal","internalAddress":"http://solr-solrcloud-2.solr","externalAddress":"http://solr-solr-solrcloud-2.k8s.solr.cloud","ready":true,"version":"0.8","specUpToDate":false,"scheduledForDeletion":false}],"replicas":3,"podSelector":"solr-cloud=solr,technology=solr-cloud","readyReplicas":2,"upToDateNodes":1,"version":"0.8","internalCommonAddress":"http://solr-solrcloud-common.solr","externalCommonAddress":"http://solr-solr-solrcloud.k8s.solr.cloud","zookeeperConnectionInfo":{"internalConnectionString":"solr-solrcloud-zookeeper-0.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-1.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181,solr-solrcloud-zookeeper-2.solr-solrcloud-zookeeper-headless.solr.svc.cluster.local:2181","externalConnectionString":"N/A","chroot":"/"},"backupRestoreReady":false}}
2023-12-14T21:58:49Z	INFO	ManagedUpdateSelector	Pod update selection not started. The number of unavailable pods unavailable (or scheduled for deletion) equals or exceeds the calculated maxPodsUnavailable.	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "1bfacd90-5943-43cb-8635-b773b5f4a2f0", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 1, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 0}
2023-12-14T21:58:50Z	INFO	ManagedUpdateSelector	Pod update selection not started. The number of unavailable pods unavailable (or scheduled for deletion) equals or exceeds the calculated maxPodsUnavailable.	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "d03b7902-1f06-4e1f-9a26-6334d327a8de", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 1, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 0}
2023-12-14T21:59:00Z	INFO	ManagedUpdateSelector	Pod update selection not started. The number of unavailable pods unavailable (or scheduled for deletion) equals or exceeds the calculated maxPodsUnavailable.	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "ce0ba8f5-6eb5-4717-8fc4-e3d2c1bc750a", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 1, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 0}
2023-12-14T21:59:05Z	INFO	ManagedUpdateSelector	Pod update selection started.	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "bb32cce3-ce4b-4a71-bf07-f828890cb313", "outOfDatePods": 2, "maxPodsUnavailable": 1, "unavailableUpdatedPods": 0, "outOfDatePodsNotStarted": 0, "alreadyScheduledForDeletion": 0, "maxPodsToUpdate": 1}
2023-12-14T21:59:05Z	INFO	ManagedUpdateSelector	Pod not able to be killed for update.	{"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "bb32cce3-ce4b-4a71-bf07-f828890cb313", "pod": "solr-solrcloud-2", "reason": "Shard bookmarks|shard1 already has 1 replicas not active, taking down 1 more would put it over the maximum allowed down: 1"}

@brickpattern
Copy link
Author

brickpattern commented Dec 18, 2023

changing the configs and eliminated the dependency on ZK being on same node as solrcloud pod.
Regardless of the dataStorage type (ephemeral or persistent), when the POD comes up ... the core config its looking for is at $SOLR_HOME/data/<<collection_shardN_replica_nN>>
This file/folder is missing on that rehomed pod.

Looking at other PODs, this respective file folder contents are written by "#Written by CorePropertiesLocator" .

So Q -

  • Should Solr Operator or the SolrCloud itself talking to ZK place necessary config files to have the replica filled in with data?

  • In previous versions of Solr (say Solr 7) , there was a parameter at Create Collection API to set AUTOADDREPLICA ? I dont see that in 9.4. Any relevance with collection properties that POD replacement is losing data?

@brickpattern
Copy link
Author

bump...

@HoustonPutman
Copy link
Contributor

Newer versions of Solr do not have an AutoAddReplica feature.

What kind of persistent volumes are you using? The data should not be missing when the pod is restarted. That's a failure of kubernetes/your PVC, and the Solr Operator isn't built to handle that. When you are running with Persistent Data it will expect the data to be there when restarted.

If you are running with ephemeral data, it will remove the data from the node before killing the pod. It can get into a bad state if the pod is killed on its own, and the data isn't moved beforehand.

@brickpattern
Copy link
Author

I have tested using both Persistent and Ephemeral resulting in loss of data for that solr node.

For Persistent storage , using the local volume provisioner. As long as the POD comes back in the same EKS node it binds to the PVC to the same PV and retains the data. But when the POD gets scheduled to another EKS node (which is my scenario) the data is lost. the folders/directory for the core config on that replaced POD is void of any data.

@HoustonPutman
Copy link
Contributor

The only ways that local volumes work as PVs is if the PVs that are created have node limitations (i.e. the Pod connected to the PV cannot be rescheduled onto another node). Are you sure that the local volume provisioner is setup correctly?

@brickpattern
Copy link
Author

yes, PVs are setup correctly. Solr PODS come up correctly either by evicting or restarting.

The specific scenario I'm certifying is EKS node taint n drain ( replace node with new hardware).

So it appears from your description, Operator will NOT move the data as the local volume is tied to a EKs node.

Is there a recommendation to manually trigger data replication from other 2 nodes ? like a API call to fulfill the data

$SOLR_HOME/data/<<collection_shardN_replica_nN>>

@HoustonPutman
Copy link
Contributor

Ahhh yes during node draining. That is a problem.

Yes, that is correct. What I would do is issue a Replace node command, moving all of the replicas off of the data-less pod. Then you can do a balance after that to move replicas back onto that pod.

It would be nice to have a command to fix all of the data on broken replicas. Maybe I'll make a JIRA for that.

One thing the operator can do is notice that a PV has changed, and if so automate the replica moving to restore data. Can you confirm that the PVs that are tied to the Solr PVCs change after draining the node? If so we can watch those PVs and try to fix the data if they are changed. (i.e. the data might be gone)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants