storageusers pod in CrashLoopBackOff mode after upgrade #855

euh2 · 2025-01-30T18:20:57Z

After upgrading from 5.0.3 to 7.0.0 the storageusers pod remains in CrashLoopBackOff mode. The logs complain about needing nats, but nats is running and healthy. Anyone any idea about that?

> k logs -f storageusers-846b49d6f9-kcrfh
2025-01-30T18:07:03Z INF no probe provided, reverting to default (OK) endpoint=/healthz line=github.com/owncloud/ocis/v2/ocis-pkg/service/debug/service.go:27 service=storage-users
2025-01-30T18:07:03Z INF registering external service com.owncloud.api.storage-users-2c687af4-9288-4b9c-bc90-1132b24db07f@10.233.230.223:9157 line=github.com/owncloud/ocis/v2/ocis-pkg/registry/register.go:19 service=storage-users
2025-01-30T18:07:03Z INF host info: storageusers-846b49d6f9-kcrfh line=github.com/cs3org/reva/[email protected]/cmd/revad/runtime/runtime.go:85 service=storage-users
2025-01-30T18:07:03Z INF running on 8 cpus line=github.com/cs3org/reva/[email protected]/cmd/revad/runtime/runtime.go:178 service=storage-users
2025-01-30T18:07:03Z INF pidfile saved at: /tmp/revad-storage-users-3d341290-da97-4089-ad67-6cf99530fc17.pid line=github.com/cs3org/reva/[email protected]/cmd/revad/internal/grace/grace.go:187 pkg=grace service=storage-users
2025-01-30T18:07:03Z WRN missing or incomplete nats configuration. Events will not be published. line=github.com/cs3org/reva/[email protected]/internal/http/services/dataprovider/dataprovider.go:84 pkg=rhttp service=storage-users
2025-01-30T18:07:03Z INF rgrpc: grpc service enabled: storageprovider line=github.com/cs3org/reva/[email protected]/pkg/rgrpc/rgrpc.go:228 pkg=rgrpc service=storage-users
2025-01-30T18:07:03Z INF rgrpc: chaining grpc unary interceptor prometheus with priority 100 line=github.com/cs3org/reva/[email protected]/pkg/rgrpc/rgrpc.go:343 pkg=rgrpc service=storage-users
2025-01-30T18:07:03Z INF rgrpc: chaining grpc unary interceptor eventsmiddleware with priority 200 line=github.com/cs3org/reva/[email protected]/pkg/rgrpc/rgrpc.go:343 pkg=rgrpc service=storage-users
2025-01-30T18:07:03Z INF grpc server listening at tcp:0.0.0.0:9157 line=github.com/cs3org/reva/[email protected]/pkg/rgrpc/rgrpc.go:192 pkg=rgrpc service=storage-users
2025-01-30T18:07:04Z ERR need event stream for async file processing line=github.com/cs3org/reva/[email protected]/pkg/storage/utils/decomposedfs/decomposedfs.go:256 pkg=rhttp service=storage-users
2025-01-30T18:07:04Z ERR error starting the http server error="http service dataprovider could not be started,: need nats for async file processing" line=github.com/cs3org/reva/[email protected]/cmd/revad/runtime/runtime.go:198 service=storage-users
2025-01-30T18:07:04Z INF pid file "/tmp/revad-storage-users-3d341290-da97-4089-ad67-6cf99530fc17.pid" got removed line=github.com/cs3org/reva/[email protected]/cmd/revad/internal/grace/grace.go:95 pkg=grace service=storage-users

The text was updated successfully, but these errors were encountered:

wkloucek · 2025-01-31T06:13:51Z

After upgrading from 5.0.3 to 7.0.0

How did you upgrade? Do you use a chart that has 7.0.0 in the Chart.yml -> appVersion? If not, the Chart you're using is probably with this version.

euh2 · 2025-01-31T08:20:12Z

After upgrading from 5.0.3 to 7.0.0

How did you upgrade? Do you use a chart that has 7.0.0 in the Chart.yml -> appVersion? If not, the Chart you're using is probably with this version.

Yes! I pulled the latest from this repository. So the Chart.yaml shows 7.0.0. And helm history ocis shows appversion 7.0.0.

wkloucek · 2025-01-31T08:35:27Z

Are you using the builtin NATS or an external one?

euh2 · 2025-01-31T12:09:01Z

Are you using the builtin NATS or an external one?

I use NATS like one of the examples here in this repository. My helmfile.yaml may answer your question. It's deployed to it's own namespace, so kind of external. But used exclusively by OCIS.

wkloucek · 2025-02-03T05:59:05Z

I use NATS like one of the examples here in this repository. My helmfile.yaml may answer your question. It's deployed to it's own namespace, so kind of external. But used exclusively by OCIS.

The configuration actually looks fine.

Could you please execute following command to ensure that the relevant pods are on 7.0.0:

kubectl get pods --selector='!batch.kubernetes.io/job-name,app.kubernetes.io/instance=ocis' -o jsonpath="{.items[*].spec.containers[*].image}" |\
tr -s '[[:space:]]' '\n' |\
sort |\
uniq -c

Also it would be interesting if the output is similar (I have replicas set to 2, so we see the same output twice.):

    ~  kubectl get pods -l app=storageusers -o yaml | grep -B1 nats                                                                       ✔  garden-420505--de-instncs-0001-external ⎈ 
      - name: MICRO_REGISTRY
        value: nats-js-kv
      - name: MICRO_REGISTRY_ADDRESS
        value: nats.my-namespace.svc.cluster.local:4222
      - name: OCIS_EVENTS_ENDPOINT
        value: nats.my-namespace.svc.cluster.local:4222
      - name: OCIS_EVENTS_CLUSTER
        value: nats
--
      - name: OCIS_CACHE_STORE
        value: nats-js-kv
      - name: OCIS_CACHE_STORE_NODES
        value: nats.my-namespace.svc.cluster.local:4222
--
      - name: MICRO_REGISTRY
        value: nats-js-kv
      - name: MICRO_REGISTRY_ADDRESS
        value: nats.my-namespace.svc.cluster.local:4222
      - name: OCIS_EVENTS_ENDPOINT
        value: nats.my-namespace.svc.cluster.local:4222
      - name: OCIS_EVENTS_CLUSTER
        value: nats
--
      - name: OCIS_CACHE_STORE
        value: nats-js-kv
      - name: OCIS_CACHE_STORE_NODES
        value: nats.my-namespace.svc.cluster.local:4222

euh2 · 2025-02-03T11:36:47Z

Could you please execute following command to ensure that the relevant pods are on 7.0.0:
kubectl get pods --selector='!batch.kubernetes.io/job-name,app.kubernetes.io/instance=ocis' -o jsonpath="{.items[*].spec.containers[*].image}" |\
tr -s '[[:space:]]' '\n' |\
sort |\
uniq -c

Yes. It looks like all of them are updated.

30 owncloud/ocis:7.0.0

And the second output is similar to yours:

- name: MICRO_REGISTRY
        value: nats-js-kv
      - name: MICRO_REGISTRY_ADDRESS
        value: nats.ocis-nats.svc.cluster.local:4222
      - name: OCIS_EVENTS_ENDPOINT
        value: nats.ocis-nats.svc.cluster.local:4222
--
      - name: OCIS_CACHE_STORE
        value: nats-js-kv
      - name: OCIS_CACHE_STORE_NODES
        value: nats.ocis-nats.svc.cluster.local:4222

I also checked the connection to nats.ocis-nats.svc.cluster.local:4222 and received a respons, which is a good thing, I guess:

> kubectl run curlpod --image=curlimages/curl -ti -- sh
If you don't see a command prompt, try pressing enter.
~ $ curl -v nats.ocis-nats.svc.cluster.local:4222
* Host nats.ocis-nats.svc.cluster.local:4222 was resolved.
* IPv6: (none)
* IPv4: 10.96.123.122
*   Trying 10.96.123.122:4222...
* Connected to nats.ocis-nats.svc.cluster.local (10.96.123.122) port 4222
* using HTTP/1.x
> GET / HTTP/1.1
> Host: nats.ocis-nats.svc.cluster.local:4222
> User-Agent: curl/8.11.1
> Accept: */*
>
* Received HTTP/0.9 when not allowed
* closing connection #0
curl: (1) Received HTTP/0.9 when not allowed
~ $

I suppose NATS is a kv-store. But is it dependent on persistence? Can I delete the NATS PVCs and recreate NATS from scratch? Maybe the kv-store can become corrupted in some way after updating to 7.0.0. Although the logs look fine to me:

> kubectl -n ocis-nats logs -l app.kubernetes.io/component=nats
Defaulted container "nats" out of: nats, reloader
Defaulted container "nats" out of: nats, reloader
Defaulted container "nats" out of: nats, reloader
[7] 2025/02/03 11:27:26.073835 [WRN] Catchup for stream '$OCIS > KV_service-registry' resetting first sequence: 386508 on catchup request
[7] 2025/02/03 11:27:26.132553 [INF] JetStream cluster new stream leader for '$OCIS > KV_eventhistory'
[7] 2025/02/03 11:27:26.355371 [INF] JetStream cluster new stream leader for '$OCIS > KV_ids-storage-users'
[7] 2025/02/03 11:27:26.550106 [INF] JetStream cluster new consumer leader for '$OCIS > main-queue > userlog'
[7] 2025/02/03 11:27:27.066558 [INF] JetStream cluster new metadata leader: nats-2/nats
[7] 2025/02/03 11:27:27.641533 [INF] JetStream cluster new consumer leader for '$OCIS > main-queue > search'
[7] 2025/02/03 11:27:27.689822 [INF] JetStream cluster new stream leader for '$OCIS > KV_postprocessing'
[7] 2025/02/03 11:27:28.489216 [INF] JetStream cluster new consumer leader for '$OCIS > main-queue > frontend'
[7] 2025/02/03 11:27:29.294879 [INF] JetStream cluster new consumer leader for '$OCIS > main-queue > activitylog'
[7] 2025/02/03 11:27:35.960745 [INF] JetStream cluster new consumer leader for '$OCIS > KV_service-registry > atTLH0de'
[7] 2025/02/03 11:27:19.999821 [WRN] RAFT [yrzKKRBu - C-R3F-VcxU0MuI] Detected another leader with higher term, will stepdown
[7] 2025/02/03 11:27:20.005924 [WRN] RAFT [yrzKKRBu - S-R3F-GQ0lBcwu] Detected another leader with higher term, will stepdown
[7] 2025/02/03 11:27:20.009586 [WRN] RAFT [yrzKKRBu - S-R3M-tPuEdTd1] Detected another leader with higher term, will stepdown
[7] 2025/02/03 11:27:20.029709 [WRN] RAFT [yrzKKRBu - S-R3M-eSXnkVG4] Detected another leader with higher term, will stepdown
[7] 2025/02/03 11:27:20.153287 [INF] 10.233.205.70:45004 - rid:300 - Route connection created
[7] 2025/02/03 11:27:20.154363 [INF] 10.233.205.70:45004 - rid:300 - Router connection closed: Duplicate Route
[7] 2025/02/03 11:27:24.855415 [INF] JetStream cluster new stream leader for '$OCIS > KV_activitylog'
[7] 2025/02/03 11:27:24.858442 [INF] JetStream cluster new stream leader for '$OCIS > KV_settings-cache'
[7] 2025/02/03 11:27:26.100417 [INF] JetStream cluster new stream leader for '$OCIS > KV_ocis-pkg'
[7] 2025/02/03 11:27:36.483472 [INF] JetStream cluster new consumer leader for '$OCIS > KV_service-registry > beIVHnry'
[7] 2025/02/03 11:27:26.074897 [INF] Catchup for stream '$OCIS > KV_service-registry' complete
[7] 2025/02/03 11:27:26.119206 [INF] JetStream cluster new stream leader for '$OCIS > KV_userlog'
[7] 2025/02/03 11:27:26.165808 [INF] JetStream cluster new consumer leader for '$OCIS > main-queue > postprocessing'
[7] 2025/02/03 11:27:26.188788 [INF] JetStream cluster new consumer leader for '$OCIS > main-queue > graph'
[7] 2025/02/03 11:27:26.958469 [INF] JetStream cluster new stream leader for '$OCIS > KV_cache-roles'
[7] 2025/02/03 11:27:27.061159 [INF] Self is new JetStream cluster metadata leader
[7] 2025/02/03 11:27:28.034532 [INF] JetStream cluster new stream leader for '$OCIS > main-queue'
[7] 2025/02/03 11:27:28.238499 [INF] JetStream cluster new consumer leader for '$OCIS > main-queue > jsoncs3sharemanager'
[7] 2025/02/03 11:27:29.882287 [INF] JetStream cluster new stream leader for '$OCIS > KV_storage-system'
[7] 2025/02/03 11:27:35.968347 [INF] JetStream cluster new consumer leader for '$OCIS > KV_service-registry > 0x9QNsBr'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storageusers pod in CrashLoopBackOff mode after upgrade #855

storageusers pod in CrashLoopBackOff mode after upgrade #855

euh2 commented Jan 30, 2025

wkloucek commented Jan 31, 2025

euh2 commented Jan 31, 2025

wkloucek commented Jan 31, 2025

euh2 commented Jan 31, 2025

wkloucek commented Feb 3, 2025

euh2 commented Feb 3, 2025

storageusers pod in CrashLoopBackOff mode after upgrade #855

storageusers pod in CrashLoopBackOff mode after upgrade #855

Comments

euh2 commented Jan 30, 2025

wkloucek commented Jan 31, 2025

euh2 commented Jan 31, 2025

wkloucek commented Jan 31, 2025

euh2 commented Jan 31, 2025

wkloucek commented Feb 3, 2025

euh2 commented Feb 3, 2025