Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data pool for metadata pool isn't found #5103

Open
yellowpattern opened this issue Jan 27, 2025 · 1 comment
Open

data pool for metadata pool isn't found #5103

yellowpattern opened this issue Jan 27, 2025 · 1 comment
Labels
component/deployment Helm chart, kubernetes templates and configuration Issues/PRs component/rbd Issues related to RBD

Comments

@yellowpattern
Copy link

yellowpattern commented Jan 27, 2025

Describe the bug

Creating a PVC using a StorageClass with different data & metadata pools fails.

rbd_util.go:1641] ID: 27 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c setting image options on my-rbd-repl/my-vol-cd61239b-5756-4b4a-be8b-63dff4c31b58, data pool %!s(MISSING)my-rbd

The two pools are there:

ceph df | grep my-rbd
my-rbd       17   32    8 KiB      473   12 KiB      0     6.5 TiB
my-rbd-repl  19   32     19 B        5    8 KiB      0     5 TiB

The storage class:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: csi-rbd-sc
annotations:
storageclass.kubernetes.io/is-default-class: 'false'
provisioner: rbd.csi.ceph.com
parameters:
pool: my-rbd-repl
dataPool: my-rbd
clusterID: ....
volumeNamePrefix: my-vol-
imageFeatures: layering
imageFormat: "2"
csi.storage.k8s.io/fstype: ext4
csi.storage.k8s.io/provisioner-secret-namespace: default
csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
csi.storage.k8s.io/node-stage-secret-namespace: default
csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
csi.storage.k8s.io/controller-expand-secret-namespace: default
csi.storage.k8s.io/controller-expand-secret-name: csi-rbd-secret
volumeBindingMode: Immediate
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
- discard

Environment details

  • Image/version of Ceph CSI driver : quay.io/cephcsi/cephcsi:v3.13.0
  • Helm chart version : ceph-csi-rbd-3.13.0
  • Kernel version : 5.14
  • Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its
    krbd or rbd-nbd) :
  • Kubernetes cluster version :1.27.4
  • Ceph cluster version : 19.2.0

Steps to reproduce

Steps to reproduce the behavior:

  1. Configure ceph-csi storage class with both metadata and data pools:
    • my-rbd is an erasure coded pool
    • my-rbd-repl is a replication pool.
  2. If I just use "my-rbd-repl" for the storageclass, no problem (just inefficient disk).
  3. If I just use "my-rbd" for the storageclass, I get a different error - the storageclass wants a replicated data pool for metadata.
  4. This error comes from when I try to use a different pool for each of metadata and data.

Actual results

I get an error implying that ceph-csi can't find the data pool.

Expected behavior

For ceph-csi to use my-rbd-repl for metadata and my-rbd for data

Logs

If the issue is in PVC creation, deletion, cloning please attach complete logs
of below containers.

This is from the provisioner that's doing the work:

I0127 08:37:11.457540       1 utils.go:266] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c GRPC call: /csi.v1.Controller/CreateVolume
I0127 08:37:11.457919       1 utils.go:267] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c GRPC request: {"capacity_range":{"required_bytes":52428800},"name":"pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c","parameters":{"clusterID":"fd9c1e26-da6e-11ef-8593-3cecef103636","csi.storage.k8s.io/pv/name":"pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c","csi.storage.k8s.io/pvc/name":"raw-block-pvc","csi.storage.k8s.io/pvc/namespace":"ceph-csi-rbd","dataPool":"my-rbd","imageFeatures":"layering","imageFormat":"2","pool":"my-rbd-repl","volumeNamePrefix":"my-vol-"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Block":{}},"access_mode":{"mode":1}}]}
I0127 08:37:11.458319       1 rbd_util.go:1387] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c setting disableInUseChecks: false image features: [layering] mounter: rbd
I0127 08:37:11.459737       1 omap.go:89] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c got omap values: (pool="my-rbd-repl", namespace="", name="csi.volumes.default"): map[]
I0127 08:37:11.465571       1 omap.go:159] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c set omap keys (pool="my-rbd-repl", namespace="", name="csi.volumes.default"): map[csi.volume.pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c:58568a22-4043-4326-84ee-62d860bdf19d])
I0127 08:37:11.467524       1 omap.go:159] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c set omap keys (pool="my-rbd-repl", namespace="", name="csi.volume.58568a22-4043-4326-84ee-62d860bdf19d"): map[csi.imagename:my-vol-58568a22-4043-4326-84ee-62d860bdf19d csi.volname:pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c csi.volume.owner:ceph-csi-rbd])
I0127 08:37:11.467548       1 rbd_journal.go:515] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c generated Volume ID (0001-0024-fd9c1e26-da6e-11ef-8593-3cecef103636-0000000000000013-58568a22-4043-4326-84ee-62d860bdf19d) and image name (my-vol-58568a22-4043-4326-84ee-62d860bdf19d) for request name (pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c)
I0127 08:37:11.467596       1 rbd_util.go:437] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c rbd: create my-rbd-repl/my-vol-58568a22-4043-4326-84ee-62d860bdf19d size 50M (features: [layering]) using mon 10.0.1.1:6789,10.0.1.2:6789,10.0.1.3:6789
I0127 08:37:11.467650       1 rbd_util.go:1641] ID: 77 **Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c setting image options on my-rbd-repl/my-vol-58568a22-4043-4326-84ee-62d860bdf19d, data pool %!s(MISSING)my-rbd**
E0127 08:37:11.480323       1 controllerserver.go:749] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c failed to create volume: failed to create rbd image: rbd: ret=-22, Invalid argument
I0127 08:37:11.484437       1 omap.go:126] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c removed omap keys (pool="my-rbd-repl", namespace="", name="csi.volumes.default"): [csi.volume.pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c]
E0127 08:37:11.484478       1 utils.go:271] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c GRPC error: rpc error: code = Internal desc = failed to create rbd image: rbd: ret=-22, Invalid argument
@nixpanic
Copy link
Member

This is a debug message, and it's formatting looks broken:

 I0127 08:37:11.467650 1 rbd_util.go:1641] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c setting image options on my-rbd-repl/my-vol-58568a22-4043-4326-84ee-62d860bdf19d, data pool %!s(MISSING)my-rbd

It comes from this line:

logMsg += ", data pool %s" + rv.DataPool

There is a %s marker in the logMsg, which should not be there. It causes the %!s(MISSING) part in the output.

That also means that setting the data-pool did not fail, as the debug log message is only written at the end of the function, in case no failures occurred.

The real problem seems to be this:

 E0127 08:37:11.480323 1 controllerserver.go:749] ID: 77 Req-ID: pvc-9e5f3d1a-0120-438a-88d0-aaf410a8854c failed to create volume: failed to create rbd image: rbd: ret=-22, Invalid argument

Which happens at the time of the image creation.

err = librbd.CreateImage(pOpts.ioctx, pOpts.RbdImageName,
uint64(util.RoundOffVolSize(pOpts.VolSize)*helpers.MiB), options)
if err != nil {
return fmt.Errorf("failed to create rbd image: %w", err)

It is not clear which image option could be invalid. The dataPool option is something that we test with an erasure coded pool in our e2e that runs for every PR. We can be quite confident that it works, generally. There must be something else in your environment that causes RBD-image creation to fail. Can you check the following:

  • do the credentials for the provisioner have access to both pools?
  • can you create an image manually with the same configuration?
  • are there any logs on the Ceph side about the failure (in the OSDs, or maybe MONs)?

@nixpanic nixpanic added component/rbd Issues related to RBD component/deployment Helm chart, kubernetes templates and configuration Issues/PRs labels Jan 29, 2025
nixpanic added a commit to nixpanic/ceph-csi that referenced this issue Jan 30, 2025
When a `dataPool` is passed while creating a volume, there is a
`%!s(MISSING)` piece added to a debug log message. By using
`fmt.Sprintf()` instead of concatinating the string, this should be gone
now.

Updates: ceph#5103
Signed-off-by: Niels de Vos <[email protected]>
nixpanic added a commit to nixpanic/ceph-csi that referenced this issue Jan 30, 2025
When a `dataPool` is passed while creating a volume, there is a
`%!s(MISSING)` piece added to a debug log message. When concatinating
strings, the `%s` formatter is not needed.

Updates: ceph#5103
Signed-off-by: Niels de Vos <[email protected]>
mergify bot pushed a commit that referenced this issue Jan 30, 2025
When a `dataPool` is passed while creating a volume, there is a
`%!s(MISSING)` piece added to a debug log message. When concatinating
strings, the `%s` formatter is not needed.

Updates: #5103
Signed-off-by: Niels de Vos <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/deployment Helm chart, kubernetes templates and configuration Issues/PRs component/rbd Issues related to RBD
Projects
None yet
Development

No branches or pull requests

2 participants