From f71177a7c518a7a2cbd63070e384013847cfe263 Mon Sep 17 00:00:00 2001 From: Fabian Deutsch Date: Tue, 15 Aug 2017 09:44:35 +0200 Subject: [PATCH 1/7] docs: Add file-system PV disks design This proposal introduces a design which allows a user to use plain file-system based PersistentVolumes for backing virtual disks. Fixes #266 Signed-off-by: Fabian Deutsch --- docs/filesystem-pv-disks.md | 120 ++++++++++++++++++++++++++++++++++++ 1 file changed, 120 insertions(+) create mode 100644 docs/filesystem-pv-disks.md diff --git a/docs/filesystem-pv-disks.md b/docs/filesystem-pv-disks.md new file mode 100644 index 000000000000..623cd0a17654 --- /dev/null +++ b/docs/filesystem-pv-disks.md @@ -0,0 +1,120 @@ +# Using file-system based PersistentVolumes as virtual disk backends + +## Introduction + +So far Kubernetes PersistentVolumes are remote file-systems attached to a pod at +runtime. This is currently the only way of exposing remote or local storage to +pods, through a file-system. +Despite the fact that storage is exposed as file-systems to pod, KubeVirt has +no mechanism to use these plain file-system based PVs to act as backends for +virtual disks. + + +## Goals + +This proposal is about adding a simple mechanism to allow using file-system +based PersistentVolumes for storing file images backing virtual disks. + +The focus is primarily on providing the right cluster levels mechanics to enable +future changes for improving usability, performance, or other characteristics. + +## Non-Goals + +- Provide a totally atumatic solution +- Cover snapshots explicitly + + +## Use-case + +The primary use-case is to allow a VM to use PVs as backend storage for their +virtual disks. + + +## Design Overview + +Currently KubeVirt only supports [direct PV disks](direct-pv-disks.md). +This feature works, by assigning PVCs backed by network block storage (currently +only iSCSI) to a VM. The connection details of the PVC are then leveraged with +qemu's built-in network storage drivers, to directly connect qemu to the remote +storage. This effectively bypasses all other Kubernetes or KubeVirt components, +and establishes a direct connection between qemu and the storage backend. + +The mechanism proposed in proposal however is leveraging Kubernetes to attach +the remote storage as a file-system to a pod, to use this file-system to storage +disk images which are acting as a backend storage to the VM. + + +## API + +The general API to use PersistentVolume claims as virtual disk backends was +introduced with [direct PV proposal](direct-pv-disks.md). + +This proposal is merely adding an additional field to support addressing files +on a file-system. + +Thus, to use a PV as a virtual disk backend, a user needs to create a claim for +the required PV, and then reference the to be used disk image using the newly +introduced `file` parameter. +The `file` field takes a path, relative to the source of the file-system held by +the referenced PVC. + +An example: + +```yaml +kind: VM +spec: + domain: + devices: + disks: + - type: PersistentVolumeClaim + source: + name: vm-01-disks + file: disk-01.img # The change + target: + bus: scsi + device: sda +``` + +Here the user attaches the PersistentVolumeClaim _vm-01-disks_ to a VM, and uses +the file `disk-01.img` as the backing file for the disk `sda`. + +### Storage Type Inference + +The system can look at the PVC to infer whether file-system or raw block storage +should be used with a PV. +If there is a conflict between the VM API configuration and the backing PV, then +an error must be raised. +One error condition for example would be if a `file` field is given, but the +backing PV is of `volumeType: block`. + + +## Additional Notes + +### Introduction of a `driver` struct + +In future we might want to introduce a `driver` struct for disks to +differentiate between (up to now) qemu's built-in drivers or using kubelet's +file-system and ([in close future](https://github.com/kubernetes/community/pull/805)) +raw block storage support, i.e.: +```yaml + - type: PersistentVolumeClaim + driver: + name: qemu + source: + name: vm-01-disks +--- + - type: PersistentVolumeClaim + driver: + name: kubelet + source: + name: vm-01-disks +``` + +### Snapshots + +Also something for a different proposal, but to be considered are snapshots. +A general approach to snapshots which works with the existing direct PV and +this proposal is, to either use intermediate transparent qcow2 files, or improve +qemu to have a cow subsystem, which is agnostic to the backing store type. +Both solutions however would be independent of the storage type and don't +contradict with our designs. From a0893441c0bc517edee0f9aa4cf30f3450efbf19 Mon Sep 17 00:00:00 2001 From: Fabian Deutsch Date: Tue, 15 Aug 2017 09:46:59 +0200 Subject: [PATCH 2/7] docs: Fix direct PV yaml syntax Signed-off-by: Fabian Deutsch --- docs/direct-pv-disks.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/direct-pv-disks.md b/docs/direct-pv-disks.md index 727d856a3c9d..8c4bf5407742 100644 --- a/docs/direct-pv-disks.md +++ b/docs/direct-pv-disks.md @@ -70,9 +70,9 @@ spec: devices: disks: - type: PersistentVolumeClaim - - source: + source: name: disk-01 - - target: + target: bus: scsi device: sda ``` From 876ba098eac284fbe4f6c15a5251f410941cac6f Mon Sep 17 00:00:00 2001 From: Fabian Deutsch Date: Tue, 15 Aug 2017 09:51:44 +0200 Subject: [PATCH 3/7] docs. Additional notes Signed-off-by: Fabian Deutsch --- docs/filesystem-pv-disks.md | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/docs/filesystem-pv-disks.md b/docs/filesystem-pv-disks.md index 623cd0a17654..f228ff806cd2 100644 --- a/docs/filesystem-pv-disks.md +++ b/docs/filesystem-pv-disks.md @@ -18,6 +18,7 @@ based PersistentVolumes for storing file images backing virtual disks. The focus is primarily on providing the right cluster levels mechanics to enable future changes for improving usability, performance, or other characteristics. + ## Non-Goals - Provide a totally atumatic solution @@ -90,9 +91,9 @@ backing PV is of `volumeType: block`. ## Additional Notes -### Introduction of a `driver` struct +### Introduction of a `driver` field -In future we might want to introduce a `driver` struct for disks to +In future we might want to introduce a `driver` field for disks to differentiate between (up to now) qemu's built-in drivers or using kubelet's file-system and ([in close future](https://github.com/kubernetes/community/pull/805)) raw block storage support, i.e.: @@ -118,3 +119,18 @@ this proposal is, to either use intermediate transparent qcow2 files, or improve qemu to have a cow subsystem, which is agnostic to the backing store type. Both solutions however would be independent of the storage type and don't contradict with our designs. + +### Virtfs + +Virtfs might allow us to directly use file-systems as a backing store for +virtual machines. +This API design should not contradict with this use-case, but it will probably +depend on the `driver` field mentioned above to signal the system how the PV has +to beconsumed (mounted). + +### `VMConfig` level feature based usage + +It might be of value to users to only specify a PV, and not care about the file +to be used for a disk. We can add such logic, but this should probably reside on +the `VMConcifg` level. The `VM` API should focus to provide the mechanics to +support this more opinionated approach. From 37a629b9a7c10ea0c75beeec314ffc4546f9a3d1 Mon Sep 17 00:00:00 2001 From: Fabian Deutsch Date: Thu, 24 Aug 2017 16:00:09 +0200 Subject: [PATCH 4/7] docs: Simplify file-system volume proposal - Introduce 1:1 mapping - Introduce how the data can be shared between libvirt and kubelet Signed-off-by: Fabian Deutsch --- docs/filesystem-pv-disks.md | 68 +++++++++++++++++++++++++++---------- 1 file changed, 50 insertions(+), 18 deletions(-) diff --git a/docs/filesystem-pv-disks.md b/docs/filesystem-pv-disks.md index f228ff806cd2..39b906bdf507 100644 --- a/docs/filesystem-pv-disks.md +++ b/docs/filesystem-pv-disks.md @@ -21,7 +21,7 @@ future changes for improving usability, performance, or other characteristics. ## Non-Goals -- Provide a totally atumatic solution +- Provide a totally automatic solution - Cover snapshots explicitly @@ -44,20 +44,22 @@ The mechanism proposed in proposal however is leveraging Kubernetes to attach the remote storage as a file-system to a pod, to use this file-system to storage disk images which are acting as a backend storage to the VM. +To keep the story compatability between the block and file-system storage, this +proposal assumes that the file-system based storage will also just support a 1:1 +mapping between the virtual disk and volume. +This aligns with the fact that a block volume can also just back a single +virtual disk. + ## API The general API to use PersistentVolume claims as virtual disk backends was introduced with [direct PV proposal](direct-pv-disks.md). -This proposal is merely adding an additional field to support addressing files -on a file-system. +The change suggested by this proposal, does not require any API change. -Thus, to use a PV as a virtual disk backend, a user needs to create a claim for -the required PV, and then reference the to be used disk image using the newly -introduced `file` parameter. -The `file` field takes a path, relative to the source of the file-system held by -the referenced PVC. +To use a PV as a virtual disk backend, a user needs to create a claim for the +required PV, this claim is then used as a disk source for a virtual disk. An example: @@ -69,24 +71,54 @@ spec: disks: - type: PersistentVolumeClaim source: - name: vm-01-disks - file: disk-01.img # The change + name: vm-01-disk target: bus: scsi device: sda ``` -Here the user attaches the PersistentVolumeClaim _vm-01-disks_ to a VM, and uses -the file `disk-01.img` as the backing file for the disk `sda`. +Here the user attaches the PersistentVolumeClaim _vm-01-disk_ to a VM, the +assumption is that the _vm-01-disk_ volume is a file-system based volume. + ### Storage Type Inference -The system can look at the PVC to infer whether file-system or raw block storage -should be used with a PV. -If there is a conflict between the VM API configuration and the backing PV, then -an error must be raised. -One error condition for example would be if a `file` field is given, but the -backing PV is of `volumeType: block`. +The system needs to know if the referenced volume needs to be treated as a +block or file-system volume. This information can be infered from the existing +PV metadata. + + +## Implementation + +### Volume layout + +A file-system based volume will contain the image file only, this file must be +named `disk.img`. +The format of the file must be `raw`. + +The file-system layout of a mounted volume then looks like: + +``` +/disk.img +``` + +### Mounting & sharing + +The file-system volume needs to be associated with the launcher container of +the VM pod, and is thus mounted in the VMs pod launcher mount namespace. + +As the qemu proccesses run in the libvirt's mount namespace, the volume mount +namespace has to be shared with libvirt. + +To achieve this, libvrit needs to gain access to the `/var/lib/kubelet/pods` +path in the `kubelet`'s mount namespace. +This path contains all volume mounts of all containers. + +The handler can craft the (relative) path to a volume, by taking the pod's UUID +and volume informations. This path is then passed to libvirt, which in turn uses +it in a disk definition. + +FIXME define the EXACT way of how to craft the path. ## Additional Notes From 1ced88ddd550db92bd898fc7a778b962bbc539f3 Mon Sep 17 00:00:00 2001 From: Fabian Deutsch Date: Fri, 25 Aug 2017 12:27:12 +0200 Subject: [PATCH 5/7] Add reference to kubelet path for volumes Signed-off-by: Fabian Deutsch --- docs/filesystem-pv-disks.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/filesystem-pv-disks.md b/docs/filesystem-pv-disks.md index 39b906bdf507..8e055882d939 100644 --- a/docs/filesystem-pv-disks.md +++ b/docs/filesystem-pv-disks.md @@ -119,6 +119,8 @@ and volume informations. This path is then passed to libvirt, which in turn uses it in a disk definition. FIXME define the EXACT way of how to craft the path. +Path is defined here: +https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/operationexecutor/operation_executor.go#L394 ## Additional Notes From 52d6980d7e6d8910eaf17b9b8ad5613fdaf45cee Mon Sep 17 00:00:00 2001 From: Fabian Deutsch Date: Mon, 22 Jan 2018 14:17:46 +0100 Subject: [PATCH 6/7] docs: FS PV updated Signed-off-by: Fabian Deutsch --- docs/filesystem-pv-disks.md | 64 +++++++++++-------------------------- 1 file changed, 18 insertions(+), 46 deletions(-) diff --git a/docs/filesystem-pv-disks.md b/docs/filesystem-pv-disks.md index 8e055882d939..7543658f0bea 100644 --- a/docs/filesystem-pv-disks.md +++ b/docs/filesystem-pv-disks.md @@ -35,14 +35,11 @@ virtual disks. Currently KubeVirt only supports [direct PV disks](direct-pv-disks.md). This feature works, by assigning PVCs backed by network block storage (currently -only iSCSI) to a VM. The connection details of the PVC are then leveraged with -qemu's built-in network storage drivers, to directly connect qemu to the remote -storage. This effectively bypasses all other Kubernetes or KubeVirt components, -and establishes a direct connection between qemu and the storage backend. +only iSCSI) to a VM. -The mechanism proposed in proposal however is leveraging Kubernetes to attach -the remote storage as a file-system to a pod, to use this file-system to storage -disk images which are acting as a backend storage to the VM. +The mechanism proposed in this proposal is leveraging Kubernetes to attach +the remote storage as a file-system to a container, to use this file-system to +store a disk image to act as a backend storage to the VM's disk. To keep the story compatability between the block and file-system storage, this proposal assumes that the file-system based storage will also just support a 1:1 @@ -69,31 +66,30 @@ spec: domain: devices: disks: - - type: PersistentVolumeClaim - source: - name: vm-01-disk - target: - bus: scsi - device: sda + - name: root-disk + volumeName: my-fs-store + volumes: + - name: my-fs-store + persistentVolumeClaim: my-fs-claim ``` -Here the user attaches the PersistentVolumeClaim _vm-01-disk_ to a VM, the -assumption is that the _vm-01-disk_ volume is a file-system based volume. +Here the user attaches the PersistentVolumeClaim `my-fs-claim` as a disk to a +VM. ### Storage Type Inference The system needs to know if the referenced volume needs to be treated as a -block or file-system volume. This information can be infered from the existing -PV metadata. +block or file-system volume. +This information can be infered from the existing PV metadata. ## Implementation ### Volume layout -A file-system based volume will contain the image file only, this file must be -named `disk.img`. +A file-system based volume will contain only a single image file, this file +must be named `disk.img`. The format of the file must be `raw`. The file-system layout of a mounted volume then looks like: @@ -102,25 +98,8 @@ The file-system layout of a mounted volume then looks like: /disk.img ``` -### Mounting & sharing - -The file-system volume needs to be associated with the launcher container of -the VM pod, and is thus mounted in the VMs pod launcher mount namespace. - -As the qemu proccesses run in the libvirt's mount namespace, the volume mount -namespace has to be shared with libvirt. - -To achieve this, libvrit needs to gain access to the `/var/lib/kubelet/pods` -path in the `kubelet`'s mount namespace. -This path contains all volume mounts of all containers. - -The handler can craft the (relative) path to a volume, by taking the pod's UUID -and volume informations. This path is then passed to libvirt, which in turn uses -it in a disk definition. - -FIXME define the EXACT way of how to craft the path. -Path is defined here: -https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/operationexecutor/operation_executor.go#L394 +In future we might want to add additional files to carry metadata, but the limit +of a single image file per volume must not be changed. ## Additional Notes @@ -134,7 +113,7 @@ raw block storage support, i.e.: ```yaml - type: PersistentVolumeClaim driver: - name: qemu + name: qemu-built-in source: name: vm-01-disks --- @@ -161,10 +140,3 @@ virtual machines. This API design should not contradict with this use-case, but it will probably depend on the `driver` field mentioned above to signal the system how the PV has to beconsumed (mounted). - -### `VMConfig` level feature based usage - -It might be of value to users to only specify a PV, and not care about the file -to be used for a disk. We can add such logic, but this should probably reside on -the `VMConcifg` level. The `VM` API should focus to provide the mechanics to -support this more opinionated approach. From 359c0bf8fdb63d46d0c37065a43818fde6157f5e Mon Sep 17 00:00:00 2001 From: Fabian Deutsch Date: Mon, 12 Feb 2018 14:51:46 +0100 Subject: [PATCH 7/7] filesystem PV: Adress comments Signed-off-by: Fabian Deutsch --- docs/filesystem-pv-disks.md | 38 +++---------------------------------- 1 file changed, 3 insertions(+), 35 deletions(-) diff --git a/docs/filesystem-pv-disks.md b/docs/filesystem-pv-disks.md index 7543658f0bea..33d6fef5d390 100644 --- a/docs/filesystem-pv-disks.md +++ b/docs/filesystem-pv-disks.md @@ -33,10 +33,6 @@ virtual disks. ## Design Overview -Currently KubeVirt only supports [direct PV disks](direct-pv-disks.md). -This feature works, by assigning PVCs backed by network block storage (currently -only iSCSI) to a VM. - The mechanism proposed in this proposal is leveraging Kubernetes to attach the remote storage as a file-system to a container, to use this file-system to store a disk image to act as a backend storage to the VM's disk. @@ -81,7 +77,8 @@ VM. The system needs to know if the referenced volume needs to be treated as a block or file-system volume. -This information can be infered from the existing PV metadata. +Since Kubernetes 1.9 this information can be infered from the existing PV +metadata. ## Implementation @@ -104,39 +101,10 @@ of a single image file per volume must not be changed. ## Additional Notes -### Introduction of a `driver` field - -In future we might want to introduce a `driver` field for disks to -differentiate between (up to now) qemu's built-in drivers or using kubelet's -file-system and ([in close future](https://github.com/kubernetes/community/pull/805)) -raw block storage support, i.e.: -```yaml - - type: PersistentVolumeClaim - driver: - name: qemu-built-in - source: - name: vm-01-disks ---- - - type: PersistentVolumeClaim - driver: - name: kubelet - source: - name: vm-01-disks -``` - -### Snapshots - -Also something for a different proposal, but to be considered are snapshots. -A general approach to snapshots which works with the existing direct PV and -this proposal is, to either use intermediate transparent qcow2 files, or improve -qemu to have a cow subsystem, which is agnostic to the backing store type. -Both solutions however would be independent of the storage type and don't -contradict with our designs. - ### Virtfs Virtfs might allow us to directly use file-systems as a backing store for virtual machines. This API design should not contradict with this use-case, but it will probably depend on the `driver` field mentioned above to signal the system how the PV has -to beconsumed (mounted). +to be consumed (mounted).