Handle lost sparseness in non http data sources #3213

akalenyu · 2024-04-21T14:04:50Z

What this PR does / why we need it:
Handle some non-http sources where we lose sparseness similarly to #3219

For this to be complete, we must fix the broken check for sparseness in e2e, to help us test against regression.
The first issue is that we look at content size since #2168,
which is not beneficial in the sparseness check as opposed to disk usage.

The second issue is that the check we have in tests for sparseness is not honest because of the "equal to" part.
The virtual size has to be strictly greater than the size on disk.

The third issue is that we almost always resize which results in significantly larger virtual size.
What we really care about is comparing against the original virtual size of the image.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

NONE

kubevirt-bot · 2024-04-21T14:05:16Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from akalenyu. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

akalenyu · 2024-04-26T20:33:03Z

/test all

akalenyu · 2024-04-30T13:23:30Z

/retest

akalenyu · 2024-05-01T17:37:55Z

/test pull-containerized-data-importer-fossa

coveralls · 2024-07-29T15:13:29Z

coverage: 59.141% (-0.03%) from 59.168%
when pulling 483318c on akalenyu:sparse-check-tests
into e263c02 on kubevirt:main.

alromeros

Really nice, just a couple of comments

alromeros · 2024-07-29T15:59:57Z

tests/framework/pvc.go

-	return info.VirtualSize >= imageContentSize, nil
+	fmt.Fprintf(ginkgo.GinkgoWriter, "INFO: VerifySparse comparison: OriginalVirtual: %d vs SizeOnDisk: %d\n", originalVirtualSize, info.ActualSize)
+	// The size on disk of a sparse image is significantly lower than the image's virtual size
+	return originalVirtualSize-info.ActualSize >= units.MB, nil


Probably intentional but why use units.MB instead of the previously used MiB ?

Yeah intentional but it's been a while so I don't remember the exact reasoning, have to take a look

What is the 1MB fudge factor all about?

Its probably to ensure we align the size on 1Mib. qemu gets unhappy if the image is not aligned on the actual storage block size.

I think this check is a little too strict, wondering why current virtual size is not appropriate

So I looked up the 1MiB->1MB switch.
It's because qemu-img convert only manages to shave off one megabyte (not mebi) for our test image
INFO: VerifySparse comparison: OriginalVirtual: 18874368 vs SizeOnDisk: 17829888

So this check is specific to a particular image?

No, you pass the virtual size of the used image as a parameter

alromeros · 2024-07-29T16:07:16Z

pkg/importer/upload-datasource.go

@@ -51,8 +54,7 @@ func (ud *UploadDataSource) Info() (ProcessingPhase, error) {
 	if ud.contentType == cdiv1.DataVolumeArchive {
 		return ProcessingPhaseTransferDataDir, nil
 	}
-	if !ud.readers.Convert {
-		// Uploading a raw file, we can write that directly to the target.
+	if ud.directWriteException {


Maybe pass the source content type and do the sourceContentType == common.BlockdeviceClone here? It makes this easier to debug and understand than having an arbitrary directWriteException that's not too informative. Just a nit though, feel free to ignore.

yeah that makes sense

it could be a little awkward having it know about source content type

akalenyu · 2024-07-30T08:38:28Z

/hold

vddk - no actual bug, fake vddk plugin lying

imageio

containerized-data-importer/pkg/importer/imageio-datasource.go

Lines 119 to 121 in dd518e7

    
           if !is.readers.Convert { 
        
           	return ProcessingPhaseTransferDataFile, nil 
        
           }

mhenriks · 2024-08-01T18:34:17Z

tests/framework/pvc.go

@@ -284,24 +283,15 @@ func (f *Framework) VerifyBlankDisk(namespace *k8sv1.Namespace, pvc *k8sv1.Persi
 }

 // VerifySparse checks a disk image being sparse after creation/resize.
-func (f *Framework) VerifySparse(namespace *k8sv1.Namespace, pvc *k8sv1.PersistentVolumeClaim, imagePath string) (bool, error) {
+func (f *Framework) VerifySparse(namespace *k8sv1.Namespace, pvc *k8sv1.PersistentVolumeClaim, imagePath string, originalVirtualSize int64) (bool, error) {


Why not get rid of this function and check !VerifyImagePreallocated()?

Mainly because of the third point in the PR description

The third issue is that we almost always resize which results in significantly larger virtual size.
What we really care about is comparing against the original virtual size of the image.

We could get rid of one of them by negating the other but those additions have to stay

What we really care about is comparing against the original virtual size of the image.

Why?

Since current virtual size is post-resize you always end up comparing 1Gi or more (requested disk image size) versus a modest couple of megabytes, which is always going to pass as sort of a false negative

The first issue is that we look at content size since 2168, which is not beneficial in the sparseness check as opposed to disk usage. The second issue is that the check we have in tests for sparsiness is not honest because of the "equal to" part. The virtual size has to be strictly greater than the content (as the content is displayed by tools that understand sparseness). The third issue is that we almost always resize which results in significantly larger virtual size. What we really care about is comparing against the original virtual size of the image. Signed-off-by: Alex Kalenyuk <[email protected]>

Basically a follow up to 3219 for upload sources, which suffer from the same issue of losing sparseness. Signed-off-by: Alex Kalenyuk <[email protected]>

- ImageIO raw images are not passed through any sparsification mechanism - VDDK ones do, but the fake plugin used for testing lies that they're not Signed-off-by: Alex Kalenyuk <[email protected]>

akalenyu · 2024-09-01T13:26:31Z

/retest

kubevirt-bot · 2024-09-01T17:01:05Z

@akalenyu: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-containerized-data-importer-e2e-ceph	`483318c`	link	true	`/test pull-containerized-data-importer-e2e-ceph`

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

kubevirt-bot · 2024-11-30T18:15:16Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot · 2024-12-30T18:47:16Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

alromeros · 2025-01-02T08:27:10Z

/remove-lifecycle rotten

kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/M labels Apr 21, 2024

kubevirt-bot requested review from aglitke and alromeros April 21, 2024 14:05

akalenyu force-pushed the sparse-check-tests branch 2 times, most recently from db6ec2c to 3dc043f Compare April 21, 2024 21:53

kubevirt-bot added size/XS and removed size/M labels Apr 21, 2024

akalenyu force-pushed the sparse-check-tests branch from 3dc043f to 5b926ed Compare April 21, 2024 22:10

akalenyu changed the title ~~[WIP] Correct wrong check of resulting image sparsiness~~ Correct wrong check of resulting image sparseness Apr 21, 2024

kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 21, 2024

akalenyu mentioned this pull request Apr 21, 2024

[WIP] Revert to using nbdkit for http conversion imports #3212

Closed

akalenyu force-pushed the sparse-check-tests branch from 5b926ed to 66d6ac4 Compare April 21, 2024 23:11

kubevirt-bot added size/M and removed size/XS labels Apr 21, 2024

akalenyu force-pushed the sparse-check-tests branch 5 times, most recently from c2a3372 to e9c286a Compare April 21, 2024 23:47

akalenyu force-pushed the sparse-check-tests branch 2 times, most recently from 2fddf72 to c4388c6 Compare April 30, 2024 12:54

akalenyu force-pushed the sparse-check-tests branch 2 times, most recently from 4c77744 to 86ef1bf Compare May 1, 2024 10:24

kubevirt-bot removed the size/M label May 1, 2024

kubevirt-bot added the size/L label May 1, 2024

akalenyu force-pushed the sparse-check-tests branch from 86ef1bf to 85903b9 Compare May 1, 2024 17:30

akalenyu changed the title ~~Correct wrong check of resulting image sparseness~~ Handle lost sparseness in non http data sources May 1, 2024

akalenyu force-pushed the sparse-check-tests branch from 85903b9 to ad75665 Compare May 2, 2024 08:09

kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 16, 2024

akalenyu force-pushed the sparse-check-tests branch from ad75665 to 8aaab52 Compare July 29, 2024 15:09

kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 29, 2024

alromeros reviewed Jul 29, 2024

View reviewed changes

kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2024

mhenriks reviewed Aug 1, 2024

View reviewed changes

kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 16, 2024

akalenyu force-pushed the sparse-check-tests branch from 8aaab52 to 0827b34 Compare August 19, 2024 10:01

kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 19, 2024

akalenyu added 3 commits August 28, 2024 15:21

Use scratch space for non-archive uploads

7182165

Basically a follow up to 3219 for upload sources, which suffer from the same issue of losing sparseness. Signed-off-by: Alex Kalenyuk <[email protected]>

tests: skip sparse check for imageio/vddk

483318c

- ImageIO raw images are not passed through any sparsification mechanism - VDDK ones do, but the fake plugin used for testing lies that they're not Signed-off-by: Alex Kalenyuk <[email protected]>

akalenyu force-pushed the sparse-check-tests branch from 0827b34 to 483318c Compare August 28, 2024 13:04

kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 30, 2024

kubevirt-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 30, 2024

kubevirt-bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle lost sparseness in non http data sources #3213

Handle lost sparseness in non http data sources #3213

akalenyu commented Apr 21, 2024 •

edited

Loading

kubevirt-bot commented Apr 21, 2024

akalenyu commented Apr 26, 2024

akalenyu commented Apr 30, 2024

akalenyu commented May 1, 2024

coveralls commented Jul 29, 2024 •

edited

Loading

alromeros left a comment

alromeros Jul 29, 2024

akalenyu Jul 30, 2024

mhenriks Aug 7, 2024

awels Aug 7, 2024

mhenriks Aug 7, 2024

akalenyu Aug 7, 2024 •

edited

Loading

mhenriks Aug 8, 2024

akalenyu Aug 8, 2024

alromeros Jul 29, 2024

akalenyu Jul 30, 2024

akalenyu Jul 30, 2024

akalenyu commented Jul 30, 2024

mhenriks Aug 1, 2024

akalenyu Aug 4, 2024 •

edited

Loading

mhenriks Aug 7, 2024

akalenyu Aug 7, 2024

akalenyu commented Sep 1, 2024

kubevirt-bot commented Sep 1, 2024 •

edited

Loading

kubevirt-bot commented Nov 30, 2024

kubevirt-bot commented Dec 30, 2024

alromeros commented Jan 2, 2025

Handle lost sparseness in non http data sources #3213

Are you sure you want to change the base?

Handle lost sparseness in non http data sources #3213

Conversation

akalenyu commented Apr 21, 2024 • edited Loading

kubevirt-bot commented Apr 21, 2024

akalenyu commented Apr 26, 2024

akalenyu commented Apr 30, 2024

akalenyu commented May 1, 2024

coveralls commented Jul 29, 2024 • edited Loading

alromeros left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akalenyu Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akalenyu commented Jul 30, 2024

Choose a reason for hiding this comment

akalenyu Aug 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akalenyu commented Sep 1, 2024

kubevirt-bot commented Sep 1, 2024 • edited Loading

kubevirt-bot commented Nov 30, 2024

kubevirt-bot commented Dec 30, 2024

alromeros commented Jan 2, 2025

akalenyu commented Apr 21, 2024 •

edited

Loading

coveralls commented Jul 29, 2024 •

edited

Loading

akalenyu Aug 7, 2024 •

edited

Loading

akalenyu Aug 4, 2024 •

edited

Loading

kubevirt-bot commented Sep 1, 2024 •

edited

Loading