Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement suggestion for image prepull #1368

Open
NymanRobin opened this issue Mar 13, 2024 · 8 comments
Open

Enhancement suggestion for image prepull #1368

NymanRobin opened this issue Mar 13, 2024 · 8 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Indicates an issue is ready to be actively worked on.

Comments

@NymanRobin
Copy link
Member

When preparing the host for the metal3-dev-env the virtual machine base images are rather big, which leads to the current process being quite fragile to network or process interruptions. This might lead to errors in the configuration since the integrity checks in iamge_prepull.sh are quite loose.

The first problem raises from checking if the image exists
if [[ ! -f "${IMAGE_NAME}" ]]; then
This only cares if the image exists not about the content at all

What does not help the situation is the check of checksum, since if the checksum does not exist it is generated from the file and file might be corrupt at this point already.

Finally I think the user experience could be elevated by adding a progress bar to this slow downloads so the user is not confused about what is happening with for example this options to wget: --show-progress --progress=bar:force:noscroll
Note: This is quite new option for wget so might break on some older machines so might be best to consider a fallback in case wget does not recognize the options

An improvement suggestion would be to download the checksum directly from artifactory each time and comparing with the one of the actual file
This is normally done by appending the type of checksum you want to the end of the filepath so something like this:
https://artifactory.nordix.org/ui/native/metal3/images/k8s_v1.29.0/UBUNTU_22.04_NODE_IMAGE_K8S_v1.29.0.qcow2.sha256

This did not work even though I can see in the Artifactory UI that the checksum is generated, might be related to repository settings could be investigated further.

Otherwise this could most likely also be achieved with some wget options

@metal3-io-bot metal3-io-bot added the needs-triage Indicates an issue lacks a `triage/foo` label and requires one. label Mar 13, 2024
@tuminoid
Copy link
Member

Is there a case where the file download is corrupt and wget would still return success, or is the failure case that the pre-pulled image exists and is corrupt?

Finally I think the user experience could be elevated by adding a progress bar to this slow downloads so the user is not confused about what is happening with for example this options to wget: --show-progress --progress=bar:force:noscroll
Note: This is quite new option for wget so might break on some older machines so might be best to consider a fallback in case wget does not recognize the options

Yes, this option does not work on our Centos variant. All nice things are missing in Centos side :)

Also that option does not look great in logs:

--2024-03-14 07:02:26--  https://artifactory.nordix.org/artifactory/metal3/images/k8s_v1.29.0/CENTOS_9_NODE_IMAGE_K8S_v1.29.0.qcow2
Resolving artifactory.nordix.org (artifactory.nordix.org)... 91.106.198.25
Connecting to artifactory.nordix.org (artifactory.nordix.org)|91.106.198.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2270668288 (2.1G) [application/octet-stream]
Saving to: ‘CENTOS_9_NODE_IMAGE_K8S_v1.29.0.qcow2.3’

^MCENTOS_9_NODE_IMAGE   0%[                    ]       0  --.-KB/s               ^MCENTOS_9_NODE_IMAGE   2%[                    ]  54.73M   274MB/s               ^MCENTOS_9_NODE_IMAGE   6%[>                   ] 130.85M   327MB/s               ^MCENTOS_9_NODE_IMAGE   9%[>                   ] 204.76M   341MB/s               ^MCENTOS_9_NODE_IMAGE  12%[=>                  ] 281.48M   352MB/s               ^MCENTOS_9_NODE_IMAGE  16%[==>                 ] 359.76M   360MB/s               ^MCENTOS_9_NODE_IMAGE  20%[===>                ] 437.87M   365MB/s               ^MCENTOS_9_NODE_IMAGE  23%[===>                ] 517.19M   369MB/s               ^MCENTOS_9_NODE_IMAGE  27%[====>               ] 598.85M   374MB/s               ^MCENTOS_9_NODE_IMAGE  31%[=====>              ] 671.87M   373MB/s               ^MCENTOS_9_NODE_IMAGE  33%[=====>              ] 735.64M   368MB/s               ^MCENTOS_9_NODE_IMAGE  36%[======>             ] 799.74M   363MB/s               ^MCENTOS_9_NODE_IMAGE  39%[======>             ] 863.41M   360MB/s               ^MCENTOS_9_NODE_IMAGE  42%[=======>            ] 927.01M   356MB/s               ^MCENTOS_9_NODE_IMAGE  45%[========>           ] 991.03M   354MB/s               ^MCENTOS_9_NODE_IMAGE  49%[========>           ]   1.04G   354MB/s    eta 3s     ^MCENTOS_9_NODE_IMAGE  52%[=========>          ]   1.11G   362MB/s    eta 3s     ^MCENTOS_9_NODE_IMAGE  56%[==========>         ]   1.18G   361MB/s    eta 3s     ^MCENTOS_9_NODE_IMAGE  59%[==========>         ]   1.26G   362MB/s    eta 3s     ^MCENTOS_9_NODE_IMAGE  63%[===========>        ]   1.34G   363MB/s    eta 3s     ^MCENTOS_9_NODE_IMAGE  66%[============>       ]   1.41G   363MB/s    eta 2s     ^MCENTOS_9_NODE_IMAGE  70%[=============>      ]   1.49G   363MB/s    eta 2s     ^MCENTOS_9_NODE_IMAGE  73%[=============>      ]   1.56G   361MB/s    eta 2s     ^MCENTOS_9_NODE_IMAGE  77%[==============>     ]   1.63G   358MB/s    eta 2s     ^MCENTOS_9_NODE_IMAGE  80%[===============>    ]   1.70G   355MB/s    eta 2s     ^MCENTOS_9_NODE_IMAGE  83%[===============>    ]   1.77G   359MB/s    eta 1s     ^MCENTOS_9_NODE_IMAGE  87%[================>   ]   1.85G   362MB/s    eta 1s     ^MCENTOS_9_NODE_IMAGE  90%[=================>  ]   1.92G   366MB/s    eta 1s     ^MCENTOS_9_NODE_IMAGE  94%[=================>  ]   2.00G   372MB/s    eta 1s     ^MCENTOS_9_NODE_IMAGE  97%[==================> ]   2.07G   374MB/s    eta 1s     ^MCENTOS_9_NODE_IMAGE 100%[===================>]   2.11G   377MB/s    in 5.9s   

That said, the current logging is really spammy, printing a literal thousand lines...

This did not work even though I can see in the Artifactory UI that the checksum is generated, might be related to repository settings could be investigated further.

Checking the file listing at https://artifactory.nordix.org/ui/native/metal3/images/k8s_v1.29.0/ does not show any checksum files to be downloaded. We should probably be uploading them with the images themselves.

@NymanRobin
Copy link
Member Author

It is only in a failure case that the corruption can happen and Indeed it does not generate so great in log files

Seeing the checksum in the filebrowser depends on this setting in artifactory: artifactory.ui.hideChecksums

But in the UI view I can at least see it: https://artifactory.nordix.org/ui/repos/tree/General/metal3/images/k8s_v1.29.0/UBUNTU_22.04_NODE_IMAGE_K8S_v1.29.0.qcow2

@Rozzii
Copy link
Member

Rozzii commented Mar 27, 2024

/triage accepted
Imo the output of this should be :

  • nicer logging (no bar please, I agree with @tuminoid that looks bad on logs)
  • implementation of checksum validation
    /kind feature

@metal3-io-bot metal3-io-bot added triage/accepted Indicates an issue is ready to be actively worked on. kind/feature Categorizes issue or PR as related to a new feature. and removed needs-triage Indicates an issue lacks a `triage/foo` label and requires one. labels Mar 27, 2024
@metal3-io-bot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 25, 2024
@Rozzii
Copy link
Member

Rozzii commented Jun 26, 2024

/remove-lifecycle stale

@metal3-io-bot metal3-io-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 26, 2024
@Rozzii
Copy link
Member

Rozzii commented Jun 28, 2024

/remove-lifecycle stale

@metal3-io-bot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 26, 2024
@tuminoid
Copy link
Member

/remove-lifecycle stale
/lifecycle frozen

@metal3-io-bot metal3-io-bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Indicates an issue is ready to be actively worked on.
Projects
Status: Backlog
Development

No branches or pull requests

4 participants