Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait VM becomes getable after triggering restore #1419

Merged

Conversation

albinsun
Copy link
Contributor

@albinsun albinsun commented Aug 2, 2024

Which issue(s) this PR fixes:

What this PR does / why we need it:

  1. ADD vm_checker.wait_getable between api_client.backups.restore and vm_checker.wait_ip_addresses
    • vm_checker.wait_ip_addresses calls api_client.vms.start in the callback chain and will assert FAIL if target VM does not exist.
    • api_client.backups.restore needs some time to communicate with backup-target to restore VM. So depends on the env., VM instance may not be created immediately.

Special notes for your reviewer:

This is a quick fix.
IMO, coder tends to expect function wait_XXX does some polling GET, thus some error-prone. To enhance, we may:

  1. Only polling GET and call POST, PUT, DELETE explicitly.
  2. Refactor function name to something like create_AAA_and_wait_XXX.
  3. ...

Additional documentation or context

Verification

  • Before (harvester-runtests#61)
    image

  • After (harvester-runtests#62)
    image

Copy link
Contributor

@TachunLin TachunLin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thank you @albinsun for checking and bringing the solution to fix the test_restore with_new_vm related test cases.

I added the vm_checker.wait_getable into all related test case in the test_4_vm_backup_restore.py and also test along with the new class TestBackupRestoreWithSnapshot for PR #1384

And trigger a new test inside the main Jenkins vm toward the raven cluster.

The result is all test_restore_with_new_vm test cases are running well and also PASS most of the backup and restore test cases except those flaky cases already failed before for other reason.
image

@khushboo-rancher khushboo-rancher merged commit 1d57e65 into harvester:main Aug 5, 2024
3 checks passed
@lanfon72
Copy link
Member

lanfon72 commented Aug 5, 2024

I think the root cause would be, we introduced new state while restoring VM, so we should check the restoring process is completed rather than the VM is available.

@albinsun albinsun deleted the add_wait_getable_in_vm_checker branch August 6, 2024 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants