Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait_compose module doesn't exit when compose finishes #281

Open
sallyom opened this issue Aug 16, 2023 · 2 comments
Open

wait_compose module doesn't exit when compose finishes #281

sallyom opened this issue Aug 16, 2023 · 2 comments

Comments

@sallyom
Copy link

sallyom commented Aug 16, 2023

Builder roles fail by timing out while waiting for the compose to finish, although the compose has already finished several minutes ago. The builder roles are running in ec2 rhel9.2 instance.

json output from vm, shows finished:

    {
        "method": "GET",
        "path": "/compose/finished",
        "status": 200,
        "body": {
            "finished": [
                {
                    "blueprint": "rhde",
                    "compose_type": "edge-container",
                    "id": "01d2e66b-96bc-4477-8978-4d27e16e417f",
                    "image_size": 0,
                    "job_created": 1692152909.3148224,
                    "job_finished": 1692153570.499627,
                    "job_started": 1692152909.3239973,
                    "queue_status": "FINISHED",
                    "version": "0.0.1"
                }
            ]
        }
    },
    

Run never progresses past the wait_compose.py / Wait for compose to finish task.

TASK [infra.osbuild.builder : Wait for compose to finish] **********************
task path: /runner/requirements_collections/ansible_collections/infra/osbuild/roles/builder/tasks/main.yml:121
--- no useful info ---
@matoval
Copy link
Collaborator

matoval commented Aug 22, 2023

Hey @sallyom I spun up an ec2 instance and wasn't able to reproduce this issue. I successfully built an edge-container and edge-commit with no issues.

Are you still experiencing this issue?

@sallyom
Copy link
Author

sallyom commented Aug 25, 2023

@matoval the issue happens when I'm running the multi-stage edge-installer compose_type.

I'm running AAP in OpenShift, and I have a rhel9.2 builder VM in ec2 configured as the remote host.
The first stage, edge-commit completes in the VM successfully. So I know the playbook/inventory/connection is a-ok - and also several weldr API calls happen successfully (the blueprint push, the start compose, etc). The playbook running from AAP never proceeds past this first edge-commit stage because the request result that the edge-commit compose is finished never gets through so the wait_compose task fails due to timeout (it hangs - there is no other error).

Here's the weird thing. I can watch the weldr socket API calls in the rhel9 vm - I see that the wait_compose checks every 20s (the default recheck frequency). The instant the compose finishes, the wait_compose goes silent - it no longer checks in every 20s. So something has triggered that the compose finished, but then silence - and the eventual timeout.

Here's the weirder thing. I can run the exact same playbook with the exact same vars to completion if I instead ssh into the rhel9.2 ec2 instance and configure a localhost inventory. When I run it directly on the host I see the multi-stage composes complete. First the edge-commit and the commit is served as expected, then, an empty blueprint is created, then, the edge-installer compose completes and I have the ISO image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants