Teleport fails to release due to OPRT race condition #52488

fheinecke · 2025-02-25T19:30:07Z

Expected behavior:

Teleport should release successfully

Current behavior:

Teleport OS package publishing can fail due to a race condition. The OS package repo tool (OPRT) uses a bunch of non-concurrent logic (as in, no mutex) to avoid a race condition, but it can still infrequently occur. This occurred for the first (recorded) time in a few years as logged here.

Only one instance of the OPRT per environment (stage, prod) can run at once without there being a guarantee of data corruption. Several protections are built in to the system to prevent this. The protection we primarily rely on handles this via retries and backoffs pretty well, but this protection does not allow for any form of communication between pending concurrent instances. In some extremely rare cases it can fail, and the next "level" of protection which stops the job from running at all.

Bug details:

Teleport version - all
Recreation steps
Debug logs

fheinecke · 2025-02-25T19:31:07Z

The fix for this would require significant changes to how we publish packages, so I'm going to group this under the "replace our current OS package publishing setup" work item.

camscale · 2025-02-27T21:20:19Z

Failure today: https://github.com/gravitational/teleport.e/actions/runs/13564944838/job/37918292297

fheinecke added the release-build-failures label Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Teleport fails to release due to OPRT race condition #52488

Teleport fails to release due to OPRT race condition #52488

fheinecke commented Feb 25, 2025

fheinecke commented Feb 25, 2025

camscale commented Feb 27, 2025

Teleport fails to release due to OPRT race condition #52488

Teleport fails to release due to OPRT race condition #52488

Comments

fheinecke commented Feb 25, 2025

fheinecke commented Feb 25, 2025

camscale commented Feb 27, 2025