You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Teleport OS package publishing can fail due to a race condition. The OS package repo tool (OPRT) uses a bunch of non-concurrent logic (as in, no mutex) to avoid a race condition, but it can still infrequently occur. This occurred for the first (recorded) time in a few years as logged here.
Only one instance of the OPRT per environment (stage, prod) can run at once without there being a guarantee of data corruption. Several protections are built in to the system to prevent this. The protection we primarily rely on handles this via retries and backoffs pretty well, but this protection does not allow for any form of communication between pending concurrent instances. In some extremely rare cases it can fail, and the next "level" of protection which stops the job from running at all.
Bug details:
Teleport version - all
Recreation steps
Debug logs
The text was updated successfully, but these errors were encountered:
The fix for this would require significant changes to how we publish packages, so I'm going to group this under the "replace our current OS package publishing setup" work item.
Expected behavior:
Teleport should release successfully
Current behavior:
Teleport OS package publishing can fail due to a race condition. The OS package repo tool (OPRT) uses a bunch of non-concurrent logic (as in, no mutex) to avoid a race condition, but it can still infrequently occur. This occurred for the first (recorded) time in a few years as logged here.
Only one instance of the OPRT per environment (stage, prod) can run at once without there being a guarantee of data corruption. Several protections are built in to the system to prevent this. The protection we primarily rely on handles this via retries and backoffs pretty well, but this protection does not allow for any form of communication between pending concurrent instances. In some extremely rare cases it can fail, and the next "level" of protection which stops the job from running at all.
Bug details:
The text was updated successfully, but these errors were encountered: