Skip to content

Conversation

@Najib632
Copy link
Contributor

Description

Fixes #36680

This PR addresses a race condition in system/manager/manager.py that caused unexpected device shutdowns.

The Issue:
Previously, manager.py updated the /var/tmp/power_watchdog file by opening it with w mode, which truncates the file to 0 bytes before writing the new timestamp. If the external AGNOS power monitoring process read the file during this brief window, it would see an empty file (interpreted as 0), assume the watchdog had expired, and trigger an immediate shutdown.

The Fix:
Implemented an atomic write pattern:

  1. Write the new timestamp to a temporary file (/var/tmp/power_watchdog.tmp).
  2. Use os.rename to atomically replace the target file.
    This guarantees that the reader process always sees a valid file content—either the old timestamp or the new one—but never an empty file.

Verification:

  • Added a runtime self-verification step that reads the file back immediately after the update to ensure the timestamp is valid.
  • If the verification fails (which should be impossible with os.rename), a critical error is logged to help with future debugging.

@greatgitsby
Copy link
Contributor

this PR makes a ton of unnecessary changes that bloat the diff

@Najib632
Copy link
Contributor Author

Najib632 commented Nov 26, 2025

this PR makes a ton of unnecessary changes that bloat the diff

Thank you sir, I have removed the unnecessary changes I made

@greatgitsby
Copy link
Contributor

i meant primarily on reorganizing imports and indents

@Najib632 Najib632 force-pushed the fix-power-watchdog-race branch from 81c51ec to 2e654a8 Compare November 26, 2025 08:12
@Najib632
Copy link
Contributor Author

i meant primarily on reorganizing imports and indents

Okay sir, I switched the HEAD back to a previous commit that didn't have the bloat and then applied the changes from there

@Bglg2k
Copy link

Bglg2k commented Nov 26, 2025 via email

@adeebshihadeh
Copy link
Contributor

Still way too complicated, see how I did it here: 436e3de

@Najib632 Najib632 deleted the fix-power-watchdog-race branch December 1, 2025 23:49
@Najib632
Copy link
Contributor Author

Najib632 commented Dec 2, 2025

Still way too complicated, see how I did it here: 436e3de

That was neat sir, adding it to the atomic_write really made it simpler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

unexpected shutdown from AGNOS power monitor

4 participants