Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Persistent data storage in Python controller bindings is not power loss safe. #36654

Open
kepstin opened this issue Nov 27, 2024 · 0 comments
Labels
bug Something isn't working needs triage

Comments

@kepstin
Copy link

kepstin commented Nov 27, 2024

Reproduction steps

A Home Assistant user reported an issue where following a power loss, all of their connected Matter devices had disappeared.

From the user's provided logs: matter-logs.txt
It looks like the json file was corrupted (with partially written data), and upon reading the corrupted file the Matter SDK decided to reset and start from a blank storage.

The code responsible for writing the file is here:

def Commit(self):
''' Commits the cached JSON configuration to file (if one was provided in the constructor).
Otherwise, this is a no-op.
'''
if (self._path is None):
return
if (self._file is None):
try:
self._file = open(self._path, 'w')
except Exception as ex:
LOGGER.error(
f"Could not open {self._path} for writing configuration. Error: {ex}")
return
self._file.seek(0)
json.dump(self._jsonData, self._file, ensure_ascii=True, indent=4)
self._file.truncate()
self._file.flush()

The logic here does an in-place overwrite of the file. The problem this causes is that if power is cut at any point after the write is started, but before the operating system finishing flushing writes to disk, then the user may be left with a corrupt partially written file which contains a mix of data - parts of the new file and parts of the old file.

I'd recommend the following logic instead:

  • (optional) Make a backup of of the storage file by copying it to a different filename, e.g. with a ".backup" suffix, leaving the existing file in place. (flush and fsync the newly written file to ensure the backup is updated before the original file gets replaced.)
  • Write the new data to a temporary file in the same directory (e.g. using tempfile.mkstemp or tempfile.NamedTemporaryFile with delete=False).
  • After writing the data, flush and fsync to ensure the file contents are persisted on the disk. This is needed to ensure ordering - make sure that the file contents will be fully written before the file gets renamed.
  • Rename the temporary file to the final filename, atomically replacing the existing file.
  • Run fsync on the renamed file to ensure the rename is committed to storage.

Bug prevalence

Rarely - requires unlucky timing of power loss

GitHub hash of the SDK that was being used

648d7bf (chip-wheels 2024.9.0)

Platform

python

Platform Version(s)

No response

Anything else?

I originally reported this issue on Home Assistant's python-matter-server: home-assistant-libs/python-matter-server#977 but it turns out that the responsible code was actually in the Matter SDK, not provided by Home Assistant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
None yet
Development

No branches or pull requests

1 participant