Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No system-response after update Raspberry pi 5 with NVMe wit 14.0 #3720

Open
Ladenburg1 opened this issue Dec 5, 2024 · 29 comments
Open

No system-response after update Raspberry pi 5 with NVMe wit 14.0 #3720

Ladenburg1 opened this issue Dec 5, 2024 · 29 comments
Labels
board/raspberrypi Raspberry Pi Boards bug

Comments

@Ladenburg1
Copy link

Describe the issue you are experiencing

after clicking update the system don't response. Only switching power off and on is restarting the system. After restart the system it is on he version 13.2
tried it about 5 times with the same behaviour

What operating system image do you use?

rpi5-64 (Raspberry Pi 5 64-bit OS)

What version of Home Assistant Operating System is installed?

13.2

Did the problem occur after upgrading the Operating System?

Yes

Hardware details

Raspberry Pi5 8GB
NVMe 256 GB Intenso installed directly on th Pi (HAT-Module)

Steps to reproduce the issue

  1. Klick install
  2. System is hanging...
  3. restart with disconnecting from Power and reconnecting to power
    ...

Anything in the Supervisor logs that might be useful for us?

2024-12-05 22:57:13.946 INFO (MainThread) [supervisor.os.manager] Fetch OTA update from https://os-artifacts.home-assistant.io/14.0/haos_rpi5-64-14.0.raucb
2024-12-05 22:57:18.303 INFO (MainThread) [supervisor.os.manager] Completed download of OTA update file /data/tmp/hassos-14.0.raucb
2024-12-05 22:57:22.288 INFO (MainThread) [supervisor.os.manager] Install of Home Assistant Operating System 14.0 success
2024-12-05 22:57:22.289 INFO (MainThread) [supervisor.host.control] Initialize host reboot using logind
2024-12-05 22:57:22.289 INFO (MainThread) [supervisor.addons.manager] Phase 'application' stopping 3 add-ons
2024-12-05 22:57:22.297 INFO (SyncWorker_6) [supervisor.docker.manager] Stopping addon_core_configurator application
2024-12-05 22:57:25.571 INFO (SyncWorker_6) [supervisor.docker.manager] Cleaning addon_core_configurator application
2024-12-05 22:57:25.594 INFO (SyncWorker_0) [supervisor.docker.manager] Stopping addon_db21ed7f_filebrowser application
2024-12-05 22:57:25.865 INFO (SyncWorker_0) [supervisor.docker.manager] Cleaning addon_db21ed7f_filebrowser application
2024-12-05 22:57:25.886 INFO (SyncWorker_4) [supervisor.docker.manager] Stopping addon_de91e161_hassio_onedrive_backup application
2024-12-05 22:57:26.112 INFO (SyncWorker_4) [supervisor.docker.manager] Cleaning addon_de91e161_hassio_onedrive_backup application
2024-12-05 22:57:26.172 INFO (SyncWorker_1) [supervisor.docker.manager] Stopping homeassistant application
2024-12-05 22:57:33.673 INFO (MainThread) [supervisor.addons.manager] Phase 'services' stopping 4 add-ons
2024-12-05 22:57:33.678 INFO (SyncWorker_5) [supervisor.docker.manager] Stopping addon_core_ssh application
2024-12-05 22:57:36.930 INFO (SyncWorker_5) [supervisor.docker.manager] Cleaning addon_core_ssh application
2024-12-05 22:57:36.952 INFO (SyncWorker_2) [supervisor.docker.manager] Stopping addon_core_matter_server application
2024-12-05 22:57:41.548 INFO (SyncWorker_2) [supervisor.docker.manager] Cleaning addon_core_matter_server application
2024-12-05 22:57:41.569 INFO (SyncWorker_0) [supervisor.docker.manager] Stopping addon_a0d7b954_influxdb application
2024-12-05 22:57:45.027 INFO (SyncWorker_0) [supervisor.docker.manager] Cleaning addon_a0d7b954_influxdb application
2024-12-05 22:57:45.053 INFO (SyncWorker_1) [supervisor.docker.manager] Stopping addon_a0d7b954_grafana application
2024-12-05 22:57:48.806 INFO (SyncWorker_1) [supervisor.docker.manager] Cleaning addon_a0d7b954_grafana application
2024-12-05 22:57:48.824 INFO (MainThread) [supervisor.addons.manager] Phase 'system' stopping 1 add-ons
2024-12-05 22:57:48.828 INFO (SyncWorker_5) [supervisor.docker.manager] Stopping addon_core_mosquitto application
2024-12-05 22:57:52.389 INFO (SyncWorker_5) [supervisor.docker.manager] Cleaning addon_core_mosquitto application
2024-12-05 22:57:52.407 INFO (MainThread) [supervisor.addons.manager] Phase 'initialize' stopping 0 add-ons
2024-12-05 22:57:52.407 INFO (MainThread) [supervisor.plugins.cli] Stopping cli plugin
2024-12-05 22:57:52.410 INFO (SyncWorker_6) [supervisor.docker.manager] Stopping hassio_cli application
2024-12-05 22:57:55.656 INFO (SyncWorker_6) [supervisor.docker.manager] Cleaning hassio_cli application
2024-12-05 22:57:55.671 INFO (MainThread) [supervisor.plugins.dns] Stopping CoreDNS plugin
2024-12-05 22:57:55.674 INFO (SyncWorker_2) [supervisor.docker.manager] Stopping hassio_dns application
2024-12-05 22:57:58.898 INFO (SyncWorker_2) [supervisor.docker.manager] Cleaning hassio_dns application
2024-12-05 22:57:58.916 INFO (MainThread) [supervisor.plugins.audio] Stopping Audio plugin
2024-12-05 22:57:58.920 INFO (SyncWorker_7) [supervisor.docker.manager] Stopping hassio_audio application
2024-12-05 22:58:02.174 INFO (SyncWorker_7) [supervisor.docker.manager] Cleaning hassio_audio application
2024-12-05 22:58:02.190 INFO (MainThread) [supervisor.plugins.multicast] Stopping Multicast plugin
2024-12-05 22:58:02.193 INFO (SyncWorker_0) [supervisor.docker.manager] Stopping hassio_multicast application
2024-12-05 22:58:05.356 INFO (SyncWorker_0) [supervisor.docker.manager] Cleaning hassio_multicast application
s6-rc: info: service legacy-services: stopping
2024-12-05 22:58:05.478 INFO (MainThread) [supervisor.misc.scheduler] Shutting down scheduled tasks
2024-12-05 22:58:05.478 INFO (MainThread) [supervisor.docker.monitor] Stopped docker events monitor
2024-12-05 22:58:05.479 INFO (MainThread) [supervisor.api] Stopping API on 172.30.32.2
2024-12-05 22:58:05.483 INFO (MainThread) [supervisor.hardware.monitor] Stopped Supervisor hardware monitor
2024-12-05 22:58:05.487 INFO (MainThread) [supervisor.dbus.manager] Closed conection to system D-Bus.
2024-12-05 22:58:05.490 INFO (MainThread) [supervisor.core] Supervisor is down - 0
2024-12-05 22:58:05.491 INFO (MainThread) [__main__] Closing Supervisor
[21:58:05] INFO: Watchdog restart after closing
[21:58:05] WARNING: Halt Supervisor
[21:58:05] INFO: Supervisor restart after closing
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/udev.sh
[22:02:19] INFO: Using udev information from host
cont-init: info: /etc/cont-init.d/udev.sh exited 0
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun supervisor (no readiness notification)
services-up: info: copying legacy longrun watchdog (no readiness notification)
[22:02:19] INFO: Starting local supervisor watchdog...
s6-rc: info: service legacy-services successfully started
2024-12-05 22:02:21.173 INFO (MainThread) [__main__] Initializing Supervisor setup
2024-12-05 22:02:21.234 INFO (MainThread) [supervisor.utils.sentry] Initializing Supervisor Sentry
2024-12-05 23:02:21.239 INFO (MainThread) [supervisor.bootstrap] Setting up coresys for machine: raspberrypi5-64
2024-12-05 23:02:21.244 INFO (MainThread) [supervisor.docker.supervisor] Attaching to Supervisor ghcr.io/home-assistant/aarch64-hassio-supervisor with version 2024.11.4

Anything in the Host logs that might be useful for us?

no

System information

Version | core-2024.12.0 -- | -- Installationstyp | Home Assistant OS Entwicklung | false Supervisor | true Docker | true Benutzer | root Virtuelle Umgebung | false Python-Version | 3.13.0 Betriebssystemfamilie | Linux Betriebssystem-Version | 6.6.31-haos-raspi CPU-Architektur | aarch64 Zeitzone | Europe/Berlin Konfigurationsverzeichnis | /config

Core-Kennzahlen

Prozessornutzung
0.3 %
Arbeitsspeicher-Auslastung
9 %

Supervisor-Kennzahlen

Version core-2024.12.0 Installationstyp Home Assistant OS Entwicklung false Supervisor true Docker true Benutzer root Virtuelle Umgebung false Python-Version 3.13.0 Betriebssystemfamilie Linux Betriebssystem-Version 6.6.31-haos-raspi CPU-Architektur aarch64 Zeitzone Europe/Berlin Konfigurationsverzeichnis /config Home Assistant Community Store

VERWALTEN
GitHub API ok
GitHub Content ok
GitHub Web ok
HACS Data ok
GitHub API Calls Remaining 5000
Installed Version 2.0.1
Stage running
Available Repositories 1476
Downloaded Repositories 12
Home Assistant Cloud

VERWALTEN
Angemeldet false
Zertifikatsserver erreichbar ok
Authentifizierungsserver erreichbar ok
Home Assistant Cloud erreichbar ok
Home Assistant Supervisor

Host-Betriebssystem Home Assistant OS 13.2
Update-Channel beta
Supervisor-Version supervisor-2024.11.4
Agent-Version 1.6.0
Docker-Version 27.2.0
Speicherplatz gesamt 228.5 GB
Speicherplatz genutzt 15.4 GB
Gesund true
Unterstützt true
host_connectivity true
supervisor_connectivity true
ntp_synchronized true
virtualization
Board rpi5-64
Supervisor-API ok
Versions-API ok
Installierte Add-ons File editor (5.8.0), Terminal & SSH (9.15.0), Filebrowser (2.23.0_14), Matter Server (6.6.1), Let's Encrypt (5.2.7), Mosquitto broker (6.4.1), Cloudflared (5.2.2), InfluxDB (5.0.1), Grafana (10.2.2), Samba Backup (5.2.0), OneDrive Backup (2.3.6)
Dashboards

VERWALTEN
Dashboards 7
Ressourcen 0
Ansichten 24
Modus storage
Recorder

Startzeitpunkt des ältesten Laufs 25. November 2024 um 10:30
Startzeitpunkt des aktuellen Laufs 5. Dezember 2024 um 23:03
Geschätzte Datenbankgröße (MiB) 875.75 MiB
Datenbank-Engine sqlite
Datenbankversion 3.45.3
Core-Kennzahlen

Prozessornutzung
0.3 %
Arbeitsspeicher-Auslastung
9 %
Supervisor-Kennzahlen

Additional information

No response

@Ladenburg1 Ladenburg1 added the bug label Dec 5, 2024
@plumbum00
Copy link

Hi,
You are not only one :(
have exact same configuration, RP5-8G and 256G M2 NMVe
same issue here: install, crash ... , power down > up and back to 13.2

@jonpaterson

This comment was marked as off-topic.

@mark-carline
Copy link

mark-carline commented Dec 6, 2024

plus 1 for me, same issue. Good that power cycling brings back 13.2 though.

RP5-8G and 256G M2 NMVe

but i have this board:
https://thepihut.com/products/argon-neo-5-m-2-nvme-expansion-board?variant=42787704078531

@g4njawizard
Copy link

g4njawizard commented Dec 6, 2024

You aint the only one.
Yesterday when I freshly installed on a new pi5 it worked. Today I reflashed and now it wont boot.
LED flashing, flickering and then turns off..
No matter if you have a PCIe or something else connected.

@Puma7

This comment was marked as off-topic.

@Gigoo25
Copy link

Gigoo25 commented Dec 7, 2024

Same issue here. Raspberry Pi 5 13.2 -> 14.0. NVME hat with drive and no boot.

@sk-ilya
Copy link

sk-ilya commented Dec 7, 2024

In my case, the system completely bricked. I tried booting from a Raspberry Pi OS microSD card and (re-)writing the HA OS image to the NVMe, but I kept getting random I/O errors (like "no space left on device", and something related to power). I thought the disk was dying, or some issue with the board... so I ended up disconnecting the drive and all USB peripherals, then flashed another microSD with HA 13.2. I was able to boot successfully from that and restore from a backup. I ran the system without the disk connected for about a day.

Eventually, this is what worked for me the next day to get 14.0 installed:

  1. Create a full backup. Disconnect all peripherals.
  2. Boot from Raspberry Pi OS on a microSD.
  3. Update the system: sudo apt update && sudo apt full-upgrade (in my case, the kernel updated from 6.6.51 to 6.6.62)
  4. Reboot.
  5. Download and write the HA OS 14.0 image to the NVMe:
wget https://github.com/home-assistant/operating-system/releases/download/14.0/haos_rpi5-64-14.0.img.xz
sudo rpi-imager --cli haos_rpi5-64-14.0.img.xz /dev/nvme0n1
  1. sudo poweroff, disconnect the microSD, turn on the PI, wait for homeassistant.local:8123, restore from backup. Connect the peripherals back and reboot.

@durd
Copy link

durd commented Dec 7, 2024

Simliar issue here, rpi5 8gb, nvme.
Mine upgraded to core v12.0 (I can't remember when I upgraded HAOS to v14.0) and instantly had DB and supervisor issues. Could barely reboot. Pulled the power twice and it got back to "normal". Then a day or two later the same happened again, I pulled the power again and got it up, backed up and downloaded the backup immediately. Started the SSH addon, found that I could switch the HAOS boot-partition to the previous 13.2 and did that.
Seems fine now, but time will tell. I'll be wary about future versions...

@d96moe
Copy link

d96moe commented Dec 8, 2024

I got this behavior with pi5 and nvme hat instead:
#3432

I assume that the conclusion is not to even try a clean install and to have some patience?

@ico2k2developer
Copy link

Despite the very different setup, the very same behavior happens when trying to update on Raspberry PI 3B with microSD card

@Yoda-Soda
Copy link

Same issue but with sdcard.

@richard-doornbos
Copy link

Same issue. rpi4 4gb, V-NAND SSD 500 GB (via USB).
Completely bricked SSD. Not recognized on Ubuntu or Windows (Balena Etcher)...
I have to go back to my old setup, I think.

@mark-carline
Copy link

plus 1 for me, same issue. Good that power cycling brings back 13.2 though.

RP5-8G and 256G M2 NMVe

but i have this board: https://thepihut.com/products/argon-neo-5-m-2-nvme-expansion-board?variant=42787704078531

UPDATE: I just retried with the latest OS / Core updates and all worked for me now, i am now on:

Core 2024.12.2
Supervisor 2024.11.4
Operating System 14.0
Frontend 20241127.7

@werfpsa
Copy link

werfpsa commented Dec 12, 2024

Plus 1 for me. also RP5-8G and M2 NMVE 256G

@brentm5
Copy link

brentm5 commented Dec 13, 2024

Same thing is happening for my install when attempting to upgrade to version 14. Interesting enough is that a simple restart of the PI does appear to resolve the issue.

Hardware

Device: RP5-8G
Storage: Inland NVME SSD 256G
NVME Hat: Geekworm x1012 v1.2 POE+ /NVME Shield

Software

Core 2024.12.3
Supervisor 2024.11.4
Operating System 13.2
Frontend 20241127.6

Logs

I have included host logs from my instance. The important timestamps are as follows

  • 2024-12-13 16:26:00 - This was around when I kicked off the install
  • 2024-12-13 16:54:00 - This is around when I did a power cycle of the pi

@durd
Copy link

durd commented Dec 13, 2024

I tried upgrading to Core 12.3 and OS 14.0 again, the issue prevails :(

@beebop5
Copy link

beebop5 commented Dec 16, 2024

Same issue, RPi5 + x1001 hat + crucial P3 1Tb NVME. Have rebuilt on 13.2 for now. Will try upgrading again later today.

@litinoveweedle

This comment was marked as off-topic.

@NW4FUN
Copy link

NW4FUN commented Dec 17, 2024

Has anyone had any luck in upgrading to 14.0?
I'm still sitting on a fence here...

@sairon
Copy link
Member

sairon commented Dec 17, 2024

With issues like this, it's always helpful to connect an HDMI display and check what's shown on the display after the upgrade - the boot failure most likely happens early in the boot process and the data partition is not mounted at that point to preserve any logs. A little insight is also provided by the on-board LED (color and blinking pattern) but that is only helpful for rough troubleshooting.

That said, we can't proceed with troubleshooting and fixing the issue without more detailed information. Issues with NVMe can be specific to some shield and drive combinations we can't test fully, yet the problem is not affecting all configurations obviously, as I'm not able to reproduce it on my end (official M.2 hat with Samsung PM9A1a drive).

@sairon sairon added the board/raspberrypi Raspberry Pi Boards label Dec 17, 2024
@litinoveweedle
Copy link

I would not say, that the issue is bound to the given type of the NVMe HAT. It more likely to be an intermittent issue, as few users reported, that it succeeded at the second run (with the same HW).

@sairon
Copy link
Member

sairon commented Dec 17, 2024

@litinoveweedle Yes, I agree on that. However, it's still crucial to find out when the failure happens and what is the cause. There are not than many differences in the Linux kernel and the boot process on RPi 5 is the same as on RPi OS (unlike on previous Pi's, we're not using U-Boot), so there is possibility it is not downstream issues of HAOS and the same problem could intermittently present with this hardware combination on RPi OS as well. The chance is it is not a regression of the particular HAOS version either, just some users were "more lucky" booting the other version.

@litinoveweedle
Copy link

Great thanks. I would say, that the issue is in the way the Hassos upgrades system partitions. Does it keep /boot/firmare/confix.txt modifications? Does it understand the difference in partition layout of the NVMe disks? Does it call sync after upgrade? I do not think, that you will find any common message on the boot screen pointing to the root cause. I understand your requests, but it is also tricky to post the boot logs here without having KVM. Maybe some users should post pictures of the screen. Also the root cause can be lost in the screen scrolling, so maybe better video? As you can see not very straightforward requests to fulfill. Did you try to perform the upgrade process multiple times to see if it works reliably?

@sairon
Copy link
Member

sairon commented Dec 17, 2024

Does it keep /boot/firmare/confix.txt modifications?

It performs some sed replacements to create the tryboot.txt config but otherwise the custom configuration (overlays, etc.) is preserved.

Does it understand the difference in partition layout of the NVMe disks?

The layout is the same as on a system running from an SD card.

Does it call sync after upgrade?

Obviously, as the kernel goes through a standard shutdown.

I do not think, that you will find any common message on the boot screen pointing to the root cause. I understand your requests, but it is also tricky to post the boot logs here without having KVM. Maybe some users should post pictures of the screen.

Checking the screen, and eventually sending a picture of it, is a great starting point, and it's exactly what I'm asking for here and what should we wait for.

@Ladenburg1
Copy link
Author

same behavior here with the 14.1.rc1
RPI with my nvme doesn't reboot and comes only after a hard power on/off with 13.2

@brentm5
Copy link

brentm5 commented Dec 17, 2024

@sairon I attempted to get you a screenshot of the HA instance in a stuck state after the install of 14.0. However when I actually kicked off the upgrade it surprising worked. I had previously tried to install this upgrade 2 - 3 times, all of which failed and required a power cycle. My assumption is its an intermittent issue.

@Jpsy
Copy link

Jpsy commented Dec 18, 2024

Today I tried again to upgrade to 14.0 and it failed again.
My problems deviate a bit from the majority as I can always start the system with 14.0 but it dies after some hours, usually with elementary files becoming unavailable (maybe mounts becoming unavailable). A typical effect is that the button "Check configuration" in developer tools results in "File configuration.yaml not found.".
I can still see HA Core logs in HTML mode, but not in raw mode. Supervisor logs and Host logs become fully unavailable. Below is a screenshot of my HA Core log from the moment were things go wrong. This happened 3:40 hours after upgrading the system at 6:00 in the morning. System logs show nothing unusual, CPU usage and temperature, fan speed, RAM usage etc. are all totally normal until the log freezes at 9:40.

image

I will go back to 13.2 now. This usually requires pulling the plug as the restart button refuses to do its job when configuration.yaml cannot be found. If I can provide some more information, please tell me.

System:

  • latest HA Core 2024.12.4
  • latest Supervisor 2024.12.0
  • RPi 5b 8GB
  • NVMe 500 Gb (Transcend MTS400S), connected using PCIe 3 mode
  • M.2 hat: Geekworm X1001
  • all partitions on SSD (no SD card)
  • System is rock solid on 13.2

@Jpsy
Copy link

Jpsy commented Dec 18, 2024

BTW:
One strange thing I notice, is that my system currently shows a strange SSH prompt and there is no ASCII art greeting when I connect:

image

This happens on 13.2 and on 14.0. The CLI works normally though. But I guess this is just something totally different.

@d96moe
Copy link

d96moe commented Dec 18, 2024

I got this behavior with pi5 and nvme hat instead: #3432

I assume that the conclusion is not to even try a clean install and to have some patience?

Above was another problem from trying to run 14.0 rc-1 before, however, had another go now but without success. What I did:

  1. Booting into netboot, selecting HA with HAOS 14.0, and doing a clean install on my NVME SSD -> result: seeing a lot of disc error messages after initial boot, resulting in not even getting to the HA CLI in the terminal. usually happens when boot is starting docker instances.
  2. Boot into raspberryOS from SD-card, Downloading the HAOS 14 image, and write it to NVRAM with pi-imager -> result: same as above
  3. Again boot up raspberryOS, but now flash the HAOS 13.2 with pi-imager to the SSD. result: works fine
  4. Updating /mnt/boot/config.txt with adding dtparam=pciex1 and dtparam=pciex_gen=3
  5. Updating to HASOS 14.0 from webui -> result: sort of same as with 14 above, disc errors but rebooting over and over again finally got me into the HA cli from the terminal. So sort of a little bit better. Managed to do a boot slot change back to 13.2 with "os boot-slot other"
  6. Don't know if it's related but, then flashed haos 14 on an SD-card and tried to start. -> result: get the bootup printouts in the terminal but just before jumping to cli I briefly se a message about watchdog, then it reboots and ends up in a boot loop.
  7. removed SD card, restored backup, and is now staying on 13.2 for a while.

So some conclusions, and open questions..

  • pciex_gen=3 somehow improves things for me. On 13.x I previously had issues that the system would lose the SSD after a week or two, after enabling gen 3 that behavior stopped.
  • so why isn't 14 booting from the SD-card?? Can this be a power issue? As the nvme-disk is still mounted and enabled also when booting from the SD-card (didn't disconnect it, maybe next step?)
  • If it's a power issue, why is it rock solid on 13.2 (with gen 3 enabled, and with the standard pi5 power supply)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
board/raspberrypi Raspberry Pi Boards bug
Projects
None yet
Development

No branches or pull requests