Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InfluxDB fails to start / crashes #150

Closed
mholzma opened this issue Dec 9, 2024 · 6 comments
Closed

InfluxDB fails to start / crashes #150

mholzma opened this issue Dec 9, 2024 · 6 comments

Comments

@mholzma
Copy link

mholzma commented Dec 9, 2024

I have been running rpipowermonitor for over a year or two and all of a sudden I find that the InfluxDB service is not stable.

When I check the status of the service I see a continuously incrementing message that says:
rpipowermonitor influxd-systemd-start.sh: InfluxDB API unavailable after XX attempts...

I think it may be that the database has grown too large for the RaspberryPi 3 with 1GB of memory.

I could try setting up Influx on an alternative platform, but it would probably make more sense to setup a 2.x version of Influx then.

@David00
Copy link
Owner

David00 commented Dec 10, 2024

Hi Mike, that's an interesting one that I haven't run into yet. Do you have any files in /var/log/influxdb?

There are some comments in a popular issue thread that suggest raising the start timeout limit can help, and also some chatter about lack of memory being a suspected culprit too. See the link below for more information. If systemd isn't killing the process on startup, the timeout limit likely won't help, leaving your board's memory as the likely suspect.
influxdata/influxdb#23639 (comment)

Is this on a Pi 3 with my custom OS image? If so, can you check to see if you have a file at /root/rpi_power_monitor_os-version.txt and share the details here? If you don't have the file, you're on an older build before I added this to my build script. If you do have it, it will tell me which base image you flashed to your Pi.

Regarding InfluxDB 2.x, I really like it, and I use it at work on some much larger datasets. It might be a good time to bring support for Influx 2 now that Influx 3 open source is right around the corner. I do intend on exploring interfacing with InfluxDB 2 and responding to this discussion soon:
#149

One other note - you may want to ensure that you have a backup of the power monitor data and your config file (either config.py on v0.2.0, or config.toml on v0.3.0+). Unfortunately the backup script I wrote requires InfluxDB to be running. So, if you don't already have a backup from when InfluxDB was working, I would recommend simply tarballing the Influx data directory with:

sudo tar -cvzf /var/www/html/influx-backup.tar.gz /opt/influxdb

This will put it right into the webserver directory which you can then visit from another computer in your network at http://<your Pi's IP address> to download the file and get it off the Pi.

@mholzma
Copy link
Author

mholzma commented Dec 10, 2024

Thanks again for the quick response. There seemed to be a couple things going on and to be honest, I am not sure what cleared it. I seemed to have a large number of WAL files sitting around and after clearing those out and prodding a couple things I got the DB back up.
I noticed the default data retention policy was 0s so I was worried that I might have too large of a DB for my old Raspberry Pi.

The text file shows:
Build Name: rpi_power_monitor Version v0.3.1+release
Build Date: Sat Aug 19 23:31:52 BST 2023
Base Image: /distro/image/2023-05-03-raspios-bullseye-armhf-lite.img.xz

I have Influx 2 running on a linux box that I think would make sense to send the data to instead which would get around any issues. Especially because I am running a temperature gauge and an air quality meter off of the pi now. I am supportive of the migration and started poking around in the code to see what would need to be done. Worst case scenario, I could run a v1 Influx on my linux box as well.

@mholzma
Copy link
Author

mholzma commented Dec 10, 2024

Happy to continue the conversation but I'll close this thread for now.

@mholzma mholzma closed this as completed Dec 10, 2024
@David00
Copy link
Owner

David00 commented Dec 16, 2024

I took a look at this further and found that when there are a lot of Influx .wal files on the disk, Influx will attempt to load all of them into memory on startup before they get compacted and written to the database. Setting the Influx logging level to debug in /etc/influxdb/influxdb.conf showed the details of what it was doing when starting up, including the out of memory failure:

... hundreds of similiar lines excluded ...
2024-12-16T01:11:41.016234Z     info    Reading file    {"log_id": "0tVG5gO0000", "engine": "tsm1", "service": "cacheloader", "path": "/var/lib/influxdb/wal/_internal/monitor/1048/_00483.wal", "size": 461096}
2024-12-16T01:11:41.230689Z     info    Reading file    {"log_id": "0tVG5gO0000", "engine": "tsm1", "service": "cacheloader", "path": "/var/lib/influxdb/wal/_internal/monitor/1048/_00484.wal", "size": 461648}
runtime: out of memory: cannot allocate 376832-byte block (569278464 in use)
fatal error: out of memory

To get around this, I moved all of the _00###.wal files out of the /var/lib/influxdb/wal/_internal/monitor/1048/ directory and only moved back in a handful at a time:

mkdir -p ~/temp-wal/_internal/monitor/1048
mv /var/lib/influxdb/wal/_internal/monitor/1048/* ~/temp-wal/_internal/monitor/1048/
mv ~/temp-wal/_internal/monitor/1048/_001* /var/lib/influxdb/wal/_internal/monitor/1048/

# start influx with: 
influxd

# Let it load all the files

Then, I'd start Influx, let it load all those .wal files and fully start up, then stop influx, move in the next batch, and repeat until I got through all .wal files.

This InfluxDB issue seems to be the most closely related from what I've found.

As to why there are so many .wal files getting stacked up in the first place, it could be disk performance issues. I'll continue to look into this and see if we can do anything like tuning InfluxDB to keep the .wal file count down.

@David00
Copy link
Owner

David00 commented Dec 16, 2024

Also, thanks for letting me know about the retention policy being set to 0s (infinite) retention! This is definitely not what I intended so something broke with the retention policy management piece of the Influx initialization.

I am working on v0.4.0 so I'll open a new issue for this and fix it in the new upcoming release.

@mholzma
Copy link
Author

mholzma commented Dec 16, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants