Skip to content

Conversation

@DL6ER
Copy link
Member

@DL6ER DL6ER commented Apr 6, 2025

What does this implement/fix?

Do not try to restart FTL when a streamed pihole -g run is still ongoing. This to prevent being stuck in a situation where FTL has shut down 99% and cannot serve DNS until pihole -g finished running.


Related issue or feature (if applicable): Fixes #2395

Pull request in docs with documentation (if applicable): N/A


By submitting this pull request, I confirm the following:

  1. I have read and understood the contributors guide, as well as this entire template. I understand which branch to base my commits and Pull Requests against.
  2. I have commented my proposed changes within the code.
  3. I am willing to help maintain this change if there are issues with it later.
  4. It is compatible with the EUPL 1.2 license
  5. I have squashed any insignificant commits. (git rebase)

Checklist:

  • The code change is tested and works locally.
  • I based my code and PRs against the repositories developmental branch.
  • I signed off all commits. Pi-hole enforces the DCO for all contributions
  • I signed all my commits. Pi-hole requires signatures to verify authorship
  • I have read the above and my PR is ready for review.

@DL6ER DL6ER added the Bugfix label Apr 6, 2025
@DL6ER DL6ER requested a review from a team April 6, 2025 18:11
@yubiuser
Copy link
Member

yubiuser commented Apr 6, 2025

Why are we offering the gravity endpoint at all? Because we want to be able to run gravity from the web interface?


Im not sure if this is the right way to fix #2395 - maybe the external program should simply call pihole -g instead of using the API endpoint?

@DL6ER
Copy link
Member Author

DL6ER commented Apr 6, 2025

Why are we offering the gravity endpoint at all? Because we want to be able to run gravity from the web interface?

Yes

Im not sure if this is the right way to fix #2395 - maybe the external program should simply call pihole -g instead of using the API endpoint?

Yes, but my point here is that - as long as we are offering this - this can very well be considered a bug as FTL can be brought into a (temporary, but maybe long) defunct state until gravity finishes it can finally restart

Copy link
Member

@yubiuser yubiuser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It causes some logspam

2025-04-06 21:31:59.322 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:00.323 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:01.322 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:02.323 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:03.324 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:04.325 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:05.325 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:06.326 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:07.327 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:08.328 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:09.328 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:10.329 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:11.329 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:12.330 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:13.332 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:13.641 CEST [135290/T135558] INFO: Restarting FTL: API action request
2025-04-06 21:32:13.642 CEST [135290M] INFO: Asked to terminate by "/usr/bin/pihole-FTL -f" (PID 135290, user pihole UID 999)
2025-04-06 21:32:13.642 CEST [135290M] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:14.331 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:15.332 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:16.332 CEST [135290/T135309] INFO: Not restarting as gravity is still running...
2025-04-06 21:32:17.333 CEST [135290/T135309] INFO: Not restarting as gravity is still running...

__

This is working as expected, but something very 'gravity' specific. Shouldn't this be extended to all endpoints - only restart when the previous requests have been finished?

@DL6ER
Copy link
Member Author

DL6ER commented Apr 21, 2025

Sorry, seeing your reply only now. The gravity case is very special here because it is (really!) long-lived and can prevent FTL from shutting down. Is there any other action that is capable of doing something similar? I don't think so.

We can surely reduce the logging to do it ... once and then again every 30 seconds if still waiting? ...

@yubiuser
Copy link
Member

We can surely reduce the logging to do it ... once and then again every 30 seconds if still waiting? ...

Yes please. And more importantly: Only start logging when gravity is running and a restart request has been received. As you can see in the log above the logging started when the gravity endpoint has been requested, and only 14 seconds later the restart requests was received.

@rdwebdesign rdwebdesign requested a review from a team May 13, 2025 01:07
@yubiuser
Copy link
Member

I think this needs some attention from @DL6ER. I added some review points

Yes please. And more importantly: Only start logging when gravity is running and a restart request has been received. As you can see in the log above the logging started when the gravity endpoint has been requested, and only 14 seconds later the restart requests was received.

@DL6ER
Copy link
Member Author

DL6ER commented May 14, 2025

Added. I also just have seen your comment on #5815 and I don't actually think so. All this PR is doing is ensuring the DNS server is not shut down too early. The gravity process will - in any case be finished.

Having said that, I did just see a possible evasion of this: Assume the gravity runs really long (think minutes). DNS operation has stopped. Users may log in via SSH and force a restart of FTL using systemctl which cannot be stopped from within FTL.

However, even in this case, I don't think users will end up with a broken database as the gravity process has been designed to avoid this. We are always working on a temporary file the entire time until we swap the files at the very end. This should be an almost atomic operation and the likeliness that the process is interrupted at this very point seems ultra-low to me. But I may of course be wrong about this.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

DL6ER added 2 commits June 24, 2025 19:32
…ing. This to prevent being stuck in a situation where FTL has shut down 99% and cannot serve DNS until pihole -g finished running

Signed-off-by: DL6ER <[email protected]>
@DL6ER DL6ER force-pushed the tweak/prevent_restart_during_gravity_run branch from b503db3 to 9281118 Compare June 24, 2025 17:32
@github-actions
Copy link

Conflicts have been resolved.

@yubiuser yubiuser linked an issue Jul 22, 2025 that may be closed by this pull request
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link

Conflicts have been resolved.

Copy link
Member

@yubiuser yubiuser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I tested it last time, but now I tried it with two approaches and it killed gravity right away without printing any warning or waiting for gravity.

  1. Triggered gravity from the web interface, then re-started resolver from Settings/System.
  2. Triggered gravity from the web interface, then changed a setting that required a FTL restart

@DL6ER
Copy link
Member Author

DL6ER commented Oct 5, 2025

@yubiuser Did you meant to submit this as request for changes? If so, I'm not seeing an actual request.

@yubiuser
Copy link
Member

yubiuser commented Oct 5, 2025

Yes it is. The request is to make it work. Currently, a running gravity is killed without waiting when a FTL restart is triggered

Signed-off-by: Dominik <[email protected]>
@DL6ER
Copy link
Member Author

DL6ER commented Oct 5, 2025

Looking at the comment immediately revealed that the ! on the second variable shouldn't be there as this is the code part that prevents an early shutdown...

Screenshot From 2025-10-05 10-12-15

yubiuser
yubiuser previously approved these changes Oct 5, 2025
Copy link
Member

@yubiuser yubiuser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2025-10-05 11:50:32.551 CEST [172358/T172828] INFO: Restarting FTL: API action request
2025-10-05 11:50:32.552 CEST [172358M] INFO: Asked to terminate by "/usr/bin/pihole-FTL -f" (PID 172358, user pihole UID 999)
2025-10-05 11:50:32.552 CEST [172358M] INFO: Not terminating as gravity is still running...
2025-10-05 11:51:03.959 CEST [172358/T172818] INFO: Not terminating as gravity is still running...
2025-10-05 11:51:34.975 CEST [172358/T172818] INFO: Not terminating as gravity is still running...
2025-10-05 11:51:42.033 CEST [172358/T172820] INFO: Terminating timer thread
2025-10-05 11:51:42.044 CEST [172358/T172817] INFO: Terminating database thread
2025-10-05 11:51:42.329 CEST [172358M] INFO: Finished final database update
2025-10-05 11:51:42.330 CEST [172358M] INFO: Waiting for threads to join
2025-10-05 11:51:42.330 CEST [172358M] INFO: Thread housekeeper (1) is idle, terminating it.
2025-10-05 11:51:42.330 CEST [172358M] INFO: Thread dns-client (2) is idle, terminating it.

@yubiuser
Copy link
Member

yubiuser commented Oct 5, 2025

RISCV is constantly failing. Maybe needs a rebase/merge of development to include #2631

@DL6ER
Copy link
Member Author

DL6ER commented Oct 5, 2025

Merged latest development right now

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link

Conflicts have been resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FTL takes 5 minutes to reboot?

3 participants