Open
Description
Several Issues could help be solved by better management of the GHA process.
Not necessarily solving but the following could be helped by this solution.
- Losing network for a while can endup with the runner running forever (GH at least) #1014 @DavidGOrtega?
- No space left on device creates hung EC2 instance #1006 @dacbd
- OOM reaping / crashes @dacbd
- Oddities in runner log parsing/detecting events #1037
For cloud runners, launch the GHA client as a systemd unit to better control the process / separate it from the cml
process. Then hook into logs for monitoring/triggering shutdown events etc.