We are running a large-scale ROS 2 application launched via ros2 launch service call. After approximately 8–9 hours of continuous operation, the following symptoms occur:
- ros2 node list shows no active nodes.
- However, all corresponding node processes are still alive (confirmed via ps, top, or systemd status).
- ros2 topic list shows that all topics are still present.
- Running ros2 topic info -v /battery_state shows the associated node as:
NODE_NAME_UNKNOWN
- All logs suddenly stop at a certain point in time, including rosout.
This suggests that while the nodes are still running and publishing data, their ROS graph metadata has been lost or not properly maintained, which may be related to RMW/DDS internal state or lifecycle management.
My Environment:
- ROS Version: HUMBLE;
- RMW IMPLEMENTATION: rmw_fastrtps_cpp;
- FastRtps Version: 6.2.7;
- OS: Ubuntu22.04;
- Kernel Version: 5.15.136-tegra;
- CPU Architecture: aarch(Orin Nano).
Has anyone seen this before? Your help would be appreciated.