xmlrpc.XmlRpcNode consumes 100%+ cpu after being shutdown #2238

drjsmith · 2022-04-26T18:17:10Z

TLDR: calling shutdown() on a XmlRpcNode does not fully shut it down and leaves a thread running.
I am currently on Noetic/Ubuntu 20.04/python 3.8.

What is going on:

Calling start() results in a new thread running self.server.serve_forever(), where server is of type ThreadingXMLRPCServer which has a base class of socketserver.BaseServer.
That function, defined for BaseServer in socketserver.py:215, will not exit until the loop's condition is satisfied: while not self.__shutdown_request:
self.__shutdown_request can be set by calling shutdown() on BaseServer
However, XmlRpcNode's shutdown() method never calls self.server.shutdown(), only server.socket.close() and server.server_close(). I'm pretty sure those last 2 functions do the same thing, since:

class TCPServer(BaseServer):
    ...
    def server_close(self):
        self.socket.close()

The thread is then stuck in an infinite loop. Before shutting down XmlRpcNode, the thread not use a significant amount of cpu; afterwards, however, the thread consumes 100% of a single core (as shown by top).

It seems to me that the solution may be as simple as adding self.server.shutdown() to XmlRpcNode's shutdown() function, though I'm not certain whether it should be before or after closing the socket or if there are other considerations.

Why this matters to me:
I'm guessing this isn't a common problem, since generally the process would be ending shortly afterwards. However, I am using roslaunch.parent.ROSLaunchParent to launch and shutdown nodes as part of a python-based robot navigation benchmarking codebase. I wrote the code and used it extensively on Kinetic/Ubuntu 16.04/python 2.7 and did not encounter this issue at that time: starting and stopping hundreds of instances of roslaunch.parent.ROSLaunchParent did not cause any noticeable issues. As soon as I switched to Noetic/Ubuntu 20.04/python 3.8, I noticed that after running a few experiments the python process was at times using ~166% cpu. Usage hits 100% if there is only 1 'stopped' instance; ~166% if there are 2 or more.
The cpu usage of the process drops very low as soon as I start a new instance; I'm not familiar enough with the Selector-related code to understand why this is so.

Minimum example:
The script starts a roscore and then repeatedly creates, starts, and stops roslaunch.parent.ROSLaunchParent instances; run top to see how the process' cpu usage changes.

import os
import time
import roslaunch

class RoslaunchTest(object):

    def __init__(self, is_core=True):
        os.environ["ROS_MASTER_URI"]="http://localhost:11312"
        self.is_core = is_core

    def start(self):
        print("Starting...")
        uuid = roslaunch.rlutil.get_or_generate_uuid(None, not self.is_core)

        self.roslaunch_object = roslaunch.parent.ROSLaunchParent(
            run_id=uuid, roslaunch_files=[],
            is_core=self.is_core, port=11312
        )
        self.roslaunch_object.start()
        print("Started!")

    def stop(self):
        print("Stopping...")
        self.roslaunch_object.shutdown()
        print("Stopped!")

def demo():
    core = RoslaunchTest(is_core=True)
    core.start()

    test = RoslaunchTest(is_core=False)
    def cycle():
        test.start()
        time.sleep(5)
        test.stop()
        time.sleep(10)

    for _ in range(4):
        cycle()

if __name__ == '__main__':
    demo()

The text was updated successfully, but these errors were encountered:

fgrcar · 2022-05-03T13:26:00Z

I've noticed the same issues on the ROS Melodic/Ubuntu 18 but only with the Python 3.6.9. On the other hand this is not the case with the Python 2.7.

BRNKR · 2022-08-17T08:14:21Z

Same for me. Unusable at the moment. We want to switch between two driving modes with different set of nodes. we want to avoid running all at the time when it is not necessary. for that we have a launch_handler with service which should start the corresponding launch file. when killing one each time a new instance of my handler appears in my process list, which generates 100% cpu load. for each on/off it generates a new one.

drjsmith · 2022-08-24T20:48:52Z

@TobiMiller @fgrcar Here is how I worked around the problem:

#Class that fixes the problem
class RoslaunchShutdownWrapper(roslaunch.parent.ROSLaunchParent):

    def shutdown(self):
        server = self.server.server if self.server is not None and self.server.server is not None else None
        super(RoslaunchShutdownWrapper, self).shutdown()
        if server:
            server.shutdown()

    def __del__(self):
        print("Shutting down now!!!!!!")
        self.shutdown()

#How to use it:
roslaunch_object = RoslaunchShutdownWrapper(
                                run_id=..., roslaunch_files=...,
                                is_core=..., port=...,
                                sigterm_timeout=...
                            )
roslaunch_object.start()

Let me know if it works for you.

RodolpheCyber · 2022-08-27T05:11:08Z

@drjsmith Thanks so much for posting what helped for you !

I'm surprised this hasn't been addressed sooner, it's really a pain. I made a QT user Interface using the roslaunch API basically to launch and shutdown nodes of our stack. But I'm seeing the same CPU error you mentioned as I shutdown the nodes, which makes our Qt interface extremely slow and impossible to use as some nodes are being shut.

Your proposition seems to work for me so far though! Based on it, I wrote a quick class that the Qt interface calls to launch and shutdown nodes :

class RoslaunchWrapperObject(roslaunch.parent.ROSLaunchParent):

    def start(self):
        super(RoslaunchWrapperObject, self).start()
        print(self.server)

    def stop(self):
        print("Stopping...")
        print(self.server)
        server = self.server.server if self.server is not None and self.server.server is not None else None
        super(RoslaunchWrapperObject, self).shutdown()
        if server:
           server.shutdown()

To launch a node from the QT :

def launch(parameter):
    uuid = roslaunch.rlutil.get_or_generate_uuid(None, False)
    roslaunch.configure_logging(uuid)
    cli_args = parameter
    roslaunch_args = cli_args[1:]
    roslaunch_file = [(roslaunch.rlutil.resolve_launch_arguments(cli_args)[0], roslaunch_args)]
    launcher = RoslaunchWrapperObject(run_id = uuid, roslaunch_files = roslaunch_file)
    return launcher

and the calls are made like this :

cli_args = ['PATH/online.launch','simulation:=true']
        self.launch = launch(cli_args)
        self.launch.start() # to start it 
        self.launch.stop() # to stop it

Thank you so much for this, it really helped ! I do hope they address the issue and find a proper way to fix it so we can reuse the API just like on previous ROS versions.

drjsmith · 2022-08-27T18:19:59Z

@RodolpheCyber You're welcome!

Fixing the issue with 100% CPU usage after the shutdown ros#2238

adipotnis · 2022-11-01T07:02:27Z

is this issue still present on melodic?

drjsmith · 2022-11-01T19:10:15Z

@adipotnis I discovered the problem when upgrading from kinetic to noetic, so I haven't personally tested it on melodic. @fgrcar reported encountering the problem on melodic but only when using python 3. I wouldn't be surprised if the root bug has been present for many versions and has only been exposed due to the transition to python 3, though I haven't verified this.

stop CPU usage spike when a rospy node receives a signals.SIGINT interrupt but before it terminates the process (e.g. pressing CTRL+C while the node is sleeping due to a rospy.sleep() call no longer sends CPU usage of a core to 100%) should also fix ros#2238

l1va added a commit to l1va/ros_comm that referenced this issue Sep 27, 2022

Thread is stucking on shutdown

dc658fe

Fixing the issue with 100% CPU usage after the shutdown ros#2238

l1va mentioned this issue Sep 27, 2022

Thread is stucking on shutdown #2281

Open

jamesdarrenmuir linked a pull request Jan 26, 2024 that will close this issue

properly shutdown xml rpc server #2368

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xmlrpc.XmlRpcNode consumes 100%+ cpu after being shutdown #2238

xmlrpc.XmlRpcNode consumes 100%+ cpu after being shutdown #2238

drjsmith commented Apr 26, 2022

fgrcar commented May 3, 2022

BRNKR commented Aug 17, 2022

drjsmith commented Aug 24, 2022

RodolpheCyber commented Aug 27, 2022 •

edited

Loading

drjsmith commented Aug 27, 2022

adipotnis commented Nov 1, 2022

drjsmith commented Nov 1, 2022

xmlrpc.XmlRpcNode consumes 100%+ cpu after being shutdown #2238

xmlrpc.XmlRpcNode consumes 100%+ cpu after being shutdown #2238

Comments

drjsmith commented Apr 26, 2022

fgrcar commented May 3, 2022

BRNKR commented Aug 17, 2022

drjsmith commented Aug 24, 2022

RodolpheCyber commented Aug 27, 2022 • edited Loading

drjsmith commented Aug 27, 2022

adipotnis commented Nov 1, 2022

drjsmith commented Nov 1, 2022

RodolpheCyber commented Aug 27, 2022 •

edited

Loading