The worker, when shutdown requested through signal handle, is sending a DisconnectRequest. It should wait for DisconnectResponse afterwards. This was never done. Current version ( main branch ) exits immediately after sending DisconnectRequest.
To prove this, we have this in the scheduler side,
async def on_disconnect(self, worker_id: WorkerID, request: DisconnectRequest):
await self.__disconnect_worker(request.worker)
await self._binder.send(worker_id, DisconnectResponse.new_msg(request.worker)) # <-- HERE
And the worker currently doesn't check DisconnectResponse message type. Because of race condition, the worker almost always exits before checking the last message from the scheduler, thus no exception was thrown.
There's a more detailed explanation in #430. This issue is also fixed by that PR. Please check accordingly.
The worker, when shutdown requested through signal handle, is sending a
DisconnectRequest. It should wait forDisconnectResponseafterwards. This was never done. Current version (mainbranch ) exits immediately after sendingDisconnectRequest.To prove this, we have this in the
schedulerside,And the worker currently doesn't check
DisconnectResponsemessage type. Because of race condition, theworkeralmost always exits before checking the last message from thescheduler, thus no exception was thrown.There's a more detailed explanation in #430. This issue is also fixed by that PR. Please check accordingly.