-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
yaws is stucked and can not receive any new HTTP connection #492
Comments
First thing you might try is building from the yaws master branch to see if you can reproduce the problem with the latest code. |
For this observer image it would be good if you could provide all the process info you can get from observer for pid |
The same image referenced in the previous comment shows a |
The 2nd snapshot is the process info of <0.767.0>. I think inet_async is created by the erlang VM to receive a tcp connection from acceptor in OS. The yaws stuck with this message therefore the new tcp connection is not processed. But we had no idea why yaws gen server stuck and can not process new messages. |
For the second image it would be good to also see the Current Stack tab for that process. |
Thanks. Given that yaws doesn't make any calls itself that would result an |
This is another case today which is attached below. I cannot provide the code since it's a company product. 2 erlang cluster nodes running yaws in two different machines. The websocket process Pids are stored in mnesia table and synced between the nodes. When one web socket client need to send data to other clients. We have to looking for process Pids in mnesia table and send data to these Pids. HAproxy is in front of the yaws nodes. It is configured to use "option httpchk *" to check yaws online status every 5 seconds. So there are http connections that happened from HAProxy. |
When your websocket code is finished, does it perform a socket close? If not, can you try adding a call to |
The server does not close the websocket connection but always wait for client connection opened/closed. Based on the server log, when some websocket connection are closed by clients, the websocket callback module function are invoked:
Should I put yaws_api:websocket_close/1 in terminate/2 ? terminate/2 seems to be the yaws websocket callback function. After testing these days, we had found that "clamscan" in ClamAV may cause more yaws_server:acceptor0/2 to pending the inet_async Message. When "clamscan" is execued, the CPU loading and Memory usage are extremely high.
"clamscan" is execused at 4:20AM in daily cronjob, but the yaws_server:acceptor0 process pending "inet_async" message still happened when we monitor the VM in the day time. Sometimes the VM has 1 or 2 yaws_server:acceptor0 processes that pending the inet_async Message, but the yaws still accepts new http connection. Is it possible monitor the incorrect pending message process and kill these processes? |
Apologies for not being clear. Rather than making an API call, if your websocket callback module receives any sort of message particular to your application in its |
It's written this way
|
OK, that's good, but I assume the application protocol you've created for your application on top of websocket also has a shutdown or closing aspect, and the message you use to indicate that also needs to be handled with |
Our code follows basic_echo_callback_extended. What's the difference between handle_message/1 and handle_message/2 ? |
Ok, since you're using the extended example, do you have a The |
We had found the problem source comes from Line 2736 in 0c1d25e
yaws tried to get all http headers. When it invokes do_recv as Count =:= 0, but do_recv returns nothing. It looks like gen_tcp:recv in do_recv does not timeout and return error.
The problem is not about websocket. It's caused by haproxy regular httpchk to check if the server is alive. And the problem is not happened immediately when the server starts. We've tried to decrease the httpchk interval from 5 seconds to 200ms in two servers. Sometimes it happened after 1 hour and sometimes 2 or 3 hours. Some posts suggest to use |
But surely |
We used yaws server with embedded mode and the server can be started to process some requests.
HAProxy is used in front end to monitor yaws cluster nodes and receive/dispatch new HTTP webpage/websocket connections.
yaws node will be stucked suddenly and can not receive any new HTTP connection.
We tried to use observer_cli to check the erlang VM node and found the stucked node:
We had tested yaws version 2.0.6, 2.0.9 and tried to change the erlang version 20, 25.
The problem always happened randomly.
The text was updated successfully, but these errors were encountered: