-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TLSSocket appears to be leaking, possibly on Connection Reset with HTTP2 #42710
Comments
Forgot to mention, I've logged the following errors: HTTP/2 client error [Error: 140455755077568:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:../deps/openssl/openssl/ssl/record/ssl3_record.c:332:
] {
library: 'SSL routines',
function: 'ssl3_get_record',
reason: 'wrong version number',
code: 'ERR_SSL_WRONG_VERSION_NUMBER'
}
HTTP/2 client error [Error: 140455755077568:error:14209102:SSL routines:tls_early_post_process_client_hello:unsupported protocol:../deps/openssl/openssl/ssl/statem/statem_srvr.c:1714:
] {
library: 'SSL routines',
function: 'tls_early_post_process_client_hello',
reason: 'unsupported protocol',
code: 'ERR_SSL_UNSUPPORTED_PROTOCOL'
}
HTTP client error [Error: Parse Error: Invalid method encountered] {
bytesParsed: 0,
code: 'HPE_INVALID_METHOD',
reason: 'Invalid method encountered',
rawPacket: <Buffer 00 0e 38 a6 27 6b 99 23 84 88 be 00 00 00 00 00>
}
HTTP/2 client error [Error: 140455755077568:error:1417A0C1:SSL routines:tls_post_process_client_hello:no shared cipher:../deps/openssl/openssl/ssl/statem/statem_srvr.c:2313:
] {
library: 'SSL routines',
function: 'tls_post_process_client_hello',
reason: 'no shared cipher',
code: 'ERR_SSL_NO_SHARED_CIPHER'
}
HTTP/2 client error [Error: 140455755077568:error:14209102:SSL routines:tls_early_post_process_client_hello:unsupported protocol:../deps/openssl/openssl/ssl/statem/statem_srvr.c:1714:
] {
library: 'SSL routines',
function: 'tls_early_post_process_client_hello',
reason: 'unsupported protocol',
code: 'ERR_SSL_UNSUPPORTED_PROTOCOL'
}
HTTP/2 client error [Error: 140455755077568:error:14209102:SSL routines:tls_early_post_process_client_hello:unsupported protocol:../deps/openssl/openssl/ssl/statem/statem_srvr.c:1714:
] {
library: 'SSL routines',
function: 'tls_early_post_process_client_hello',
reason: 'unsupported protocol',
code: 'ERR_SSL_UNSUPPORTED_PROTOCOL'
}
HTTP/2 client error [Error: 140455755077568:error:1420918C:SSL routines:tls_early_post_process_client_hello:version too low:../deps/openssl/openssl/ssl/statem/statem_srvr.c:1714:
] {
library: 'SSL routines',
function: 'tls_early_post_process_client_hello',
reason: 'version too low',
code: 'ERR_SSL_VERSION_TOO_LOW'
}
HTTP/2 client error [Error: 140455755077568:error:142090FC:SSL routines:tls_early_post_process_client_hello:unknown protocol:../deps/openssl/openssl/ssl/statem/statem_srvr.c:1689:
] {
library: 'SSL routines',
function: 'tls_early_post_process_client_hello',
reason: 'unknown protocol',
code: 'ERR_SSL_UNKNOWN_PROTOCOL'
} I also get general connection reset errors. I ensure the socket is destroyed, so I don't think it's that: this.http2Server.addListener('clientError', (err, socket) => {
// No clear way to signal back to client why a socket has failed because
// it's unsure if it's HTTP1 or HTTP2. Just destroy and log server-side.
if (!socket.destroyed) socket.destroy(err);
if (err?.code === 'ECONNRESET') {
console.warn('HTTP/2 client connection reset.');
return;
}
console.error('HTTP/2 client error', err);
}); |
More diving. Each lingering TLSSocket is referenced by
|
Pinging at a set interval seems to fix it: /** @type {Set<import('node:http2').Http2Session>} */
const sessions = new Set();
this.http2Server.addListener('session', (session) => {
sessions.add(session);
});
let max = 0;
let previous = 0;
setInterval(() => {
max = Math.max(max, sessions.size);
let removals = 0;
const added = sessions.size - previous;
for (const session of sessions) {
if (session.destroyed) {
removals++;
sessions.delete(session);
} else {
session.ping(Buffer.from('pingpong'), (err) => {
if (err) {
console.error(err);
}
});
}
}
const count = sessions.size;
console.log(`HTTP2 Sessions: ${count} (+${added}/-${removals}) / ${max}`);
previous = count;
}, 30_000); It forces the stream to try to communicate with the client and then realize the connection has been dropped, with a 8 hours later and no leaks if you use ping. |
So... apparently we have to set a timeout on Http2 server objects, but not Http1. The documentation contradicts itself:
https://nodejs.org/api/http2.html#event-timeout_3 And yet:
https://nodejs.org/api/http2.html#servertimeout_1 Apparently it was changed v13. But this also means that you need to manually track sessions and closed them manually. This is prone to memory leak because it's odd for a user to track sessions manually. They'd likely want to track streams instead. We need a better solution to address the original issue. |
Could you provide a code snippet to reproduce the problem? |
@ywave620 it happens with a simple http2 server, but for some connections not all of them. maybe as @clshortfuse mentioned:
I tried to simulate these situations with tools like h2i but they can't do this. maybe @clshortfuse know. |
@rahbari |
@ywave620 this is not a common thing, and I don't know exactly how this happens. as you can see in attached picture out of 467665 connection of a cluster worker almost 2670 (0.5%) of them are not garbage collected even though they are destroyed and not writable, but they still holds an output buffer. they are for several days ago although I have set timeout of http2 server to 2 minute. Most of them are from the similar IPs which make me think there is a problem with their client or connection that cause node not to garbage collect their sessions and streams. this leads to gradually increase in NodeJS rss till it halts. I keep a weakref to each stream and each row is a valid one which is not garbage collected yet: |
Hmm, this is definitely intriguing. And if true, definitely worth investigation. cc @nodejs/http2 |
Unfortunately, tracking down without reproduction and/or access to your production systems will be extremely hard. |
@mcollina is there any h2 debugger that can simulate some of these situations:
so i can test what cause this on our production server. I searched a lot but couldn't find one. |
Reproducing is also a bit hard because almost all tests I've built emit I'm sharing my code because it's given me the best results. I do have a leak somewhere, but I no longer think it's from the listener. It's probably somewhere else in the code (I think my CDN Proxy). Or it's on my HTTP1 server. /** @return {Promise<import('node:http2').Http2SecureServer>} */
startHttp2Server() {
return new Promise((resolve, reject) => {
this.createHttp2Server();
this.http2Server.listen({
port: this.securePort,
host: this.secureHost,
}, () => {
this.http2Server.removeListener('error', reject);
this.http2Server.setTimeout(120_000, (socket) => {
if (!socket) {
console.warn('HTTP2 socket (unknown) timed out.');
return;
}
const identity = `${socket.remoteFamily}:${socket.remoteAddress}:${socket.remotePort}`;
console.warn(`HTTP2 socket ${identity} timed out.`);
socket.destroy(new Error('SOCKET_TIMEOUT'));
});
// Error Handlers
this.http2Server.addListener('error', (err) => {
console.error('HTTP/2 server error', err);
});
this.http2Server.addListener('clientError', (err, socket) => {
// No clear way to signal back to client why a socket has failed because
// it's unsure if it's HTTP1 or HTTP2. Just destroy and log server-side.
if (!socket.destroyed) socket.destroy(err);
if (err?.code === 'ECONNRESET') {
console.warn('HTTP/2 client connection reset.');
return;
}
console.error('HTTP/2 client error', err);
});
this.http2Server.addListener('sessionError', (err, session) => {
if (!session.destroyed) session.destroy(err);
if (err?.code === 'ECONNRESET') {
console.warn('HTTP/2 client connection reset.');
return;
}
console.error('HTTP/2 sessionError error', err);
});
this.http2Server.addListener('session', (session) => {
const identity = `${session.socket.remoteFamily}:${session.socket.remoteAddress}:${session.socket.remotePort}`;
session.setTimeout(60_000, () => {
console.warn(`HTTP/2 session ${identity} timed out.`);
session.destroy(new Error('SESSION_TIMEOUT'));
});
const pingInterval = setInterval(() => {
if (session.destroyed) {
clearInterval(pingInterval);
} else {
session.ping((err) => {
if (!err) return;
if (session.destroyed) return;
if (err.code === 'ERR_HTTP2_PING_CANCEL') return;
console.error(`Ping to ${identity} failed.`, err);
});
}
}, 15_000);
});
// Logic handlers
this.http2Server.addListener('stream', (stream, headers) => {
stream.addListener('error', (err) => {
if (err?.code === 'ECONNRESET') {
console.warn('HTTP/2 stream connection reset.', headers[':path']);
} else {
console.error('HTTP/2 stream error', headers, err);
}
});
this.httpHandler.handleHttp2Stream(stream, headers).catch((err) => {
console.error('HTTP2 handler failed.', err);
});
});
this.http2Server.addListener('request', (req, res) => {
if (req.httpVersionMajor >= 2) return;
// @ts-ignore Ignore typings
req.setTimeout(300_000, (socket) => {
if (!socket) {
console.warn('HTTP1 in HTTP2 request (unknown) timed out.');
return;
}
const identity = `${socket.remoteFamily}:${socket.remoteAddress}:${socket.remotePort}`;
console.warn(`HTTP1 in HTTP2 request ${identity} timed out.`);
socket.destroy(new Error('SOCKET_TIMEOUT'));
});
res.setTimeout(300_000, (socket) => {
if (!socket) {
console.warn('HTTP1 in HTTP2 response (unknown) timed out.');
return;
}
const identity = `${socket.remoteFamily}:${socket.remoteAddress}:${socket.remotePort}`;
console.warn(`HTTP1 in HTTP2 response ${identity} timed out.`);
socket.destroy(new Error('SOCKET_TIMEOUT'));
});
req.addListener('error', (err) => {
if (err?.code === 'ECONNRESET') {
console.warn('Request stream connection reset.', req.url);
} else {
console.error('Request stream error.', req.url, req.headers, err);
}
});
res.addListener('error', (err) => {
if (err?.code === 'ECONNRESET') {
console.warn('Response stream connection reset.', req.url);
} else {
console.error('Response stream error', req.url, req.headers, err);
}
});
this.httpHandler.handleHttp1Request(req, res).catch((err) => {
console.error('HTTP1 in HTTP2 handler failed.', err);
});
});
resolve(this.http2Server);
});
this.http2Server.addListener('error', reject);
});
} I do have a way to pull a Edit: The most common output is the following three lines together:
I don't know why clients leave without Connection Reset, but I can imagine if you don't handle session timeouts, you will leak. 99% of all my connections are mobile, and many are asset tracking (moving vehicles). Disconnecting on one IP and reconnecting on another because they switch cell towers could be a likely explanation. The other is the use of WebSockets that go idle when they switch away from the app (PWAs). We did see a higher number of Connection Reset with the iOS 15 after sending 301/302. Safari doesn't gracefully leave. But this should be handled. I have to let it run longer to see if TLSSockets are still leaking. All servers are v16.14+ Edit2: To expalin why I feel I have a leak somewhere is because my Active Handles creep little by little: The red one is one client after 3 months (just rebooted it now). |
@rahbari Do you have a If you're just tracking I catch the Also, not 100% sure, but if you never try to send anything, you'll never trigger a timeout. That means the |
Maybe @jasnell could help |
@clshortfuse |
@rahbari I see. You're not considering the possibility of a session never goes as far as creating a session. I believe
I believe your code doesn't account for the possibility of a timeout between the time a client establishes a connection and a stream is received. Http1 doesn't have this intermediary session, so it goes Connect => Request, whereas HTTP2 is Connect => Session => Stream (Request). IIRC Http1Server handles connections with timeout requests with the I not 100% sure adding this session timeout is required, but the authors would know. IIRC, it was mandatory and not well documented. Edit: Sorry, I see that you do have |
@clshortfuse I've tested setTimeout previously with an EventSource and it works fine on established connection. I think it is called when the socket itself is idle for defined time. |
Going back to original comment, I no longer appear to have a TLSSocket leak. All the methods I have appear to have been plugged it. I don't know if my Http2 setup is overkill, watching for everything, but it does indeed work. @rahbari I used to use a similar setup for SSE (EventSource) over HTTP2 by using a WeakRef, but I moved to just storing the req/res as an object and keep them in a set. I also use /** @type {Set<HttpTransaction>} */
export const eventStreamSet = new Set();
/** @type {MiddlewareFunction} */
async function onEventStream(transaction) {
if (!isAuthenticated(transaction)) {
return 401;
}
const { request, response, httpVersion, locals, socket } = transaction;
const { headers, searchParams } = request;
locals.eventChannels = [
...searchParams.getAll('channel'),
...(searchParams.get('channels')?.split(',') ?? [])]
.map((type) => type.trim())
.filter((t) => t);
const identity = `${socket.remoteFamily}:${socket.remoteAddress}:${socket.remotePort}`;
logger.log(`Event stream subscription via ${identity} from ${headers?.['user-agent'] ?? '(anonymous)'} for ${locals.eventChannels} by ${JSON.stringify(locals.auth)}`);
response.status = 200;
response.headers['content-type'] = 'text/event-stream';
response.headers['cache-control'] = 'no-store';
if (httpVersion?.charAt(0) === '1') {
response.headers.connection = 'keep-alive';
}
let keepAliveInterval;
try {
await new Promise((resolve, reject) => {
eventStreamSet.add(transaction);
const stream = response.getPipeline((err) => {
if (err) reject(err);
else resolve();
});
stream.write(buildSSEPacket('ping', Date.now()));
keepAliveInterval = setInterval(() => {
try {
if (!stream.writable) return;
stream.write(buildSSEPacket('ping', Date.now()));
} catch {}
}, 15_000);
});
logger.log(`Event stream via ${identity} closed successfully.`);
} catch (e) {
switch (e?.code) {
case 'ERR_STREAM_PREMATURE_CLOSE':
logger.log(`Event stream via ${identity} closed prematurely.`);
break;
default:
logger.log(`Event stream via ${identity} closed unexpectedly.`, e);
}
}
clearInterval(keepAliveInterval);
eventStreamSet.delete(transaction);
return HttpHandler.END;
}
/**
* @param {string} channel event channel
* @param {string} type event type
* @param {string|Object} data string or JSON.stringify(object)
* @return {number}
*/
export function emitSSE(channel, type, data) {
let count = 0;
let output = null;
for (const { response, locals } of eventStreamSet) {
// console.log(res.locals.eventChannels, channel);
if (!locals.eventChannels?.includes(channel)) continue;
if (output == null) {
output = buildSSEPacket(type, data);
}
// if (id) res.socket.write(`id: ${id}`);
// if (retry != null) res.socket.write(`retry: ${retry.toString(10)}`);
const sseStream = response.getPipeline();
sseStream.write(output);
count++;
}
// if (count) console.log('writing SSE', output, count);
return count;
} The web framework is custom, but close to fastify or koa. I do suggest you dump your snapshot to get a good look at what's still holding references: I can share my dump middleware: /** @type {MiddlewareFunction} */
function onDumpHeapSnapshot(transaction) {
if (!isAuthenticated(transaction)) return 401;
if (!transaction.locals.auth.isAdmin) return 403;
transaction.response.status = 200;
transaction.response.headers['content-type'] = 'application/octet-stream';
transaction.response.headers['cache-control'] = 'no-store';
transaction.response.headers['content-disposition'] = `attachment; filename=${Date.now()}.heapsnapshot`;
if (!transaction.response.stream.writable) throw new Error('NOT_WRITABLE');
return getHeapSnapshot();
} I will opt not to close this, because I'd be interested to know what the proper minimum error catching is needed for H2 connections. Hopefully the HTTP2 team can direct us. |
For what it is worth in this discussion: I'm using the latest node LTS at the moment ( It's tough to reproduce, and the stack trace only goes as far as node_internals, making it hard to debug. I wonder if anyone has any direction on where to put the right handlers to at least avoid the crash (assuming some memory leaks might remain...) |
@diervo I've recently started using https://github.com/myndzi/wtfnode to trace possible leaks. In my case, with the HTTP code from above, I'm not leaking. The http2 ping interval at 15_000ms ensures an idle connection is dropped. It'll trigger a timeout error (or maybe connection reset).
|
Are those hard crashes or logs? |
These are the logs right before a hard crash. I have fastify logging to true and graceful-shutdown. I'm unclear if fastify is managing the error handler and rethrowing somewhere or is just node logging and crashing. When running locally node stops, when in prod, is making k8s container restart. Another weird observation, this only happen after some time of inactivity: I will let the sever running for a while locally and only after some lond idle time it will crash |
if you can open a bug in Fastify ideally with reproduction, as it is a bug there and not related to this issue: Fastify should handle that condition. |
After a bit more code cleanup, I added another line to my server logic: this.http2Server.addListener('stream', (stream, headers) => {
stream.setTimeout(300_000, () => {
stream.destroy(new Error('SOCKET_TIMEOUT'));
});
/* snip */
}); Theoretically, a session may stay active and respond to pings, but a stream (eg: resource request) may have timed out. It appears as though Another note, using VSCode debugger (or probably any debugger) won't allow garbage collection if you do |
Well, I found my memory leak. It's This is after about 90 minutes of running. Apparently the It's only the Generally, if a stream is "stuck" while destroyed, it'll get cleaned up when the session gets destroyed. I checked the NodeJS code, and find it strange that it doesn't garbage-collect the last stream on a session after the stream gets destroyed, but knowing that clean up will happen when the session ends is fine, I guess. Maybe there's a leak here? A stream shouldn't have to wait for it's parent session to be destroyed before garbage collection. I color coded some entries and at least there are no orphaned streams with my handler. For now, I'll work on plugging the |
If that's the case, it makes sense to report this issue to the socket.io repository. |
Tweaked timeouts for HTTP2 connections, in the hopes of working around an apparent HTTP2-related memory leak in the Node core library. This is [issue #42710](nodejs/node#42710) in the Node repo.
BTW, while investigating what (I think) turned out to be this issue in my system, I noticed an inconsistency between RSS reported by GC tracing vs. |
In light of the previous comments it sounds like this wasn't an issue with node after all? I'll go ahead and close this but let me know if there is reason to reopen... ...but, future commenters, file new issues if your bug only has the general shape of what's being discussed here. Exact matches only. |
For ppl come to here, you can try the fix in #55966 |
Version
v16.14.2
Platform
Linux host.ec2.internal 4.14.243-185.433.amzn2.x86_64 #1 SMP Mon Aug 9 05:55:52 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
http2 streams http tls
What steps will reproduce the bug?
Load up an HTTP2 server with SSL certificate. Perform requests, but don't complete them.
eg:
After a while,
ServerHttp2Session
,nghttp2_memory
, andHttp2Session
vastly outnumber number ofHttp2ServerRequest
orHttp2ServerResponse
references.How often does it reproduce? Is there a required condition?
I believe it's from connection reset.
What is the expected behavior?
Objects should be cleared.
What do you see instead?
I have numerous memory snapshots and about 15 production servers exhibiting this issue. I can love dump a heapshot as needed. One shows the issue clearly with 102
ServerHttp2Session
objects, but only 18Http2ServerRequest
objects.TLSSocket
appears to clearly show that the references are internal. 21 out of 122 have a distance towindow
. The rest appear to leak.After rewriting code to eliminate all stream/session listeners to avoid leaking and only using
getPipeline()
, I have a recent example of 40 sessions, 24 streams, and 21 req/res. The leakingServerHttp2Session
look like this:An active req/res (with middleware) looks like this:
Additional information
I have have more snapshots and can pull them. The
HttpResponse
andHttpRequest
objects are custom middleware. The other 16Http2ServerRequest
objects are mostlysocket.io
objects, but they don't appear to leak. If you need the actual files, I'm on Slack and can share there (since snapshots have private information).The text was updated successfully, but these errors were encountered: