Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLSSocket appears to be leaking, possibly on Connection Reset with HTTP2 #42710

Closed
clshortfuse opened this issue Apr 12, 2022 · 30 comments
Closed

Comments

@clshortfuse
Copy link
Contributor

Version

v16.14.2

Platform

Linux host.ec2.internal 4.14.243-185.433.amzn2.x86_64 #1 SMP Mon Aug 9 05:55:52 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

http2 streams http tls

What steps will reproduce the bug?

Load up an HTTP2 server with SSL certificate. Perform requests, but don't complete them.

eg:

  • disconnect before stream can be is read
  • send bad SSL data
  • disconnect between response can be written

After a while, ServerHttp2Session, nghttp2_memory, and Http2Session vastly outnumber number of Http2ServerRequest or Http2ServerResponse references.

How often does it reproduce? Is there a required condition?

I believe it's from connection reset.

What is the expected behavior?

Objects should be cleared.

What do you see instead?

I have numerous memory snapshots and about 15 production servers exhibiting this issue. I can love dump a heapshot as needed. One shows the issue clearly with 102 ServerHttp2Session objects, but only 18 Http2ServerRequest objects.

TLSSocket appears to clearly show that the references are internal. 21 out of 122 have a distance to window. The rest appear to leak.

image

image


After rewriting code to eliminate all stream/session listeners to avoid leaking and only using getPipeline(), I have a recent example of 40 sessions, 24 streams, and 21 req/res. The leaking ServerHttp2Session look like this:

image

An active req/res (with middleware) looks like this:

image

Additional information

I have have more snapshots and can pull them. The HttpResponse and HttpRequest objects are custom middleware. The other 16 Http2ServerRequest objects are mostly socket.io objects, but they don't appear to leak. If you need the actual files, I'm on Slack and can share there (since snapshots have private information).

@clshortfuse
Copy link
Contributor Author

clshortfuse commented Apr 12, 2022

Forgot to mention, I've logged the following errors:

HTTP/2 client error [Error: 140455755077568:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:../deps/openssl/openssl/ssl/record/ssl3_record.c:332:
] {
  library: 'SSL routines',
  function: 'ssl3_get_record',
  reason: 'wrong version number',
  code: 'ERR_SSL_WRONG_VERSION_NUMBER'
}
HTTP/2 client error [Error: 140455755077568:error:14209102:SSL routines:tls_early_post_process_client_hello:unsupported protocol:../deps/openssl/openssl/ssl/statem/statem_srvr.c:1714:
] {
  library: 'SSL routines',
  function: 'tls_early_post_process_client_hello',
  reason: 'unsupported protocol',
  code: 'ERR_SSL_UNSUPPORTED_PROTOCOL'
}
HTTP client error [Error: Parse Error: Invalid method encountered] {
  bytesParsed: 0,
  code: 'HPE_INVALID_METHOD',
  reason: 'Invalid method encountered',
  rawPacket: <Buffer 00 0e 38 a6 27 6b 99 23 84 88 be 00 00 00 00 00>
}
HTTP/2 client error [Error: 140455755077568:error:1417A0C1:SSL routines:tls_post_process_client_hello:no shared cipher:../deps/openssl/openssl/ssl/statem/statem_srvr.c:2313:
] {
  library: 'SSL routines',
  function: 'tls_post_process_client_hello',
  reason: 'no shared cipher',
  code: 'ERR_SSL_NO_SHARED_CIPHER'
}
HTTP/2 client error [Error: 140455755077568:error:14209102:SSL routines:tls_early_post_process_client_hello:unsupported protocol:../deps/openssl/openssl/ssl/statem/statem_srvr.c:1714:
] {
  library: 'SSL routines',
  function: 'tls_early_post_process_client_hello',
  reason: 'unsupported protocol',
  code: 'ERR_SSL_UNSUPPORTED_PROTOCOL'
}
HTTP/2 client error [Error: 140455755077568:error:14209102:SSL routines:tls_early_post_process_client_hello:unsupported protocol:../deps/openssl/openssl/ssl/statem/statem_srvr.c:1714:
] {
  library: 'SSL routines',
  function: 'tls_early_post_process_client_hello',
  reason: 'unsupported protocol',
  code: 'ERR_SSL_UNSUPPORTED_PROTOCOL'
}
HTTP/2 client error [Error: 140455755077568:error:1420918C:SSL routines:tls_early_post_process_client_hello:version too low:../deps/openssl/openssl/ssl/statem/statem_srvr.c:1714:
] {
  library: 'SSL routines',
  function: 'tls_early_post_process_client_hello',
  reason: 'version too low',
  code: 'ERR_SSL_VERSION_TOO_LOW'
}
HTTP/2 client error [Error: 140455755077568:error:142090FC:SSL routines:tls_early_post_process_client_hello:unknown protocol:../deps/openssl/openssl/ssl/statem/statem_srvr.c:1689:
] {
  library: 'SSL routines',
  function: 'tls_early_post_process_client_hello',
  reason: 'unknown protocol',
  code: 'ERR_SSL_UNKNOWN_PROTOCOL'
}

I also get general connection reset errors. I ensure the socket is destroyed, so I don't think it's that:

this.http2Server.addListener('clientError', (err, socket) => {
  // No clear way to signal back to client why a socket has failed because
  // it's unsure if it's HTTP1 or HTTP2. Just destroy and log server-side.
  if (!socket.destroyed) socket.destroy(err);
  if (err?.code === 'ECONNRESET') {
    console.warn('HTTP/2 client connection reset.');
    return;
  }
  console.error('HTTP/2 client error', err);
});

@clshortfuse
Copy link
Contributor Author

More diving. Each lingering TLSSocket is referenced by HttpSession. The readableState is reading.

HttpSession[kState] has goawayCode and goawayLastStreamID equal to null. streams is empty. I can't tell if a stream was never set or it was removed. Either way, it seems the leaking TLSSockets are always related to ServerHttp2Session objects with no streams (either already cleaned up or never initiated).

@clshortfuse
Copy link
Contributor Author

clshortfuse commented Apr 13, 2022

Pinging at a set interval seems to fix it:

/** @type {Set<import('node:http2').Http2Session>} */
const sessions = new Set();
this.http2Server.addListener('session', (session) => {
  sessions.add(session);
});

let max = 0;
let previous = 0;
setInterval(() => {
  max = Math.max(max, sessions.size);
  let removals = 0;
  const added = sessions.size - previous;
  for (const session of sessions) {
    if (session.destroyed) {
      removals++;
      sessions.delete(session);
    } else {
      session.ping(Buffer.from('pingpong'), (err) => {
        if (err) {
          console.error(err);
        }
      });
    }
  }

  const count = sessions.size;
  console.log(`HTTP2 Sessions: ${count} (+${added}/-${removals}) / ${max}`);
  previous = count;
}, 30_000);

It forces the stream to try to communicate with the client and then realize the connection has been dropped, with a ERR_HTTP2_PING_CANCEL error in the ping callback and/or ETIMEDOUT in 'sessionError'. No leaking ServerHttp2Session or TLSSocket so far. Will leave it running overnight and check again tomorrow, but it seems like a solid workaround. I'm trying on both v16.14.0 and v16.14.2


8 hours later and no leaks if you use ping.

@clshortfuse
Copy link
Contributor Author

So... apparently we have to set a timeout on Http2 server objects, but not Http1. The documentation contradicts itself:

The 'timeout' event is emitted when there is no activity on the Server for a given number of milliseconds set using http2secureServer.setTimeout(). Default: 2 minutes.

https://nodejs.org/api/http2.html#event-timeout_3

And yet:

server.timeout
v13.0.0 | The default timeout changed from 120s to 0 (no timeout).
v8.4.0 | Added in: v8.4.0
Timeout in milliseconds. Default: 0 (no timeout)

https://nodejs.org/api/http2.html#servertimeout_1

Apparently it was changed v13. But this also means that you need to manually track sessions and closed them manually. This is prone to memory leak because it's odd for a user to track sessions manually. They'd likely want to track streams instead. We need a better solution to address the original issue.

#27556

@rahbari
Copy link

rahbari commented Aug 10, 2022

We have this socket leak since v16 and now in 18.7. this is a frustrating problem because node memory usage constantly increase until it crashes and the process must get restarted.
this is part of lsof output:
image
there are so many of this unhandled socks with http2. the same code works fluently with http.

memory usage with http after one day:
image
h2 after one day:
image
h2 after 3 days:
image

@ywave620
Copy link
Contributor

Could you provide a code snippet to reproduce the problem?

@rahbari
Copy link

rahbari commented Sep 25, 2022

@ywave620 it happens with a simple http2 server, but for some connections not all of them. maybe as @clshortfuse mentioned:

  • disconnect before stream can be is read
  • send bad SSL data
  • disconnect between response can be written

I tried to simulate these situations with tools like h2i but they can't do this. maybe @clshortfuse know.

@ywave620
Copy link
Contributor

ywave620 commented Sep 27, 2022

@rahbari
I tried the latest node, and I can't reproduce this issue, either.

@rahbari
Copy link

rahbari commented Sep 27, 2022

@ywave620 this is not a common thing, and I don't know exactly how this happens. as you can see in attached picture out of 467665 connection of a cluster worker almost 2670 (0.5%) of them are not garbage collected even though they are destroyed and not writable, but they still holds an output buffer. they are for several days ago although I have set timeout of http2 server to 2 minute. Most of them are from the similar IPs which make me think there is a problem with their client or connection that cause node not to garbage collect their sessions and streams. this leads to gradually increase in NodeJS rss till it halts.

I keep a weakref to each stream and each row is a valid one which is not garbage collected yet:
image
I tried to find a http2 debugger to mimic different behavior of a connection but i couldn't find one.

@ovflowd
Copy link
Member

ovflowd commented Sep 27, 2022

Hmm, this is definitely intriguing. And if true, definitely worth investigation.

cc @nodejs/http2

@mcollina
Copy link
Member

Unfortunately, tracking down without reproduction and/or access to your production systems will be extremely hard.

@rahbari
Copy link

rahbari commented Sep 27, 2022

@mcollina is there any h2 debugger that can simulate some of these situations:

  • disconnect before stream can be is read
  • send bad SSL data
  • disconnect between response can be written
  • unnecessary CONTINUATION frames
  • request bodies (DATA frames)
  • send invalid frames

so i can test what cause this on our production server. I searched a lot but couldn't find one.

@clshortfuse
Copy link
Contributor Author

clshortfuse commented Sep 27, 2022

Reproducing is also a bit hard because almost all tests I've built emit ECONNRESET which already invalidates the test. The HTTP2 connections time out, and by default HTTP2 does not include a timeout. The difficulty lies in tracking down all possible ways it can timeout. I haven't considered bad packets, because I think it'll be handled.

I'm sharing my code because it's given me the best results. I do have a leak somewhere, but I no longer think it's from the listener. It's probably somewhere else in the code (I think my CDN Proxy). Or it's on my HTTP1 server.

  /** @return {Promise<import('node:http2').Http2SecureServer>} */
  startHttp2Server() {
    return new Promise((resolve, reject) => {
      this.createHttp2Server();
      this.http2Server.listen({
        port: this.securePort,
        host: this.secureHost,
      }, () => {
        this.http2Server.removeListener('error', reject);

        this.http2Server.setTimeout(120_000, (socket) => {
          if (!socket) {
            console.warn('HTTP2 socket (unknown) timed out.');
            return;
          }
          const identity = `${socket.remoteFamily}:${socket.remoteAddress}:${socket.remotePort}`;
          console.warn(`HTTP2 socket ${identity} timed out.`);
          socket.destroy(new Error('SOCKET_TIMEOUT'));
        });

        // Error Handlers
        this.http2Server.addListener('error', (err) => {
          console.error('HTTP/2 server error', err);
        });
        this.http2Server.addListener('clientError', (err, socket) => {
          // No clear way to signal back to client why a socket has failed because
          // it's unsure if it's HTTP1 or HTTP2. Just destroy and log server-side.
          if (!socket.destroyed) socket.destroy(err);
          if (err?.code === 'ECONNRESET') {
            console.warn('HTTP/2 client connection reset.');
            return;
          }
          console.error('HTTP/2 client error', err);
        });
        this.http2Server.addListener('sessionError', (err, session) => {
          if (!session.destroyed) session.destroy(err);
          if (err?.code === 'ECONNRESET') {
            console.warn('HTTP/2 client connection reset.');
            return;
          }
          console.error('HTTP/2 sessionError error', err);
        });

        this.http2Server.addListener('session', (session) => {
          const identity = `${session.socket.remoteFamily}:${session.socket.remoteAddress}:${session.socket.remotePort}`;
          session.setTimeout(60_000, () => {
            console.warn(`HTTP/2 session ${identity} timed out.`);
            session.destroy(new Error('SESSION_TIMEOUT'));
          });
          const pingInterval = setInterval(() => {
            if (session.destroyed) {
              clearInterval(pingInterval);
            } else {
              session.ping((err) => {
                if (!err) return;
                if (session.destroyed) return;
                if (err.code === 'ERR_HTTP2_PING_CANCEL') return;
                console.error(`Ping to ${identity} failed.`, err);
              });
            }
          }, 15_000);
        });

        // Logic handlers
        this.http2Server.addListener('stream', (stream, headers) => {
          stream.addListener('error', (err) => {
            if (err?.code === 'ECONNRESET') {
              console.warn('HTTP/2 stream connection reset.', headers[':path']);
            } else {
              console.error('HTTP/2 stream error', headers, err);
            }
          });
          this.httpHandler.handleHttp2Stream(stream, headers).catch((err) => {
            console.error('HTTP2 handler failed.', err);
          });
        });
        this.http2Server.addListener('request', (req, res) => {
          if (req.httpVersionMajor >= 2) return;
          // @ts-ignore Ignore typings
          req.setTimeout(300_000, (socket) => {
            if (!socket) {
              console.warn('HTTP1 in HTTP2 request (unknown) timed out.');
              return;
            }
            const identity = `${socket.remoteFamily}:${socket.remoteAddress}:${socket.remotePort}`;
            console.warn(`HTTP1 in HTTP2 request ${identity} timed out.`);
            socket.destroy(new Error('SOCKET_TIMEOUT'));
          });
          res.setTimeout(300_000, (socket) => {
            if (!socket) {
              console.warn('HTTP1 in HTTP2 response (unknown) timed out.');
              return;
            }
            const identity = `${socket.remoteFamily}:${socket.remoteAddress}:${socket.remotePort}`;
            console.warn(`HTTP1 in HTTP2 response ${identity} timed out.`);
            socket.destroy(new Error('SOCKET_TIMEOUT'));
          });
          req.addListener('error', (err) => {
            if (err?.code === 'ECONNRESET') {
              console.warn('Request stream connection reset.', req.url);
            } else {
              console.error('Request stream error.', req.url, req.headers, err);
            }
          });
          res.addListener('error', (err) => {
            if (err?.code === 'ECONNRESET') {
              console.warn('Response stream connection reset.', req.url);
            } else {
              console.error('Response stream error', req.url, req.headers, err);
            }
          });
          this.httpHandler.handleHttp1Request(req, res).catch((err) => {
            console.error('HTTP1 in HTTP2 handler failed.', err);
          });
        });

        resolve(this.http2Server);
      });
      this.http2Server.addListener('error', reject);
    });
  }

I do have a way to pull a v8.getHeapSnapshot() on my server, and I'll double check later on to see if any TLSSocket are still leaking in production.

Edit: The most common output is the following three lines together:

HTTP2 socket undefined:undefined:undefined timed out.
HTTP/2 session IPv4:xxx.xxx.xxx.xxx:62938 timed out.
HTTP/2 sessionError error Error: SOCKET_TIMEOUT

I don't know why clients leave without Connection Reset, but I can imagine if you don't handle session timeouts, you will leak. 99% of all my connections are mobile, and many are asset tracking (moving vehicles). Disconnecting on one IP and reconnecting on another because they switch cell towers could be a likely explanation. The other is the use of WebSockets that go idle when they switch away from the app (PWAs).

We did see a higher number of Connection Reset with the iOS 15 after sending 301/302. Safari doesn't gracefully leave. But this should be handled. I have to let it run longer to see if TLSSockets are still leaking. All servers are v16.14+

Edit2: To expalin why I feel I have a leak somewhere is because my Active Handles creep little by little:

image

The red one is one client after 3 months (just rebooted it now).

@clshortfuse
Copy link
Contributor Author

@rahbari Do you have a sessionError listener? You might need it.

If you're just tracking streams (essentially children of sessions), you may not be catching session-level events. Clients timing out would be a session issue. It could be possible that the stream closes properly, but the session hasn't yet. (eg: a session with 0 streams). You might have to force destroy which seems to be the main cleanup method from my code ^.

I catch the sessionError (timeout) and destroy. That's why log trace is in that order (destroy first, which bubbles, then log the timeout). I think streams don't timeout, so you have go a level higher.

Also, not 100% sure, but if you never try to send anything, you'll never trigger a timeout. That means the session.ping() is necessary (forcing traffic).

@mcollina
Copy link
Member

Maybe @jasnell could help

@rahbari
Copy link

rahbari commented Sep 27, 2022

@clshortfuse
I've set the timeout of h2 server to 2 minutes:
setTimeout(120 * 1000, s => s.destroy())
i keep a weakref to all streams and assign them a weakref to their session (also and index, url and time). the leaking streams destroy are true and their internal session field are undefined. but the weakref to session (s) is still valid and show that it's also destroyed but none of them are garbage collected.
image
image
I will add sessionError to see if there is any error regarding those streams or not.
the same exact code which only serves on http on another server works without any problem.

@clshortfuse
Copy link
Contributor Author

clshortfuse commented Sep 27, 2022

@rahbari I see.

You're not considering the possibility of a session never goes as far as creating a session. I believe Http2SecureServer.setTimeout() is just for establishing a connection, not once it's made. Because a session can have multiple streams because of multiplexes, it doesn't mean because one stream ends, the client is done. So there is an idle period that I'm not sure auto-destroys. It might stay in memory because it's Http2Session holds it in memory, not your code. I don't believe an Http2Stream can dangle without a HTTP2Session since it's a parent/child relationship. So, if we walk through psuedo code, I believe it works like this:

  1. HTTP2Server receives a TCP connection request (can timeout)
  2. Create HTTP2Session on Connection
  3. Listen for Request (can timeout) <=== HERE
  4. Create HTTP2Stream on Http2Session on Request
  5. Listen for Request Stream on HTTP2Stream (can timeout)
  6. Send Response Stream on HTTP2Stream (can timeout)
  7. End Http2Stream
  8. Go to 3 (continue waiting for new request)

I believe your code doesn't account for the possibility of a timeout between the time a client establishes a connection and a stream is received. Http1 doesn't have this intermediary session, so it goes Connect => Request, whereas HTTP2 is Connect => Session => Stream (Request). IIRC Http1Server handles connections with timeout requests with the HTTPServer.setTimeout, but Http2Server leaves that up to the session.

I not 100% sure adding this session timeout is required, but the authors would know. IIRC, it was mandatory and not well documented.

Edit: Sorry, I see that you do have session-destroyed:true in the picture. I remember I had a leak somewhere and I had buffer remaining on the stream and session. Because of the way I was piping data, I didn't properly destroy all references to the stream. I believe you can hold the entire chain (transform => stream => session) in limbo because it can't flush. The 'destroy' from the session wasn't bubbling back to the transform stream (eg: gzip) that was responding to the stream. Using steam.pipeline solved that for me.

@rahbari
Copy link

rahbari commented Sep 27, 2022

@clshortfuse I've tested setTimeout previously with an EventSource and it works fine on established connection. I think it is called when the socket itself is idle for defined time.
the problem is even though these streams and their session are all destroyed, but they are not garbage collected. this happens everywhere not just for pipes, even for stream that simply calls end('data');
I will set an error listener to investigate this more.

@clshortfuse
Copy link
Contributor Author

Going back to original comment, I no longer appear to have a TLSSocket leak. All the methods I have appear to have been plugged it.

image

image

I don't know if my Http2 setup is overkill, watching for everything, but it does indeed work.


@rahbari I used to use a similar setup for SSE (EventSource) over HTTP2 by using a WeakRef, but I moved to just storing the req/res as an object and keep them in a set. I also use pipeline. I can share my production eventsource code in hope it can help track down your issue:

/** @type {Set<HttpTransaction>} */
export const eventStreamSet = new Set();

/** @type {MiddlewareFunction} */
async function onEventStream(transaction) {
  if (!isAuthenticated(transaction)) {
    return 401;
  }

  const { request, response, httpVersion, locals, socket } = transaction;

  const { headers, searchParams } = request;
  locals.eventChannels = [
    ...searchParams.getAll('channel'),
    ...(searchParams.get('channels')?.split(',') ?? [])]
    .map((type) => type.trim())
    .filter((t) => t);

  const identity = `${socket.remoteFamily}:${socket.remoteAddress}:${socket.remotePort}`;

  logger.log(`Event stream subscription via ${identity} from ${headers?.['user-agent'] ?? '(anonymous)'} for ${locals.eventChannels} by ${JSON.stringify(locals.auth)}`);

  response.status = 200;
  response.headers['content-type'] = 'text/event-stream';
  response.headers['cache-control'] = 'no-store';
  if (httpVersion?.charAt(0) === '1') {
    response.headers.connection = 'keep-alive';
  }

  let keepAliveInterval;
  try {
    await new Promise((resolve, reject) => {
      eventStreamSet.add(transaction);
      const stream = response.getPipeline((err) => {
        if (err) reject(err);
        else resolve();
      });
      stream.write(buildSSEPacket('ping', Date.now()));
      keepAliveInterval = setInterval(() => {
        try {
          if (!stream.writable) return;
          stream.write(buildSSEPacket('ping', Date.now()));
        } catch {}
      }, 15_000);
    });
    logger.log(`Event stream via ${identity} closed successfully.`);
  } catch (e) {
    switch (e?.code) {
      case 'ERR_STREAM_PREMATURE_CLOSE':
        logger.log(`Event stream via ${identity} closed prematurely.`);
        break;
      default:
        logger.log(`Event stream via ${identity} closed unexpectedly.`, e);
    }
  }
  clearInterval(keepAliveInterval);
  eventStreamSet.delete(transaction);
  return HttpHandler.END;
}

/**
 * @param {string} channel event channel
 * @param {string} type event type
 * @param {string|Object} data string or JSON.stringify(object)
 * @return {number}
 */
export function emitSSE(channel, type, data) {
  let count = 0;
  let output = null;
  for (const { response, locals } of eventStreamSet) {
    // console.log(res.locals.eventChannels, channel);
    if (!locals.eventChannels?.includes(channel)) continue;
    if (output == null) {
      output = buildSSEPacket(type, data);
    }
    // if (id) res.socket.write(`id: ${id}`);
    // if (retry != null) res.socket.write(`retry: ${retry.toString(10)}`);
    const sseStream = response.getPipeline();
    sseStream.write(output);
    count++;
  }
  // if (count) console.log('writing SSE', output, count);
  return count;
}

The web framework is custom, but close to fastify or koa. I do suggest you dump your snapshot to get a good look at what's still holding references: I can share my dump middleware:

/** @type {MiddlewareFunction} */
function onDumpHeapSnapshot(transaction) {
  if (!isAuthenticated(transaction)) return 401;
  if (!transaction.locals.auth.isAdmin) return 403;

  transaction.response.status = 200;

  transaction.response.headers['content-type'] = 'application/octet-stream';
  transaction.response.headers['cache-control'] = 'no-store';
  transaction.response.headers['content-disposition'] = `attachment; filename=${Date.now()}.heapsnapshot`;
  if (!transaction.response.stream.writable) throw new Error('NOT_WRITABLE');
  return getHeapSnapshot();
}

I will opt not to close this, because I'd be interested to know what the proper minimum error catching is needed for H2 connections. Hopefully the HTTP2 team can direct us.

@diervo
Copy link

diervo commented Nov 15, 2022

For what it is worth in this discussion:

I'm using the latest node LTS at the moment (18.12.1) and the latest fastify (4.9.2).
I started getting all sorts of Error: read ECONNRESET once I enabled http2 + tls certs. Breaks both locally and in production.

It's tough to reproduce, and the stack trace only goes as far as node_internals, making it hard to debug.

I wonder if anyone has any direction on where to put the right handlers to at least avoid the crash (assuming some memory leaks might remain...)

@clshortfuse
Copy link
Contributor Author

@diervo I've recently started using https://github.com/myndzi/wtfnode to trace possible leaks. In my case, with the HTTP code from above, I'm not leaking. The http2 ping interval at 15_000ms ensures an idle connection is dropped. It'll trigger a timeout error (or maybe connection reset).

fastify should be tracking ECONNRESET generally. But if you attach an upstream (pipe to res) or downstream (read from req) you always run the risk of that stream dropping because the client left. Then that error will bubble up to your code. Connection resets are normal, but not handling the error that spawns can be dangerous.

@mcollina
Copy link
Member

I'm using the latest node LTS at the moment (18.12.1) and the latest fastify (4.9.2).
I started getting all sorts of Error: read ECONNRESET once I enabled http2 + tls certs. Breaks both locally and in production.

Are those hard crashes or logs?

@diervo
Copy link

diervo commented Nov 16, 2022

These are the logs right before a hard crash. I have fastify logging to true and graceful-shutdown.

I'm unclear if fastify is managing the error handler and rethrowing somewhere or is just node logging and crashing.

When running locally node stops, when in prod, is making k8s container restart.

Another weird observation, this only happen after some time of inactivity: I will let the sever running for a while locally and only after some lond idle time it will crash

@mcollina
Copy link
Member

if you can open a bug in Fastify ideally with reproduction, as it is a bug there and not related to this issue: Fastify should handle that condition.

@clshortfuse
Copy link
Contributor Author

After a bit more code cleanup, I added another line to my server logic:

this.http2Server.addListener('stream', (stream, headers) => {
  stream.setTimeout(300_000, () => {
    stream.destroy(new Error('SOCKET_TIMEOUT'));
  });
  /* snip */
});

Theoretically, a session may stay active and respond to pings, but a stream (eg: resource request) may have timed out. It appears as though Http2Session.timeout defaults to undefined. Typescript typings are out of date and says the prop doesn't exist, but it does.


Another note, using VSCode debugger (or probably any debugger) won't allow garbage collection if you do console.log(stream). Custom loggers may hold references as well, so be careful. Right now I'm testing with --expose-gc global.gc(), and Set<WeakRef<Http2Stream>> objects to see if references are being held. I had a client memory dump appear with many NgHttp2StreamWrite references. It might have very well been the lack of a stream timeout.

@clshortfuse
Copy link
Contributor Author

clshortfuse commented Nov 16, 2022

Well, I found my memory leak. It's socket.io:

image

This is after about 90 minutes of running. Apparently the polling method doesn't get garbage collected. In the screenshot, streams that have been up since 2283ms after application load are still in memory. Whereas other sessions and streams do eventually garbage collect (eg: 90 minutes after application load).

It's only the socket.io streams that appear orphaned. All streams in my handler have a matching session.

Generally, if a stream is "stuck" while destroyed, it'll get cleaned up when the session gets destroyed. I checked the NodeJS code, and find it strange that it doesn't garbage-collect the last stream on a session after the stream gets destroyed, but knowing that clean up will happen when the session ends is fine, I guess. Maybe there's a leak here? A stream shouldn't have to wait for it's parent session to be destroyed before garbage collection. I color coded some entries and at least there are no orphaned streams with my handler.

For now, I'll work on plugging the socket.io leak and then collect more data.

@ovflowd
Copy link
Member

ovflowd commented Feb 27, 2023

If that's the case, it makes sense to report this issue to the socket.io repository.

danfuzz added a commit to danfuzz/lactoserv that referenced this issue Mar 20, 2023
Tweaked timeouts for HTTP2 connections, in the hopes of working around
an apparent HTTP2-related memory leak in the Node core library. This is
[issue #42710](nodejs/node#42710) in the Node
repo.
@danfuzz
Copy link

danfuzz commented Mar 20, 2023

BTW, while investigating what (I think) turned out to be this issue in my system, I noticed an inconsistency between RSS reported by GC tracing vs. process.memoryUsage (at least on Node 18.*, which is what I'm currently running in production), with the latter seeming to under-report the amount of memory in use.

@bnoordhuis
Copy link
Member

In light of the previous comments it sounds like this wasn't an issue with node after all? I'll go ahead and close this but let me know if there is reason to reopen...

...but, future commenters, file new issues if your bug only has the general shape of what's being discussed here. Exact matches only.

@bnoordhuis bnoordhuis closed this as not planned Won't fix, can't repro, duplicate, stale Mar 21, 2023
@ywave620
Copy link
Contributor

For ppl come to here, you can try the fix in #55966

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants