Exceptions with stacktraces kept alive for long cause pools not to be destroyed #3082

joaoe · 2023-06-29T23:39:18Z

Subject

Hi.
This is a tricky subject to understand, but it is a regression introducedy by
a80c248
present in urllib3>=1.26.16

That commit causes poolmanager to clear its state (namely connections) based on the whims of the garbage collector.
However, there is a pretty simple scenario where references to poolmanager can be kept alive for too long and code that would reliably close all connections and clear all state no longer works.

I happened to stumble upon this with a test file which tests for error handling using the requests library which internally uses urllib3.
Basically I trigger errors inside the urlib3 code by using a local mock http server which misbehaves.
The urllib3 code then gets a RemoteDisconnected() exception from http/client.py", line 287, in _read_status.
The exception includes the traceback together with outer stack frames. The traceback keeps all objects in the entire call chain pinned in memory.

Then this exception is saved inside a Retries object in connectionpool.py, here

            retries = retries.increment(
                method, url, error=new_e, _pool=self, _stacktrace=sys.exc_info()[2]
            )

Another location is in urllib3/util/retry.py where a retry object is created with the history of failed requests, which also saves exceptions

        new_retry = self.new(
            [...]
            history=history,
        )

Basically, any exception being triggered causes this problem.

To put the cherry on top, these messages are logged, and when using pytest (or another testing framework), it happily collects all log messages (LogRecord objects) for reporting after the test finishes, which in turn keep the retry objects alive inside LogRecord.args. So during a unit test, nothing gets garbage collected.

Actual Behavior

So below is a smaller version of the test I had which gets stuck indefinetly. It gets stuck in the HTTPServer.shutdown() call because it waits for the serve_forever loop to quit and set a threading.Event. However, that loop is stuck because the socket and connection are still open, because the session and connections have not been garbage collected.

Expected Behavior

Although exceptions can be saved, their tracebacks should be cleared. Every location where an exception is saves as property of some object, the stacktrace should be cleared with ex.with_traceback(None).

Steps to Reproduce

This test hangs.

import logging
import threading

import pytest
import requests.adapters

logger = logging.getLogger(__name__)


class HttpServerControl:
    force_error: str = ""
    base_url: str = ""
    server_handled_request: threading.Event


@pytest.fixture(scope="module")
def http_server() -> HttpServerControl:
    import threading
    from http.server import BaseHTTPRequestHandler, HTTPServer, HTTPStatus, _get_best_family

    # The code in this function was adapted from http.server.test()

    protocol = "HTTP/1.1"  # HTTP 1.1 supports automatic Keep-Alive
    port = 0  # Use port 0 so a free random port is allocated
    bind = "127.0.0.1"  # Bind to localhost so nothing else can connect.

    class LocalHttpServer(HTTPServer):
        thread = None
        socket = None

        def shutdown_request(self, request):
            super().shutdown_request(request)
            server_ctl.server_handled_request.set()

        def shutdown(self, *_):
            logger.info("HTTPServer shutdown")
            if self.thread:
                super().shutdown()
                self.thread.join()
            self.socket = self.thread = None

        def start(self):
            assert not self.thread
            self.thread = threading.Thread(target=httpd.serve_forever)
            self.thread.start()

        def __enter__(self):
            return self

        __del__ = __exit__ = shutdown

    class RequestHandler(BaseHTTPRequestHandler):
        protocol_version = protocol

        def do_GET(self):
            self.log_message("GET %s HTTP/%s", self.path, self.protocol_version)
            force_error, server_ctl.force_error = server_ctl.force_error, ""
            if force_error:
                self.close_connection = True
                self.log_message("Mark close connection: forced %s [%s]", force_error, self.path)
                return

            contents = b"{}"
            try:
                self.log_message("GET OK %s [start response]", self.path)
                self.send_response(HTTPStatus.OK)
                self.send_header("Content-type", "application/json")
                self.send_header("Content-Length", str(len(contents)))
                self.end_headers()
                self.wfile.write(contents)
                self.log_message("GET OK %s [end response]", self.path)

            except OSError as e:  # Host stops reading too early.
                self.log_message("GET OK %s [failed response]\n%s", self.path, str(e))

        def log_message(self, msg, *args):
            logger.info(f"[http.server] {msg}", *args)

        def log_error(self, msg, *args):
            logger.error(f"[http.server] {msg}", *args)

    server_ctl = HttpServerControl()
    server_ctl.server_handled_request = threading.Event()

    LocalHttpServer.address_family, addr = _get_best_family(bind, port)
    with LocalHttpServer(addr, RequestHandler) as httpd:
        host, port = httpd.socket.getsockname()[:2]
        url_host = f"[{host}]" if ":" in host else host
        server_ctl.base_url = f"http://{url_host}:{port}/"
        logger.info("HTTPServer at %s", server_ctl.base_url)
        httpd.start()

        yield server_ctl


def http_get(http_server, session, param):
    http_server.server_handled_request.clear()

    logger.info("[client] GET %s/param?%s", http_server.base_url, param)
    with session.get(http_server.base_url, params=dict(param=param)) as response:
        response.raise_for_status()
        assert len(response.text) == 2  # read everything
        logger.info("[client] %s %r", response.status_code, response.text)

    http_server.server_handled_request.wait()


def test_http_connection_errors(http_server: HttpServerControl, monkeypatch):
    logger_urllib = logging.getLogger("urllib3")
    monkeypatch.setattr(logger_urllib, "level", logging.DEBUG)

    http_server.force_error = "abort_response"

    session = requests.Session()
    max_retries = requests.adapters.Retry(total=2, connect=1, read=1)
    adapter = requests.adapters.HTTPAdapter(pool_connections=1, pool_maxsize=1, max_retries=max_retries)
    session.mount("http://", adapter)

    with session:
        http_get(http_server, session, "A")

Environment

Describe your environment.
At least, paste here the output of:

OS Windows-10-10.0.19044-SP0
Python 3.10.11
OpenSSL 1.1.1t  7 Feb 2023
urllib3 2.0.3

The text was updated successfully, but these errors were encountered:

joaoe changed the title ~~Cyclic references with exceptions cause pools not to be destroyed~~ Exceptions with stacktraces kept alive for long cause pools not to be destroyed Jun 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exceptions with stacktraces kept alive for long cause pools not to be destroyed #3082

Exceptions with stacktraces kept alive for long cause pools not to be destroyed #3082

joaoe commented Jun 29, 2023 •

edited

Loading

Exceptions with stacktraces kept alive for long cause pools not to be destroyed #3082

Exceptions with stacktraces kept alive for long cause pools not to be destroyed #3082

Comments

joaoe commented Jun 29, 2023 • edited Loading

Subject

Actual Behavior

Expected Behavior

Steps to Reproduce

Environment

joaoe commented Jun 29, 2023 •

edited

Loading