Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncio.sslproto._SSLProtocolTransport can experience invalid state, leading to silent failures. #118950

Open
cjavad opened this issue May 11, 2024 · 3 comments
Labels
topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@cjavad
Copy link

cjavad commented May 11, 2024

Bug report

Bug description:

TL;DR

_SSLProtocolTransport.is_closing should match its inner _SelectorTransport.is_closing, indicating to the user that the transport is actually closed instead of silently logging an error.

Description

I've been using the aio-libs library aiohttp in production together with its WebSocket client implementation, and found an interesting issue that sometimes occured on certain devices (Specifically M-series macbooks).

The logs i've been seeing looks something like this:

( Indication that we are sending messages over websocket successfully )
...
[2024-05-07 09:34:13,403] WARNING asyncio.write: socket.send() raised exception.
...
( Sucessfully sent a message over websocket ) 
...
[2024-05-07 09:34:13,553] WARNING asyncio.write: socket.send() raised exception.

( No more indication that we're sending or recieving messages over websocket )

Digging deeper the issue occurs when the connection has been lost due to an exception when invoking socket.send, this normally will result in the Transports is_closing() function returning True.

The issue occurs when using TLS, which now uses the transport _SSLProtocolTransport which implements its own is_closing logic.

When _SocketSelectorTransport.write gets an OSError such as Broken Pipe (which is the issue i've experienced in the wild) it sets its inner transport state as closed but when a library such as aiohttp checks its transport is_closing it returns False leading to it silently assuming that it is still connected.

I've been able to recreate the flow by raising a different exception (by manually closing the socket) but the error source and flow is the same in both cases as far as i can tell.

Full example (out of the box + SSL cert generation)

import asyncio
import contextlib
import logging
import socket
import ssl
import subprocess
import tempfile


@contextlib.contextmanager
def server_ssl_context(host):
    with tempfile.NamedTemporaryFile() as keyfile, tempfile.NamedTemporaryFile() as certfile:
        subprocess.run([
            'openssl', 'req', '-new', '-newkey', 'ec', '-pkeyopt', 'ec_paramgen_curve:prime256v1',
            '-keyout', keyfile.name, '-nodes', '-x509', '-days', '365', '-subj', f'/CN={host}',
            '-out', certfile.name,
        ], check=True, shell=False)

        context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
        context.load_cert_chain(certfile.name, keyfile.name)

        yield context


@contextlib.contextmanager
def client_ssl_context():
    try:
        context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH)
        context.check_hostname = False
        context.verify_mode = ssl.CERT_NONE

        yield context
    finally:
        pass


async def client_handle(reader, writer):
    ...


async def main(host, port):
    with server_ssl_context(host) as server_context, client_ssl_context() as client_context:
        await asyncio.start_server(client_handle, host, port, ssl=server_context)
        reader, writer = await asyncio.open_connection(host, port, ssl=client_context)

        transport = writer._transport

        from asyncio.sslproto import _SSLProtocolTransport
        assert isinstance(transport, _SSLProtocolTransport)

        inner_transport = transport._ssl_protocol._transport

        from asyncio.selector_events import _SelectorSocketTransport
        assert isinstance(inner_transport, _SelectorSocketTransport)

        sock: socket.socket = inner_transport._sock
        assert isinstance(sock, socket.socket)

        # Simulate a broken pipe, this invokes the OS error "Bad file descriptor"
        # but triggers the same flow as "Broken Pipe"
        sock.close()

        # Invoke write so socket returns an OSError
        print('[Client] Sending x6: %r' % 'Hello, world!')

        # Increment _conn_lost to more than 5 to trigger the logging.
        # This silently fails, but the logging is triggered.
        for i in range(6):
            writer.write('Hello, world!'.encode())
            await writer.drain()

        print(f"{inner_transport._conn_lost=}, {transport.is_closing()=}, {inner_transport.is_closing()=}")


if __name__ == '__main__':
    logging.basicConfig(level=logging.DEBUG)
    asyncio.run(main('localhost', 8443), debug=True)

CPython versions tested on:

3.12

Operating systems tested on:

Linux, macOS

Linked PRs

@cjavad cjavad added the type-bug An unexpected behavior, bug, or error label May 11, 2024
@cjavad
Copy link
Author

cjavad commented May 11, 2024

The current "mitigation" i've implemented looks something like this:

  def _ensure_internal_ssl_proto_state(self) -> bool:
    valid = True

    try:
        if not self.socket or not self.socket._writer:
            return valid

        transport = self.socket._writer.transport
        inner_transport = transport._ssl_protocol._transport

        from asyncio.sslproto import _SSLProtocolTransport, _SSLProtocol
        from asyncio.selector_events import _SelectorTransport

        assert isinstance(transport, _SSLProtocolTransport)
        assert isinstance(inner_transport, _SelectorTransport)

        if inner_transport.is_closing() and not transport.is_closing():
            self.logger.warning(
                "The internal SSL protocol state is invalid, manually marking top level SSL Transport as closed.")
            valid = False
            self.socket._closed = True
            transport._force_close(None)

    except (ImportError, AssertionError, AttributeError):
        pass
    except Exception as e:
        self.logger.warning("An exception occurred while ensuring the internal SSL protocol state", exc_info=e)
    finally:
        return valid

I am not sure what is intended when handling the error, it seems like it should propegate the issue with the connection_lost callback on the protocol, but that seems to not be the SSLProtocol class, and how the SSL transport should deal with a broken pipe for instance is beyond me at this time, hence the issue and not a pull request.

@cjavad cjavad changed the title asyncio.sslproto._SSLProtocolTransport can experience invalid state, leading to silent failures. asyncio.sslproto._SSLProtocolTransport can experience invalid state, leading to silent failures. May 11, 2024
@cjavad cjavad changed the title asyncio.sslproto._SSLProtocolTransport can experience invalid state, leading to silent failures. asyncio.sslproto._SSLProtocolTransport can experience invalid state, leading to silent failures. May 11, 2024
@cjavad cjavad changed the title asyncio.sslproto._SSLProtocolTransport can experience invalid state, leading to silent failures. asyncio.sslproto._SSLProtocolTransport can experience invalid state, leading to silent failures. May 11, 2024
@gvanrossum
Copy link
Member

Hi Javad, thanks for the detailed bug report, and especially for providing a stand-alone repro. Unfortunately SSL support in asyncio is an esoteric area where I don't seem to have much luck finding experts. So it may be some time before we have a fix. If you come up with a PR, don't be shy, and ping me directly on the PR.

@cjavad
Copy link
Author

cjavad commented May 12, 2024

After some further investigation the code is technically sane, the connection_lost does end up setting the top level transport as closed when the next async context switch happens and it is allowed to run the connection_lost callback, simply adding a await asyncio.sleep(0) before my last prints resolves the issue. Something in the end library application code must be interfering with this, or some other condition is required to be met preventing the connection_lost function from working as intended.

Here is another example, using the aiohttp library directly, the difference maker between the issue here and it working as intended is the await asyncio.sleep(0)

import asyncio

import aiohttp
from aiohttp import web

from issue_118950 import server_ssl_context, client_ssl_context


async def websocket_handler(request):
    ws = web.WebSocketResponse()
    await ws.prepare(request)

    async for msg in ws:
        print("Message received from client:", msg.data)

    return ws


async def server(host, port):
    with server_ssl_context(host) as context:
        app = web.Application()
        app.add_routes([web.get('/', websocket_handler)])
        runner = web.AppRunner(app)
        await runner.setup()
        site = web.TCPSite(runner, host, port, ssl_context=context)
        await site.start()


async def client(host, port):
    await asyncio.sleep(1)  # Wait for server to start

    with client_ssl_context() as context:
        session = aiohttp.ClientSession()
        async with session.ws_connect(f'wss://{host}:{port}', ssl_context=context) as ws:
            while True:
                await ws.send_str("Hello, server!")

                # This line fixes the issue.
                # await asyncio.sleep(0)

                try:
                    ws._writer.transport._ssl_protocol._transport._sock.close()  # Simulate a broken pipe
                except AttributeError:
                    ...


async def main(conn=('localhost', 8765)):
    await asyncio.gather(server(*conn), client(*conn))


if __name__ == '__main__':
    asyncio.run(main())

I can see a similar approach is used in asyncio/streams.py with a comment:

            # Wait for protocol.connection_lost() call
            # Raise connection closing error if any,
            # ConnectionResetError otherwise
            # Yield to the event loop so connection_lost() may be
            # called.  Without this, _drain_helper() would return
            # immediately, and code that calls
            #     write(...); await drain()
            # in a loop would never call connection_lost(), so it
            # would not see an error when the socket is closed.
            await sleep(0)

So something similar to this might be required. This logic is only triggered when the top transport is is_closing so it would require it to already be marked as closed before it would wait and run the function that sets it to close. So either it would have to respect the inner transport in the SSL Transport, or this logic should also exist another place.

cjavad added a commit to cjavad/cpython that referenced this issue May 12, 2024
…ocol internal transport closing state so that StreamWriter.drain will invoke sleep(0) which calls connection_lost and correctly notifies waiters of connection lost. (python#118950)
cjavad added a commit to cjavad/cpython that referenced this issue May 12, 2024
…ocol internal transport closing state so that StreamWriter.drain will invoke sleep(0) which calls connection_lost and correctly notifies waiters of connection lost. (python#118950)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-asyncio type-bug An unexpected behavior, bug, or error
Projects
Status: Todo
Development

No branches or pull requests

3 participants