Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support socketio auth option in connect handler #1497

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nethi
Copy link

@nethi nethi commented Oct 31, 2024

socketio by default uses ["polling", "websocket"] as transports in that order. When we have multiple instances of chainlit server with a load balancer in the front, this creates a problem. The service that responds with sessionId is not necessarily the same instance when subsequent request is made. While there are other complex strategies such as an external cache (redis etc), given websocket is supported broadly, a simple strategy is to use just "websocket" transport alone.

However, passing Authorization header based token mechanism doesn't work with browser based socketio clients, as described here, and here

const socket = io(uri, {

      const socket = io(uri, {
        path,
        extraHeaders: {
          Authorization: accessToken || '',
          'X-Chainlit-Client-Type': client.type,
          'X-Chainlit-Session-Id': sessionId,
          'X-Chainlit-Thread-Id': idToResume || '',
          'user-env': JSON.stringify(userEnv),
          'X-Chainlit-Chat-Profile': chatProfile
            ? encodeURIComponent(chatProfile)
            : ''
        }
      });

Recommended solution with socketio is to use "auth" option

const socket = io(server, {
    transports: ['websocket', 'polling', 'flashsocket'],
    auth: {
        token: accessToken
    }
});

However, doing this in a custom client, throws following error in chainlit backend.

Traceback (most recent call last):
  File "/test/.pyenv/versions/generic-agent/lib/python3.10/site-packages/engineio/async_server.py", line 494, in run_async_handler
    return await self.handlers[event](*args)
  File "/test/pyenv/versions/generic-agent/lib/python3.10/site-packages/socketio/async_server.py", line 672, in _handle_eio_message
    await self._handle_connect(eio_sid, pkt.namespace, pkt.data)
  File "/test/.pyenv/versions/generic-agent/lib/python3.10/site-packages/socketio/async_server.py", line 551, in _handle_connect
    success = await self._trigger_event(
  File "/test/.pyenv/versions/generic-agent/lib/python3.10/site-packages/socketio/async_server.py", line 637, in _trigger_event
    ret = await handler(*args)
TypeError: connect() takes 2 positional arguments but 3 were given
ERROR:engineio.server:message async handler error
Traceback (most recent call last):
  File "/test/.pyenv/versions/generic-agent/lib/python3.10/site-packages/engineio/async_server.py", line 494, in run_async_handler
    return await self.handlers[event](*args)
  File "/test/.pyenv/versions/generic-agent/lib/python3.10/site-packages/socketio/async_server.py", line 672, in _handle_eio_message
    await self._handle_connect(eio_sid, pkt.namespace, pkt.data)
  File "/test/.pyenv/versions/generic-agent/lib/python3.10/site-packages/socketio/async_server.py", line 551, in _handle_connect
    success = await self._trigger_event(
  File "/test/.pyenv/versions/generic-agent/lib/python3.10/site-packages/socketio/async_server.py", line 637, in _trigger_event
    ret = await handler(*args)
TypeError: connect() takes 2 positional arguments but 3 were given

Basically, if auth option is enabled in client, python_socketio calls "connect" callback with a third argument "auth". But chainlit callback handler doesn't define a third argument.

async def connect(sid, environ):

@sio.on("connect")
async def connect(sid, environ):
    if (
        not config.code.on_chat_start
        and not config.code.on_message
        and not config.code.on_audio_chunk
    ):
        logger.warning(
            "You need to configure at least one of on_chat_start, on_message or on_audio_chunk callback"
        )
        return False
    user = None
    token = None
    login_required = require_login()

A simple solution is to add the third argument, auth, with default value of None and use auth["token"] if HTTP_AUTHORIZTION is not present. This is exactly done by this PR and works great.

@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. auth Pertaining to authentication. backend Pertains to the Python backend. labels Oct 31, 2024
@nethi nethi changed the title support socketio auth option support socketio auth option in connect handler Oct 31, 2024
@nethi
Copy link
Author

nethi commented Oct 31, 2024

Here is a test html I used to test with a stand-alone browser based client.
index.html.zip

@dokterbob
Copy link
Collaborator

Thanks for the contrib @nethi. We're looking into adding HTTP only cookie support, and making that the default auth for the frontend. That would significantly improve security and potentially resolve other issues as well.

Might it work for your use case, too?

@nethi
Copy link
Author

nethi commented Oct 31, 2024

Thanks for the contrib @nethi. We're looking into adding HTTP only cookie support, and making that the default auth for the frontend. That would significantly improve security and potentially resolve other issues as well.

Might it work for your use case, too?

Can you share some details as to how that would look ? Assume that would work with websockets too - haven't tried.

@dokterbob
Copy link
Collaborator

I don't know about socketio specifically, but in general on the FastAPI side it seems cookie-based websocket auth is a pretty standard thing: https://fastapi.tiangolo.com/advanced/websockets/#using-depends-and-others

AFAIK WebSockets start out with a normal HTTP request.

The advantage of HTTP only cookies is that JS never has access to the auth token, so XSS attacks which are normally a pain to deal with disappear entirely (though CSRF attacks require proper config server-side).

@sandangel
Copy link
Contributor

Is it possible to use Server Send Event and remove socket.io? So we can have simpler architecture and more stable, scalable with stateless servers.

@dokterbob
Copy link
Collaborator

chainlit/libs/react-client/src/useChatSession.ts

No, we're not gonna do that. SSE's are, like you said, server-sent events. Websockets are bidirectional. The connection persistence are what allows a smooth UX, the socketio bus is basically the core of our functionality.

@dokterbob
Copy link
Collaborator

@nethi I'm looking to migrate to httponly based cookies as the default frontend -> backend token bearer over the next 2 weeks or so. This removes XSS attack vectors and will allow us to properly secure files, which are currently served, effectively, unauthenticated.

That having been said, your patch looks like a good solution and perhaps we would like to have it in as a fixup (also for the 1.3 branch, which might not get the cookie-based auth).

In the description, you're suggesting patching useChatSession. I know for a fact there's others in your situation, so I think this change should be part of this fix. That will also enable E2E tests for it.

So, could you please add your suggested changes to useChatSession to the PR? :)

@dokterbob dokterbob added the blocked Awaiting update or feedback from user after initial review/comments. label Nov 6, 2024
@nethi
Copy link
Author

nethi commented Nov 6, 2024

@dokterbob It turns out that for websocket only transport to work with chainlit, needs more than this fix. Initially, I thought Authorization header is the only problem so sending it as Auth option is a good workaround. But today I have been debugging this for few hours - it seems any header in extraHeaders will not come through websocket transport for browser based clients as browser websocket APi doesn't have any APIs to send these headers.
https://socket.io/docs/v4/client-options/#extraheaders

In addition to Authorization header, Chainlit depends on few X-Chainlit-* headers to be sent - key among them is the session id. I am thinking of using query parameters instead and ofcourse that means, more custom changes to chainlit code :-(

In summary, you mayn't want "Auth" option to be passed by default in useChatSession.ts as it mayn't work for all. Probably, leave it as a cookbook.

Given all this, I think Cookie approach looks to be a good option. Please consider extending that to X-Chainlit-* headers as well. Otherwise, it will be an incomplete solution.

Being able to deploy websocket only transport is key to run in lot of multi-replica environments (K8s, reverse proxy, load balancers etc).

@dokterbob
Copy link
Collaborator

@nethi How about #1339?

@nethi
Copy link
Author

nethi commented Nov 7, 2024

@dokterbob #1339? only seems to enable sending cookies and other CORS related headers to the backend. By itself, this won't address the issues I mentioned. Btw, I would be looking at these fixes for my copilot enablement in the next week or two.

I have been able to get everything working end to end with these additional changes - Here, I am passing additional headers as query parameters. Cookies could be another way to pass these chainlit headers in websocket only mode.

This is example working code and not PR

Test Client

    accessToken = "Bearer eyJhbXXX-XXXX";
    const socket = io('http://localhost:8080', {
      transports: ['websocket'],
      query: {
          'X-Chainlit-Client-Type': "browser",
          'X-Chainlit-Session-Id': sessionId,
          'X-Chainlit-Thread-Id': '',
          'X-Chainlit-Chat-Profile': "profile1",
      },
      path: '/api/generic-agent/agent1/ws/socket.io',
      auth: {
        token: accessToken
      },
      extraHeaders: {
          Authorization:  accessToken,
          'X-Chainlit-Client-Type': "browser",
          'X-Chainlit-Session-Id': sessionId,
          'X-Chainlit-Thread-Id': '',
          'X-Chainlit-Chat-Profile': "profile1",
        } 
    });

To have headers retrieved either form Query or from headers, I made these changes locally in chainlit backend

socket.py

def get_ws_header_parameter(environ, header_name):
    from urllib.parse import parse_qs
    query_string = environ.get('QUERY_STRING', '')
    query_params = parse_qs(query_string)
    value =  query_params.get(header_name, [None])[0]
    if value is None:
        value = environ.get(f"HTTP_{header_name.replace('-', '_').upper()}")

    return value

@sio.on("connect")
async def connect(sid, environ, auth=None):
....
....
    session_id =  get_ws_header_parameter(environ, "X-Chainlit-Session-Id")

    if restore_existing_session(sid, session_id, emit_fn, emit_call_fn):
        return True

    user_env_string = get_ws_header_parameter(environ, "user-env")
    user_env = load_user_env(user_env_string)

    client_type = get_ws_header_parameter(environ, "X-Chainlit-Client-Type")
    http_referer = environ.get("HTTP_REFERER")
    url_encoded_chat_profile = get_ws_header_parameter(environ, "X-Chainlit-Chat-Profile") 
    chat_profile = (
        unquote(url_encoded_chat_profile) if url_encoded_chat_profile else None
    )

    WebsocketSession(
        id=session_id,
        socket_id=sid,
        emit=emit_fn,
        emit_call=emit_call_fn,
        client_type=client_type,
        user_env=user_env,
        user=user,
        token=token,
        chat_profile=chat_profile,
        thread_id= get_ws_header_parameter(environ, "X-Chainlit-Thread-Id"),
        languages=environ.get("HTTP_ACCEPT_LANGUAGE"),
        http_referer=http_referer,
        http_path=environ.get("PATH_INFO"),
    )

    trace_event("connection_successful")
    return True

@dokterbob
Copy link
Collaborator

Thanks for the update!

@dokterbob #1339? only seems to enable sending cookies and other CORS related headers to the backend. By itself, this won't address the issues I mentioned. Btw, I would be looking at these fixes for my copilot enablement in the next week or two.

There's a LOT going on in the code and the community -- so I'm not always able to go fully in-depth on every issue. But other community members involved in the same part can! I noticed that the issues which both of you seem to be experiencing are related to load balancing and websockets. So I figured they might be related and, as such, you talking directly to the other person might be conducive to solving these issues, regardless of whether they're the same..

I have been able to get everything working end to end with these additional changes - Here, I am passing additional headers as query parameters. Cookies could be another way to pass these chainlit headers in websocket only mode.

At one point, hopefully, we'll thoroughly review the full API. There's currently some parts stateful (e.g. with context and/or session) and some parts stateless, then there's also the authentication layer. I will start going deep into this over the coming 2/3 weeks, in part to improve Chainlit's security.

This is example working code and not PR

Any idea why?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auth Pertaining to authentication. backend Pertains to the Python backend. blocked Awaiting update or feedback from user after initial review/comments. size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants