Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce bifurcation of library internals by using low-level API in high-level API #1723

Open
cpu opened this issue Jan 4, 2024 · 8 comments

Comments

@cpu
Copy link
Member

cpu commented Jan 4, 2024

Is your feature request related to a problem? Please describe.

The gradual introduction of a lower-level API that grants callers more control over allocation has bifurcated the internal structure of Rustls somewhat. This can be problematic with respect to test coverage and maintenance: e.g. most of our existing test coverage is using the higher-level API.

Describe the solution you'd like

We should make a concerted effort to implement the higher-level API on top of the lower-level API, reducing the bifurcation and ensuring that our existing test coverage also applies to the lower level API without needing to duplicate test cases.

@cpu cpu self-assigned this Jan 23, 2024
@cpu
Copy link
Member Author

cpu commented Jan 25, 2024

I've spent some time thinking about this. Here's my thought process so far. @ctz, @djc does this match your expectations?

It seems like ClientConnection and ServerConnection should have their ConnectionCommon<Data> member replaced with an adapter layer that holds an UnbufferedConnectionCommon<Data> and additional buffers (ChunkVecBuffer instances?) that can augment the UnbufferedConnectionCommon state for use when handling each UnbufferedStatus we encounter while processing.

I'm imagining that the loop over UnbufferedConnectionCommon<Data>'s process_tls_records_common() results (e.g. this code in the client example) needs to be done in the adapter layer's process_new_packets() fn, but using a single call to process_tls_records_common() instead of in a loop while open_connection { ... }. Each call would handle one UnbufferedStatus state, discarding from the adapter's internal buffer at the end.

I think the reader, writer, read_tls and write_tls, and set_buffer_limit fns would be implemented based on the adapter's internal buffers. I think since complete_io is already implemented in terms of the other fns, it would be unchanged. Similarly I think export_keying_material and dangerous_extract_secrets fns are unchanged.

I think the majority of work would be in adapting process_new_packets to handle each of the UnbufferedStatus states. I need some assistance figuring out how to map the states to actions on the connection/buffers. I have some initial thoughts but I'll hold off until the general idea of trying to handle one UnbufferedStatus per call to process_new_packets is confirmed as sensible - it might be that I'm jammed up here because it's not the right approach.

WDYT?

@djc
Copy link
Member

djc commented Jan 26, 2024

It seems like ClientConnection and ServerConnection should have their ConnectionCommon<Data> member replaced with an adapter layer that holds an UnbufferedConnectionCommon<Data> and additional buffers (ChunkVecBuffer instances?) that can augment the UnbufferedConnectionCommon state for use when handling each UnbufferedStatus we encounter while processing.

I think my high-level goal would be that ConnectionCommon<Data> will contain an UnbufferedConnectionCommon<Data> plus fields that contain the buffers. I think ServerConnection and ClientConnection types should still hold ConnectionCommon<Data> because buffered/unbuffered is orthogonal to client/server AIUI.

Other than this detail, that all sounds good!

@ctz
Copy link
Member

ctz commented Jan 26, 2024

I have some initial thoughts but I'll hold off until the general idea of trying to handle one UnbufferedStatus per call to process_new_packets is confirmed as sensible - it might be that I'm jammed up here because it's not the right approach.

I think one call of process_new_packets should handle potentially many UnbufferedStatuses, until it gets ConnectionState::BlockedHandshake or has a completely empty received data buffer. The motivation for that is to maintain the existing API behaviour: where one read_tls can read many many individual TLS messages in one go, and then process_new_packets has to work through every one to make further progress.

Remainder sounds good!

@cpu
Copy link
Member Author

cpu commented Jan 26, 2024

I think ServerConnection and ClientConnection types should still hold ConnectionCommon because buffered/unbuffered is orthogonal to client/server AIUI.

I think we're on the same page here. Agreed for sure that the buffered/unbuffered part is orthogonal to whether it's a client or server conn.

I think one call of process_new_packets should handle potentially many UnbufferedStatuses, until it gets ConnectionState::BlockedHandshake or has a completely empty received data buffer. The motivation for that is to maintain the existing API behaviour: where one read_tls can read many many individual TLS messages in one go, and then process_new_packets has to work through every one to make further progress.

Ah ok yes, that makes sense and I think resolves some of the uncertainty I had with regards to how to map some of the UnbufferedStatus variants to operations within process_new_packets

Thank you both for the input 🙇

@cpu
Copy link
Member Author

cpu commented Feb 6, 2024

I'm still struggling to find a way to make the strategy I described above work with the existing buffered and unbuffered API designs. The biggest issue I'm wrangling right now is how to rectify the design of a client like tlsclient-mio with the way I was trying to structure the adapter between the buffered API and the unbuffered API.

Currently for both APIs the guts of the library put writable tls data into the core.common_state.sendable_tls buffer. In the existing buffered API the write_tls function on ConnectionCommon drains directly out of that common state buffer to the writer. In the unbuffered API that core.common_state buffer is pop'd from in UnbufferedConnectionCommon's process_tls_records_common fn, yielding EncodeTlsData state UnbufferedStatus events that the caller has to handle, producing bytes that get written to the caller's own outbound TLS buffer that it must write to a socket once the TransmitTlsData or WriteTraffic events are encountered.

My idea for how to replace the buffered API with an adapter to the unbuffered API was to have the adapter layer maintain that second outbound TLS buffer, writing to it as dictacted by the UnbufferedStatus's it processes, and draining from it in write_tls. The process_new_packets fn would be responsible for handling EncodeTlsData state's by encoding the data that was popped from the common state buffer, writing the contents into the adapter's outbound TLS buffer. After processing all of the unbuffered events until a connection close or a blocked handshake state the adapter's process_new_packets fn would yield an IoState with a tls_bytes_to_write field set to the length of the adapter's sendable TLS buffer.

The issue I'm seeing with tlsclient-mio is that it doesn't call process_new_packets until it finds the socket it's polling is readable. It calls ConnectionCommon::write_tls() up-front, which with the existing bifurcated API is enough to push the client hello to the socket (since its draining directly from the common state buffer) and then once the server writes its response the socket polls as readable and only then does it begin processing TLS packets. With the adapter based API, calling write_tls before process_new_packets is fruitless: the adapter's sendable TLS buffer is always empty at this stage. The client hello bytes won't be present in that buffer until we process the unbuffered EncodeTlsData event from a call to process_new_packets. Since there are no bytes to write from write_tls before processing we stall here - the socket is never readable because we never sent a hello.

Is there something obvious I'm missing or is this approach a non-starter?

@cpu
Copy link
Member Author

cpu commented Feb 6, 2024

Is there something obvious I'm missing or is this approach a non-starter?

Maybe the updated connection common write_tls should call process_new_packets itself before trying to write from the adapter's sendable TLS buffer? I'm not sure if that will break any other assumptions 🤔

@ctz
Copy link
Member

ctz commented Feb 7, 2024

How about doing process_new_packets inside ClientConnection::new? That would maintain the invariant that a client starts out life with something to send.

@cpu
Copy link
Member Author

cpu commented Feb 15, 2024

Continuing my practice of rubber ducking long comments into this issue.

How about doing process_new_packets inside ClientConnection::new? That would maintain the invariant that a client starts out life with something to send.

That was a helpful suggestion, but the same general problem crops up elsewhere.

APIs that queue data for the next write_tls

The existing API offers methods that are described as queuing tls data to be sent on the next write (e.g. send_close_notify). With the buffered->unbuffered bridge approach calling one will queue deframer buffer data but as before, without a call to the bridge processing logic to handle the pending EncodeTlsData state unbuffered statuses there's nothing for write_tls to write immediately after the call. This breaks the described API and was stalling the tlsserver-mio example. I experimented with my idea to have write_tls call the bridge processing logic before trying to write and that seems to work, but its a blunt solution.

I have a local branch that works for all of the examples/ programs but I had to do some unpleasant refactoring to get there and I don't think it's the right solution. The challenge I keep bumping into is that the unbuffered API processing logic isn't as generic over a connection that is either a client connection or a server connection as the classic API is.

Loss of "side" generality in conn types

In the existing API most of the interesting processing logic fns (e.g. process_new_packets and complete_io) are defined on ConnectionCommon<Data> or ConnectionCore<Data>, generic over both client/server data. In turn the Stream and OwnedStream types can be generic over anything that derefs to ConnectionCommon<S>. In the new buffered API there isn't an equivalent to the Connection enum and the processing logic entry-point (process_tls_records) is implemented per-side (on UnbufferedConnectionCommon<ClientConnectionData> and UnbufferedConnectionCommon<ServerConnectionData> respectively). Unbuffered client and server connections process unbuffered status events differently to account for early data: clients can send early data, servers can read early data.

I tried to centralize as much of the processing logic as I could into generic BufferedConnectionCommon<Side> impls but ultimately the process_new_packets equivalent needed to be implemented on BufferedConnectionCommon<ClientConnectionData> and BufferedConnectionCommon<ServerConnectionData> instead of the generic type. This in turn spreads like an infection backwards through the API. We need to be able to process packets from write_tls to address the issues highlighted at the start of my comment, and so that gets lifted to the per-side impl. We need to be able to process packets from complete_io, so that gets lifted to the per-side impl. Now the Stream can't be generic over any ConnectionComm<S> because it too relies on process_new_packets and complete_io. To avoid making ClientStream, ClientOwnedStream, ServerStream, ServerOwnedStream-like-types I re-implemented Stream in terms of the Connection enum, which can then have general fns that match over itself and dispatch to the correct per-side version but it's all very awkward....

I'm going to try and think about this more but I thought if I shared the issues I've encountered someone might have some input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants