New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added protection for cases when client drives sockets itself after HTTP protocol upgrade #12761
base: master
Are you sure you want to change the base?
Conversation
…grading HTTP libcurl transfer to a different protocol and steps on socket errors. In such cases, it is possible that client performs read/write operations and calls libcurl socket filters after libcurl has cleared the transfer easy handle.
Ref: https://curl.se/mail/lib-2024-01/0038.html As I said in that thread I am not sure this is correct, I would expect data always be != NULL. /cc @icing |
Can you elaborate more on how you get this problem? Maybe an example reproducing it? To me it feels like this is just hiding the problem when maybe it could be fixed earlier... |
As I mentioned in the e-mail thread, this problem happens when some client code uses Curl HTTP transfer for "upgrading" to some bi-directional protocol like WebSocket. After the "upgrade", the client code extract SSL handle and drives read/write operations itself performing protocol operations. It works OK unless there is some socket/network error, and it looks like that Curl easy handle that was used to start the protocol transfer gets cleared by the error (maybe socket was closed or something like that), and the subsequent client SSL read operations crash in the libcurl socket filter function with a null data upon socket read failure. It works OK with 7.84, but back then there was not socket filter and read/write operations just failed on socket/network errors indicating that the WebSocket is done. The main factor here seems to the fact that the "data" gets cleared by some socket error, but subsequent client read/write socket operations still use libcurl socket filter and thus step on the crash due to null data. |
@bagder As I can see these changes didn't get into 8.6.0, I am wondering if it is possible to get them into the next release? |
I still want more information on how to reproduce this, and possibly @icing's comment on this. It does not feel right to just add this check. To me, it seems to indicate a previous problem that we should deal with rather than to paint it over with this extra condition. I think we want:
|
As I mentioned, this problem happens when some client drives sockets itself after "upgrading" to some protocol. Maybe, you are right, and the problem is bigger, but it happens because the 'data' gets null as a result of socket error before the socket filter is unregistered - so when client does its read/write it steps on it. My change is specifically for handling socket error parts in read/write functions, and I guess the other places where 'data' can be null should be checked/handled differently. I guess the main question to @icing (as I understand he is the author of the socket filter code) - if it was thought about cases when client performing socket operations itself may do read/writes after socket was closed because of error (that's when we get null 'data' in the filter)? Unfortunately, I don't know how to reliably reproduce these crashes, because they randomly happen on remote devices with limited debugging resources and inside 3rd party code, but they happen after the client calls SSL_read() some time after socket error has occurred. |
@dkarpov1970 If I understand you correctly, the application does a "connect only", getting the socket from curl and then reading/writing to that socket directly. When it is done, it calls My question is then: how do connection filters get involved after the application has taken over? If you see a crash with Thanks for your help in understanding this. |
@icing Yes, the application does this kind of thing - "connect only", then doing SSL read/writes to run its own WebSocket implementation and curl_easy_cleanup() at the end of WebSocket life. Everything works smoothly until some socket error occurs, and the next SSL read after it crashes. The call stack of the crash is the following: Program terminated with signal SIGSEGV, Segmentation fault. |
I think I found a not over complicated sequence that can reproduce the crash with HTTPs transfer done via curl_easy_perform(), socket closure and post transfer SSL read: Transfer setup:
Steps:
|
Re Step 3: you cannot use any |
@icing I think I should've clarified that the step 3 should be done after the transfer has been performed (by In this case, the post-transfer SSL_read() will still work because the SSL context and SSL handle are still alive. If I don't close sockets before calling SSL_read(), then But if I close the sockets, then the The real application which stepped on the crash, as far as I can tell, uses a multi-handle and drives multiple transfers, but the crash is caused by the same or a similar sequence - SSL_read() after socket error. |
@dkarpov1970 Thanks for the reminder. Question: would I still don't like the idea that apps do I/O on an SSL they obtained from a callback. Maybe @bagder has an idea how this is supposed to work. |
Is the |
Yes, the But the app developers told me that they can't use the They are now facing a crashing regression because everything worked fine for many years in the previous Curl releases. While I agree that this case is probably not quite aligned with Curl usage model, it is still possible that "Connect-only" clients extract SSL handles and pass them to some 3rd party libraries which work with SSL handles internally and can't be modified to use I think my change provides a very simple protection for such cases and helps to avoid crashes which may occur in the scenarios that I described. |
I already explained how it is not a complete fix.
I don't think this is a convincing argument. They extract internal data from the libcurl engine and run with that, but libcurl makes no promises that this works nor will we make any such promises going forward. libcurl owns that handle and sets up stuff for it in ways that has been tested and works assuming the API is used as documented. If someone goes beyond that, then we cannot guarantee behavior or functionality. |
This sort of use is not part of curl's API. Even if it worked in the past, it is not a use case we want libcurl to accommodate for. While your proposed patch solves the current issue, we are not able to guarantee this will continue to work in the future. The
While the change is simple, you ask us to "make this use case work" in future releases as well. And that is what we do not want to do. For example, your application might one day use a HTTP proxy setup and that will make drag in other parts of libcurl. Which then need also be fixed. And the |
I would argue that fixing the crash is not really opening a box of warms here. And the scenario that I described is not fully artificial, because 3rd party libraries driving high-level protocols after "connect only" phase may require low level read/write handles like SSL handle or sockets to make protocol functional, and they can't be modified to use I understand that this scenario is not supported and it may not work in future, but preventing the crash doesn't create a claim that from now on such scenarios are fully supported. It still will be responsibility of the client to take the risk of unsupported case, use this mechanism properly, and take care of the issues, but at least crashes when reporting read/write socket errors will be avoided. So, this change is just a basic null pointer protection and not a requirement to add a full support for such use case in the future. Also, libcurl provides a way to extract SSL context and sockets via |
Another point. While I agree that using Such libraries may be written without capabilities to pass These libraries may be designed to work with different HTTP clients, not just libcurl, and they take care of all nuances between their protocol client and backend. This is a valid use case, and the only way how libcurl clients may work with such libraries is by extracting sockets/SSL handle and passing them to the libraries. And this approach used to work just fine, and it still works OK even if it isn't officially supported. My simple change doesn't add anything new, it just allows this approach to work without crashes. |
I have to agree that this is not something we can support.
Your change is covering up something that should not happen. |
@jay I don't ask for a full support for such scenarios. My change just allows to avoid the crash and let it work, and it is a trivial null pointer check. And they may have no other choice if some 3rd party high-level protocol library needs low-level IO read/read handles.
This scenario has been working fine for many years, and it is possible that not only my client uses some 3rd party libraries requiring sockets/SSL handles. So, it may happen even if it shouldn't. And as I suggested in my previous comments, we can add a debug warning message about null data as an indication that it is not a supported scenario, but we at least we can help to avoid the crashes. |
I want to talk a bit more about using direct The scenario works OK because And if we look at the That's why direct So, if we provide a trivial protection for a null easy data in the And the |
I don't think anyone denies that there might be such code bases. I think what we might be proposing is that the appropriate fix here should rather be to make such users use the correct API instead of letting them keep walking blind-folded on the edge of a cliff.
The missing words from that sentence are: for the moment. It is quite possible that these functions will be modified going forward to do other things involving the easy handles.
It has probably worked for decades but I don't think that is a strong argument. It is undocumented behavior. Covering up for cases in code that can't happen is generally a bad idea. It makes the code harder to read and it generally also tricks static code analyzers (and humans) etc to believe that a NULL pointer is okay in that variable at that time. It also risks being a cat and mouse game of further similar changes - if we do this change for a case that can't happen, why should we turn down additional checks for other situations that only can trigger if the API is "abused" (in other ways)? I would argue that we should assert on NULLs instead, because they should not be NULL and if they are that's a sign something is terribly wrong. |
@bagder But what would be the appropriate fix here? The problem is that some client app needs to use some 3rd party library which implements high-level protocol on top of HTTP upgrade which internally uses low-level IO read/write primitives like sockets or SSL handle. And if it doesn't provide a way to "inject" I don't see any viable options except extracting sockets/SSL handle in libcurl callbacks and passing them to these libraries even though it is not a supported scenario. And argument that it has been working for decades, and still works is a strong one - it indicates that there is nothing fundamentally wrong with that use case, and there may be a significant number of client apps that had to use it by the reasons I mentioned above. Don't get me wrong, I fully understand the hesitation of doing any changes that will support the unsupported scenario that just happens to work somehow.
I am not sure that my "not normal" case is the only case when the crash that my change is trying to prevent may happen. I suspect that there may be some other "normal" cases when the "data" can be null, besides my "not normal" scenario, and that's why the So, my change may potentially prevent crashes in some "normal" cases. |
Added crash protection for cases when client drives sockets itself after upgrading libcurl HTTP transfer to a different protocol (i.e. WebSockets) and steps on socket errors.
In such cases, it is possible that client performs read/write operations and calls libcurl socket filters after libcurl has cleared the transfer easy handle.