-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add streaming multipart/form-data encoder #3361
base: main
Are you sure you want to change the base?
[WIP] Add streaming multipart/form-data encoder #3361
Conversation
I initially implemented this for Requests in the requests-toolbelt but this was already pretty generic because Requests just passes file-like objects (which is how the streaming encoder behaves) directly to urllib3. All that needed to change was what we were relying on from the requests namespace and imports and such. This also adds the decoder in ther same breath because it's easier to ensure that's all working together properly in one and it all fits together nicely. One thing we _could_ do is consolidate a bunch of the logic too and make `encode_multipart_formdata` rely on the streaming encoder and call `getall` instead so that we don't have 2 implementations of the same logic.
It's incorrect in the context of the streaming encoder changes.
…a-encoder-2024 # Conflicts: # docs/conf.py # src/urllib3/connection.py # src/urllib3/fields.py # src/urllib3/filepost.py # test/with_dummyserver/test_connectionpool.py
Previously this test was failing with a warning that was being raised as an error: _____________________________________________ TestMultipartEncoder.test_regression_1 ______________________________________________ Traceback (most recent call last): File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/pluggy/_callers.py", line 155, in _multicall teardown[0].send(outcome) File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/_pytest/unraisableexception.py", line 88, in pytest_runtest_call yield from unraisable_exception_runtest_hook() File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/_pytest/unraisableexception.py", line 78, in unraisable_exception_runtest_hook warnings.warn(pytest.PytestUnraisableExceptionWarning(msg)) pytest.PytestUnraisableExceptionWarning: Exception ignored in: <_io.FileIO [closed]> Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/unittest/case.py", line 589, in _callTestMethod if method() is not None: ^^^^^^^^ ResourceWarning: unclosed file <_io.BufferedReader name='/Users/alexwlchan/repos/urllib3/test/multipart/test_encoder.py'> During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/_pytest/runner.py", line 341, in from_call result: Optional[TResult] = func() ^^^^^^ File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/_pytest/runner.py", line 262, in <lambda> lambda: ihook(item=item, **kwds), when=when, reraise=reraise ^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/pluggy/_hooks.py", line 501, in __call__ return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/pluggy/_manager.py", line 119, in _hookexec return self._inner_hookexec(hook_name, methods, kwargs, firstresult) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/pluggy/_callers.py", line 159, in _multicall _warn_teardown_exception(hook_name, teardown[1], e) File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/pluggy/_callers.py", line 49, in _warn_teardown_exception warnings.warn(PluggyTeardownRaisedWarning(msg), stacklevel=5) pluggy.PluggyTeardownRaisedWarning: A plugin raised an exception during an old-style hookwrapper teardown. Plugin: unraisableexception, Hook: pytest_runtest_call PytestUnraisableExceptionWarning: Exception ignored in: <_io.FileIO [closed]> Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/unittest/case.py", line 589, in _callTestMethod if method() is not None: ^^^^^^^^ ResourceWarning: unclosed file <_io.BufferedReader name='/Users/alexwlchan/repos/urllib3/test/multipart/test_encoder.py'>
def test_from_response_needs_content_type(self) -> None: | ||
response = mock.NonCallableMagicMock(spec=urllib3.response.HTTPResponse) | ||
response.headers = {} | ||
response.data = b"" | ||
|
||
with pytest.raises( | ||
ValueError, match="Cannot determine content-type header from response" | ||
): | ||
MultipartDecoder.from_response(response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is new, to add coverage for this error.
def test_content_length(self) -> None: | ||
assert self.instance.content_length == str(len(self.instance)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is new, to add coverage for the content_length
property.
8b6cbfe
to
5dc22c2
Compare
def test_accepts_custom_content_type_as_bytes(self) -> None: | ||
"""Verify that the Encoder handles custom content-types which | ||
are bytes. | ||
|
||
See https://github.com/requests/toolbelt/issues/52 | ||
""" | ||
fields = [ | ||
( | ||
b"test".decode("utf-8"), | ||
( | ||
b"filename".decode("utf-8"), | ||
b"filecontent", | ||
b"application/json", | ||
), | ||
) | ||
] | ||
m = MultipartEncoder(fields=fields) | ||
output = m.read().decode("utf-8") | ||
assert output.index("Content-Type: application/json\r\n") > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is new, to add coverage for a couple of lines in the multipart encoder:
if isinstance(file_type, bytes):
file_type = file_type.decode("utf-8")
I'm getting a CI failure for Python 3.13 on Windows, but I don't think it's related to this patch?
I can see a similar failure in the latest build on main. |
elif isinstance(data, typing.BinaryIO): | ||
body.write(data.read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line isn't currently tested, and I need to think more about what the intention is here, because I haven't found a way to trigger this branch.
In particular neither an open file nor an io.BytesIO
return True here, so I wonder if a different check is what's meant (if this branch is even needed at all?):
import io
import typing
with open(__file__, 'rb') as f:
print(isinstance(f, typing.BinaryIO)) # False
bytes_io = io.BytesIO(b'Hello, World!')
print(isinstance(bytes_io, typing.BinaryIO)) # False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe bytearray? I can't remember honestly
print(resp.json()["form"]) | ||
# {"field": "value", "myfile": "..."} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another note to investigate: if I run this locally, I just get {"field": "value"}
as the output; filename.txt
is uploaded as a file rather than a form-encoded field. 🤔
❤️ Thank you @alexwlchan ! I don't have the time with the kiddo and hopefully you can claim the bounty (since I couldn't even if I got the PR over the line) |
(Having picked this up I've immediately stalled due to some stuff in my personal life, but I do want to continue this after this week is over! |
By the way, requests/toolbelt#316 is a reasonable addition to the enclosed but will complicate this a smidge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had a more thorough read of decoder.py
and left some comments, which I'll push a commit to address shortly.
for item in ct_info[1:]: | ||
attr, _, value = item.partition("=") | ||
if attr.lower() == "boundary": | ||
self.boundary = encode_with(value.strip('"'), self.encoding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what happens if you receive a value which isn't properly quoted – e.g. if value
starts with an open quote but there's no matching close quote.
from urllib3.multipart.encoder import MultipartEncoder, encode_with | ||
|
||
|
||
class TestBodyPart(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the test case in test_encoder.py
are the only tests in the project that subclass unittest.TestCase
.
I'm going to leave them as-is for now, but I wonder if they should be rewritten to use pytest fixtures or similar, to match the rest of the project.
I've made a couple more tweaks, and I think the decoder is now working. In particular I created a small Flask server that uses requests-toolbelt to send multipart-encoded responses, so I can test the code in my urllib3 fork: from flask import Flask, Response
from requests_toolbelt import MultipartEncoder
app = Flask(__name__)
@app.route('/downloads')
def downloads():
m = MultipartEncoder(
fields={'field0': 'value', 'field1': 'value',
'field2': ('filename', open('file.py', 'rb'), 'text/plain')}
)
s = m.to_string()
return Response(s, mimetype=m.content_type, headers={"content-type": m.content_type})
app.run(debug=True) Next up is diving into the encoder code and seeing what else needs to be done there. |
I've had a look through |
This is an attempt to get the streaming multipart/form-data encoder landed again (for #51, #624). This is something that was previously worked on in #2331 and has stalled; I'd like to have a go at getting it across the line.
Status: still fairly experimental. I wanted to open a PR to get it running in the proper CI environment, but I'm still wrapping my head around the code – a bunch of the comments are me talking to myself saying "hmm, look at this further". But I think I'm gradually moving forward! 🥰
So far:
I've merged the latest main into sigmavirus24’s branch. This was mostly clean; the only issues were:
import typing; typing.X
rather thanfrom typing import X; X
, so I had to change all of those (but reading the comments on the original PR, I think that was the preference anyway?)_TYPE_BODY
has moved fromconnection.py
to_base_connection.py
Otherwise I haven't touched the original code.
Run
nox -rs lint
,nox -rs mypy
andnox -rs docs
to fix various style/linting issues. Again I don't think these are especially meaningful.I've added some new tests to increase the coverage. I think I'm up to 100% for the decoder, but not the encoder.
I read the review comments on the original PR. I couldn't see anything outstanding, but if I've missed something I'm happy to incorporate those changes into this new branch.
This code comes from requests-toolbelt. I had a quick look at the code in the requests-toolbelt repo to see if there are any changes between the original PR and now that we should incorporate into this urllib3 copy, but there aren't any.
Still to do:
Because the original code already had several rounds of review, I'll add some inline comments to highlight my new changes, to make it easier for reviewers.