Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tool/-T: perform non-chunked transfer for stdin if it is a regular file #12178

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

emanuele6
Copy link
Contributor

@emanuele6 emanuele6 commented Oct 21, 2023

curl will now also compute the content-length of the transfer if stdin is the file to upload and stdin is a regular file, using its file size.

Since, while being a regular file, stdin could not have its offset at the start of the file, curl will now also get the current offset into the upload file's file descriptor and use (filesize - offset) as content-length for transfer instead of just using the full filesize. This also fixes a bug on BSDs where open("/dev/fd/N") behaves like dup(N), so, if N is a file descriptor to a regular file, the file offset of the file descriptor returned by open() may not have been at the start of the file despite curl's previous assumption.

Since I don't know anything about VMS systems, I left the behaviour for VMS unchanged; on VMS, curl will still perform a chunked transfer if the upload file is stdin.

Fixes #12171
Fixes #12177


$ wc -c ~/.bashrc
782
$ sed 1d ~/.bashrc | wc -c
767
$ # src/curl -T - localhost:40400 < ~/.bashrc
$ nc -l 40400
PUT / HTTP/1.1
Host: localhost:40400
User-Agent: curl/8.4.1-DEV
Accept: */*
Content-Length: 782

#!/bin/bash --
#  _               _
# | |__   __ _ ___| |__  _ __ ___
# | '_ \ / _` / __| '_ \| '__/ __|
# | |_) | (_| \__ \ | | | | | (__
# |_.__/ \__,_|___/_| |_|_|  \___|
[...]
$ # cat ~/.bashrc | src/curl -T - localhost:40400
$ nc -l 40400
PUT / HTTP/1.1
Host: localhost:40400
User-Agent: curl/8.4.1-DEV
Accept: */*
Transfer-Encoding: chunked
Expect: 100-continue

30e
#!/bin/bash --
#  _               _
# | |__   __ _ ___| |__  _ __ ___
# | '_ \ / _` / __| '_ \| '__/ __|
# | |_) | (_| \__ \ | | | | | (__
# |_.__/ \__,_|___/_| |_|_|  \___|
[...]
$ # { read -r x && src/curl -T - localhost:40400 ;} < ~/.bashrc
$ nc -l 40400
PUT / HTTP/1.1
Host: localhost:40400
User-Agent: curl/8.4.1-DEV
Accept: */*
Content-Length: 767

#  _               _
# | |__   __ _ ___| |__  _ __ ___
# | '_ \ / _` / __| '_ \| '__/ __|
# | |_) | (_| \__ \ | | | | | (__
# |_.__/ \__,_|___/_| |_|_|  \___|
[...]

I didn't add a test to the test suite because I don't know how to write test cases that have an open file descriptor to a regular file with a non-zero offset.

@emanuele6
Copy link
Contributor Author

oops, I forgot to rebase

{
/* When the upload file is stdin, or when the upload file is
/dev/std{in,out,err} or /dev/fd/N on BSDs, the offset may not
be 0 */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give an example of this happening

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!
For stdin as upload file (-T -), it is the third example in the PR description: { read -r x && src/curl -T - localhost:40400 ;} < ~/.bashrc.
For an example of -T /dev/fd/N etc on BSDs, see #12177.

@bagder
Copy link
Member

bagder commented Oct 22, 2023

TESTFAIL: These test cases failed: 60 98 1068 1069 1073 

These tests are assumed to not be able to get the size from stdin, so probably we need to change how runtests.pl invokes those with data on stdin. Right now they are made to run < log/stdin , which thus with this PR is treated differently now.

@emanuele6
Copy link
Contributor Author

Hmm, yeah, the test runner is doing $cmdargs .= " <$stdinfile"; to run curl with a specified stdin, and $stdinfile is always a regular file, but those tests assume stdin will be a pipe basically.

src/tool_operate.c Outdated Show resolved Hide resolved
@emanuele6
Copy link
Contributor Author

emanuele6 commented Oct 24, 2023

I was reading this source file for another reason, and I found this block of code, which is doing pretty much what I am doing in this PR, but for -F@, using stdio.h.

curl/src/tool_formparse.c

Lines 104 to 123 in b6e6d4f

else { /* Standard input. */
int fd = fileno(stdin);
char *data = NULL;
curl_off_t size;
curl_off_t origin;
struct_stat sbuf;
set_binmode(stdin);
origin = ftell(stdin);
/* If stdin is a regular file, do not buffer data but read it
when needed. */
if(fd >= 0 && origin >= 0 && !fstat(fd, &sbuf) &&
#ifdef __VMS
sbuf.st_fab_rfm != FAB$C_VAR && sbuf.st_fab_rfm != FAB$C_VFC &&
#endif
S_ISREG(sbuf.st_mode)) {
size = sbuf.st_size - origin;
if(size < 0)
size = 0;
}

I think that that sbuf.st_fab_rfm != FAB$C_VAR && sbuf.st_fab_rfm != FAB$C_VFC && for VMS is supposed to tell you when it is ok to use statsize - offset for stdin on VMS, but I am not sure if I should add that to make curl not need to do a chunked transfer for stdin on VMS in some cases, since I have no idea what that actually does.

Can someone familiar with VMS please tell me if this looks correct?

diff --git a/src/tool_operate.c b/src/tool_operate.c
index 5728736c8..ef8072337 100644
--- a/src/tool_operate.c
+++ b/src/tool_operate.c
@@ -307,7 +307,7 @@ static CURLcode pre_transfer(struct GlobalConfig *global,
     }
 
 #ifdef __VMS
-    if(per->infd == -1)
+    if(per->infd == -1 || (isstdin && fstat(per->infd, &fileinfo)))
 #else
     if(per->infd == -1 || fstat(per->infd, &fileinfo))
 #endif
@@ -324,8 +324,11 @@ static CURLcode pre_transfer(struct GlobalConfig *global,
     /* we ignore file size for char/block devices, sockets, etc. */
     if(S_ISREG(fileinfo.st_mode))
 #ifdef __VMS
-      uploadfilesize = fileinfo.st_size;
-#else
+      if(!isstdin)
+        uploadfilesize = fileinfo.st_size;
+      else if(fileinfo.st_fab_rfm != FAB$C_VAR &&
+              fileinfo.st_fab_rfm != FAB$C_VFC)
+#endif
     {
       /* When the upload file is stdin, or when the upload file is
          /dev/std{in,out,err} or /dev/fd/N on BSDs, the offset may not
@@ -338,7 +341,6 @@ static CURLcode pre_transfer(struct GlobalConfig *global,
               per->infd, per->uploadfile);
       }
     }
-#endif
 
 #ifdef DEBUGBUILD
     /* allow dedicated test cases to override */

cc @monnerat Since you wrote that snippet of code, =)

@@ -47,7 +47,7 @@ which is impossible in HTTP/1.0
# Verify data after the test has been "shot"
<verify>
<errorcode>
25
65
Copy link
Contributor Author

@emanuele6 emanuele6 Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to change the error code to 65 because I think now the problem is that curl tries to rewind the file descriptor to resend it after the redirect, but it can't because stdin is not seekable if it is a pipe

@emanuele6
Copy link
Contributor Author

@bagder Now you can specify <stdin pipe="yes"> if you want stdin to be a pipe in a test.

@@ -53,7 +53,7 @@ which is impossible in HTTP/1.0
# Verify data after the test has been "shot"
<verify>
<errorcode>
25
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@emanuele6
Copy link
Contributor Author

If I want to add a test that checks that -T - (with regular file stdin) does not perform a chunky upload, what number should it be?

@bagder
Copy link
Member

bagder commented Oct 24, 2023

If I want to add a test that checks that -T - (with regular file stdin) does not perform a chunky upload, what number should it be?

You can pick any available number. I usually pick one that is the next available counting upward from the one I clone when I made a new one... But you can do whatever. Don't go crazy and pick one way outside of the current range though. 😁

If there still ends up a number collision at merge time because another PR has had the bad idea and picked the same number, renumbering a single test case should be just a matter of renaming it as we use variables for the test number within the files.

@emanuele6
Copy link
Contributor Author

emanuele6 commented Oct 24, 2023

I added documentation for <stdin pipe="yes">, and the test (268).

Do I need to somehow disable the test on VMS or is that not necessary because runtests.pl can only run on UNIX-like systems?

curl will now also compute the content-length of the transfer if stdin
is the file to upload and stdin is a regular file, using its file size.

Since, while being a regular file, stdin could not have its offset at
the start of the file, curl will now also get the current offset into
the upload file's file descriptor and use (filesize - offset) as
content-length for transfer instead of just using the full filesize.
This also fixes a bug on BSDs where open("/dev/fd/N") behaves like
dup(N), so, if N is a file descriptor to a regular file, the file offset
of the file descriptor returned by open() may not have been at the start
of the file despite curl's previous assumption.

Since I don't know anything about VMS systems, I left the behaviour for
VMS unchanged; on VMS, curl will still perform a chunked transfer if the
upload file is stdin.

I updated tests that were testing chunked transfers for stdin to use
<stdin pipe="yes"> since now curl only performs a chunked transfer by
default if the upload file (stdin or not) is not a regular file.
And I added a new test (268) that verifies that stdin is uploaded with
content-length (not transfer-encoding:chunked) if it is a regular file.

Fixes curl#12171
Fixes curl#12177
@bagder
Copy link
Member

bagder commented Oct 24, 2023

Do I need to somehow disable the test on VMS or is that not necessary because runtests.pl can only run on UNIX-like systems?

runtests should run on most systems, but I am not sure if it works/worked on VMS. We generally don't have any special stuff for VMS in the test suite, and I suspect that is because it is never executed on that OS...

Since we don't have any regulars around using VMS, we have to do as as good as we can and simply guess and then let people on VMS sweep up the pieces later and fix that we broke.

@monnerat
Copy link
Contributor

I think that that sbuf.st_fab_rfm != FAB$C_VAR && sbuf.st_fab_rfm != FAB$C_VFC && for VMS is supposed to tell you when it is ok to use statsize - offset for stdin on VMS, but I am not sure if I should add that to make curl not need to do a chunked transfer for stdin on VMS in some cases, since I have no idea what that actually does.

As you say, I wrote the MIME api version of tool_formparse.c, but I've never put my hands on a VMS keyboard: I just came to the same conclusion as you by reading function VmsSpecialSize() in file tool_operate.c as I needed to know the file size.

This should probably be confirmed/adjusted/refined/completed by some VMS guy, but there hasn't been any complaint about it in 6 years :-)

If it can help you, I just found this: http://odl.sysworks.biz/disk$axpdocmar021/opsys/vmsos73/vmsos73/4523/4523pro_006.html.

@emanuele6
Copy link
Contributor Author

@monnerat Hmm, ok. Thanks anyway.

@emanuele6
Copy link
Contributor Author

emanuele6 commented Oct 24, 2023

The windows CIs are failing because 1072 and 1073 are still returning 25 on windows; they return 65 on linux. :/

@bagder
Copy link
Member

bagder commented Nov 6, 2023

The windows CIs are failing because 1072 and 1073 are still returning 25 on windows; they return 65 on linux. :/

Is this because Windows does not treat the file the same thing when this is used? We need to manage this somehow...

@emanuele6
Copy link
Contributor Author

I think the problem may have something to do with --anyauth always printing a warning when the upload file is stdin; it should rather only warn if the file is a non-regular file in general, not just only stdin always. I will try to take a look later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants