-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use copy_file_range(2)
when available
#7908
base: master
Are you sure you want to change the base?
Conversation
CT Test ResultsNo tests were run for this PR. This is either because the build failed, or the PR is based on a branch without GH actions tests configured. Results for commit 91b4350 To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts
// Erlang/OTP Github Action Bot |
6021d44
to
d05f253
Compare
Hello! Thanks for the PR! I haven't done any review of this yet, but I noticed that you have placed the nif in kernel. All file related nifs are placed in |
@garazdawi Hello! If this NIF and its related file should be placed in |
7a22b67
to
f65b6b2
Compare
@garazdawi I moved nif and file to |
4771d9d
to
1ddca56
Compare
2c63f44
to
91b4350
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
We appreciate your efforts, however the issue you've tried to fix is very ambitious for a first contribution as the file subsystem has many unexpected wrinkles, and guiding you through them will take considerable time.
That's not to say that we reject this, just that you need to be prepared that this may take a long time and we might not always be available to respond in a timely manner as we've got a lot to finish before the first release candidate for OTP 27. I hope that's alright, I've left a few comments to get started :-)
@@ -102,6 +102,9 @@ static ERL_NIF_TERM read_file_nif(ErlNifEnv *env, int argc, const ERL_NIF_TERM a | |||
static ERL_NIF_TERM open_nif(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[]); | |||
static ERL_NIF_TERM close_nif(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[]); | |||
|
|||
static ERL_NIF_TERM copy_file_range_available_nif(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want runtime tests for whether APIs exist, please move this over to using a configure
test. You can put it near the test for writev
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking availability in configure.ac
now. Kind of a dumb question: How do you regenerate the configure file? Running autoconf configure.ac
inside the erts folder gives me an error: /usr/bin/m4:configure.ac:28: cannot open 'otp.m4': No such file or directory
.
} | ||
} | ||
|
||
static ERL_NIF_TERM copy_file_range_nif(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like exposing copy_file_range(2)
directly like this, I'd prefer it to be a plain "copy file" operation, ideally of a whole file because the alternative is so rare and it would let us use CopyFileExW
on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to copy_file_nif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment, RabbitMQ (for example) could greatly benefit from a copy_file_range API that didn't copy the entire file. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's add both, keeping the specifics (fallback etc) in the platform-specific parts. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhogberg Should we treat the third argument (length) as optional in the existing function, or create a separate function called for example copy_part_file/3
? In Windows, we utilize CopyFileW(3)
, but for partial copying, It will require a different approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer a separate function, and think copy_range
would be a good name since file
is a bit redundant given the module name (file:copy_part_file
...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhogberg I have implemented a new function called copy_range/3
. Additionally, I've added efile_copy_range_int/3
, which contains code utilized by both efile_copy_file/2
and efile_copy_range/3
. I haven't found any similar internal shared functions in the primary file. Therefore, if this approach is incorrect, I can redo it without using the internal function. Implemented copy_range/3
for Windows API, but not sure if implementation is correct, tried to do in same style al other functions in win_prim_file.c
. Also will recheck for error handling, have a feeling that I forget to check something.
lib/kernel/src/file.erl
Outdated
@@ -907,7 +907,10 @@ copy_int(Source, {_DestName, DestOpts} = Dest, Length) | |||
%% Both must be bare filenames. If they are not, | |||
%% the filename check in the copy operation will yell. | |||
copy_int(Source, Dest, Length) -> | |||
copy_int({Source, []}, {Dest, []}, Length). | |||
case prim_file:copy_file_range_available() of | |||
true -> prim_file:copy_file_range(Source, Dest, Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks when copying files that have already been opened (which do not necessarily have to be opened through this module), and does not honor the file server at all (which may live on another node).
When passed two file names, this should delegate the copy operation to the file server (which should in turn delegate it to another process so as to not block all other file server operations). Otherwise, I think it should just use the current fallback because not having to care about weird edge cases in the NIF like copying a named file to an io_device()
simplifies things a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I removed this code from file.erl
in order to understand how file server works, and should work in this case
efile_close(d_in, &ignored); | ||
efile_close(d_out, &ignored); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing without moving the file into CLOSED
state will crash the debug emulator.
efile_close(d_in, &ignored); | |
efile_close(d_out, &ignored); | |
erts_atomic32_set_acqb(&d_in->state, EFILE_STATE_CLOSED); | |
efile_close(d_in, &ignored); | |
erts_atomic32_set_acqb(&d_out->state, EFILE_STATE_CLOSED); | |
efile_close(d_out, &ignored); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked all file close cases and moved all of them to the 'CLOSED' state before the actual closure.
efile_unix_t *u_in = (efile_unix_t*)d_in; | ||
efile_unix_t *u_out = (efile_unix_t*)d_out; | ||
|
||
if (!u_in || !u_out || u_in->fd < 0 || u_out->fd < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant since we're entirely in control of the arguments, these things are better expressed with ASSERT(...)
.
@jhogberg Thanks for reviewing my code. I'm aware this isn't as straightforward as I initially thought, but I'm committed to fixing all the issues and learning how this should be done in OTP. I hope to get my PR merged someday when the code is as close to done as possible. It would be great if someone from the contributors could check my comments from time to time as well. |
f959551
to
99296b8
Compare
len = (off64_t)info.size; | ||
|
||
if (len < 0) { | ||
return enif_make_badarg(env); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be EINVAL
, and the source should be closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code no more, moved to unix_prim_file.c
with closing in case this fd_out and returning errno
.
/* We need to open Source file in order to get file size*/ | ||
if ((posix_errno = efile_open(&file_path_in, EFILE_MODE_READ, efile_resource_type, &d_in))) { | ||
return posix_error_to_tuple(env, posix_errno); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better if this was entirely moved to the platform-specific implementations, e.g. efile_copy_file
taking paths instead of file handles as arguments. You would then be able to use plain open(2)
instead of the more clunky efile_open
, or CopyFileW
on Windows.
As a fallback for when copy_file_range(2)
is unavailable you could have a simple read(2)+write(2)
loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved all platform-specific implementations to unix_prim_file.c
and added fallback function with simple read/write loop.
99296b8
to
31cec63
Compare
#ifdef HAVE_COPY_FILE_RANGE | ||
/* Copy file contents using unix SYS_copy_file_range syscall */ | ||
if ((posix_errno = efile_copy_file_range(&file_path_in, &file_path_out, &ret))) { | ||
return posix_error_to_tuple(env, posix_errno); | ||
} | ||
#else | ||
/* As a fallback for when copy_file_range(2) is unavailable wecould do a simple read(2)+write(2) loop. */ | ||
if ((posix_errno = efile_simple_copy_file(&file_path_in, &file_path_out, &ret))) { | ||
return posix_error_to_tuple(env, posix_errno); | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be simpler if you moved this logic over to unix_prim_file
/ win_prim_file
, where the latter just uses CopyFileW
. Just have efile_copy_file
as the "canonical" interface from here.
Adding Windows support should just be a matter of copying a routine like efile_make_dir
and changing the call to CopyFileW
, all the hairy stuff with paths has already been taken care of in efile_marshal_path
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added support for Windows
posix_errno_t efile_copy_file_range(const efile_path_t *file_path_in, const efile_path_t *file_path_out, off64_t *result); | ||
posix_errno_t efile_simple_copy_file(const efile_path_t *file_path_in, const efile_path_t *file_path_out, off64_t *result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
posix_errno_t efile_copy_file_range(const efile_path_t *file_path_in, const efile_path_t *file_path_out, off64_t *result); | |
posix_errno_t efile_simple_copy_file(const efile_path_t *file_path_in, const efile_path_t *file_path_out, off64_t *result); | |
posix_errno_t efile_copy_file(const efile_path_t *source, const efile_path_t *destination); |
bytes_written = write(fd_out, buffer, bytes_read); | ||
if (bytes_written != bytes_read) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that you get to write less than you read (for example if you get interrupted after having read a few bytes, the write will successfully return the number of bytes written instead of EINTR
), so this is not an error per se. Loop until all bytes have been written (and, of course, handle EINTR
on its own).
return errno; | ||
} | ||
|
||
while ((bytes_read = read(fd_in, buffer, sizeof(buffer))) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about EINTR
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exist in efile_copy_range_int
inside while look
} | ||
|
||
do { | ||
fd_out = open((const char*)file_path_out->data, O_CREAT | O_WRONLY | O_TRUNC, 0644); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fd_out = open((const char*)file_path_out->data, O_CREAT | O_WRONLY | O_TRUNC, 0644); | |
fd_out = open((const char*)file_path_out->data, O_CREAT | O_WRONLY | O_TRUNC, FILE_MODE); |
return posix_error_to_tuple(env, posix_errno); | ||
} | ||
#endif | ||
return enif_make_tuple2(env, enif_make_atom(env, "ok"), enif_make_uint(env, (unsigned int)ret)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size is not interesting here, so let's just return ok
on success.
return enif_make_tuple2(env, enif_make_atom(env, "ok"), enif_make_uint(env, (unsigned int)ret)); | |
return am_ok; |
Also, you truncated ret
here which is not what you want. Sizes >4GB would have been reported as modulo 4GB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning as 'am_ok' now. However, both file:copy/2
and file:copy/3
return {ok, BytesCopied}. So, why are we only returning 'ok'? Considering the user uses copy_range/3
, we can return the same length that they passed into the function. But what should we do in the case of copy_file/2
when only Source and Destination passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I forgot that file:copy/2
is already documented as {ok, BytesCopied}
-- it wasn't introduced by this PR.
But what should we do in the case of copy_file/2 when only Source and Destination passed?
Hm, I think file:copy/2
should either do a range copy (when available) or use the fallback. The current interface of file:copy/2
means that we cannot use CopyFileW
or equivalent, so a separate function may be warranted.
I plan on refactoring the current file
interface into a new module that has all our lessons learned baked into a cleaner interface (no idea when though), so we can either live with this until then or forget about CopyFileW
for now, I'll leave that up to you. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for not using CopyFileW
in the case of file:copy/2
? When using Filename
as arguments, we can employ it. However, when using pid()
or fd()
, we can't utilize it. Nevertheless, we can fallback to the already implemented copy in Erlang, right? Moreover, for Unix systems, copy_file/2
and copy_range/3
currently handle only Filenames. There's a need to extend support to handle fd()
and pid()
as well, aligning with the functionality of file/copy/{2,3}
. While the focus isn't primarily on CopyFileW
, the objective is to enhance the performance of file copying on Unix systems. If a solution can be devised to handle pid()
, fd()
, and Filename
as parameters for copy_file/2
and copy_range/3
, then surely a solution will be explored for Windows as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for not using CopyFileW in the case of file:copy/2?
Not getting an accurate count of bytes copied. We can only check before and after, and it might have changed in the meantime.
However, when using pid() or fd(), we can't utilize it. Nevertheless, we can fallback to the already implemented copy in Erlang, right? Moreover, for Unix systems, copy_file/2 and copy_range/3 currently handle only Filenames. There's a need to extend support to handle fd() and pid() as well, aligning with the functionality of file/copy/{2,3}.
fd()
and pid()
involve communicating with other processes that might not be wrapping a file in the first place. It's better to just leave all that to the Erlang fallback.
lib/kernel/test/prim_file_SUITE.erl
Outdated
Name1 = filename:join(RootDir, atom_to_list(?MODULE)++"_copy_1.txt"), | ||
Line = "The quick brown fox jumps over a lazy dog. 0123456789\n", | ||
Len = length(Line), | ||
{ok, Handle1} = file:open(Name1, [write]), | ||
{_, Size1} = | ||
iterate({0, 0}, | ||
done, | ||
fun({_, S}) when S >= 128*1024 -> | ||
done; | ||
({N, S}) -> | ||
H = integer_to_list(N), | ||
ok = file:write(Handle1, [H, " ", Line]), | ||
{N + 1, S + length(H) + 1 + Len} | ||
end), | ||
file:close(Handle1), | ||
%% Make a copy | ||
Name2 = filename:join(RootDir, atom_to_list(?MODULE)++"_copy_2.txt"), | ||
{ok, Size1} = ?PRIM_FILE:copy_file(Name1, Name2), | ||
ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit complicated and doesn't prove that the copy worked, only that the size is correct. How about something like...?
Name1 = filename:join(RootDir, atom_to_list(?MODULE)++"_copy_1.txt"), | |
Line = "The quick brown fox jumps over a lazy dog. 0123456789\n", | |
Len = length(Line), | |
{ok, Handle1} = file:open(Name1, [write]), | |
{_, Size1} = | |
iterate({0, 0}, | |
done, | |
fun({_, S}) when S >= 128*1024 -> | |
done; | |
({N, S}) -> | |
H = integer_to_list(N), | |
ok = file:write(Handle1, [H, " ", Line]), | |
{N + 1, S + length(H) + 1 + Len} | |
end), | |
file:close(Handle1), | |
%% Make a copy | |
Name2 = filename:join(RootDir, atom_to_list(?MODULE)++"_copy_2.txt"), | |
{ok, Size1} = ?PRIM_FILE:copy_file(Name1, Name2), | |
ok | |
Source = filename:join(RootDir, atom_to_list(?MODULE)++"_cf_source"), | |
Destination = filename:join(RootDir, atom_to_list(?MODULE)++"_cf_destination"), | |
Slug = <<"The quick brown fox jumps over a lazy dog. 0123456789\n">>, | |
Data = << <<N:32/native, Slug/bits>> || N <- lists:seq(1, 1 bsl 16) >>, | |
ok = file:write_file(Source, Data), | |
ok = ?PRIM_FILE:copy_file(Source, Destination), | |
{ok, Data} = file:read_file(Destination), | |
ok |
(Untested, but hopefully you get the gist 🙂)
ae125a9
to
5fe9c77
Compare
/* If this parameter is TRUE and the new file specified by lpNewFileName already exists, the function fails. | ||
* If this parameter is FALSE and the new file already exists, the function overwrites the existing file and succeeds. | ||
* For safety reason, we using TRUE as default value. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file:copy/2
behaves as if passing FALSE
. Cut this comment and pass FALSE
instead.
5fe9c77
to
0d94df1
Compare
off64_t bytes_copied = 0; | ||
ssize_t bytes_read, bytes_written; | ||
|
||
while (bytes_copied < length && (bytes_read = read(fd_in, buffer, sizeof(buffer))) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EINTR
on read? I think this is also better expressed as do { bytes_read = read( ... ) ... snip ... } while (bytes_copied < length);
return errno; | ||
} | ||
} | ||
} while (total_bytes_written < bytes_read && bytes_copied < length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you ask to read MIN(length, sizeof(buffer)
bytes, you can skip bytes_copied < length
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I hadn't thought about that. I will make sure to do so.
if (bytes_written < 0) { | ||
return errno; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've already checked this above, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, removed
if (fstat(fd_in, &file_stat) < 0) { | ||
close(fd_in); | ||
close(fd_out); | ||
return errno; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close
may set or clear errno
, giving the user the wrong reason. Consider this pattern instead:
#ifdef HAVE_FSTAT
/* Don't even bother opening file ... */
return ENOTSUP;
#else
if (fd_in == -1) {
result = errno;
} else {
/* open fd_out etc */
if (fd_out == -1) {
result = errno;
} else {
if (fstat(fd_in, &file_stat) < 0) {
result = errno;
} else {
result = efile_copy_range_int(fd_in, fd_out, length);
}
close(fd_out);
}
close(fd_in);
}
return result;
#endif
(Either that or have a error:
label at the end which the failures can goto
, but this is sufficiently shallow that if/else
may be clearer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If close
will clear errno
then it will be better approach, however ifdef
and else
statements must be swapped (in case of having fstat i should be bothered opening file). If I get it right, then it be like this (if/else for sure clearer)
#ifdef HAVE_FSTAT
do {
fd_in = open((const char*)file_path_in->data, O_RDONLY);
} while (fd_in == -1 && errno == EINTR);
if (fd_in == -1) {
result = errno;
} else {
do {
fd_out = open((const char*)file_path_out->data, O_CREAT | O_WRONLY | O_TRUNC, FILE_MODE);
} while (fd_out == -1 && errno == EINTR);
if (fd_out == -1) {
result = errno;
} else {
if (fstat(fd_in, &file_stat) < 0) {
result = errno;
} else {
length = file_stat.st_size;
result = efile_copy_range_int(fd_in, fd_out, length);
}
close(fd_out);
}
close(fd_in);
}
return result;
#else
return ENOTSUP;
#endif
0d94df1
to
e90fd1e
Compare
Implemented
copy_file_range(2)
linux system cal when available. Github Issue