Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use copy_file_range(2) when available #7908

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

RomaniukVadim
Copy link

Implemented copy_file_range(2) linux system cal when available. Github Issue

Copy link
Contributor

github-actions bot commented Nov 27, 2023

CT Test Results

No tests were run for this PR. This is either because the build failed, or the PR is based on a branch without GH actions tests configured.

Results for commit 91b4350

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@CLAassistant
Copy link

CLAassistant commented Nov 27, 2023

CLA assistant check
All committers have signed the CLA.

@RomaniukVadim RomaniukVadim force-pushed the file_copy_file_range branch 3 times, most recently from 6021d44 to d05f253 Compare November 27, 2023 18:45
@RomaniukVadim RomaniukVadim marked this pull request as ready for review November 27, 2023 19:35
@garazdawi
Copy link
Contributor

Hello! Thanks for the PR! I haven't done any review of this yet, but I noticed that you have placed the nif in kernel. All file related nifs are placed in erts/emulator/nifs + erts/preloaded/src/prim_file and so should this be.

@RomaniukVadim
Copy link
Author

RomaniukVadim commented Nov 27, 2023

@garazdawi Hello! If this NIF and its related file should be placed in erts, I will put them there correspondingly and update my commit

@RomaniukVadim RomaniukVadim force-pushed the file_copy_file_range branch 2 times, most recently from 7a22b67 to f65b6b2 Compare November 28, 2023 17:29
@RomaniukVadim
Copy link
Author

RomaniukVadim commented Nov 28, 2023

@garazdawi I moved nif and file to erts folder. For now I see what functions already exist for file open/read/write/error handling

@RomaniukVadim RomaniukVadim force-pushed the file_copy_file_range branch 3 times, most recently from 4771d9d to 1ddca56 Compare November 28, 2023 22:38
@rickard-green rickard-green added the team:VM Assigned to OTP team VM label Nov 29, 2023
@RomaniukVadim RomaniukVadim force-pushed the file_copy_file_range branch 3 times, most recently from 2c63f44 to 91b4350 Compare November 29, 2023 10:11
Copy link
Contributor

@jhogberg jhogberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

We appreciate your efforts, however the issue you've tried to fix is very ambitious for a first contribution as the file subsystem has many unexpected wrinkles, and guiding you through them will take considerable time.

That's not to say that we reject this, just that you need to be prepared that this may take a long time and we might not always be available to respond in a timely manner as we've got a lot to finish before the first release candidate for OTP 27. I hope that's alright, I've left a few comments to get started :-)

@@ -102,6 +102,9 @@ static ERL_NIF_TERM read_file_nif(ErlNifEnv *env, int argc, const ERL_NIF_TERM a
static ERL_NIF_TERM open_nif(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[]);
static ERL_NIF_TERM close_nif(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[]);

static ERL_NIF_TERM copy_file_range_available_nif(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want runtime tests for whether APIs exist, please move this over to using a configure test. You can put it near the test for writev.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking availability in configure.ac now. Kind of a dumb question: How do you regenerate the configure file? Running autoconf configure.ac inside the erts folder gives me an error: /usr/bin/m4:configure.ac:28: cannot open 'otp.m4': No such file or directory.

}
}

static ERL_NIF_TERM copy_file_range_nif(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like exposing copy_file_range(2) directly like this, I'd prefer it to be a plain "copy file" operation, ideally of a whole file because the alternative is so rare and it would let us use CopyFileExW on Windows.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to copy_file_nif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment, RabbitMQ (for example) could greatly benefit from a copy_file_range API that didn't copy the entire file. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's add both, keeping the specifics (fallback etc) in the platform-specific parts. :-)

Copy link
Author

@RomaniukVadim RomaniukVadim Dec 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhogberg Should we treat the third argument (length) as optional in the existing function, or create a separate function called for example copy_part_file/3? In Windows, we utilize CopyFileW(3), but for partial copying, It will require a different approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a separate function, and think copy_range would be a good name since file is a bit redundant given the module name (file:copy_part_file ...).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhogberg I have implemented a new function called copy_range/3. Additionally, I've added efile_copy_range_int/3, which contains code utilized by both efile_copy_file/2 and efile_copy_range/3. I haven't found any similar internal shared functions in the primary file. Therefore, if this approach is incorrect, I can redo it without using the internal function. Implemented copy_range/3 for Windows API, but not sure if implementation is correct, tried to do in same style al other functions in win_prim_file.c. Also will recheck for error handling, have a feeling that I forget to check something.

@@ -907,7 +907,10 @@ copy_int(Source, {_DestName, DestOpts} = Dest, Length)
%% Both must be bare filenames. If they are not,
%% the filename check in the copy operation will yell.
copy_int(Source, Dest, Length) ->
copy_int({Source, []}, {Dest, []}, Length).
case prim_file:copy_file_range_available() of
true -> prim_file:copy_file_range(Source, Dest, Length);
Copy link
Contributor

@jhogberg jhogberg Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks when copying files that have already been opened (which do not necessarily have to be opened through this module), and does not honor the file server at all (which may live on another node).

When passed two file names, this should delegate the copy operation to the file server (which should in turn delegate it to another process so as to not block all other file server operations). Otherwise, I think it should just use the current fallback because not having to care about weird edge cases in the NIF like copying a named file to an io_device() simplifies things a lot.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I removed this code from file.erl in order to understand how file server works, and should work in this case

erts/emulator/nifs/common/prim_file_nif.c Outdated Show resolved Hide resolved
efile_close(d_in, &ignored);
efile_close(d_out, &ignored);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing without moving the file into CLOSED state will crash the debug emulator.

Suggested change
efile_close(d_in, &ignored);
efile_close(d_out, &ignored);
erts_atomic32_set_acqb(&d_in->state, EFILE_STATE_CLOSED);
efile_close(d_in, &ignored);
erts_atomic32_set_acqb(&d_out->state, EFILE_STATE_CLOSED);
efile_close(d_out, &ignored);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked all file close cases and moved all of them to the 'CLOSED' state before the actual closure.

efile_unix_t *u_in = (efile_unix_t*)d_in;
efile_unix_t *u_out = (efile_unix_t*)d_out;

if (!u_in || !u_out || u_in->fd < 0 || u_out->fd < 0) {
Copy link
Contributor

@jhogberg jhogberg Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant since we're entirely in control of the arguments, these things are better expressed with ASSERT(...).

erts/emulator/nifs/common/prim_file_nif.c Outdated Show resolved Hide resolved
erts/emulator/nifs/common/prim_file_nif.c Outdated Show resolved Hide resolved
lib/kernel/src/kernel.app.src Outdated Show resolved Hide resolved
@RomaniukVadim
Copy link
Author

@jhogberg Thanks for reviewing my code. I'm aware this isn't as straightforward as I initially thought, but I'm committed to fixing all the issues and learning how this should be done in OTP. I hope to get my PR merged someday when the code is as close to done as possible. It would be great if someone from the contributors could check my comments from time to time as well.

@RomaniukVadim RomaniukVadim force-pushed the file_copy_file_range branch 2 times, most recently from f959551 to 99296b8 Compare November 29, 2023 18:04
len = (off64_t)info.size;

if (len < 0) {
return enif_make_badarg(env);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be EINVAL, and the source should be closed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code no more, moved to unix_prim_file.c with closing in case this fd_out and returning errno.

Comment on lines 1437 to 1424
/* We need to open Source file in order to get file size*/
if ((posix_errno = efile_open(&file_path_in, EFILE_MODE_READ, efile_resource_type, &d_in))) {
return posix_error_to_tuple(env, posix_errno);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if this was entirely moved to the platform-specific implementations, e.g. efile_copy_file taking paths instead of file handles as arguments. You would then be able to use plain open(2) instead of the more clunky efile_open, or CopyFileW on Windows.

As a fallback for when copy_file_range(2) is unavailable you could have a simple read(2)+write(2) loop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved all platform-specific implementations to unix_prim_file.c and added fallback function with simple read/write loop.

@jhogberg jhogberg self-assigned this Dec 4, 2023
Comment on lines 1421 to 1431
#ifdef HAVE_COPY_FILE_RANGE
/* Copy file contents using unix SYS_copy_file_range syscall */
if ((posix_errno = efile_copy_file_range(&file_path_in, &file_path_out, &ret))) {
return posix_error_to_tuple(env, posix_errno);
}
#else
/* As a fallback for when copy_file_range(2) is unavailable wecould do a simple read(2)+write(2) loop. */
if ((posix_errno = efile_simple_copy_file(&file_path_in, &file_path_out, &ret))) {
return posix_error_to_tuple(env, posix_errno);
}
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be simpler if you moved this logic over to unix_prim_file / win_prim_file, where the latter just uses CopyFileW. Just have efile_copy_file as the "canonical" interface from here.

Adding Windows support should just be a matter of copying a routine like efile_make_dir and changing the call to CopyFileW, all the hairy stuff with paths has already been taken care of in efile_marshal_path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added support for Windows

Comment on lines 180 to 181
posix_errno_t efile_copy_file_range(const efile_path_t *file_path_in, const efile_path_t *file_path_out, off64_t *result);
posix_errno_t efile_simple_copy_file(const efile_path_t *file_path_in, const efile_path_t *file_path_out, off64_t *result);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
posix_errno_t efile_copy_file_range(const efile_path_t *file_path_in, const efile_path_t *file_path_out, off64_t *result);
posix_errno_t efile_simple_copy_file(const efile_path_t *file_path_in, const efile_path_t *file_path_out, off64_t *result);
posix_errno_t efile_copy_file(const efile_path_t *source, const efile_path_t *destination);

erts/emulator/nifs/unix/unix_prim_file.c Outdated Show resolved Hide resolved
Comment on lines 1165 to 1166
bytes_written = write(fd_out, buffer, bytes_read);
if (bytes_written != bytes_read) {
Copy link
Contributor

@jhogberg jhogberg Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that you get to write less than you read (for example if you get interrupted after having read a few bytes, the write will successfully return the number of bytes written instead of EINTR), so this is not an error per se. Loop until all bytes have been written (and, of course, handle EINTR on its own).

return errno;
}

while ((bytes_read = read(fd_in, buffer, sizeof(buffer))) > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about EINTR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exist in efile_copy_range_int inside while look

}

do {
fd_out = open((const char*)file_path_out->data, O_CREAT | O_WRONLY | O_TRUNC, 0644);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fd_out = open((const char*)file_path_out->data, O_CREAT | O_WRONLY | O_TRUNC, 0644);
fd_out = open((const char*)file_path_out->data, O_CREAT | O_WRONLY | O_TRUNC, FILE_MODE);

erts/preloaded/src/prim_file.erl Outdated Show resolved Hide resolved
lib/kernel/src/Makefile Outdated Show resolved Hide resolved
return posix_error_to_tuple(env, posix_errno);
}
#endif
return enif_make_tuple2(env, enif_make_atom(env, "ok"), enif_make_uint(env, (unsigned int)ret));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size is not interesting here, so let's just return ok on success.

Suggested change
return enif_make_tuple2(env, enif_make_atom(env, "ok"), enif_make_uint(env, (unsigned int)ret));
return am_ok;

Also, you truncated ret here which is not what you want. Sizes >4GB would have been reported as modulo 4GB.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning as 'am_ok' now. However, both file:copy/2 and file:copy/3 return {ok, BytesCopied}. So, why are we only returning 'ok'? Considering the user uses copy_range/3, we can return the same length that they passed into the function. But what should we do in the case of copy_file/2 when only Source and Destination passed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I forgot that file:copy/2 is already documented as {ok, BytesCopied} -- it wasn't introduced by this PR.

But what should we do in the case of copy_file/2 when only Source and Destination passed?

Hm, I think file:copy/2 should either do a range copy (when available) or use the fallback. The current interface of file:copy/2 means that we cannot use CopyFileW or equivalent, so a separate function may be warranted.

I plan on refactoring the current file interface into a new module that has all our lessons learned baked into a cleaner interface (no idea when though), so we can either live with this until then or forget about CopyFileW for now, I'll leave that up to you. :-)

Copy link
Author

@RomaniukVadim RomaniukVadim Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for not using CopyFileW in the case of file:copy/2? When using Filename as arguments, we can employ it. However, when using pid() or fd(), we can't utilize it. Nevertheless, we can fallback to the already implemented copy in Erlang, right? Moreover, for Unix systems, copy_file/2 and copy_range/3 currently handle only Filenames. There's a need to extend support to handle fd() and pid() as well, aligning with the functionality of file/copy/{2,3}. While the focus isn't primarily on CopyFileW, the objective is to enhance the performance of file copying on Unix systems. If a solution can be devised to handle pid(), fd(), and Filename as parameters for copy_file/2 and copy_range/3, then surely a solution will be explored for Windows as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for not using CopyFileW in the case of file:copy/2?

Not getting an accurate count of bytes copied. We can only check before and after, and it might have changed in the meantime.

However, when using pid() or fd(), we can't utilize it. Nevertheless, we can fallback to the already implemented copy in Erlang, right? Moreover, for Unix systems, copy_file/2 and copy_range/3 currently handle only Filenames. There's a need to extend support to handle fd() and pid() as well, aligning with the functionality of file/copy/{2,3}.

fd() and pid() involve communicating with other processes that might not be wrapping a file in the first place. It's better to just leave all that to the Erlang fallback.

Comment on lines 637 to 645
Name1 = filename:join(RootDir, atom_to_list(?MODULE)++"_copy_1.txt"),
Line = "The quick brown fox jumps over a lazy dog. 0123456789\n",
Len = length(Line),
{ok, Handle1} = file:open(Name1, [write]),
{_, Size1} =
iterate({0, 0},
done,
fun({_, S}) when S >= 128*1024 ->
done;
({N, S}) ->
H = integer_to_list(N),
ok = file:write(Handle1, [H, " ", Line]),
{N + 1, S + length(H) + 1 + Len}
end),
file:close(Handle1),
%% Make a copy
Name2 = filename:join(RootDir, atom_to_list(?MODULE)++"_copy_2.txt"),
{ok, Size1} = ?PRIM_FILE:copy_file(Name1, Name2),
ok
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit complicated and doesn't prove that the copy worked, only that the size is correct. How about something like...?

Suggested change
Name1 = filename:join(RootDir, atom_to_list(?MODULE)++"_copy_1.txt"),
Line = "The quick brown fox jumps over a lazy dog. 0123456789\n",
Len = length(Line),
{ok, Handle1} = file:open(Name1, [write]),
{_, Size1} =
iterate({0, 0},
done,
fun({_, S}) when S >= 128*1024 ->
done;
({N, S}) ->
H = integer_to_list(N),
ok = file:write(Handle1, [H, " ", Line]),
{N + 1, S + length(H) + 1 + Len}
end),
file:close(Handle1),
%% Make a copy
Name2 = filename:join(RootDir, atom_to_list(?MODULE)++"_copy_2.txt"),
{ok, Size1} = ?PRIM_FILE:copy_file(Name1, Name2),
ok
Source = filename:join(RootDir, atom_to_list(?MODULE)++"_cf_source"),
Destination = filename:join(RootDir, atom_to_list(?MODULE)++"_cf_destination"),
Slug = <<"The quick brown fox jumps over a lazy dog. 0123456789\n">>,
Data = << <<N:32/native, Slug/bits>> || N <- lists:seq(1, 1 bsl 16) >>,
ok = file:write_file(Source, Data),
ok = ?PRIM_FILE:copy_file(Source, Destination),
{ok, Data} = file:read_file(Destination),
ok

(Untested, but hopefully you get the gist 🙂)

@RomaniukVadim RomaniukVadim force-pushed the file_copy_file_range branch 3 times, most recently from ae125a9 to 5fe9c77 Compare December 10, 2023 16:05
Comment on lines 1598 to 1601
/* If this parameter is TRUE and the new file specified by lpNewFileName already exists, the function fails.
* If this parameter is FALSE and the new file already exists, the function overwrites the existing file and succeeds.
* For safety reason, we using TRUE as default value.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file:copy/2 behaves as if passing FALSE. Cut this comment and pass FALSE instead.

off64_t bytes_copied = 0;
ssize_t bytes_read, bytes_written;

while (bytes_copied < length && (bytes_read = read(fd_in, buffer, sizeof(buffer))) > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EINTR on read? I think this is also better expressed as do { bytes_read = read( ... ) ... snip ... } while (bytes_copied < length);

return errno;
}
}
} while (total_bytes_written < bytes_read && bytes_copied < length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you ask to read MIN(length, sizeof(buffer) bytes, you can skip bytes_copied < length here.

Copy link
Author

@RomaniukVadim RomaniukVadim Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I hadn't thought about that. I will make sure to do so.

Comment on lines 1118 to 1122
if (bytes_written < 0) {
return errno;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've already checked this above, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, removed

if (fstat(fd_in, &file_stat) < 0) {
close(fd_in);
close(fd_out);
return errno;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close may set or clear errno, giving the user the wrong reason. Consider this pattern instead:

#ifdef HAVE_FSTAT
    /* Don't even bother opening file ... */
    return ENOTSUP;
#else
    if (fd_in == -1) {
        result = errno;
    } else {
        /* open fd_out etc */
        if (fd_out == -1) {
            result = errno;
        } else {
            if (fstat(fd_in, &file_stat) < 0) {
                result = errno;
            } else {
                result = efile_copy_range_int(fd_in, fd_out, length);
            }
            
            close(fd_out);
        }

        close(fd_in);
    }

    return result;
#endif

(Either that or have a error: label at the end which the failures can goto, but this is sufficiently shallow that if/else may be clearer)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If close will clear errno then it will be better approach, however ifdef and else statements must be swapped (in case of having fstat i should be bothered opening file). If I get it right, then it be like this (if/else for sure clearer)

#ifdef HAVE_FSTAT
   do {
       fd_in = open((const char*)file_path_in->data, O_RDONLY);
   } while (fd_in == -1 && errno == EINTR);

    if (fd_in == -1) {
           result = errno;
       } else {
           do {
               fd_out = open((const char*)file_path_out->data, O_CREAT | O_WRONLY | O_TRUNC, FILE_MODE);
           } while (fd_out == -1 && errno == EINTR);

           if (fd_out == -1) {
               result = errno;
           } else {
               if (fstat(fd_in, &file_stat) < 0) {
                   result = errno;
               } else {
                   length = file_stat.st_size;
                   result = efile_copy_range_int(fd_in, fd_out, length);
               }
               
               close(fd_out);
           }

           close(fd_in);
       }

   return result;
#else
   return ENOTSUP;
#endif 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants