-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread safety #182
Comments
Do the tmp files still get cleaned up automatically on close?
Did that patch trigger a Travis CI test run? If not could you set it up so it does. I'd like to be sure all the tests are passing. |
I've put in some documentation, and a pull request to trigger Travis CI. The temporary files get cleaned when the Workbook object goes out of scope. If the workbook writing was triggered by that (e.g. the end of the programme) then nothing has changed. If the workbook writing was triggered by an explicit call to the close method, then destruction of the temporary file is deferred to when the Workbook object goes out of scope. |
Also, could you add a small program here that demonstrates the issues prior to the patch, that I can use for testing. |
This programme demonstrates that the module is not thread-safe. After a few seconds or minutes error messages will start streaming because one thread has done a chdir behind the back of another. #!/usr/bin/env perl
use strict;
use warnings;
use Cwd;
use File::Spec;
use threads;
use Thread::Queue;
my $completion_queue = Thread::Queue->new;
use Excel::Writer::XLSX;
-d 'test_output' or mkdir 'test_output' or die $!;
while (1) {
while ( threads->list(threads::running) > 5 ) { # maximum number of workers
$completion_queue->dequeue; # wait for one of them to report completion
$_->join foreach threads->list(threads::joinable);
}
threads->create( \&run_thread ); # start a new worker
}
sub make_workbook {
my ($name) = @_;
my $workbook =
Excel::Writer::XLSX->new(
File::Spec->catfile( 'test_output', $name . '.xlsx' ) );
if ($workbook) {
my $worksheet1 = $workbook->add_worksheet();
$workbook->close;
}
else {
warn "Failed to create workbook; cwd = " . getcwd() . "\n";
}
$workbook;
}
sub run_thread {
map { make_workbook( 'Thread ' . threads->tid . ' book ' . $_ ); } 1 .. 3;
$completion_queue->enqueue(1);
} |
This programme shows how Excel::Writer::XLSX can be run in a thread-safe way after my patches. Object destruction is still not thread-safe (changing File::Temp would have involved too much effort and risk), so there is a locking system using the shared variable $chdir_lock to ensure that object destruction is not run in parallel with any other task. #!/usr/bin/env perl
use strict;
use warnings;
use Cwd;
use File::Spec;
use threads;
use Thread::Queue;
my $completion_queue = Thread::Queue->new;
use threads::shared;
my $chdir_lock = 0;
share $chdir_lock;
use Excel::Writer::XLSX;
-d 'test_output' or mkdir 'test_output' or die $!;
my $running_count = 0;
while (1) {
while ( $running_count > 5 || threads->list(threads::running) > 30 ) {
my $completion_message = $completion_queue->dequeue;
--$running_count if $completion_message eq 'Workbooks complete';
$_->join foreach threads->list(threads::joinable);
}
threads->create( \&run_thread );
++$running_count;
}
sub make_workbook {
my ($name) = @_;
{
lock $chdir_lock;
++$chdir_lock;
}
my $workbook =
Excel::Writer::XLSX->new(
File::Spec->catfile( 'test_output', $name . '.xlsx' ) );
if ($workbook) {
my $worksheet1 = $workbook->add_worksheet();
$workbook->close;
}
else {
warn "Failed to create workbook; cwd = " . getcwd() . "\n";
}
{
lock $chdir_lock;
--$chdir_lock;
cond_signal $chdir_lock;
}
$workbook;
}
sub run_thread {
my @books =
grep { $_; }
map { make_workbook( 'Thread ' . threads->tid . ' book ' . $_ ); } 1 .. 3;
$completion_queue->enqueue('Workbooks complete');
{
lock $chdir_lock;
cond_wait $chdir_lock while $chdir_lock > 0;
$_->DESTROY foreach @books;
cond_signal $chdir_lock;
}
$completion_queue->enqueue('Cleanup complete');
} |
Thanks for the examples. I'll need some time to test this properly before merging but I'll get back to you. |
Mistakingly closed. Reopening. |
Hi. I'm facing this issue as well in one of the perl programs that I'm using. Could you please tell me if you were able to carry out any of the tests on this patch? |
Hi f20. Could you please explain in detail or point me to the documentation where I can clearly find out why the issue occurs in a threaded application? |
prasanthbj05, thank you for reminding me that this needed looking at. I have set up a new pull request #261 which has:
Multi-threaded workbook creation requires some additions to the calling code (see the example script) so this approach will presumably not work with Perl's fork emulation on Microsoft Windows. On Windows, use threads, not fork emulation. |
Hi f20, Thank you for setting up this pull request, providing sufficient information on the issue and also suggesting a fix for the same. I used your method of storing the temporary directory object as part of the workbook object and then deleting it later in the DESTROY method of the workbook object (I am testing it in a Perl threaded application which uses 'use threads' to create and handle the threads). I found that though the chdir() invocations have been stopped in the When the sub DESTROY {
my $self = shift;
local($., $@, $!, $^E, $?);
if ($self->unlink_on_destroy && $$ == $self->{LAUNCHPID} && !$File::Temp::KEEP_ALL)
{
if (-d $self->{DIRNAME})
{
# Some versions of rmtree will abort if you attempt to remove
# the directory you are sitting in. We protect that and turn it
# into a warning. We do this because this occurs during object
# destruction and so can not be caught by the user.
eval { rmtree($self->{DIRNAME}, $File::Temp::DEBUG, 0); };
warn $@ if ($@ && $^W);
}
}
} Now, the To get around this, I am planning to pass an additional cleanup argument while creating the temporary directory (in the _store_workbook() method) as below: This will prevent the |
Does my example script examples/multithreaded.pl work on your setup?
If it does, then the trick is the $cleanup_lock_counter shared variable in that example script, which ensures that workbook object destruction only occurs (in each thread) at a time where no other thread is using Excel::Writer::XLSX. This is what I mean by “the Workbook objects must only be destroyed or allowed to go out of scope within a critical code section that only one thread can be running at any one time” in the documentation (probably not the best wording though).
Franck.
This message's text or attachments might be confidential or privileged.
No implied warranty of fitness for any purpose, or of any authority.
… On 28 Oct 2020, at 13:58, prasanthbj05 ***@***.***> wrote:
prasanthbj05, thank you for reminding me that this needed looking at.
I have set up a new pull request #261 which has:
A formal test of multi-threaded workbook creation (which on my MacBook fails on the current release, but passes with my patches both on my machine and on various Travis CI threaded perl versions).
Better POD documentation of my proposed modification.
An example script.
Multi-threaded workbook creation requires some additions to the calling code (see the example script) so this approach will presumably not work with Perl's fork emulation on Microsoft Windows. On Windows, use threads, not fork emulation.
Hi f20,
Thank you for setting up this pull request providing sufficient information on the issue and also suggesting a fix for the same.
I used your method of storing the temporary directory object as part of the workbook object and then deleting it later in the DESTROY method of the workbook object (I am testing it in a Perl threaded application which uses 'use threads' to create and handle the threads).
I found that though the chdir() invocations have been stopped in the File::Find::find by passing an extra 'no_chdir => 1' argument, this still happens while the temporary directory object gets destroyed.
When the delete $self->{_tempdir_object}; statement is executed in the DESTROY method of Workbook.pm, it will trigger the DESTROY method of File::Temp object which looks like (Perl 5.10.1 version):
sub DESTROY { my $self = shift; local($., $@, $!, $^E, $?); if ($self->unlink_on_destroy && $$ == $self->{LAUNCHPID} && !$File::Temp::KEEP_ALL) { if (-d $self->{DIRNAME}) { # Some versions of rmtree will abort if you attempt to remove # the directory you are sitting in. We protect that and turn it # into a warning. We do this because this occurs during object # destruction and so can not be caught by the user. eval { **rmtree($self->{DIRNAME}, $File::Temp::DEBUG, 0);** }; warn $@ if ($@ && $^W); } } }
Now, the rmtree(...) command in the above subroutine again does chdir into various directories resulting in the same errors that were occurring before.
To get around this, I am planning to pass an additional argument 'CLEANUP=>0' while creating the temporary directory (in _store_workbook method) as below:
my $tempdir = File::Temp->newdir( DIR => $self->{_tempdir}, CLEANUP => 0 );
This will prevent the rmtree from getting executed, but the temporary directory still needs to be removed. I checked a little bit and found that File::Path::Tiny is a module that allows to remove directories recursively without doing a chdir(), but am yet to verify that it works. If you're aware of any better methods to remove the temporary directory, please do let me know.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hi Franck, Yes, the multithread.pl script works in my setup and as you said, the $cleanup_lock_counter shared variable ensures that there is no error. The Perl application I'm using can sometimes have a lot of threads (>100) and hence I didn't want the deletion of the temporary directories to happen serially one at a time (ensured through the locking of the shared variable). This is why I wanted to use the cleanup => 0 method so that the directory deletion can be done simultaneously in all threads (without having to wait for other threads) using some other method which doesn't involve performing chdir. I agree that the difference in run time may not matter when the number of threads is small but it might become more important with a large number of threads. Also, I didn't want the threads to wait for all the other threads to complete creating the workbook since they need to perform other tasks after the workbook creation and hence might get delayed. But thanks a lot for proposing this solution and helping me understand it. It gave me more clarity on the root cause of the issue :) |
This makes sense, and I agree that the lock is a kludge. Unfortunately File::Path::Tiny::rm($dir, $fast) might not be the solution because it does chdir even if $fast is true (the recursive call to rm at line 70 of Tiny,pm version 0.9 does not propagate $fast). Since disabling chdir is only advised "if you know it is safe to do so in your circumstance" and I probably do not fully understand the safety issue, I was reluctant to change that. Hence the locking kludge. In my pull request I have fixed a few typos and added a second example script in which the locking system is replaced with a simpler method that postpones all clean-up to take place in the main thread after all the worker threads have completed. This second approach should save time but use more temporary disk space. |
Yeah I too tried using the File::Path::Tiny and passing the $fast argument didn't help and it continued doing chdir. I agree that the second approach of using a queue to store the references is better in terms of runtime (of the application as a whole). However, I had a doubt about the disk usage though - won't it be using the same amount of disk space ? (Since in both the approaches, the deletion of the directories begins only after all the temporary directories exist on the disk) |
You are right, there were no differences in temporary storage in my previous example. It only matters when using some kind of thread pool where each thread will generate several workbooks. Also my previous Thread::Queue method did not actually work. I have force-pushed again, this time with three examples: in multithreaded1.pl, clean-up happens under lock protection at the end of the program (implies hogging of temporary space and memory for workbook objects); in multithreaded2.pl, clean-up happens under lock protection after making each workbook (implies less efficient processor use, since threads will spend time waiting for the lock); in multithreaded3.pl, clean-up is done manually in a thread-safe way using File::Find, with no locks (probably the best performance but least tested approach). |
Excel::Writer::XLSX does not work with "use threads" or with Perl's fork emulation on Windows. The issue is the use of chdir in finding items to be included in the final ZIP file, and in destroying the temporary folder used within the _store_workbook method of Excel::Writer::XLSX::Workbook.
I have fixed this issue for my own purposes by turning on the no_chdir option to File::Find::find, and storing the File::Temp object in the workbook object so that the calling code can defer the destruction of the File::Temp object (which uses chdir) to a critical code section.
This works in my use case. But I am no expert in this so I appreciate that this might not be a complete or appropriate solution.
If it is of interest, my patch is at https://github.com/f20/excel-writer-xlsx/tree/thread_safe
The text was updated successfully, but these errors were encountered: