Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix InactiveDestroy #175

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

rovo89
Copy link

@rovo89 rovo89 commented Nov 28, 2022

These commits fix the "fast" cleanup when attribute InactiveDestroy is set.

@rovo89
Copy link
Author

rovo89 commented Nov 28, 2022

I'm still looking for a good way to avoid $sth->finish() being called on destroy when InactiveDestroy is set. This is important when trying to disconnect without fetching all pending results and possible in some cases where a child process is forked and joined while fetching results.

Without this change, the END block of DBI would call disconnect_all()
and that would call mysql_close() on all open connections. This is bad
because forks would destroy their parent's context on exit as shown in
the example for AutoInactiveDestroy in the DBI docs.

This example fails with "DBD::MariaDB::db do failed: Lost connection to
MySQL server during query" without this fix:

  $dbh->{'AutoInactiveDestroy'} = 1;
  if (my $pid = fork()) {
    waitpid($pid, 0);
    $dbh->do('SELECT 1');
  } else {
    exit(0);
  }
DBI already calls dbd_st_finish() before dbd_st_destroy() if the handle
is active. For inactive handles no finish() should be necessary. With
InactiveDestroy set, the handle is always considered inactive. Skipping
finish() could lead to "out of sync" situations and inconsistencies,
however, the attribute is explicitly meant to skip all DB-related
destroy actions, especially for the case when the connection will be
closed anyway.

The only thing which is not cleaned up by finish() is the last result.

This commit fixes the following example:

 $dbh->{'AutoInactiveDestroy'} = 1;
  my $sth = $dbh->prepare('SELECT id FROM bcwb_maint_types; SELECT SLEEP(1); SELECT text FROM bcwb_maint_types;');
  $sth->execute();
  my $i = 0;
  do {
    printf "Result %d\n", ++$i;
    while (my $id = $sth->fetchrow_array()) {
      print $id, "\n";
    }
    print "---\n";
    if ($i == 1) {
      if (my $pid = fork()) {
        waitpid($pid, 0);
      } else {
        exit(0);
      }
    }
  } while ($sth->more_results);
@rovo89 rovo89 marked this pull request as ready for review November 29, 2022 15:43
@rovo89
Copy link
Author

rovo89 commented Nov 29, 2022

@pali Could you have a look please? Thanks!

@pali
Copy link
Member

pali commented Nov 29, 2022

I have looked at it but I'm really not sure if this work correctly. There are lot of pitfalls and edge cases. I think that I had in my mind some solution for this issue but I have never implemented it. Moreover such feature must be covered by tests and currently project lacks big CI testing with different Perl versions, different mysql/mariadb client versions and server versions. Travis is already gone, so it needs to be replaced by for example Github actions. And just now I fixed AppVeyor CI for Windows builds. So I would propose to postpone this change after CI matrix testing is again working to prevent any breakage. I'm not really sure that changing code related to PL_dirty does not cause segfault in some library combination.

@rovo89
Copy link
Author

rovo89 commented Nov 30, 2022

I can understand the need for tests. Even for a single database though, it will be quite hard to verify correctness.

For a statement handle destroy, I think it's required to:

  • Bring the DB into a state where it can execute other statements again (e.g. read pending rows and results from the socket)
  • For server side prepare: Tell the database were not going to use the statement anymore (mysql_stmt_close)
  • Free structures allocated via MariaDB API (e.g. mysql_free_result, mysql_stmt_free_result)
  • Free DBI-related structures

For a database handle destroy:

  • Destroy all statements (should be ensured by Perl reference counting)
  • Tell the database we're going to disconnect (mysql_close)
  • Free structures allocated via MariaDB API (but I don't think there are any for the database handle?)
  • Free DBI-related structures

If InactiveDestroy is set, any communication with the database must be skipped. The socket is shared, so if one process implicitly destroys all DBI objects on exit, it must be ensured that it does not:

  • Read anything (!) from the socket (e.g. the pending results), otherwise the parent process will wait for the data infinitely.
    A side-effect is that the exit will be faster because it doesn't need to process potentially huge results. I had a case where I set InactiveDestroy even on the parent process. Processing millions of rows takes a while, and if the user pressed Ctrl+C it would need to fetch all pending rows from the database before destroying everything. Setting the attribute made the process finish immediately.
  • Tell the database to destroy anything statement- or database-related, because it's still in use by the parent process.
  • Tell the database we're going to disconnect, otherwise the server closes the connection and the parent process gets a "Lost connection" error.

The problem is that I don't think there is a way to check whether communication over the socket has happened (or not), whether there is still anything allocated on DB side or via the MariaDB API (or not, if we still need it) and whether DBI structures are still allocated.

I think we can only test the side-effects, i.e. connect, send a query, fork, wait for the child to exit, and see if we can still fetch the results. That's basically the two examples in my commit messages. I can try to pack that into a test case. A little problem is that with the implicit $sth->finish(), the child process has already read the pending result from the socket, and the parent process will hang polling for that exact same information. Maybe a timeout would do?

These test cases can only verify though that not too much was freed, but not if something was missed to be freed.

@rovo89
Copy link
Author

rovo89 commented Nov 30, 2022

I'm not really sure that changing code related to PL_dirty does not cause segfault in some library combination.

As far as I understood the comment, the purpose was to avoid accessing $dbh during global destruction as it might have already been destroyed. However the only remaining code doesn't need the database handle, it just frees the memory allocated by the MariaDB API. It wouldn't hurt much to keep the check because the whole process will anyway be destroyed milliseconds later, but I don't think it's necessary.

DBI already takes care to skip the $sth->finish() call when it's not safe/required:

  • PL_dirty
  • database handle inactive
  • statement handle inactive
  • InactiveDestroy set on the statement handle (which effectively turns the active flag off during destroy)

Point of the commit is not to duplicate this work, but only free what might still be allocated.

@pali
Copy link
Member

pali commented Dec 4, 2022

  • Tell the database we're going to disconnect (mysql_close)

This is an issue. You must call mysql_close (or similar function) to release memory allocated for $dbh even when InactiveDestroy is set. Otherwise you introduce memory leak.

And I think this is the reason why InactiveDestroy was not properly implemented yet.

@bugfood
Copy link

bugfood commented Dec 28, 2022

Thank you for your work on this. I ran into this bug with my own code and wrote a test script (thinking I was going to write a new bug report).

My test script was showing two different types of failure, apparently at random:

  • The child closes the connection to the server, resulting in the parent failing, as reported in this PR.
  • The child reports an error instead: panic: DBI active kids (-1) < 0 or > kids (0)

I tested this PR (cherry-picked on top of current master), and the child no longer closes the connection, which is good. I still get the error from the child about half the time, though. I don't know if this is a separate bug or if it is somehow related. It doesn't happen at all with DBD::mysql.

Here is my test script (renamed as .txt to get around github's filtering):
forktest.pl.txt

Here is a good example run, using patched DBD::MariaDB::

$ strace -f -o /tmp/strace.ok.log ./forktest.pl -d MariaDB -f -a
connect to: DBI:MariaDB:;host=localhost
child 480267 : initiating normal exit
parent 480266 : child process complete
parent 480266 : exit

strace.ok.log

Here is a failing example run, also using patched DBD::MariaDB:

$ strace -f -o /tmp/strace.bad.log ./forktest.pl -d MariaDB -f -a
connect to: DBI:MariaDB:;host=localhost
child 480286 : initiating normal exit
panic: DBI active kids (-1) < 0 or > kids (0) at /usr/lib/x86_64-linux-gnu/perl5/5.36/DBI.pm line 759.
END failed--call queue aborted.
parent 480285 : child process complete
parent 480285 : exit

strace.bad.log

I'm using perl v5.36.0 (Debian package 5.36.0-6) with DBI version 1.643 (Debian package 1.643-4).

Thanks,
Corey

@pali
Copy link
Member

pali commented Aug 26, 2023

@rovo89 you can use a hack/trick with invalidating file descriptor before calling mysql_close() when you need connection to stay open but want to free mysql client resources. Current master branch already contains similar hack/trick for Windows builds due to dichotomy between file descriptor and socket handle in Perl API and Windows API. This could be a way how to properly implement InactiveDestroy.

@cbrandtbuffalo
Copy link

We use this feature in Request Tracker, so we can add some possible additional test cases. We are working on a branch to support running with DBD::MariaDB and have gotten most of our tests passing. One remaining failure is one that relies on InactiveDestroy. We have some code that uses it here:

https://github.com/bestpractical/rt/blob/6b3a2158b5d502a9115a59efb5c550050f9b72f2/lib/RT/Util.pm#L60

And tests here:

https://github.com/bestpractical/rt/blob/5.0/db-type-mariadb/t/api/safe-run-child-util.t

Was InactiveDestroy working in the past in DBD::mysql and something changed in MariaDB?

@choroba
Copy link
Member

choroba commented Feb 12, 2024

@cbrandtbuffalo: The code comments suggest InactiveDestroy never worked properly in DBD::mysql, either. There were resource leaks preventing the connection from closing (regardless of InactiveDestroy). The leaks were fixed in DBD::MariaDB, which means the connection is now always closed.

@choroba
Copy link
Member

choroba commented Feb 12, 2024

@cbrandtbuffalo
Copy link

Yes, I can take a look. All of our tests run in github actions, so the environment is available in the RT github repo. To test, it looks like I'll need to create a new build of the base Debian docker image using your PR. I'll comment if I'm able to run a test.

@cbrandtbuffalo
Copy link

The RT tests pass for me when running your branch locally on a Mac. For some reason, the tests still fail in github actions using this docker image: https://hub.docker.com/layers/bpssysadmin/rt-base-debian/RT-5.0.3-buster-testmariadb2-20240215/images/sha256-838a7cb36dad1a2883c30dab9ac6e94cf30d354efdabe2214cf122d9fb5241a5?context=repo I'm still working on debugging. It's possible the build isn't correctly grabbing and building the branch.

@cbrandtbuffalo
Copy link

We fixed the docker image and can confirm the branch allows our RT tests to pass. You can see the green checkmark here: https://github.com/bestpractical/rt/tree/5.0/db-type-mariadb. As noted before, we also see passing tests running locally on Mac OS X. We also confirmed it passed on Ubuntu.

@cbrandtbuffalo
Copy link

@pali Do you have other thoughts on this branch based on the testing noted above? Our use case for this feature is we have a running web server process (RT) which can fork other processes. When the forked process exits, we don't want it to disconnect the active DB connection the main RT process is using, so we use this flag.

@cbrandtbuffalo
Copy link

I noticed some activity with a few commits from @Tux and @choroba so I wanted to ping this PR just in case there might be a release in the future. I know people have had limited time, so I'm happy to help if there's anything do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants