Skip to content

MDEV-31949 III. Innodb flush_log_later for XA commit,rollback #4054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

andrelkin
Copy link
Contributor

This commit carries changes to Innodb for the user XA's binlog-coordinated
ordered commit. It's the part III of the series aimed at
improving on parallel slave performance and providing crash-recovery.

As the part I has unified binlog coordinated completion (commit and
rollback) of the XA transactions with the normal ones a remained
task is to optimize flush to disk call away from the XA completion execution
in engine like it's done to the normal transaction.
Namely innobase_{rollback,commit}_by_xid()^\footnote{Unlike the normal
transaction case, XA-rollback also needs a similar addressing} now
raise a flag blocking the flush before innobase_commit_low().
The flag is restored upon that, which of course what the normal
transaction does in its execution path past the low-commit.

Tests demonstrate the changes are effectual, e.g
binlog_xa_prepared_disconnect.test " failed ", to require it to
adapted to possible loss of the commit/rollback upon crash.
The actual binlog recovery in such scenarios is only going to be recovered in the part
IV of MDEV-33168.

Feasibility of successful recovery of
XA-"COMPLETE" is based on a plain observation, that in order to decide
whether to commit an XA or not it's sufficient to follow
the normal transaction recovery rule: when xid is found in binlog (that's MDEV-33168)
the transaction gets completed (committed in the normal transaction case).

andrelkin and others added 9 commits May 10, 2025 15:49
…replication/recovery

This commit is the part I of the series of four that addresses
MDEV-31949 in two main directions which are xa parallel slave
performance and xa transaction crash-recovery.

This one improves upon MDEV-742 design's XA binlogging to
facilitate to the crash-recovery (the actual binlog-based recovery
is coming in the part IV of MDEV-33168 et al).

With the refactoring changes, when binlog is ON, handling of execution of
a XA transaction, including binlogging, is made conceptually uniform
with the normal BEGIN-COMMIT transaction.
That is At XA-PREPARE the transaction first is prepared in engines and
after that accumulated replication events are written to the binary
log, naturally without any completion event as it's unknown yet.
When later XA-"COMPLETE" that is XA-COMMIT and XA-PREPARE follows up,
the binary logging of respective Query event takes place first.
One can perceive such scheme as if a normal transaction logging
is split in the middle into two parts (and nothing really happens
in between of them but time passed by). And after the second chunk is sent
to binlog the transaction gets committed (or rolled back) in engine.

With binlog is enabled both phases' loggings go through binlog-group-commit,
where XA-PREPARE "sub-transaction" merely groups for binary logging
so skips the engine action while XA-"COMPLETE" does both, that is the logging
and an ordered "complete". This behavior is also consistent between
completions from the native and external connections.
Being a participant of binlog-group-commit designates either XA phase
is recoverable (not implemented here) from active binlogs determined
by binlog-checkpoint.
For the latter specifically this patch removes custom unlogging of XA-prepare
group. See
   entry.need_unlog= 0
et al in MYSQL_BIN_LOG::write_transaction_to_binlog().

In addition to the above a corner case of engine read-only XA transaction
is addressed. Previously it was streamlined with logging an empty XA-PREPARE
group of binlog events concluded by XA-"COMPLETE" query-event.
Now when a preparing XA transaction is found to have only read-only engine
branches or none it is marked for rollback as XA_RDONLY optimization:
- nothing gets logged at the prepare time an XA_RDONLY note is generated and
- it's rolled back at disconnect
For XA-COMPLETE to tell whether the prepare phase was logged or
not the XID state object is extended with a boolean flag which is a
part internal interface for recovery implementation.
The flag is normally raised by XA-prepare at flushing to binlog and
also at binlog recovery (will be done so it's fully implemented).

Notable changes:

sql/handler.cc
  - ha_prepare()
    a. is ensured to execute binlog_hton::prepare() as
       the last XA's branch for preparing;
    b. engine read-only is marked in the xid state to rollback and
       ER_XA_RDONLY *note* is generated.
  - conversely ha_rollback_trans() executes binlog hton::rollback() as first
    branch (the commit method was already equipped to do so)
  - ditto to the external completion of XA via ha_commit_or_rollback_by_xid();
    the function is made a sort of recursive. It may be first be invoked
    on a top level to take on the binlog hton "completion" to be
    called from its stack once again now having is_xap_binlogged() false,
    so to carry out the engine commit.
  - xarecover_handlerton() now only simulates successful find of the user
    xid in binlog.
sql/log.cc
  - binlog_commit,rollback() et al are simplified and cleaned up (like
    binlog_complete_by_xid() introduction).

    In particular binlog_{commit,rollback}() are rendered to retain
    just a single piece of XA footprint in either. The methods recognize
    naturally empty transaction caches at XA completion to proceed anyway
    into binlog-group-commit thickness;
    the binlog_commit's binlog_commit_flush_trx_cache() decides which
    type of transaction and which XA phase is being handled so a proper
    group event closure is computed.
  - MYSQL_BIN_LOG::trx_group_commit_with_engines() takes care to
    raise or drop XID::binlogged flag via xid_cache_update_xa_binlog_state().
  - the new run_xa_complete_ordered() encapsulates XA specifics at
    execution of the engine ordered commit. It's defined with asserts due to
    MDEV-32455.
    It's not done as TC_LOG::member because of the scope of this work is
    limited. The new function mirrors the logic of the normal run_commit_ordered()
    in that it skips engine completion for those that lack
    hton::commit_ordered() in favor of doing that on the top level
    of ha_commit,rollback_trans().
sql/log_event_server.cc
  - use the transaction cache at XA-PREPARE handling (specifically
    when XA-END Query event is logged).
sql/xa.cc
  - XID_cache_element extended with is-binlogged meaning flag and
    few rating functions added to use by binlogging and recovery
    (xid_cache_update_xa_binlog_state());
  - trans_xa_commit,rollback() external action branches are converted
    into calls of a new largely common function.
sql/xa.h
  - xid_cache_insert(XID *xid, bool is_binlogged) parameter list is
    extended for xa binlog recovery.
Affected binlog,main,rpl suites tests/results due to the empty
XA's rollback optimization.
…_ordered ..

.. otherwise it asserts expecting a non-zero value of ha_info::m_ht in the
function's loop.
However since it's a rollback operation such state is possible as a top-level
resets the ha_info:s of transaction in a loop so some of them can get reset
before binlog_rollback() is called.

Such transaction that is "spawned" to binlog-and-rollback (because may contain
unrollable side-effects) should not have visited TC_LOG::run_prepare_ordered()
in the first place. And it did so 'cos of incorrect cache_mngr->using_xa value.

This is fixed now.
MDEV-32830 base patch missed out supporting XA-COMPLETE
via hton::"complete"_by_xid with multiple engines some of them
are and some are not defined ordered_commit().
The base patch fails with an assert in a path that should have been
unreachable in the first place.
The assert path revealed a second commit-by-xid capable engine (Spider)
was running commit_by_xid() as if it is ordered_commit()-type of
while Spider does not define ordered_commit.
For engines featuring the ordered-commit undefined and commit-by-xid defined
commit_by_xid() is considered as a general so cannot be invoked out of
binlog-group-commit (run_prepare_ordered()).
It would be a full commit similarly to the normal transaction case be invoked
after binlog-group-commit engine loop is done^\footnote{%
also specified in p.5 low-level design points of MDEV-32830 description}.

For that this patch

- extends  commit_or_rollback_xa_engine() with a hint
  for xarollback_handlerton() argument that enables to understand
  which phase of XA completion it's being run.
- elaborates in xacommit_handlerton().
  Specifically to fix this bug, not invoke a defined hton->complete_by_xid when
  the method is to run as ordered-"complete" by engine lacking the
  ordered_commit.
…lling back..

XA in XA_ROLLBACK_ONLY state.

The assert was too strict having not anticipated to face a XA-PREPARE
ending in rolling back.
That's a correct behaviour of read-only disconnecting XA.

Fixed to relax the assert.
A ha_rollback_trans()
 DBUG_ASSERT(thd->lex->sql_command != SQLCOM_XA_ROLLBACK ||
                thd->transaction->xid_state.is_explicit_XA())
expected an XA sql_command match a XID state. However Spider engine can
visit this function even when Spider is not the xa transaction
participant (sic!) through XA ROLLBACK 'xid' from an external connection.
The assert is refined, at least temporarily to satisfy MDEV-32830 testing,
to let the function run an internal Spider's transaction.
CACHE INDEX t PARTITION after XA-PREPARE also this a line previously
assert for another "exceptional" sql command:

...

This is really a variety of MDEV-32455.
The assert did not expect another use case of implicit rollback with
XA transaction while the sql command should have been (I think) rejected with

  XAER_RMFAIL: The command cannot be executed when global transaction is in the  ACTIVE state

The assert is refined and its another instance removed being redundant.
An observed assert in rpl_group_info::unmark_start_commit

 DBUG_ASSERT(!gco->next_gco ||
              gco->next_gco->wait_count > e->count_committing_event_groups ||
              allow_unmark_after_complete);

was overreaction. The reason that execution got to that point was
MDEV-33921 underfixed that issue for the slave parallel mode.

That's done now. The former test is extended to hit the assert by
the (buggy) BASE of this commit.
This commit carries changes to Innodb for the user XA's binlog-coordinated
 ordered commit. It's the part III of the series aimed at
improving on parallel slave performance and providing crash-recovery.

As the part I has unified binlog coordinated completion (commit and
rollback) of the XA transactions with the normal ones a remained
task is to optimize flush to disk call away from the XA completion execution
in engine like it's done to the normal transaction.
Namely innobase_{rollback,commit}_by_xid()^\footnote{Unlike the normal
transaction case, XA-rollback also needs a similar addressing} now
raise a flag blocking the flush before innobase_commit_low().
The flag is restored upon that, which of course what the normal
transaction does in its execution path past the low-commit.

Tests demonstrate the changes are effectual, e.g
binlog_xa_prepared_disconnect.test " failed ", to require it to
adapted to possible loss of the commit/rollback upon crash.
Binlog recovery in such scenarios is going to be recovered in the part
IV of MDEV-33168. The feasibility of successful recovery of
XA-"COMPLETE" is based on a plain observation, that in order to decide
whether to commit an XA or not it's sufficient to follow
the normal transaction recovery rule: when xid is found in binlog
the transaction gets completed (committed in the normal transaction case).
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

❌ andrelkin
❌ emoonrain
You have signed the CLA already but the status is still pending? Let us recheck it.

@andrelkin andrelkin requested review from bnestere, dr-m and knielsen and removed request for bnestere May 21, 2025 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants