Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When testing Master failover and reattaching failed master getting ReadTopologyInstance error #284

Open
leeparayno opened this issue Nov 3, 2016 · 5 comments

Comments

@leeparayno
Copy link

leeparayno commented Nov 3, 2016

I have 3 Percona MySQL 5.6.29-76.2-log instances in separate VirtualBox VMs running CentOS 7.0.

The prior replication configuration was:

mysql56b-2
+ mysql56b-1
+ mysql56b-3

Upon blocking 3306 on mysql56b-2, failover initiated and mysql56b-1 was made a slave of mysql56b-3. mysql56b-2 was not showing up in the topology shown on the Orchestrator UI any longer in the "mysql56b-2 cluster".

I was attempting to allow the old master (mysql56b-2) rejoin the cluster.

I set gtid_purged to the values that mysql56b-2 was showing in gtid_executed.

After repointing to mysql56b-3 using the change master command, for some reason, it appeared that replication was attempting to run through the existing transactions that had already been run.

For all the UUID/GTID combinations in the gtid_execute, I created empty transactions and set the gtid_next to the maximum transaction value for each UUID that was already showing had already been executed by that slave. So this should essentially make it ready to connect to the new master and retrieve any new transactions as necessary and catch up to the other replicas and new master.

However, Orchestrator was stuck with this error:

ERROR ReadTopologyInstance(mysql56b-2:3306) show global status like 'Uptime': Error 1837: When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'd1da7519-fdb9-11e5-8407-08002720ea52:111'.

On mysql56b-2 was showing gtid_next as 'AUTOMATIC', gtid_purged as the current set of transactions:

3d83956c-e8a3-11e5-ba83-080027da8259:1-5,
743902dd-97cf-11e6-b0c9-080027a97f61:1-9,
d1da7519-fdb9-11e5-8407-08002720ea52:1-11

Note: I tried a few failovers to different nodes and ran transactions, which is why there are received/executed transactions from each of the 3 nodes.

mysql56b-2 was showing now issues with replication in "show slave status" and all appeared to be in sync after reattaching to mysql56b-3.

I couldn't get Orchestrator to refresh the current state until I "forgot" mysql56b-2 and restarted Orchestrator to let it be rediscovered.

@shlomi-noach
Copy link
Contributor

I'm not sure I understand if this is an orchestrator problem or a GTID problem. You say orchestrator said:

ERROR ReadTopologyInstance(mysql56b-2:3306) show global status like 'Uptime': Error 1837: When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'd1da7519-fdb9-11e5-8407-08002720ea52:111'.

and then that message only went away when you forgot and rediscovered the host? Or were there further steps in between?

s/mysql56b-2 was showing now issues/mysql56b-2 was showing no issues/g -- correct?

@leeparayno
Copy link
Author

After I reassigned the old master back as a slave of the new master, I originally got this error in SHOW SLAVE STATUS, but fixed the replication issue by creating empty transactions for all the transactions that had already been executed.

So at the time I was still seeing the ReadTopologyInstance errors, the show slave status on mysql56b-2 was no longer showing any issues.

On Nov 7, 2016, at 5:47 AM, Shlomi Noach [email protected] wrote:

I'm not sure I understand if this is an orchestrator problem or a GTID problem. You say orchestrator said:

ERROR ReadTopologyInstance(mysql56b-2:3306) show global status like 'Uptime': Error 1837: When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'd1da7519-fdb9-11e5-8407-08002720ea52:111'.

and then that message only went away when you forgot and rediscovered the host? Or were there further steps in between?

s/mysql56b-2 was showing now issues/mysql56b-2 was showing no issues/g -- correct?

Correct, there were no more issues with replication.

This makes it look like Orchestrator was caching a previous error and maintaining that state.

You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#284 (comment)

Lee Parayno

@shlomi-noach
Copy link
Contributor

Thank you. I've never witnessed this kind of behavior before. I will do some digging.

@shlomi-noach
Copy link
Contributor

Looking slightly more into this, a couple more questions:

  • I assume you saw this error on the orchestrator log, correct? And likely this also showed at the GUI's instance dialog?
  • Other than this error showing up, did orchestrator fail to read the instance? To show the topology?

@leeparayno
Copy link
Author

Yes it was in the orchestrator log.

In the GUI, it was reporting the old replication error on the instance. It looked like orchestrator was failing to read the instance’s current state, as the topology was updated to the show the new position as a slave of the new master, but was not showing as replicating correctly.

Lee Parayno

On Nov 14, 2016, at 6:04 AM, Shlomi Noach <[email protected] mailto:[email protected]> wrote:

Looking slightly more into this, a couple more questions:

  • I assume you saw this error on the orchestrator log, correct? And likely this also showed at the GUI's instance dialog?
  • Other than this error showing up, did orchestrator fail to read the instance? To show the topology?

You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#284 (comment) #284 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants