You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are using chronos (master branch) on several clusters. On one of them, we are using zookeeper 3.5 with authentication, and mesos 1.6.2 with SSL and suthentication.
We face a race condition while starting, when the zookeeper auth handshake is not over, chronos does not even try to register with mesos.
We first found that while testing thru a vpn with ~30ms latency beetwen zk, mesos and chronos, but we also have this issue quite often with lower latencies.
A workaround while testing thru the vpn was to add a breakpoint at https://github.com/mesos/chronos/blob/master/src/main/scala/org/apache/mesos/chronos/scheduler/jobs/JobScheduler.scala#L522, wait a few seconds on start for zookeeper auth completion, then resume. This works 100% of the times.
How should it be fixed "the good way" ? ( ensure to wait for zookeeper handshake before starting election ? restart election on zookeeper (re)auth ? ... ? )
The text was updated successfully, but these errors were encountered:
We are using chronos (master branch) on several clusters. On one of them, we are using zookeeper 3.5 with authentication, and mesos 1.6.2 with SSL and suthentication.
We face a race condition while starting, when the zookeeper auth handshake is not over, chronos does not even try to register with mesos.
We first found that while testing thru a vpn with ~30ms latency beetwen zk, mesos and chronos, but we also have this issue quite often with lower latencies.
A workaround while testing thru the vpn was to add a breakpoint at https://github.com/mesos/chronos/blob/master/src/main/scala/org/apache/mesos/chronos/scheduler/jobs/JobScheduler.scala#L522, wait a few seconds on start for zookeeper auth completion, then resume. This works 100% of the times.
How should it be fixed "the good way" ? ( ensure to wait for zookeeper handshake before starting election ? restart election on zookeeper (re)auth ? ... ? )
The text was updated successfully, but these errors were encountered: