-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[client][fix] Bookie WatchTask may be stuck #4481
[client][fix] Bookie WatchTask may be stuck #4481
Conversation
@merlimat @eolivelli @hangc0276 @dlg99 @zymap @shoothzj PTAL. This problem has certain harmful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
great catch
I have changed the title, this is not an "improvement", but actually a "bugfix"
can we consider to prevent outside call of |
|
According to @hangc0276 suggestion, change the execution thread pool name to: |
@@ -424,6 +427,8 @@ public BookKeeper(ClientConfiguration conf, ZooKeeper zk, EventLoopGroup eventLo | |||
|
|||
// initialize resources | |||
this.scheduler = OrderedScheduler.newSchedulerBuilder().numThreads(1).name("BookKeeperClientScheduler").build(); | |||
this.highPriorityTaskExecutor = | |||
OrderedScheduler.newSchedulerBuilder().numThreads(1).name("BookKeeperWatchTaskScheduler").build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to change the thread pool name.
// Close the watchTask scheduler | ||
highPriorityTaskExecutor.shutdown(); | ||
if (!highPriorityTaskExecutor.awaitTermination(10, TimeUnit.SECONDS)) { | ||
LOG.warn("The highPriorityTaskExecutor for WatchTask did not shutdown cleanly"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need call highPriorityTaskExecutor.shutdownNow()
Motivation
Before understanding the problem solved by this PR, you can try to use this Test method to execute it in the existing master branch, and the unit test will fail.
Next, let me tell you my problem:
Since the thread has been performing this operation, the watch event triggered by ZK has been in the queue of the scheduler thread and has not been executed, and no new watch listeners will be registered in ZK. Finally, the online Bookie node will no longer be updated in the cache of the bk client.
···
BookKeeperClientScheduler-OrderedScheduler-0-0
at org.apache.bookkeeper.util.collections.ConcurrentOpenHashMap$Section.removeIf(Ljava/util/function/BiPredicate;)I (ConcurrentOpenHashMap.java:406)
at org.apache.bookkeeper.util.collections.ConcurrentOpenHashMap.removeIf(Ljava/util/function/BiPredicate;)I (ConcurrentOpenHashMap.java:172)
at org.apache.bookkeeper.proto.PerChannelBookieClient.checkTimeoutOnPendingOperations()V (PerChannelBookieClient.java:1015)
at org.apache.bookkeeper.proto.DefaultPerChannelBookieClientPool.checkTimeoutOnPendingOperations()V (DefaultPerChannelBookieClientPool.java:132)
at org.apache.bookkeeper.proto.BookieClientImpl.monitorPendingOperations()V (BookieClientImpl.java:572)
at org.apache.bookkeeper.proto.BookieClientImpl.lambda$new$0()V (BookieClientImpl.java:131)
at org.apache.bookkeeper.proto.BookieClientImpl$$Lambda$77.run()V (Unknown Source)
at org.apache.bookkeeper.util.SafeRunnable$1.safeRun()V (SafeRunnable.java:43)
at org.apache.bookkeeper.common.util.SafeRunnable.run()V (SafeRunnable.java:36)
at com.google.common.util.concurrent.MoreExecutors$ScheduledListeningDecorator$NeverSuccessfulListenableFutureTask.run()V (MoreExecutors.java:705)
at java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; (Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset()Z (FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)Z (ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V (ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run()V (ThreadPoolExecutor.java:617)
at io.netty.util.concurrent.FastThreadLocalRunnable.run()V (FastThreadLocalRunnable.java:30)
at java.lang.Thread.run()V (Thread.java:748)
···
So I suggest that WatchTask provide an internal independent thread pool.
Changes
Add inner thread for WatchTask.