Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] ConcurrentSnapshotsIT testQueuedOperationsOnMasterDisconnect failing #108237

Open
iverase opened this issue May 3, 2024 · 1 comment
Open
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs medium-risk An open issue or test failure that is a medium risk to future releases Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI

Comments

@iverase
Copy link
Contributor

iverase commented May 3, 2024

Build scan:
https://gradle-enterprise.elastic.co/s/4bfkfgbtnqwxy/tests/:server:internalClusterTest/org.elasticsearch.snapshots.ConcurrentSnapshotsIT/testQueuedOperationsOnMasterDisconnect

Reproduction line:

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.snapshots.ConcurrentSnapshotsIT.testQueuedOperationsOnMasterDisconnect" -Dtests.seed=254788E6CC2276E3 -Dtests.locale=de-CH -Dtests.timezone=America/Adak -Druntime.java=21

Applicable branches:
main

Reproduces locally?:
Didn't try

Failure history:
Failure dashboard for org.elasticsearch.snapshots.ConcurrentSnapshotsIT#testQueuedOperationsOnMasterDisconnect

Failure excerpt:

java.util.concurrent.ExecutionException: org.elasticsearch.repositories.RepositoryException: [_all] Failed to update cluster state during repository operation

  at __randomizedtesting.SeedInfo.seed([254788E6CC2276E3:E690D22FFFA1BFCA]:0)
  at org.elasticsearch.action.support.PlainActionFuture$Sync.getValue(PlainActionFuture.java:287)
  at org.elasticsearch.action.support.PlainActionFuture$Sync.get(PlainActionFuture.java:274)
  at org.elasticsearch.action.support.PlainActionFuture.get(PlainActionFuture.java:93)
  at org.elasticsearch.client.internal.support.AbstractClient$RefCountedFuture.get(AbstractClient.java:433)
  at org.elasticsearch.client.internal.support.AbstractClient$RefCountedFuture.get(AbstractClient.java:413)
  at org.elasticsearch.snapshots.ConcurrentSnapshotsIT.testQueuedOperationsOnMasterDisconnect(ConcurrentSnapshotsIT.java:703)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)

  Caused by: org.elasticsearch.repositories.RepositoryException: [_all] Failed to update cluster state during repository operation

    at org.elasticsearch.snapshots.SnapshotsService.failAllListenersOnMasterFailOver(SnapshotsService.java:2575)
    at org.elasticsearch.snapshots.SnapshotsService$RemoveSnapshotDeletionAndContinueTask.onFailure(SnapshotsService.java:2634)
    at org.elasticsearch.cluster.service.MasterService$ExecutionResult.notifyFailure(MasterService.java:986)
    at org.elasticsearch.cluster.service.MasterService.executeAndPublishBatch(MasterService.java:224)
    at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.lambda$run$2(MasterService.java:1656)
    at org.elasticsearch.action.ActionListener.run(ActionListener.java:356)
    at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.run(MasterService.java:1653)
    at org.elasticsearch.cluster.service.MasterService$5.lambda$doRun$0(MasterService.java:1248)
    at org.elasticsearch.action.ActionListener.run(ActionListener.java:356)
    at org.elasticsearch.cluster.service.MasterService$5.doRun(MasterService.java:1227)
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.lang.Thread.run(Thread.java:1583)

    Caused by: org.elasticsearch.cluster.NotMasterException: no longer master


@iverase iverase added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels May 3, 2024
@elasticsearchmachine elasticsearchmachine added Team:Distributed Meta label for distributed team needs:risk Requires assignment of a risk label (low, medium, blocker) labels May 3, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@henningandersen henningandersen added medium-risk An open issue or test failure that is a medium risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs medium-risk An open issue or test failure that is a medium risk to future releases Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

3 participants