Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] SingleNodeDiscoveryIT testCannotJoinNodeWithSingleNodeDiscovery failing #106425

Closed
davidkyle opened this issue Mar 18, 2024 · 8 comments · Fixed by #108206
Closed

[CI] SingleNodeDiscoveryIT testCannotJoinNodeWithSingleNodeDiscovery failing #106425

davidkyle opened this issue Mar 18, 2024 · 8 comments · Fixed by #108206
Assignees
Labels
:Core/Infra/Core Core issues without another label low-risk An open issue or test failure that is a low risk to future releases Team:Core/Infra Meta label for core/infra team >test-failure Triaged test failures from CI

Comments

@davidkyle
Copy link
Member

Build scan:
https://gradle-enterprise.elastic.co/s/a5dvvhunctzui/tests/:server:internalClusterTest/org.elasticsearch.discovery.single.SingleNodeDiscoveryIT/testCannotJoinNodeWithSingleNodeDiscovery

Reproduction line:

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.discovery.single.SingleNodeDiscoveryIT.testCannotJoinNodeWithSingleNodeDiscovery" -Dtests.seed=EDFBA5FF9920092C -Dtests.locale=ga-IE -Dtests.timezone=Pacific/Saipan -Druntime.java=21

Applicable branches:
main

Reproduces locally?:
Didn't try

Failure history:
Failure dashboard for org.elasticsearch.discovery.single.SingleNodeDiscoveryIT#testCannotJoinNodeWithSingleNodeDiscovery

Failure excerpt:

java.lang.AssertionError: 
Expected: (an empty collection or iterable containing [a string starting with "JNDI lookup class is not available because this JRE does not support JNDI. JNDI string lookups will not be available, continuing configuration.", a string starting with "JMX runtime input lookup class is not available because this JRE does not support JMX. JMX lookups will not be available, continuing configuration. "] or iterable containing [a string starting with "JMX runtime input lookup class is not available because this JRE does not support JMX. JMX lookups will not be available, continuing configuration. "])
     but: was <[Attempted to append to non-started appender mock]>

  at __randomizedtesting.SeedInfo.seed([EDFBA5FF9920092C:120F4759CD49A567]:0)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
  at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2147)
  at org.elasticsearch.test.ESTestCase.checkStaticState(ESTestCase.java:718)
  at org.elasticsearch.test.ESTestCase.after(ESTestCase.java:524)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)

@davidkyle davidkyle added :Distributed/Network Http and internode communication implementations >test-failure Triaged test failures from CI labels Mar 18, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added blocker Team:Distributed Meta label for distributed team labels Mar 18, 2024
@pxsalehi
Copy link
Member

This seems like a test issue which may or may not be related to #102516. But I think this is for delivery team maybe. Feel free to relabel if that's not the case.

@pxsalehi pxsalehi added :Delivery/Build Build or test infrastructure and removed :Distributed/Network Http and internode communication implementations labels Mar 26, 2024
@elasticsearchmachine elasticsearchmachine added Team:Delivery Meta label for Delivery team and removed Team:Distributed Meta label for distributed team labels Mar 26, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@mark-vieira
Copy link
Contributor

I don't have much insight into the internal test cluster fixtures. This is an area with fuzzy ownership to be honest but I think core/infra is the most appropriate.

@mark-vieira mark-vieira added the :Core/Infra/Core Core issues without another label label Mar 26, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Mar 26, 2024
@rjernst
Copy link
Member

rjernst commented Apr 11, 2024

I investigated this a bit, but could not come to any conclusions. The assertion is cryptic, but it results from a MockLogAppender not being started when a message is appended to it. Yet that check happens within log4j (AppenderControl), and goes to the status logger. So we don't have any control or ability to fail harder when this happens to find where it is actually occurring.

My only thought for a potential fix is to always start MockLogAppender instances within the constructor. I don't see a downside to this, but perhaps I'm missing something.

@DaveCTurner
Copy link
Contributor

I think it might just not be threadsafe to remove appenders from loggers and stop the appender while other threads might be logging stuff. The following test pretty reliably fails for me:

public void testConcurrentLogAndLifecycle() throws Exception {
    final var keepGoing = new AtomicBoolean(true);
    final var logThread = new Thread(() -> {
        while (keepGoing.get()) {
            logger.info("test");
        }
    });
    logThread.start();

    final var appender = new MockLogAppender();
    for (int i = 0; i < 1000; i++) {
        try (var ignored = appender.capturing(MockLogAppenderTests.class)) {
            Thread.yield();
        }
    }

    keepGoing.set(false);
    logThread.join();
}

@mark-vieira mark-vieira removed the :Delivery/Build Build or test infrastructure label Apr 23, 2024
@elasticsearchmachine elasticsearchmachine removed the Team:Delivery Meta label for Delivery team label Apr 23, 2024
@rjernst rjernst added low-risk An open issue or test failure that is a low risk to future releases and removed blocker labels Apr 24, 2024
rjernst added a commit to rjernst/elasticsearch that referenced this issue Apr 30, 2024
Many uses of MockLogAppender predate the addition of the helper method
to create a Releasable for restoring logging, which allows the use of
try-with-resource blocks. This commit converts current existing uses of
MockLogAppender to use capturing.

relates elastic#106425
@rjernst
Copy link
Member

rjernst commented Apr 30, 2024

Thanks @DaveCTurner, you are right, it appears modifying appenders is simply not threadsafe. I have an idea which seems to work: adding a single appender during test static init (much like some other appenders we add in ESTestCase). That single appender internally keeps track of mapping between loggers and expections in a threadsafe way. Witht hat change I was able to get your example test above to pass consistently over many runs.

However, there are a lot of places adding the MockLogAppender to appenders directly which needs to be addressed first. I've opened #108114 to do that.

rjernst added a commit that referenced this issue May 2, 2024
Many uses of MockLogAppender predate the addition of the helper method
to create a Releasable for restoring logging, which allows the use of
try-with-resource blocks. This commit converts current existing uses of
MockLogAppender to use capturing.

relates #106425
rjernst added a commit to rjernst/elasticsearch that referenced this issue May 2, 2024
Adding and removing appenders in Log4j is not threadsafe. Yet some tests
rely on capturing logging by adding an in memory appender,
MockLogAppender. This commit makes the mock logging threadsafe by
creating a new, singular appender for mock logging that delegates, in a
threadsafe way, to the existing appenders created. Confusingly
MockLogAppender is no longer really an appender, but I'm leaving
clarifying that for a followup so as to limit the scope of this PR.

closes elastic#106425
rjernst added a commit that referenced this issue May 8, 2024
Adding and removing appenders in Log4j is not threadsafe. Yet some tests
rely on capturing logging by adding an in memory appender,
MockLogAppender. This commit makes the mock logging threadsafe by
creating a new, singular appender for mock logging that delegates, in a
threadsafe way, to the existing appenders created. Confusingly
MockLogAppender is no longer really an appender, but I'm leaving
clarifying that for a followup so as to limit the scope of this PR.

closes #106425
markjhoy pushed a commit to markjhoy/elasticsearch that referenced this issue May 9, 2024
Adding and removing appenders in Log4j is not threadsafe. Yet some tests
rely on capturing logging by adding an in memory appender,
MockLogAppender. This commit makes the mock logging threadsafe by
creating a new, singular appender for mock logging that delegates, in a
threadsafe way, to the existing appenders created. Confusingly
MockLogAppender is no longer really an appender, but I'm leaving
clarifying that for a followup so as to limit the scope of this PR.

closes elastic#106425
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label low-risk An open issue or test failure that is a low risk to future releases Team:Core/Infra Meta label for core/infra team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants