Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Tag Mismatch error on VisualizationsToolIT.testVisualizationFound Windows Test #3347

Open
brianf-aws opened this issue Jan 8, 2025 · 11 comments
Labels
bug Something isn't working

Comments

@brianf-aws
Copy link
Contributor

What is the bug?

public void testVisualizationNotFound() throws IOException, ParseException {
String requestBody = "{\"parameters\":{\"question\":\"can you show me RAM info with visualization?\"}}";
Response response = TestHelper
.makeRequest(client(), "POST", "/_plugins/_ml/agents/" + agentId + "/_execute", null, requestBody, null);
String responseStr = TestHelper.httpEntityToString(response.getEntity());
String toolOutput = extractAdditionalInfo(responseStr);
Assert.assertEquals("No Visualization found", toolOutput);
}

There is a retry enabled on the VisualizationsToolIT.testVisualizationFound test but it seems that retry has a bit of a flaw if the underlying problem is different I am seeing that the problem here is a encryption issue. This might be the source of all of our flaky tests

VisualizationsToolIT > testVisualizationFound FAILED
    org.opensearch.client.ResponseException: method [POST], host [http://127.0.0.1:54529/], URI [/_plugins/_ml/agents/bLjaRJQB515KRnslfdUv/_execute], status line [HTTP/1.1 500 Internal Server Error]
    {"status":500,"error":{"type":"AEADBadTagException","reason":"System Error","details":"Tag mismatch"}}
        at app//org.opensearch.client.RestClient.convertResponse(RestClient.java:501)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:384)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:359)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:182)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:155)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:144)
        at app//org.opensearch.ml.tools.VisualizationsToolIT.testVisualizationFound(VisualizationsToolIT.java:74)

    java.lang.AssertionError: The response failed to meet condition after 5 attempts. Attempted to perform GET : /_plugins/_ml/models/arjaRJQB515KRnsleNWv
        at org.junit.Assert.fail(Assert.java:89)
        at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.waitResponseMeetingCondition(ToolIntegrationWithLLMTest.java:103)
        at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.checkForModelUndeployedStatus(ToolIntegrationWithLLMTest.java:89)
        at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.deleteModel(ToolIntegrationWithLLMTest.java:74)
        at

... 

2> REPRODUCE WITH: gradlew ':opensearch-ml-plugin:integTest' --tests "org.opensearch.ml.tools.VisualizationsToolIT.testVisualizationFound" -Dtests.seed=AD7A0603B7C68274 -Dtests.security.manager=false -Dtests.locale=fr-GN -Dtests.timezone=America/Argentina/Buenos_Aires -Druntime.java=21
  2> org.opensearch.client.ResponseException: method [POST], host [http://127.0.0.1:54529/], URI [/_plugins/_ml/agents/bLjaRJQB515KRnslfdUv/_execute], status line [HTTP/1.1 500 Internal Server Error]
    {"status":500,"error":{"type":"AEADBadTagException","reason":"System Error","details":"Tag mismatch"}}
        at app//org.opensearch.client.RestClient.convertResponse(RestClient.java:501)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:384)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:359)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:182)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:155)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:144)
        at app//org.opensearch.ml.tools.VisualizationsToolIT.testVisualizationFound(VisualizationsToolIT.java:74)

    java.lang.AssertionError: The response failed to meet condition after 5 attempts. Attempted to perform GET : /_plugins/_ml/models/arjaRJQB515KRnsleNWv
        at org.junit.Assert.fail(Assert.java:89)
        at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.waitResponseMeetingCondition(ToolIntegrationWithLLMTest.java:103)
        at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.checkForModelUndeployedStatus(ToolIntegrationWithLLMTest.java:89)
        at org.opensearch.ml.tools.ToolIntegrationWithLLMTest.deleteModel(ToolIntegrationWithLLMTest.java:74)
        at

How can one reproduce the bug?
This was discovered in a build failure.

What is the expected behavior?
This test should pass or timeout but not have this encryption issue.

@brianf-aws brianf-aws added bug Something isn't working untriaged labels Jan 8, 2025
@mingshl
Copy link
Collaborator

mingshl commented Jan 9, 2025

seeing VisualizationIT failing again but with different error. #3353

@brianf-aws
Copy link
Contributor Author

seeing VisualizationIT failing again but with different error. #3353

Hmm This is confusing I would think that the retry would help but like you said here it didnt help. Its clearly failing even when the retries are according to how many nodes there are. If only there was some way to dump all possible info and configuration when this happens

@brianf-aws
Copy link
Contributor Author

Hey @Hailong-am do you mind taking a look? Thanks

@krisfreedain
Copy link
Member

Catch All Triage - 1, 2, 3

@Hailong-am
Copy link
Contributor

Hailong-am commented Jan 28, 2025

Hey @Hailong-am do you mind taking a look? Thanks

do you have the link or the logs for this failure?

@brianf-aws
Copy link
Contributor Author

Hey Hailong, we are trying to get to paste the stack traces with reproduction line too. Thankfully this build failure log didn't expire. Can you take a look?

@brianf-aws
Copy link
Contributor Author

Adding the log here in txt format so it doesn't expire

6_Build and Test MLCommons Plugin on linux (21).txt

Here is another example of another build failure.
Linking the txt file here as well to make sure it does not expire.

7_Build and Test MLCommons Plugin on linux (21).txt

@pyek-bot
Copy link
Contributor

@Hailong-am did you get a chance to look at this? any update?

@Zhangxunmt Zhangxunmt moved this to Backlog in ml-commons projects Feb 11, 2025
@Hailong-am
Copy link
Contributor

@Hailong-am did you get a chance to look at this? any update?

by looking the logs attached
[testVisualizationNotFound] The 6-th attempt on GET:/_plugins/_ml/models/UNT-S5QBXi7OW4I7mZRp . response: Response{requestLine=GET /_plugins/_ml/models/UNT-S5QBXi7OW4I7mZRp HTTP/1.1, host=http://[::1]:38269, response=HTTP/1.1 200 OK}

Tag mismatch error happened at model deploy phrase which is not get model api. so i assume Tag mismatch error is not the cause of the flaky test for this time.

we may need add some logs to see what's the actual response body for get model api

Suppressed: javax.crypto.AEADBadTagException: Tag mismatch
2025-01-09T16:56:16.8490050Z »  		at java.base/com.sun.crypto.provider.GaloisCounterMode$GCMDecrypt.doFinal(GaloisCounterMode.java:1545) ~[?:?]
2025-01-09T16:56:16.8491643Z »  		at java.base/com.sun.crypto.provider.GaloisCounterMode.engineDoFinal(GaloisCounterMode.java:417) ~[?:?]
2025-01-09T16:56:16.8492770Z »  		at java.base/javax.crypto.Cipher.doFinal(Cipher.java:2244) ~[?:?]
2025-01-09T16:56:16.8494108Z »  		at com.amazonaws.encryptionsdk.internal.JceKeyCipher.decryptKey(JceKeyCipher.java:129) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8495801Z »  		at com.amazonaws.encryptionsdk.jce.JceMasterKey.decryptDataKey(JceMasterKey.java:165) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8497882Z »  		at com.amazonaws.encryptionsdk.DefaultCryptoMaterialsManager.decryptMaterials(DefaultCryptoMaterialsManager.java:118) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8500076Z »  		at com.amazonaws.encryptionsdk.internal.DecryptionHandler.readHeaderFields(DecryptionHandler.java:621) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8502066Z »  		at com.amazonaws.encryptionsdk.internal.DecryptionHandler.<init>(DecryptionHandler.java:111) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8503830Z »  		at com.amazonaws.encryptionsdk.internal.DecryptionHandler.create(DecryptionHandler.java:302) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8505549Z »  		at com.amazonaws.encryptionsdk.AwsCrypto.decryptData(AwsCrypto.java:511) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8507014Z »  		at com.amazonaws.encryptionsdk.AwsCrypto.decryptData(AwsCrypto.java:502) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8508505Z »  		at com.amazonaws.encryptionsdk.AwsCrypto.decryptData(AwsCrypto.java:476) ~[aws-encryption-sdk-java-2.4.1.jar:?]
2025-01-09T16:56:16.8510156Z »  		at org.opensearch.ml.engine.encryptor.EncryptorImpl.decrypt(EncryptorImpl.java:97) ~[opensearch-ml-algorithms-2.19.0.0-SNAPSHOT.jar:?]
2025-01-09T16:56:16.8512230Z »  		at org.opensearch.ml.engine.algorithms.remote.RemoteModel.lambda$initModel$0(RemoteModel.java:104) ~[opensearch-ml-algorithms-2.19.0.0-SNAPSHOT.jar:?]
2025-01-09T16:56:16.8514467Z »  		at org.opensearch.ml.common.connector.HttpConnector.decrypt(HttpConnector.java:366) ~[opensearch-ml-common-2.19.0.0-SNAPSHOT.jar:?]
2025-01-09T16:56:16.8516333Z »  		at org.opensearch.ml.engine.algorithms.remote.RemoteModel.initModel(RemoteModel.java:104) [opensearch-ml-algorithms-2.19.0.0-SNAPSHOT.jar:?]
2025-01-09T16:56:16.8518000Z »  		at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:139) [opensearch-ml-algorithms-2.19.0.0-SNAPSHOT.jar:?]
2025-01-09T16:56:16.8519744Z »  		```

@pyek-bot
Copy link
Contributor

Thanks for the update! Would you be willing to take that up? (adding logs)

@Hailong-am
Copy link
Contributor

Hailong-am commented Feb 13, 2025

Thanks for the update! Would you be willing to take that up? (adding logs)

sure, i will do two things. First add some logs to log response body, second continuing try in my local to see whether i can reproduce the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests

5 participants