Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

application start failed when muti javaagent is provided in cmd and opentelemetry agent is placed at the back #12534

Open
pepeshore opened this issue Oct 30, 2024 · 4 comments
Labels
bug Something isn't working needs triage New issue that requires triage

Comments

@pepeshore
Copy link

Describe the bug

When starting an application like this java -javaagent:/path/to/another/agent.jar -javaagent:/path/to/opentelemetry-javaagent.jar -jar /path/to/app.jar, In some case, there would be an Exception like this
java.lang.NoClassDefFoundError: io/opentelemetry/javaagent/instrumentation/internal/classloader/BootDelegationInstrumentation$Holder at com.taobao.csp.ahas.starter.SandboxClassLoader.loadClass(SandboxClassLoader.java:44) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) at com.taobao.csp.ahas.starter.initializer.LogbackInitializer.init(LogbackInitializer.java:44) at com.taobao.csp.ahas.starter.AgentLauncher.launch(AgentLauncher.java:79) at com.taobao.csp.ahas.starter.AgentStarter.premain(AgentStarter.java:41) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:491) at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:503) Caused by: java.lang.ClassNotFoundException: io.opentelemetry.javaagent.instrumentation.internal.classloader.BootDelegationInstrumentation$Holder at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) ... 11 more

different case has different exception stack but all them says that io/opentelemetry/javaagent/instrumentation/internal/classloader/BootDelegationInstrumentation$Holder can not be loaded

Steps to reproduce

I can't provide the code for the other agent and app.jar since it's not open source, but I have identified the cause of the bug.

When another agent is placed before the OpenTelemetry agent, it initializes first. This leads to the loading of a large number of classes, such as those from the Netty and OkHttp frameworks, some of which are exactly the classes the OpenTelemetry agent needs to enhance. By the time the OpenTelemetry agent initializes, it has to perform a retransform operation on these classes. Normally, it would only need to retransform some internal JDK classes, but in this scenario, it also needs to retransform the mentioned framework classes.

ByteBuddy handles this at a low level through the java.lang.instrument.Instrumentation#retransformClasses method, which is invoked once with all the classes that need to be retransformed. The issue is that if any single class fails to retransform, the subsequent classes will not be retransformed. In this particular scenario, I noticed an error occurred during the retransform of the io.netty.channel.DefaultChannelPipeline class, resulting in a VerifyError. Consequently, no retransform was performed on the SystemClassLoader (jdk.internal.loader.BuiltinClassLoader).

In summary, the opentelemetry agent successfully enhances com.taobao.csp.ahas.starter.SandboxClassLoader by io.opentelemetry.javaagent.instrumentation.internal.classloader.BootDelegationInstrumentation, but due to the reason mentioned above, the enhancement of jdk.internal.loader.BuiltinClassLoader is not triggered. When the loadClass method of com.taobao.csp.ahas.starter.SandboxClassLoader is called, it attempts to load the class BootDelegationInstrumentation$Holder. Since com.taobao.csp.ahas.starter.SandboxClassLoader is loaded by the SystemClassLoader (jdk.internal.loader.BuiltinClassLoader), it attempts to be loaded by this class loader. However, because this class loader was not enhanced, it cannot load the BootDelegationInstrumentation$Holder class, leading to an exception.

I personally believe that if the retransform on the SystemClassLoader (jdk.internal.loader.BuiltinClassLoader) is not effective, it is a significant issue that could lead to many unexpected behaviors. The proper functioning of almost all enhancements relies on this working correctly. At worst, if it doesn’t function properly, other enhancements should also not work, otherwise it could lead to application startup failures.

Expected behavior

no exception is thrown

Actual behavior

an ClassNotFound exception is thrown

Javaagent or library instrumentation version

2.9.0

Environment

JDK:
OS:

Additional context

No response

@pepeshore pepeshore added bug Something isn't working needs triage New issue that requires triage labels Oct 30, 2024
@pepeshore
Copy link
Author

In normal case, SandBoxClassLoader and BuiltinClassLoader are all instrumented by opentelemetry java agent
image

when SandboxClassLoader.loadClass is invoked, class BootDelegationInstrumentation$Holder in line 5 will be trigger to loaded by BuiltinClassLoader.loadClass and line 42 will return the expected value。

In bad case, Only SandBoxClassLoader is instrumented by opentelemetry java agent
image

when SandboxClassLoader.loadClass is invoked, class BootDelegationInstrumentation$Holder in line 5 will be trigger to loaded by BuiltinClassLoader.loadClass too, but this time, no class named BootDelegationInstrumentation$Holder can be found

@pepeshore
Copy link
Author

I can fix this case by two ways

  1. place openetelemetry java agent at before
  2. at the hook method io.opentelemetry.javaagent.tooling.AgentInstaller.RedefinitionLoggingListener.onError retranform all the classes one by one to avoid one failure leads to all failures

@trask
Copy link
Member

trask commented Oct 30, 2024

hi @pepeshore! I'd recommend going with (1) if that resolves your issue

you can see our general policy on supporting OpenTelemetry Java agent alongside other Java agents at #1534 (comment)

@laurit
Copy link
Contributor

laurit commented Nov 1, 2024

As far as I understand the root cause of the failure is the VerifyError. Did you investigate why that happens?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New issue that requires triage
Projects
None yet
Development

No branches or pull requests

3 participants