-
Notifications
You must be signed in to change notification settings - Fork 9.1k
YARN-11823: add new endpoints for getting jstacks of application and nodes #7726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
8ec513c
to
80814ff
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@@ -271,6 +273,35 @@ public ContainerInfo getNodeContainer(@javax.ws.rs.core.Context | |||
|
|||
} | |||
|
|||
@GET | |||
@Path("/jstack") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be a bit misleading name cause we already have a /stacks API for jstack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also how those it different from /stacks ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stack and JStack are totally different from each other. JStack is used on current running Java process to see what each thread are actually doing while Stack is just a list of active methods that have been called.
Here is an example of JStack:
2025-05-06 14:43:45
Full thread dump OpenJDK 64-Bit Server VM (25.232-b09 mixed mode):
"Attach Listener" #36 daemon prio=9 os_prio=0 tid=0x00007f8cf6288800 nid=0x5601e waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"shuffle-client-4-1" #35 daemon prio=5 os_prio=0 tid=0x00007f8cd86e6800 nid=0x55f04 runnable [0x00007f8ccfc0e000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000000c1402a30> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000000c1402a48> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000c14029e8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at
Here is an example Stack:
Process Thread Dump:
263 active threads
Thread 5938 (qtp2085713965-5938):
State: RUNNABLE
Blocked count: 2
Waited count: 6
Stack:
sun.management.ThreadImpl.getThreadInfo1(Native Method)
sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:178)
sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139)
org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:169)
org.apache.hadoop.http.HttpServer2$StackServlet.doGet(HttpServer2.java:1563)
javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:89)
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:178)
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
Thread 5937 (qtp2085713965-5937):
public Response getNodeJStack() { | ||
try { | ||
return Response.status(Status.OK) | ||
.entity(DiagnosticJStackService.collectNodeJStack()) // Make sure the NodeManager have python3 install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if py3 is not present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is quite ambiguous when the python3 is not installed. The exception will only be shown when I execute the script manually. If I try to access the endpoint at RM without the python3
installed in NM, It will just say 'Internal Server error 500' and user have to check the corresponding NM to see the error. I will work on this to make the error less ambiguous.
private static final String PYTHON_COMMAND = "python3"; | ||
private static String scriptLocation = null; | ||
|
||
static { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This static block will block the NM to start up, till it is not done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to my testing, it is very fast when I access the JStack endpoint. Do you happen to have a better idea of getting the script file from /resources folder?
} | ||
} | ||
|
||
public static String collectNodeJStack() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First i read NodeJS, can we use other name here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I am thinking of changing to collectNodeThreadDump()
|
||
protected static ProcessBuilder createProcessBuilder() { | ||
List<String> commandList = | ||
new ArrayList<>(Arrays.asList(PYTHON_COMMAND, scriptLocation)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need ArrayList?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the ProcessBuilder
method definition accept 'command' as a list :)
public ProcessBuilder(List<String> command) {..}
|
||
NUMBER_OF_JSTACK = 3 | ||
|
||
def get_nodemanager_pid(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I beleive from security perspective, these should not be available in REST API in case of not secure cluster, and we should do authorisation in secured clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmmm....why is that? The script will only get java processes of the active container and execute JStack command on it, not that user could modify the script or do some malicious activities?
import subprocess | ||
import sys | ||
|
||
NUMBER_OF_JSTACK = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be path throw REST
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, it will be nice to make that number configurable from the the RESTAPI.
I will work on that. Thanks!
@@ -271,6 +273,35 @@ public ContainerInfo getNodeContainer(@javax.ws.rs.core.Context | |||
|
|||
} | |||
|
|||
@GET | |||
@Path("/jstack") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also how those it different from /stacks ?
💔 -1 overall
This message was automatically generated. |
Description of PR
How was this patch tested?
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?