-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zombie simulated datanode #81
Comments
Thanks for reporting this @fengnanli ! I think I asked before but I don't remember your answer, was this running within a secure environment using LinuxContainerExecutor / cgroups? I think that is what prevents such things from occurring in our environment. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
After running start-dynamometer-cluster.sh and replay the prod audit log for some time, some simulated datanodes (containers) lost connection to the RM and when the Yarn application is killed, these containers are still running, which will sending their blocks to the Namenode.
In this case, since datanode has gone through some changes with the replay where Namenode started from a fresh fsimage. Below errors will show up in the webhdfs page after the Namenode starts up.
and checking datanode tab in the webhdfs page, a list of a couple datanodes will show up.
The text was updated successfully, but these errors were encountered: