-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Add documentation on /etc/machine-id volume mapping to support Folding At Home client v8.x #23
Comments
Thanks for opening your first issue here! Be sure to follow the relevant issue templates, or risk having this issue marked as invalid. |
What you are calling “web client” is and always has been called “web control”. The name of the repository is a historical issue and is more of a tag soup. There was an experimental “web client” that ran in Chrome using google native client code. |
It is true that fah-client acts as both client and server. It is also true that the web app connects to a fah-client and a couple servers and is therefore a client of sorts. However, the user guide and other docs call it web control, and I hope to reduce confusion with users by restricting use of “client” to mean fah-client. |
@kbernhagen understood, I am still fairly new to FaH, so I appreciate the clarification. |
Note that the unit dumping from changed machine-id is not a problem if one always sets finish on work. I realize there is no easy way to do this yet. Ironically, the behavior was requested by someone using cloned containers. I believe. |
Mapping it wouldn't be the correct way to handle it. If machine-id is required, we can generate one and store it in /config for persistency. |
That sounds much better, thanks! |
I can't reproduce the issue as described. When I recreate the container with the same env vars and config folder, it is detected as the same machine by the web app.
My containers contain a blank
Is this issue only affecting resuming partial jobs? |
I can't reproduce it with resuming either. I recreate the container and it picks up from where it left off.
|
TL;DR - it's a "me issue", root cause is some form of missing dependency (on my side?) for Core 23 WUs. I apologize for the false alarm.
My FaH containers have been generating new machine-ids, but your comment there helped me dig more into why, thank you @aptalca. When I was running the old containers prior to the major v8 changes, I was getting stuck in a I remember doing a lot of digging into the issue at the time, tons of forum threads mentioning the issue, no solutions worked for me though. I somehow found that installing Removing that init script today fixes the |
Nvidia just needs the container toolkit and the Nvidia drivers on host. All the opencl bits should be injected via nvidia docker runtime. |
I have the Inside the containers I can get the following output:
Is it possible the I notice this was changed from I am not familiar with OpenCL/GPU libs, but it seems like I have seen
|
Dev package has the headers that are necessary to build other packages dependent on this. It shouldn't be needed runtime. When I tested nvidia with this image a while back, all that was needed inside the image was this pointer file: |
I thought the same, that Could the core be doing some build/compilation that Core 22 wasn't doing? I tried running the debug FAHClient but it didn't provide anything useful. Everything I see points me at that package for some reason though. Still doesn't explain why something like systemd would've also solved my problem, maybe some kind of shared dependency? Any idea what Nvidia driver / Cuda version combo you were using in your testing? I can try to give those a shot. |
I think I've further narrowed it down to the When installing just the |
@tylerbrockett built from PR #24 |
@aptalca - I tried on that branch as well as the specific image version mentioned, and both are working as expected now for Core 23 WUs. Thank you so much! (PPD is gradually increasing and should stabilize around 4.3m or so) |
Is this a new feature request?
Wanted change
There should be a blurb discussing the need to map
/etc/machine-id
from the docker host into the docker container for the Folding At Home client v8.x to work properly (or copy it in the Dockerfile, but that is more inflexible if we wanted to mock a different machine ID for whatever reason)Reason for change
Folding at Home client V8.x requires creating an account and associating your machine to the account to be able to view folding progress. This is due to the way the FaH team changed the web
clientcontrol to be a public server that the folders/contributors/machines report to instead of each machine running its own webclientcontrol.The machine ID is indirectly linked to each work unit in such a way that if the machine ID changes, it breaks all associations. This is a problem because each new docker container gets a randomly generated
/etc/machine-id
. Each time the container is recreated, the webclientcontrol would no longer see my machine (it would show "Disconnected"). See the following log entries:I traced this machine ID coming from this line of code in FaH client which calls this line of code in the
cbang/os/SystemInfo
library to get the Machine ID from/etc/machine-id
on linux systems.By mapping the
/etc/machine-id
into the container, I am able to destroy and recreate the container as needed, and it continues to work with the webclientcontrol as expected.Proposed code change
No response
The text was updated successfully, but these errors were encountered: