Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

authenticate_user;Hosts do not match regularly found in server log #414

Open
dvandok opened this issue Feb 28, 2017 · 4 comments
Open

authenticate_user;Hosts do not match regularly found in server log #414

dvandok opened this issue Feb 28, 2017 · 4 comments

Comments

@dvandok
Copy link
Contributor

dvandok commented Feb 28, 2017

We're running torque 4.2.10 and we're seeing communication failures at sporadic intervals. The server log shows the following type of error:

02/28/2017 01:14:19;0004;PBS_Server.15205;Svr;authenticate_user;Hosts do not match: Requested host korf.nikhef.nl: credential host: stremsel.nikhef.nl

There is no rhyme or rhythm found in the names of the hosts; they could be hosts from which jobs are submitted, the torque server itself or any one of the worker nodes.

We know that this error is reproducible in a consistent manner when the clock on one of the nodes is wrong; somehow the message is signed (by trqauthd?) with a timestamp, causing a mismatch in the identity/credential checking, but we've since made sure all our hosts are using ntp.

I have had a sidelong glance at the code where the checks are done, but I found the caching algorithm hard to understand.

@dbeer
Copy link

dbeer commented Mar 1, 2017

Once you fixed the timeskew, does the problem persist?

@dvandok
Copy link
Contributor Author

dvandok commented Mar 1, 2017 via email

@dbeer
Copy link

dbeer commented Mar 1, 2017

I wasn't sure what you'd meant from what you said.

Is there a way that you can reproduce this? FWIW, I think it's very likely that upgrading will fix this issue, but I can't point to a specific changeset. It has been years since we've checked anything other than security fixes into the 4.2-dev tree.

@dvandok
Copy link
Contributor Author

dvandok commented Mar 2, 2017

It's not easy to reproduce as it's intermittent, however I do see the same behaviour on our test bed which is easier for me to debug without causing disruptions to the production system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants