authenticate_user;Hosts do not match regularly found in server log #414

dvandok · 2017-02-28T10:18:56Z

We're running torque 4.2.10 and we're seeing communication failures at sporadic intervals. The server log shows the following type of error:

02/28/2017 01:14:19;0004;PBS_Server.15205;Svr;authenticate_user;Hosts do not match: Requested host korf.nikhef.nl: credential host: stremsel.nikhef.nl

There is no rhyme or rhythm found in the names of the hosts; they could be hosts from which jobs are submitted, the torque server itself or any one of the worker nodes.

We know that this error is reproducible in a consistent manner when the clock on one of the nodes is wrong; somehow the message is signed (by trqauthd?) with a timestamp, causing a mismatch in the identity/credential checking, but we've since made sure all our hosts are using ntp.

I have had a sidelong glance at the code where the checks are done, but I found the caching algorithm hard to understand.

The text was updated successfully, but these errors were encountered:

dbeer · 2017-03-01T16:49:36Z

Once you fixed the timeskew, does the problem persist?

dvandok · 2017-03-01T19:47:04Z

On 01-03-17 17:49, David Beer wrote: Once you fixed the timeskew, does the problem persist?

Yes, that's the point; the timeskey *always* shows the problem, but without the timeskey it happens *intermittently*.

dbeer · 2017-03-01T20:56:17Z

I wasn't sure what you'd meant from what you said.

Is there a way that you can reproduce this? FWIW, I think it's very likely that upgrading will fix this issue, but I can't point to a specific changeset. It has been years since we've checked anything other than security fixes into the 4.2-dev tree.

dvandok · 2017-03-02T13:00:23Z

It's not easy to reproduce as it's intermittent, however I do see the same behaviour on our test bed which is easier for me to debug without causing disruptions to the production system.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

authenticate_user;Hosts do not match regularly found in server log #414

authenticate_user;Hosts do not match regularly found in server log #414

dvandok commented Feb 28, 2017

dbeer commented Mar 1, 2017

dvandok commented Mar 1, 2017 via email

dbeer commented Mar 1, 2017

dvandok commented Mar 2, 2017

authenticate_user;Hosts do not match regularly found in server log #414

authenticate_user;Hosts do not match regularly found in server log #414

Comments

dvandok commented Feb 28, 2017

dbeer commented Mar 1, 2017

dvandok commented Mar 1, 2017 via email

dbeer commented Mar 1, 2017

dvandok commented Mar 2, 2017