Validation of Ceph cluster fails due to Unexpected playbook failure. Check ansible-runner-service directory #85

aasraoui · 2021-01-08T17:53:54Z

Ceph Installer - Cockpit-ceph-installer.pdf

pcuzner · 2021-01-11T04:21:39Z

could you drop a screenshot into the issue instead of a pdf please (pdf's don't render, and could be mangled to do nasty stuff)

Until then some basic checks

I think current ceph-ansible has a validate role which requires ansible 2.9 ... is that in place?
are you using master of the installer?
playbooks write to /usr/share/ansible-runner-service/artifacts - so if you have an error, you can pick the playbook directory out of the error message and look at the stdout in the folder (this will be the regular ansible output from the playbook run - unless things have gone really wrong!)

aasraoui · 2021-01-11T16:26:26Z

Below is a capture of the stdout log:

Identity added: /usr/share/ansible-runner-service/artifacts/6e68835e-51b8-11eb-8c06-080027191e45/ssh_key_data (/usr/share/ansible-runner-service/artifacts/6e68835e-51b8-11eb-8c06-080027191e45/ssh_key_data)^M
[WARNING]: log file at /root/ansible/ansible.log is not writeable and we cannot create it, aborting^M
^M
^M
PLAY [Validate hosts against desired cluster state] ****************************
^M
TASK [CEPH_CHECK_ROLE] *********************************************************
Friday 08 January 2021 13:50:18 +0000 (0:00:00.274) 0:00:00.274 ********
ok: [Metrics]
ok: [Rgw]
ok: [Mds]
ok: [Osd]
[WARNING]: Unhandled error in Python interpreter discovery for host Mon:^M
Failed to connect to the host via ssh: Permission denied (publickey,gssapi-^M
keyex,gssapi-with-mic,password).
fatal: [Mon]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host "Mon". Make sure this host can be reached over ssh: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n", "unreachable": true}
^M
PLAY RECAP *********************************************************************^M
@

Mds : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ^M
Metrics : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ^M
Mon : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0 ^M
Osd : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ^M
Rgw : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ^M

Friday 08 January 2021 13:56:20 +0000 (0:06:01.955) 0:06:02.230 ********

CEPH_CHECK_ROLE ------------------------------------------------------- 361.95s

aasraoui · 2021-01-11T16:30:18Z

can sssh to Mon node, not sure why it is not reacheable!!!
[root@Cockpit-ceph-installer ceph-ansible]# ssh Mon
Last login: Sun Jan 10 20:27:27 2021 from 10.0.0.113
[root@Mon ~]#

pcuzner · 2021-01-11T19:46:13Z

what's strange is that you added the host first. The act of adding a host confirms that the ssh key that the installer uses is in the authorized_keys file on the target. So at some point, 'mon' was accessible using the installers public key. However, right now it doesn't appear so. Checking with root login to mon is misleading, since the installer uses it's own key - unless you provided yuor keys to the installer.

Next steps.
compare your authorized_keys file on mon to one of the osd or rgw host
try connecting manually using the priv key in /usr/share/ansible-runner-service/env/ssh_key (i.e. use -i /usr/share/ansible-runner-service/env/ssh_key

aasraoui · 2021-01-12T02:28:17Z

the authorized_keys in Osd is different from the Mon node, manual connection to Mon with priv key works:
[root@Cockpit-ceph-installer .ssh]# ssh root@Mon -i /usr/share/ansible-runner-service/env/ssh_key
root@mon's password:
Last login: Mon Jan 11 04:35:12 2021 from 10.0.0.113
[root@Mon ~]#

aasraoui · 2021-01-12T19:46:38Z

I have updated mon node with same authorized key as the other nodes, now it is failing for not having any osds on the cluster !!!

pcuzner · 2021-01-12T22:48:54Z

And the problem is?

The installer expects you to have nodes with disks for OSDs, so the osd role can be applied to it. Looking at your screenshot, you've ticked the osd role too. So frmo my perspective this is working as expected.

For a storage cluster you need storage..?

Also just for awareness when you see errors and warnings if you click on the triangle icon, the row will be expanded to show you the error text.

If you're just kicking the tyres - you could just use 2 machines - one for ceph and the other for monitoring - just make sure you have free disks on the node you want to deploy ceph too, and use the container mode deployment (not rpm).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation of Ceph cluster fails due to Unexpected playbook failure. Check ansible-runner-service directory #85

Validation of Ceph cluster fails due to Unexpected playbook failure. Check ansible-runner-service directory #85

aasraoui commented Jan 8, 2021

pcuzner commented Jan 11, 2021

aasraoui commented Jan 11, 2021

aasraoui commented Jan 11, 2021 •

edited

Loading

pcuzner commented Jan 11, 2021

aasraoui commented Jan 12, 2021

aasraoui commented Jan 12, 2021 •

edited

Loading

pcuzner commented Jan 12, 2021

Validation of Ceph cluster fails due to Unexpected playbook failure. Check ansible-runner-service directory #85

Validation of Ceph cluster fails due to Unexpected playbook failure. Check ansible-runner-service directory #85

Comments

aasraoui commented Jan 8, 2021

pcuzner commented Jan 11, 2021

aasraoui commented Jan 11, 2021

Friday 08 January 2021 13:56:20 +0000 (0:06:01.955) 0:06:02.230 ********

aasraoui commented Jan 11, 2021 • edited Loading

pcuzner commented Jan 11, 2021

aasraoui commented Jan 12, 2021

aasraoui commented Jan 12, 2021 • edited Loading

pcuzner commented Jan 12, 2021

aasraoui commented Jan 11, 2021 •

edited

Loading

aasraoui commented Jan 12, 2021 •

edited

Loading