-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slurm changes for omnia 2.0 #2479
base: pub/new_architecture
Are you sure you want to change the base?
Slurm changes for omnia 2.0 #2479
Conversation
slurm controller slurm compute login
Can one of the admins verify this patch? |
scheduler/scheduler.yml
Outdated
gather_facts: true | ||
any_errors_fatal: true | ||
vars: | ||
share_mounted_path: /opt/omnia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be user defined and need to come from storage_config.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i followed the local_repo for utils/include_input_dir, but soemhow was not able to access the input_project_dir var. Will look into storage_config and fix this
scheduler/scheduler.yml
Outdated
- name: Include vars omnia.yml | ||
ansible.builtin.include_vars: | ||
file: "{{ project_input_path }}/omnia_config.yml" | ||
- name: debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is required here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omnia_config has to be read, but need some clarity on the nfs share mounted on nodes and mounted on oim. This is a temporary work around, will fix it
input/omnia_config.yml
Outdated
@@ -31,6 +31,18 @@ ansible_config_file_path: "/etc/ansible" | |||
# The password must not contain -,\, '," | |||
mariadb_password: "password" | |||
|
|||
# Host and port of the db. | |||
db_host: hn2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Cypher-Miller keep these(all db_) vars blank and commented. "hn2" :)
scheduler/roles/slurm/tasks/main.yml
Outdated
@@ -8,6 +8,10 @@ | |||
ansible.builtin.include_tasks: validation.yml | |||
run_once: true | |||
|
|||
- name: Create DB tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Cypher-Miller - how are you making sure this runs on the exact db server? Probably you need to use delegate_to:
scheduler/roles/slurm/vars/main.yml
Outdated
@@ -40,6 +40,14 @@ slurm_login_packages: | |||
slurm_dbd_packages: | |||
RedHat: | |||
- slurm-slurmdbd | |||
db_packages: | |||
RedHat: | |||
common: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the python3-PyMySQL under the mysql and maridb keys. No need of the new "common" key.
…nto pub/new_architecture
scheduler/roles/slurm/tasks/main.yml
Outdated
@@ -10,7 +10,7 @@ | |||
|
|||
- name: Create DB tasks | |||
ansible.builtin.include_tasks: db.yml | |||
run_once: true | |||
when: inventory_hostname == db_host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the db would not be mentioned in the inventory file - what if the db is a node which is not in the inventory provided
- name: Is db_host defined | ||
ansible.builtin.set_fact: | ||
db_host: "{{ groups['slurm_dbd'][0] }}" | ||
when: db_host is not defined and '"slurm_node" in group_names' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it 'slurm_node' in group_names or 'slurm_dbd' ??
scheduler/roles/slurm/tasks/dbd.yml
Outdated
- name: Is db_host defined | ||
ansible.builtin.set_fact: | ||
db_host: "{{ groups['slurm_dbd'][0] }}" | ||
when: db_host is not defined and '"slurm_node" in group_names' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slurm_node??
Proper input reading for scheduler, add_node and remove_node
no more ignore_errors
…nto pub/new_architecture
scheduler/roles/slurm/tasks/dbd.yml
Outdated
ansible.builtin.include_tasks: db.yml | ||
args: | ||
apply: | ||
delegate_to: "{{ db_host }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
directly delegate_to without apply not sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not very familiar with the syntax, but it seems to fail with other configurations. I know for certain that include_tasks does not support delegate_to. From my understanding, args allows extra vars to be passed but apply is needed if we want to set an already existing value.
input/omnia_config.yml
Outdated
db_username: slurm | ||
|
||
# Type of db to be used. Options are mysql or mariadb. | ||
db_type: mysql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mariadb be the default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
- name: Remove nodes Slurm | ||
hosts: slurm_control_node, slurm_node | ||
any_errors_fatal: true | ||
vars: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are having separate utility for remove node, we should handle there
input/omnia_config.yml
Outdated
# The Length of the password should be at least 8. | ||
# The password must not contain -,\, '," | ||
mariadb_password: "password" | ||
db_username: ansible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have slurm as prefix for all variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Cypher-Miller Please update all the db_<> to slurm_db_<>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
@@ -0,0 +1,104 @@ | |||
--- | |||
- name: Append share path if NFS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensure copyright in all files
- name: Create DB tasks | ||
ansible.builtin.include_tasks: db.yml | ||
args: | ||
apply: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do run any tasks on external dbd host
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sorry I do not understand your comment. Can you explain in more detail?
ansible.builtin.assert: | ||
that: | ||
- groups['slurm_control_node'] | intersect(groups['slurm_node']) | length | int == 0 | ||
- groups['slurm_control_node'] | intersect(groups['login'] | default([])) | length | int == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validation should be part of cluster_validation role
when: restart_slurm_services and ('slurm_control_node' in group_names) | ||
|
||
- name: Assemble all slurm_nodes | ||
ansible.builtin.debug: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if its not mandatory debug, it can be run on verbosity 2
|
||
- name: UnSet env variable | ||
community.general.ini_file: | ||
path: "/etc/environment" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all path should be variable
- name: Configure chrony to sync time with NTP servers | ||
ansible.builtin.template: | ||
src: chrony.conf.j2 | ||
dest: /etc/chrony/chrony.conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all path and mode should be part of variable
…tricted; Added default db username/password
…nto pub/new_architecture
Slurm install role
@abhishek-sa1 @Cypher-Miller