Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurm changes for omnia 2.0 #2479

Open
wants to merge 30 commits into
base: pub/new_architecture
Choose a base branch
from

Conversation

jagadeeshnv
Copy link

@jagadeeshnv jagadeeshnv commented Feb 14, 2025

Slurm install role

@abhishek-sa1 @Cypher-Miller

@Omnia-svc
Copy link
Collaborator

Can one of the admins verify this patch?

gather_facts: true
any_errors_fatal: true
vars:
share_mounted_path: /opt/omnia
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be user defined and need to come from storage_config.yml

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i followed the local_repo for utils/include_input_dir, but soemhow was not able to access the input_project_dir var. Will look into storage_config and fix this

- name: Include vars omnia.yml
ansible.builtin.include_vars:
file: "{{ project_input_path }}/omnia_config.yml"
- name: debug
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is required here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omnia_config has to be read, but need some clarity on the nfs share mounted on nodes and mounted on oim. This is a temporary work around, will fix it

@@ -31,6 +31,18 @@ ansible_config_file_path: "/etc/ansible"
# The password must not contain -,\, ',"
mariadb_password: "password"

# Host and port of the db.
db_host: hn2
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cypher-Miller keep these(all db_) vars blank and commented. "hn2" :)

@@ -8,6 +8,10 @@
ansible.builtin.include_tasks: validation.yml
run_once: true

- name: Create DB tasks
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cypher-Miller - how are you making sure this runs on the exact db server? Probably you need to use delegate_to:

@@ -40,6 +40,14 @@ slurm_login_packages:
slurm_dbd_packages:
RedHat:
- slurm-slurmdbd
db_packages:
RedHat:
common:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the python3-PyMySQL under the mysql and maridb keys. No need of the new "common" key.

@@ -10,7 +10,7 @@

- name: Create DB tasks
ansible.builtin.include_tasks: db.yml
run_once: true
when: inventory_hostname == db_host
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the db would not be mentioned in the inventory file - what if the db is a node which is not in the inventory provided

- name: Is db_host defined
ansible.builtin.set_fact:
db_host: "{{ groups['slurm_dbd'][0] }}"
when: db_host is not defined and '"slurm_node" in group_names'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it 'slurm_node' in group_names or 'slurm_dbd' ??

- name: Is db_host defined
ansible.builtin.set_fact:
db_host: "{{ groups['slurm_dbd'][0] }}"
when: db_host is not defined and '"slurm_node" in group_names'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slurm_node??

ansible.builtin.include_tasks: db.yml
args:
apply:
delegate_to: "{{ db_host }}"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directly delegate_to without apply not sufficient?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very familiar with the syntax, but it seems to fail with other configurations. I know for certain that include_tasks does not support delegate_to. From my understanding, args allows extra vars to be passed but apply is needed if we want to set an already existing value.

db_username: slurm

# Type of db to be used. Options are mysql or mariadb.
db_type: mysql
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let mariadb be the default

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

- name: Remove nodes Slurm
hosts: slurm_control_node, slurm_node
any_errors_fatal: true
vars:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are having separate utility for remove node, we should handle there

# The Length of the password should be at least 8.
# The password must not contain -,\, ',"
mariadb_password: "password"
db_username: ansible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have slurm as prefix for all variables

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cypher-Miller Please update all the db_<> to slurm_db_<>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -0,0 +1,104 @@
---
- name: Append share path if NFS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensure copyright in all files

- name: Create DB tasks
ansible.builtin.include_tasks: db.yml
args:
apply:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do run any tasks on external dbd host

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sorry I do not understand your comment. Can you explain in more detail?

ansible.builtin.assert:
that:
- groups['slurm_control_node'] | intersect(groups['slurm_node']) | length | int == 0
- groups['slurm_control_node'] | intersect(groups['login'] | default([])) | length | int == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validation should be part of cluster_validation role

when: restart_slurm_services and ('slurm_control_node' in group_names)

- name: Assemble all slurm_nodes
ansible.builtin.debug:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if its not mandatory debug, it can be run on verbosity 2


- name: UnSet env variable
community.general.ini_file:
path: "/etc/environment"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all path should be variable

- name: Configure chrony to sync time with NTP servers
ansible.builtin.template:
src: chrony.conf.j2
dest: /etc/chrony/chrony.conf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all path and mode should be part of variable

@jagadeeshnv jagadeeshnv marked this pull request as ready for review March 3, 2025 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants