Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurm changes for omnia 2.0 #2479

Open
wants to merge 31 commits into
base: pub/new_architecture
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
896a6c2
Slurm in a single role
jagadeeshnv Feb 14, 2025
f01658e
Lint fixes
jagadeeshnv Feb 14, 2025
977f880
Clean up flag reset
jagadeeshnv Feb 14, 2025
80ff376
Remove unused
jagadeeshnv Feb 14, 2025
f9244a8
pytorch python version fix
jagadeeshnv Feb 18, 2025
4680b9e
[OMN01B-171]: Install slurmdbd with existing database
Cypher-Miller Feb 27, 2025
d8f04d9
[OMN01B-171]: Uncommented K8 code
Cypher-Miller Feb 27, 2025
05e8b27
[OMN01B-208]: Add slurm ansible code to create new DB if
Cypher-Miller Feb 27, 2025
3ba242b
[OMN01B-208]: Conditional added to only add db when db_host is provided
Cypher-Miller Feb 27, 2025
6136805
Add node complete
jagadeeshnv Feb 27, 2025
b414b25
Dbd code uncommented
jagadeeshnv Feb 27, 2025
b673280
Remove Node done
jagadeeshnv Feb 28, 2025
2e2a860
Add node fixed scontrol error
jagadeeshnv Feb 28, 2025
55e1646
Moved db.yml task entry point, added default behavior for db_port and…
Cypher-Miller Feb 28, 2025
66fad29
Merge branch 'pub/new_architecture' of github.com:jagadeeshnv/omnia i…
Cypher-Miller Feb 28, 2025
4ec9c79
Updated some slurm var descriptions
Cypher-Miller Feb 28, 2025
a2a0be7
Cleanup of _config_files.yml
jagadeeshnv Mar 2, 2025
6fd0a03
Cleanup of cleanll
jagadeeshnv Mar 2, 2025
9a28e22
Share dir creation synchronized
jagadeeshnv Mar 2, 2025
a2e6c8c
Debug statements cleaned
jagadeeshnv Mar 2, 2025
9e00ce9
Fixed Add db user tasks to successfully connect to mariadb db
Cypher-Miller Mar 3, 2025
c16d6d2
Merge branch 'pub/new_architecture' of github.com:jagadeeshnv/omnia i…
Cypher-Miller Mar 3, 2025
fe9355f
Added create new db user; Moved slurmdbd.conf creation code
Cypher-Miller Mar 3, 2025
9e70e82
Fixed typo causing error when creating slurmdbd.conf
Cypher-Miller Mar 3, 2025
9fb669a
benchmark tools openmpi command simplified
jagadeeshnv Mar 3, 2025
7d433fe
Changed db_ to slurm_db_; Made slurmdbd db user's privileges more ris…
Cypher-Miller Mar 3, 2025
358ecc2
Merge branch 'pub/new_architecture' of github.com:jagadeeshnv/omnia i…
Cypher-Miller Mar 3, 2025
bf73c31
Added support for db ports other than 3306
Cypher-Miller Mar 3, 2025
a30ec2c
Fixed issue where slurmctld service would sometimes not restart when …
Cypher-Miller Mar 3, 2025
e387f57
Reverted restart logic for slurmdbd and slurmctld
Cypher-Miller Mar 4, 2025
5376453
additional check for specific slurmd restarts
jagadeeshnv Mar 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion scheduler/roles/slurm/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

- name: Create DB tasks
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cypher-Miller - how are you making sure this runs on the exact db server? Probably you need to use delegate_to:

ansible.builtin.include_tasks: db.yml
run_once: true
when: inventory_hostname == db_host
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the db would not be mentioned in the inventory file - what if the db is a node which is not in the inventory provided


- name: Include common tasks
ansible.builtin.include_tasks: common.yml
Expand Down