Skip to content

Commit 8c435ef

Browse files
committed
Release 0.1.7
2 parents e3be834 + 88fe045 commit 8c435ef

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

74 files changed

+2058
-1208
lines changed

.github/workflows/integration.yml

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,14 @@ jobs:
2626
runs-on: ubuntu-22.04
2727
steps:
2828
- uses: actions/checkout@v3
29-
- name: Dependency Install
30-
run: ./ci/deps_install.sh
31-
- name: Batch Scheduler Install and Start
32-
run: ./ci/batch_scheduler.sh
33-
- name: BEE Install
34-
run: ./ci/bee_install.sh
35-
- name: BEE Config
36-
run: ./ci/bee_config.sh
29+
- name: Install and Configure
30+
run: |
31+
. ./ci/env.sh
32+
./ci/deps_install.sh
33+
./ci/batch_scheduler.sh
34+
./ci/bee_install.sh
35+
./ci/bee_config.sh
3736
- name: Integration Test
38-
run: ./ci/integration_test.sh
37+
run: |
38+
. ./ci/env.sh
39+
./ci/integration_test.sh

.github/workflows/unit-tests.yml

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,14 @@ jobs:
2424
runs-on: ubuntu-22.04
2525
steps:
2626
- uses: actions/checkout@v3
27-
- name: Dependency Install
28-
run: ./ci/deps_install.sh
29-
- name: Slurm Setup and Install
30-
run: ./ci/slurm_start.sh
31-
- name: BEE Install
32-
run: ./ci/bee_install.sh
33-
- name: BEE Config
34-
run: ./ci/bee_config.sh
27+
- name: Install and Configure
28+
run: |
29+
. ./ci/env.sh
30+
./ci/deps_install.sh
31+
./ci/batch_scheduler.sh
32+
./ci/bee_install.sh
33+
./ci/bee_config.sh
3534
- name: Unit tests
36-
run: ./ci/unit_tests.sh
35+
run: |
36+
. ./ci/env.sh
37+
./ci/unit_tests.sh

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ poetry.lock
55
*.pyc
66
*.egg-info
77
*.out
8+
*.tgz
9+
*.tar.gz
10+
*.log
811
.python-version
912
.DS_Store
1013
.idea

HISTORY.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
0.1.0
2+
Initial Release of hpc-beeflow published on PYPI
3+
4+
0.1.3
5+
BEE now accepts stdout and stderr CWL specifications to direct those outputs for each task.
6+
7+
0.1.4
8+
What's Changed
9+
- Scheduler options added for time-limit, account and partitions as CWL extensions
10+
- Fixes for MPI
11+
- Jinja file no longer required
12+
- Merge submit and start commands
13+
- Improved usability of 'beecfg new'
14+
- Combined gdbs
15+
- Add restart code to beeflow
16+
- Checkpoint restart fix
17+
- Allow Absolute/Relative Paths for Main CWL and YAML Files
18+
- Minimum version of Charliecloud required is now 0.32
19+
20+
0.1.5
21+
- Combined beeflow, beeclient and beecfg commands. All commands now are invoked via beeflow.
22+
- Fixed an obscure dependency issue between tasks
23+
- Simplified config file, deleted duplications of bee_workdir
24+
- CWL Parser was moved to the client
25+
- CwlParser is now instantiated in bee_client.py
26+
- CwlParser no longer invokes Workflow Interface, now returns Workflow and Task objects
27+
- Allows verification of CWL specification without running the workflow
28+
- Added support for Flux scheduler
29+
30+
0.1.6
31+
Clean up of processes, logs, and directory space
32+
- Eliminates extraneous Neo4j instances from cancelled/failed tasks
33+
- Cleans up log entries for query
34+
- Improves start time for celery
35+
- Makes start time configurable
36+
- Decreases the number of celery processes
37+
- Fixes capability to specify a main cwl file and/or yaml file not in the CWL directory
38+
- Parses CWL after packaging the directory
39+
- Moves temporary files for unit tests out of $HOME
40+
41+
0.1.7
42+
43+
Major features: adds the capability to include post- and pre-processing scripts to tasks, fixes the Checkpoint/Restart capability, increases logging, and adds some features to the client.
44+
- Initial task manager resiliency and error handling (#789)
45+
- Add pre/post script support (#788)
46+
- Fix LOCALE error for systems where redis container failed to start
47+
- Add logging to workflow interface (#764)
48+
- Enable logging in neo4j_cypher.py, neo4j_driver.py, and gdb_driver.py
49+
- Add ``beeflow remove`` command to client
50+
- Enables removal of archived or cancelled workflows and associated artifacts
51+
- Update minimum Charliecloud version to 0.36
52+
- CI refactor to allow running jobs on runners other than github
53+
- Add sumbit command options to workflow artifact for archive purposes
54+
- Increase maximum version of python to 3.12
55+
- Fix Checkpoint/Restart capability
56+
- Add testing for Checkpoint/Restart
57+
- Adds capability to reset the beeflow files (deletes all artifacts) especially useful for developers.

README.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ Contributors:
4242
* Paul Bryant - `paulbry <https://github.com/paulbry>`_
4343
* Rusty Davis - `rstyd <https://github.com/rstyd>`_
4444
* Jieyang Chen - `JieyangChen7 <https://github.com/JieyangChen7>`_
45+
* Krishna Chilleri - `Krishna Chilleri <https://github.com/kchilleri>`_
4546
* Patricia Grubel - `pagrubel <https://github.com/pagrubel>`_
4647
* Qiang Guan - `guanxyz <https://github.com/guanxyz>`_
4748
* Ragini Gupta - `raginigupta6 <https://github.com/raginigupta6>`_
@@ -85,9 +86,8 @@ License can be found `here <https://github.com/lanl/BEE/blob/master/LICENSE>`_
8586
Publications
8687
==========================
8788

89+
- An HPC-Container Based Continuous Integration Tool for Detecting Scaling and Performance Issues in HPC Applications, IEEE Transactions on Services Computing, 2024, `DOI: 10.1109/TSC.2023.3337662 <https://doi.ieeecomputersociety.org/10.1109/TSC.2023.3337662>`_
8890
- BEE Orchestrator: Running Complex Scientific Workflows on Multiple Systems, HiPC, 2021, `DOI: 10.1109/HiPC53243.2021.00052 <https://doi.org/10.1109/HiPC53243.2021.00052>`_
8991
- "BeeSwarm: Enabling Parallel Scaling Performance Measurement in Continuous Integration for HPC Applications", ASE, 2021, `DOI: 10.1109/ASE51524.2021.9678805 <https://www.computer.org/csdl/proceedings-article/ase/2021/033700b136/1AjTjgnW2pa#:~:text=10.1109/ASE51524.2021.9678805>`_
9092
- "BeeFlow: A Workflow Management System for In Situ Processing across HPC and Cloud Systems", ICDCS, 2018, `DOI: 10.1109/ICDCS.2018.00103 <https://ieeexplore.ieee.org/abstract/document/8416366>`_
9193
- "Build and execution environment (BEE): an encapsulated environment enabling HPC applications running everywhere", IEEE BigData, 2018, `DOI: 10.1109/BigData.2018.8622572 <https://ieeexplore.ieee.org/document/8622572>`_
92-
93-

RELEASE.rst

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,16 @@
11
Publishing a new release
22
************************
33

4-
1. Change the version in pyproject.toml and verify docs build;
5-
and get this change merged into develop. (You may want to set the bypass as in step 2)
4+
Verify all current changes in develop run correctly on nightly tests.
5+
6+
1. Start a branch named Release-0.x.x Change the version in pyproject.toml and verify docs build, add to HISTORY.md for this release,
7+
and get this change merged into develop. (You may want to set the bypass as in step 2 on develop).
8+
69
2. On github site go to Settings; on the left under Code and Automation
710
click on Branches; under Branch protection rules edit main;
811
check Allow specified actors to bypass required pull requests; add yourself
912
and don't forget to save the setting
10-
3 Make sure documentation will be published upon push to main.
13+
3. Make sure documentation will be published upon push to main.
1114
See: .github/workflows/docs.yml
1215
4. Checkout develop and pull for latest version then
1316
checkout main and merge develop into main. Verify documentation was published.

beeflow/client/bee_client.py

Lines changed: 97 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
from beeflow.common.parser import CwlParser
2727
from beeflow.common.wf_data import generate_workflow_id
2828
from beeflow.client import core
29+
from beeflow.wf_manager.resources import wf_utils
2930

3031
# Length of a shortened workflow ID
3132
short_id_len = 6 #noqa: Not a constant
@@ -105,15 +106,25 @@ def _resource(tag=""):
105106
return _url() + str(tag)
106107

107108

109+
def get_wf_list():
110+
"""Get the list of all workflows."""
111+
try:
112+
conn = _wfm_conn()
113+
resp = conn.get(_url(), timeout=60)
114+
except requests.exceptions.ConnectionError:
115+
error_exit('Could not reach WF Manager.')
116+
117+
if resp.status_code != requests.codes.okay: # pylint: disable=no-member
118+
error_exit('WF Manager did not return workflow list')
119+
120+
logging.info('List Jobs: {resp.text}')
121+
return jsonpickle.decode(resp.json()['workflow_list'])
122+
123+
108124
def check_short_id_collision():
109125
"""Check short workflow IDs for colliions; increase short ID length if detected."""
110126
global short_id_len #noqa: Not a constant
111-
conn = _wfm_conn()
112-
resp = conn.get(_url(), timeout=60)
113-
if resp.status_code != requests.codes.okay: # pylint: disable=no-member
114-
error_exit(f"Checking for ID collision failed: {resp.status_code}")
115-
116-
workflow_list = jsonpickle.decode(resp.json()['workflow_list'])
127+
workflow_list = get_wf_list()
117128
if workflow_list:
118129
while short_id_len < MAX_ID_LEN:
119130
id_list = [_short_id(job[1]) for job in workflow_list]
@@ -132,18 +143,7 @@ def check_short_id_collision():
132143
def match_short_id(wf_id):
133144
"""Match user-provided short workflow ID to full workflow IDs."""
134145
matched_ids = []
135-
136-
try:
137-
conn = _wfm_conn()
138-
resp = conn.get(_url(), timeout=60)
139-
except requests.exceptions.ConnectionError:
140-
error_exit('Could not reach WF Manager.')
141-
142-
if resp.status_code != requests.codes.okay: # pylint: disable=no-member
143-
error_exit(f'Could not match ID: {wf_id}. Code {resp.status_code}')
144-
# raise ApiError("GET /jobs".format(resp.status_code))
145-
146-
workflow_list = jsonpickle.decode(resp.json()['workflow_list'])
146+
workflow_list = get_wf_list()
147147
if workflow_list:
148148
for job in workflow_list:
149149
if job[1].startswith(wf_id):
@@ -162,11 +162,25 @@ def match_short_id(wf_id):
162162
long_wf_id = matched_ids[0]
163163
return long_wf_id
164164
else:
165-
print("There are currently no workflows.")
165+
sys.exit("There are currently no workflows.")
166166

167167
return None
168168

169169

170+
def get_wf_status(wf_id):
171+
"""Get workflow status."""
172+
try:
173+
conn = _wfm_conn()
174+
resp = conn.get(_resource(wf_id), timeout=60)
175+
except requests.exceptions.ConnectionError:
176+
error_exit('Could not reach WF Manager.')
177+
178+
if resp.status_code != requests.codes.okay: # pylint: disable=no-member
179+
error_exit('Could not successfully query workflow manager')
180+
181+
return resp.json()['wf_status']
182+
183+
170184
app = typer.Typer(no_args_is_help=True, add_completion=False, cls=NaturalOrderGroup)
171185
app.add_typer(core.app, name='core')
172186
app.add_typer(config_driver.app, name='config')
@@ -281,6 +295,19 @@ def is_parent(parent, path):
281295
if tarball_path:
282296
os.remove(tarball_path)
283297

298+
# Store provided arguments in text file for future reference
299+
wf_dir = wf_utils.get_workflow_dir(wf_id)
300+
sub_wf_dir = wf_dir + "/submit_command_args.txt"
301+
302+
f_name = open(sub_wf_dir, "w", encoding="utf-8")
303+
f_name.write(f"wf_name: {wf_name}\n")
304+
f_name.write(f"wf_path: {wf_path}\n")
305+
f_name.write(f"main_cwl: {main_cwl}\n")
306+
f_name.write(f"yaml: {yaml}\n")
307+
f_name.write(f"workdir: {workdir}\n")
308+
f_name.write(f"wf_id: {wf_id}")
309+
f_name.close()
310+
284311
return wf_id
285312

286313

@@ -340,6 +367,36 @@ def package(wf_path: pathlib.Path = typer.Argument(...,
340367
return package_path
341368

342369

370+
@app.command()
371+
def remove(wf_id: str = typer.Argument(..., callback=match_short_id)):
372+
"""Remove cancelled, paused, or archived workflow with a workflow ID."""
373+
long_wf_id = wf_id
374+
375+
wf_status = get_wf_status(wf_id)
376+
print(f"Workflow Status is {wf_status}")
377+
if wf_status in ('Cancelled', 'Archived', 'Paused'):
378+
verify = f"All stored information for workflow {_short_id(wf_id)} will be removed."
379+
verify += "\nContinue to remove? yes(y)/no(n): """
380+
response = input(verify)
381+
if response in ("n", "no"):
382+
sys.exit("Workflow not removed.")
383+
elif response in ("y", "yes"):
384+
try:
385+
conn = _wfm_conn()
386+
resp = conn.delete(_resource(long_wf_id), json={'option': 'remove'}, timeout=60)
387+
except requests.exceptions.ConnectionError:
388+
error_exit('Could not reach WF Manager.')
389+
if resp.status_code != requests.codes.accepted: # pylint: disable=no-member
390+
error_exit('WF Manager could not remove workflow.')
391+
typer.secho("Workflow removed!", fg=typer.colors.GREEN)
392+
logging.info(f'Remove workflow: {resp.text}')
393+
else:
394+
print(f"{_short_id(wf_id)} may still be running.")
395+
print("The workflow must be cancelled before attempting removal.")
396+
397+
sys.exit()
398+
399+
343400
def unpackage(package_path, dest_path):
344401
"""Unpackage a workflow tarball for parsing."""
345402
package_str = str(package_path)
@@ -363,17 +420,7 @@ def unpackage(package_path, dest_path):
363420
@app.command('list')
364421
def list_workflows():
365422
"""List all workflows."""
366-
try:
367-
conn = _wfm_conn()
368-
resp = conn.get(_url(), timeout=60)
369-
except requests.exceptions.ConnectionError:
370-
error_exit('Could not reach WF Manager.')
371-
372-
if resp.status_code != requests.codes.okay: # pylint: disable=no-member
373-
error_exit('WF Manager did not return workflow list')
374-
375-
logging.info('List Jobs: {resp.text}')
376-
workflow_list = jsonpickle.decode(resp.json()['workflow_list'])
423+
workflow_list = get_wf_list()
377424
if workflow_list:
378425
typer.secho("Name\tID\tStatus", fg=typer.colors.GREEN)
379426

@@ -401,11 +448,9 @@ def query(wf_id: str = typer.Argument(..., callback=match_short_id)):
401448

402449
tasks_status = resp.json()['tasks_status']
403450
wf_status = resp.json()['wf_status']
404-
if tasks_status == 'Unavailable':
405-
typer.echo(wf_status)
406-
else:
407-
typer.echo(wf_status)
408-
typer.echo(tasks_status)
451+
typer.echo(wf_status)
452+
for _task_id, task_name, task_state in tasks_status:
453+
typer.echo(f'{task_name}--{task_state}')
409454

410455
logging.info('Query workflow: {resp.text}')
411456
return wf_status, tasks_status
@@ -463,17 +508,24 @@ def resume(wf_id: str = typer.Argument(..., callback=match_short_id)):
463508

464509
@app.command()
465510
def cancel(wf_id: str = typer.Argument(..., callback=match_short_id)):
466-
"""Cancel a workflow."""
511+
"""Cancel a paused or running workflow."""
467512
long_wf_id = wf_id
468-
try:
469-
conn = _wfm_conn()
470-
resp = conn.delete(_resource(long_wf_id), timeout=60)
471-
except requests.exceptions.ConnectionError:
472-
error_exit('Could not reach WF Manager.')
473-
if resp.status_code != requests.codes.accepted: # pylint: disable=no-member
474-
error_exit('WF Manager could not cancel workflow.')
475-
typer.secho("Workflow cancelled!", fg=typer.colors.GREEN)
476-
logging.info(f'Cancel workflow: {resp.text}')
513+
wf_status = get_wf_status(wf_id)
514+
if wf_status in ('Running', 'Paused'):
515+
try:
516+
conn = _wfm_conn()
517+
resp = conn.delete(_resource(long_wf_id), json={'option': 'cancel'}, timeout=60)
518+
519+
except requests.exceptions.ConnectionError:
520+
error_exit('Could not reach WF Manager.')
521+
if resp.status_code != requests.codes.accepted: # pylint: disable=no-member
522+
error_exit('WF Manager could not cancel workflow.')
523+
typer.secho("Workflow cancelled!", fg=typer.colors.GREEN)
524+
logging.info(f'Cancel workflow: {resp.text}')
525+
elif wf_status == "Intializing":
526+
print(f"Workflow is {wf_status}, try cancel later.")
527+
else:
528+
print(f"Workflow is {wf_status} cannot cancel.")
477529

478530

479531
@app.command()

0 commit comments

Comments
 (0)