Skip to content

Commit 6d9b86a

Browse files
authored
Merge pull request #28 from ksuderman/release-1.1.0
Release 1.1.0 - New YAML runtime configuration. - Run multiple workflows from a single file - Separate reference and input datasets
2 parents c6560a0 + 5c582f9 commit 6d9b86a

4 files changed

Lines changed: 146 additions & 55 deletions

File tree

README.md

Lines changed: 36 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -67,35 +67,47 @@ When a workflow is run with the `run` command the invocation details will be sav
6767

6868
## Runtime Configuration
6969

70-
The runtime parameters for a benchmarking run are specified in a YAML file. This file can be stored anywhere, but several examples are included in the `config` directory. The configuration YAML must include:
70+
The runtime parameters for benchmarking runs are specified in a YAML configuration file. The configuration file can contain more than one runtime configuration specified as a YAML list. This file can be stored anywhere, but several examples are included in the `config` directory.
7171

72-
- **workflow_id**
73-
The ID of the workflow to run.
74-
- **inputs**
75-
A list of dictionaries that specify:
76-
1. **name** the name of the input as specifed in the the workflow editor.
77-
2. **dataset_id**: the ID of the dataset to be used as input. This dataset can be located in any publicy accessible history.
78-
- **output_history_name**
79-
A new history with this name will be created and all processed datasets will be stored into this history.
80-
81-
#### Example
72+
The YAML configuration for a single run looks like:
8273

8374
```
84-
workflow_id: b94314cb9cb46380
85-
inputs:
86-
- name: FASTQ Dataset
87-
dataset_id: e49d4a2f705b9571
88-
output_history_name: Example Paired DNA Test
75+
- workflow_id: d6d3c2119c4849e4
76+
output_history_base_name: RNA-seq
77+
reference_data:
78+
- name: Reference Transcript (FASTA)
79+
dataset_id: 50a269b7a99356aa
80+
runs:
81+
- history_name: 1
82+
inputs:
83+
- name: FASTQ RNA Dataset
84+
dataset_id: 28fa757e56346a34
85+
- history_name: 2
86+
inputs:
87+
- name: FASTQ RNA Dataset
88+
dataset_id: 1faa2d3b2ed5c436
8989
```
9090

91-
92-
93-
## Obtaining Results
94-
95-
TBD.
96-
97-
Scrape the results of a workflow invocation and output in a format suitable for importing into a spreadsheet or database. See issue [#3](../../issues/3).
98-
91+
- **workflow_id**
92+
The ID of the workflow to run.
93+
94+
- **output_history_ base_name** (optional)
95+
Name to use as the basis for histories created. If the *output_history_base_name* is not specified then the *workflow_id* is used.
96+
97+
- **reference_data** (optional)
98+
Input data that is the same for all benchmarking runs and only needs to be set once. See the section on *inputs* below for a description of the fields
99+
100+
- **runs**
101+
Input definitions for a benchmarking run. Each run defintion shoud contain:
102+
103+
- **history_name** (optional)
104+
The name of the history created for the output. The final output history name is generated by concatenating the *output_history_base_name* from above and the *history_name*. If the *history_name* is not specified an incrementing integer counter is used.
105+
- **inputs**
106+
The one or more input datasets to the workflow. Each input specification consists of:
107+
1. **name** the input name as specified in the workflow editor
108+
2. **dataset_id** the History API ID as displayed in the workflow editor or with the `./workflow.py histories` command.
109+
110+
99111
### Contributing
100112

101113
Fork this repository and then create a working branch for yourself from the `dev` branch. All pull requests should target `dev` and not the `master` branch.

config/example.yml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# DO NOT RUN
2+
# This is a non-working example used for discussion purposes only.
3+
- workflow_id: b94314cb9cb46380
4+
comments: DNA testing
5+
output_history_base_name: DNA Testing 10 CPU
6+
reference_data:
7+
- name: FASTQ Reference
8+
dataset_id: badfood
9+
- name: GTF
10+
dataset_id: more
11+
runs:
12+
- history_name: run one
13+
inputs:
14+
- name: input one
15+
dataset_id: 1f04e612d8649780
16+
- name: input two
17+
dataset_id: 1f04e612d8649780
18+
- history_name: run two
19+
inputs:
20+
- name: FASTQ Dataset
21+
dataset_id: 1f04e612d8649780
22+
- name: GTF index
23+
dataset_id: 1f04e612d8649780
24+
- workflow_id: b94314cb9cb46380
25+
comments: RNA testing
26+
output_history_base_name: RNA Testing 10 CPU
27+
runs:
28+
- hostory_name: FASTQ Dataset
29+
inputs:
30+
- name: FASTQ Dataset
31+
dataset_ids: 1f04e612d8649780
32+

config/rna-seq.yml

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,29 @@
1-
workflow_id: 3606d3101a772650
2-
inputs:
3-
- name: Reference FASTA
4-
id: '3947ba9ca107312f'
5-
- name: GTF
6-
id: '048a970701a6dc44'
7-
- name: FASTA Dataset
8-
id: 'ca5081d2c8f1088a'
9-
output_history_name: RNA seq test results
1+
- workflow_id: 8557135ce1bff84d
2+
output_history_base_name: PairRNA 16C 58G-MEM
3+
reference_data:
4+
- name: Reference Transcript (FASTA)
5+
dataset_id: d61e7f405474c541
6+
runs:
7+
- history_name: SRS9276533
8+
inputs:
9+
- name: FASTQ RNA Dataset
10+
dataset_id: 28fa757e56346a34
11+
- history_name: SRS9276520
12+
inputs:
13+
- name: FASTQ RNA Dataset
14+
dataset_id: 1faa2d3b2ed5c436
15+
- history_name: SRS9276534
16+
inputs:
17+
- name: FASTQ RNA Dataset
18+
dataset_id: ec8c5112d867eb82
19+
- workflow_id: 69906830c7478863
20+
output_history_base_name: RNA 16C 58G-MEM
21+
reference_data:
22+
- name: Reference Transcript (FASTA)
23+
dataset_id: d61e7f405474c541
24+
runs:
25+
- history_name: SRS9551191
26+
inputs:
27+
- name: FASTQ RNA Dataset
28+
dataset_id: 0aedafdec1eb4aeb
1029

workflow.py

Lines changed: 50 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515

1616
from pprint import pprint
1717

18-
VERSION='1.0.0'
18+
VERSION='1.1.0'
1919

2020
BOLD = '\033[1m'
2121
CLEAR = '\033[0m'
@@ -31,6 +31,16 @@
3131
# The directory where the workflow invocation data will be saved.
3232
INVOCATIONS_DIR = 'invocations'
3333

34+
class Keys:
35+
NAME = 'name'
36+
RUNS = 'runs'
37+
INPUTS = 'inputs'
38+
REFERENCE_DATA = 'reference_data'
39+
WORKFLOW_ID = 'workflow_id'
40+
DATASET_ID = 'dataset_id'
41+
HISTORY_BASE_NAME = 'output_history_base_name'
42+
HISTORY_NAME = 'history_name'
43+
3444

3545
def workflows():
3646
"""
@@ -81,33 +91,51 @@ def run(args):
8191
with open(name, 'r') as stream:
8292
try:
8393
config = yaml.safe_load(stream)
94+
print(f"Loaded {name}")
8495
except yaml.YAMLError as exc:
8596
print(exc)
8697

8798
gi = bioblend.galaxy.GalaxyInstance(url=GALAXY_SERVER, key=API_KEY)
8899
print(f"Connected to {GALAXY_SERVER}")
89100

90-
workflow = config['workflow_id']
91-
inputs = {}
92-
for spec in config['inputs']:
93-
input = gi.workflows.get_workflow_inputs(workflow, spec['name'])
94-
if input is None or len(input) == 0:
95-
print('ERROR: Invalid input specification')
96-
sys.exit(1)
97-
inputs[input[0]] = {'id': spec['dataset_id'], 'src': 'hda'}
98-
99-
if 'output_history_name' in config:
100-
print(f"Saving output to a history named {config['output_history_name']}")
101-
invocation = gi.workflows.invoke_workflow(workflow, inputs=inputs, history_name=config['output_history_name'])
102-
else:
103-
invocation = gi.workflows.invoke_workflow(workflow, inputs=inputs)
104-
105-
pprint(invocation)
106-
107-
output_path = os.path.join(INVOCATIONS_DIR, invocation['id'] + '.json')
108-
with open(output_path, 'w') as f:
109-
json.dump(invocation, f, indent=4)
110-
print(f"Wrote {output_path}")
101+
print(f"Found {len(config)} workflow definitions")
102+
for workflow in config:
103+
wfid = workflow['workflow_id']
104+
inputs = {}
105+
history_base_name = wfid
106+
if Keys.HISTORY_BASE_NAME in workflow:
107+
history_base_name = workflow[Keys.HISTORY_BASE_NAME]
108+
109+
if Keys.REFERENCE_DATA in workflow:
110+
for spec in workflow[Keys.REFERENCE_DATA]:
111+
input = gi.workflows.get_workflow_inputs(wfid, spec[Keys.NAME])
112+
if input is None or len(input) == 0:
113+
print(f'ERROR: Invalid input specification for {spec[Keys.NAME]}')
114+
sys.exit(1)
115+
inputs[input[0]] = { 'id': spec[Keys.DATASET_ID], 'src':'hda'}
116+
117+
count = 0
118+
for run in workflow[Keys.RUNS]:
119+
count += 1
120+
if Keys.HISTORY_NAME in run:
121+
output_history_name = f"{history_base_name} {run[Keys.HISTORY_NAME]}"
122+
else:
123+
output_history_name = f"{history_base_name} run {count}"
124+
for spec in run[Keys.INPUTS]:
125+
input = gi.workflows.get_workflow_inputs(wfid, spec[Keys.NAME])
126+
if input is None or len(input) == 0:
127+
print(f'ERROR: Invalid input specification for {spec[Keys.NAME]}')
128+
sys.exit(1)
129+
130+
inputs[input[0]] = {'id': spec[Keys.DATASET_ID], 'src' : 'hda' }
131+
132+
invocation = gi.workflows.invoke_workflow(wfid, inputs=inputs, history_name=output_history_name)
133+
pprint(invocation)
134+
# output_path = os.path.join(INVOCATIONS_DIR, invocation['id'] + '.json')
135+
output_path = os.path.join(INVOCATIONS_DIR, output_history_name.replace(' ', '_') + '.json')
136+
with open(output_path, 'w') as f:
137+
json.dump(invocation, f, indent=4)
138+
print(f"Wrote {output_path}")
111139

112140

113141
def histories(args):

0 commit comments

Comments
 (0)