Skip to content

Module Usage in Projects

Stephan Reichl edited this page Aug 7, 2024 · 9 revisions

As a concrete example, we will apply the unsupervised_analysis module to MyData stored on data/MyData.

Data

Code

First, we provide the configuration file for the application of the unsupervised_analysis module to MyData using this specific and predefined structure within your project's config/config.yaml.

#### Datasets and Workflows to include ###
workflows:
    MyData:
        unsupervised_analysis: "config/MyData/MyData_unsupervised_analysis_config.yaml"

Tip

Recommended folder and naming scheme for config files: config/{dataset_name}/{dataset_name}_{module}_config.yaml.

Second, within the main Snakefile (workflow/Snakefile) we have to do three things

  • load and parse all configurations into a structured dictionary.
    # load configs for all workflows and datasets
    config_wf = dict()
    
    for ds in config["workflows"]:
        for wf in config["workflows"][ds]:
            with open(config["workflows"][ds][wf], 'r') as stream:
                try:
                    config_wf[ds+'_'+wf]=yaml.safe_load(stream)
                except yaml.YAMLError as exc:
                    print(exc)
  • include the MyData analysis snakfile from the rule subfolder (see last step).
    ##### load rules (one per dataset) #####
    include: os.path.join("rules", "MyData.smk")
  • require all outputs from the used module as inputs to the target rule all.
    #### Target Rule ####
    rule all:
        input:
            #### MyData Analysis
            rules.MyData_unsupervised_analysis_all.input,
            ...

Finally, within the dedicated snakefile for the analysis of MyData, workflow/rules/MyData.smk we load the specified version of the unsupervised_analysis module directly from GitHub, provide it with the previously loaded configuration and use a prefix for all (*) loaded rules.

# MyData Analysis

### MyData - Unsupervised Analysis ####
module MyData_unsupervised_analysis:
    snakefile:
        github("epigen/unsupervised_analysis", path="workflow/Snakefile", tag="v2.0.0")
    config:
        config_wf["MyData_unsupervised_analysis"]

use rule * from MyData_unsupervised_analysis as MyData_unsupervised_analysis_*

Tip

Recommended file name for the analysis-specific snakefile: workflow/rules/{dataset_name}.smk.

Recommended prefix for the loaded rules: {dataset_name}_{module}_.

Results

====================== COMING SOON ======================