Skip to content

markus-nilsson/dpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data processing pipeline

Methods for building a data processing pipeline in MATLAB. Intended for use with MRI data in general and diffusion MRI in particular.

General structure

A node (dp_node_base.m) is a class that executes a single processing step. It has the following key methods

  • input = po2i(obj, previous_output), which takes an output structure from a previous node and converts it into an input to the present node. This method may, for example, rename fields.
  • output = i2o(obj, input), which build the output structure from the input structure. This is to declare what files that are expected to be generated by the node
  • output = execute(obj, input, output), which executes the code that generates the output from the input.

Apart from these methods, the class has some important properties

  • previous_node, which links it to previous processing steps.
  • output_test, which determines which fields of the output structure that will be checked for file existence before the node is executed.

These properties should be set in the constructor.

Apart from the methods and properties mentioned here, there are additional ones to help the execution of the data processing.

Node types

There are different types of nodes

  • dp_node_primary, which only generates output structures for later nodes
  • dp_node, which is intended to act on image data e.g. nifti-files
  • dp_node_workflow, which glues multiple nodes together into one
  • dp_node_items, which acts on an unstructured set of items e.g. imaging data not yet identified

In addition, there are nodes that only deal with inputs and outputs

  • dp_node_io_rename, which only renames fields
  • dp_node_io_append, which appends fields
  • dp_node_io, which appends a single fields

These nodes takes as input a translation table on the form { {'field_1'}, {value} }, where value can be a function handle @(x) x.field, which will get the input structure sent to it. Here, output.field_1 would have been set to input.field. Finally, dp_node_io takes only a single pair as input, e.g. dp_node_io('field_1', value) would achieve the same thing as above. The rename node will have only field_1 as output (on top of standard fields, whereas the other two will append the input structure.

Examples of nodes with more specific functions are

  • dp_node_dcm2nii.m, which converts a folder of dicom files to a nifti file
  • dp_node_dmri_denoise.m, which applies denoising via mrtrix.

To use the more specific nodes in your project, you have two options. First, start from scratch, and inherit from dp_node and implement at minimum i2o and execute. Second, you can inherit from an existing more functional node, and customize it by overloading the po2i and possibly the i2o methods. See examples.

Diffusion nodes

Nodes for processing dMRI data are prefixed by dp_node_dmri. They assume the input structure has a field called dmri_fn. For nodes needing metadata, it assumed an xps-structure can be loaded form a correspondingly named xps.mat file. See the mdm-framework for details.

Data processing with different modes

A node can support one or more data processing modes, which are accessed via the run method. For example, my_node().run(mode) would start the data processing in the given mode. Examples of modes are

  • report, which prints a report showing which input and output files that exist, and an example of an output structure
  • iter, which generates a list of outputs of the present node
  • execute, which runs these execute method on all outputs generated by the previous_node of the present node
  • debug, which is identical to execute except that errors are not enclosed in a try/catch structure
  • visualize, which saves visualizations of the data managed by the node
  • mgui, which opens the output of the node in a graphical user interface

Normally, you would use iter and report to troubleshoot a developing pipeline.

Options

An options structure can be supplemented to the data processing, according to my_node().run(mode, opt), where opt is a structure with one or more of the following fields

  • do_try_catch, which is a boolean that determines whether errors in the data processing is catched or rethrown
  • verbose, which tells the pipeline how much information to display (range 0-3)
  • do_overwrite, a boolean that determines whether existing files will be written over or not (note: output files older than input files will always be written over)

See dp_nose_base.m for a full list (static method: default_opt).

Input structure

Mandatory fields

  • id, which holds the identity of the data being processed

Optional, but near mandatory fields

  • op, is the output path (this is where e.g. dp_node_denoise puts its output)
  • bp, is the base-path from which paths are created (e.g. bp/id/nii/file.nii.gz)

These three fields will always be set in the output node from the input node, even if they are not mentioned in your code. However, the framework will not override changes done in a node.

Output structure

All fields are optional, but some rules apply

  • Fields ending with _fn are assumed to be refering to files, meaning they will be part of the input/output checks
  • Setting output.tmp.bp to a temporary path and output.tmp.do_delete = 1 will allow the execute method to put data in a temporary path that will be deleted once the execution is done.

Tips and tricks

When setting up a new node, start by defining the po2i, i2o and execute methods, as empty functions. Run the node in the report mode. Set a breakpoint within the empty functions, and start writing your code, and test it. Once you can run the report mode without errors, you're ready to start the execution mode (execute).

Running the node without catching errors may cause it to stop early, in a subject with input/output errors, that you may wish to ignore. To deal with this, run the node for a subject where all preceeding nodes work correctly. For example

my_node().run('report', struct('do_try_catch', 0, 'verbose', 3))

Stand alone use

The nodes can also be used in a stand-alone fashion. Example for denoising

input.dmri_fn = 'my_path/your_dwi_volume.nii.gz'; input.op = msf_fileparts(input.nii_fn);

a = dp_node_dmri_denoise(); a.execute(input, a.i2o(input));

Dependencies

Acknowledgements

If you use this in your project, please acknowledge this and cite this repository, and its author: Markus Nilsson at Lund University.

About

Data processing pipeline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages