Skip to content

1. Introduction

Abdurrahman Abul-Basher edited this page Nov 1, 2021 · 13 revisions

About

reMap (relabeling Metabolic pathway data with groups) is a simple framework, that performs relabeling examples to a different set of labels, characterized as "pathway groups" where a group comprises a subset of correlated pathways. A group-based approach is employed to overcome the low sensitivity scores associated with the pathway prediction task. The relabeling process in reMap is achieved by alternating between 1) assigning groups to each sample (feed-forward) and 2) updating reMap's parameters (feed-backward).

We demonstrated reMap's effectiveness on metabolic pathway prediction using leADS, where the resulting performance metrics equaled or exceeded other prediction methods on organismal genomes with improved sensitivity scores.

Fig. 1: reMap workflow

A short description of the two phases in the workflow (shown in Fig.1) is given below:

Phase 1- Feed-Forward:

During this stage, a minimal subset of groups is picked to tag each example in a three-step process:

1. Constructing Pathway Group: In this step, pathways are partitioned into non-disjoint pathway groups using any correlated models. To obtain groups, any model from the CBT package can be employed. Note that this step is applied only once during the initial round.

2. Building Pathway Group Centroid: In this step, reMap computes centroids for each pathway group to harness the relative association of each pathway to the group's centroids. This ensures that pathways within a pathway group are semantically close enough to the center of that particular group while also ensuring similar semantics among groups with overlapping pathways. Semantical information represented by pathways is extracted using pathway2vec that automatically generates features for pathway inference. At the end of this step, reMap produces a maximum number of expected hypothetical groups for each example.

3. Re-assigning Pathway Labels to Pathway Groups: In this step, each example in a given dataset is re-annotated with a set of optimum multi-label pathway groups using centroids and a history profile that stores groups probability is obtained from previous iterations.

Phase 2- Feed-Backward:

After the feed-forward phase, reMap’s internal parameters are updated by enforcing four constraints: i) similarities between groups and the associated pathways weights, ii) pathways enclosed in a group should share similar weights, iii) the input space (i.e., enzyme) and the pathway space should share similar statistical properties, and iv) all of reMap's parameters should neither be too large nor too small.

The two phases are repeated for all examples until a predefined number of rounds (τ ∈ ℤ>1) is reached. In the end, a new dataset is constructed which can be used to train a multi-label algorithm (e.g. leADS) for pathway prediction.

For more information about reMap, please visit our paper.

Citing

If you find reMap useful in your research, please consider citing the following paper:

Contact information

For any inquiries, please contact Steven Hallam and Abdurrahman Abul-Basher at: [email protected] and [email protected]

Clone this wiki locally