A method for managing upstream training task confs

## Background
Our Hobot pipeline is long and has four stages:

```
low-level -> expert -> agent -> deployment
```

Each stage can be called a 'task'. For any task, the tasks on its left are 'upstream' tasks. Generally, a task needs its upstream task config files and model weights for both training and deployment (which additionally requires its own configs and model weights). 

## A desirable scenario
In a brief discussion with @Haichao-Zhang on Friday, we agreed that in order to easily manage and use training results of upstream tasks, two important properties are desired:

- The model weights are always stored as one ckpt file, regardless of where the task is at the pipeline. For example, an agent's ckpt contains the model weights for low-level, expert, and itself. 
- We only need to look up at one stage to get all needed configurations. For example, for either agent training or deployment, it only needs the expert's job dir but not the low-level's. Similarly, for deployment, it only needs the agent's job dir. 

The above two properties simplify ckpt and conf management, because we don't want multiple training dirs passed to a downstream task.

## Solution
For model weights, it's straightforward to store all as one ckpt. However, it's a little tricky when it comes to conf management. Below is a simple hack for that purpose.

```python
def save_upstream_confs(upstream_task_root_dir: str):
    """When training the current task B, we copy all upstream task (C,D,...) confs
    to './.upstream_confs', and then add them to ``_CONF_FILES``.

    This will make them further copied to 'config_files' under the TB directory
    of B when later ALF writes the config.

    So later when one wants to use the ckpt of B for a new downstream task A, he
    doesn't need trained dirs of C,D,..., because their conf files have been
    included in B.

    To use any cached upstream conf ``x_conf.py``, one needs only to do

    .. code-block:: python

        alf.import_config('./.upstream_confs/x_conf.py')

    This will also work if ``x_conf.py`` also imports some upstream conf ``y_conf.py``,
    if inside ``x_conf.py`` it's written as

    .. code-block:: python

        alf.import_config('./.upstream_confs/y_conf.py')

    A general template of using/saving upstream confs:

    .. code-block:: python

        if is_training:
            save_upstream_confs(upstream_task_root_dir)
        # import conf files of the current task
        alf.import_config('x_conf.py')
        alf.import_config('y_conf.py')
        # import conf files of upstream tasks
        alf.import_config('./upstream_confs/z_conf.py')

    Args:
        upstream_task_root_dir: the root dir of the upstream task
    """
    root_dir = upstream_task_root_dir
    dst = pathlib.Path(__file__).parent
    dst = dst / ".upstream_confs/"
    os.system(f"mkdir -p {dst}")
    # Copy the upstream task config files, along with its upstream task conf files
    # if existing.
    if os.path.isdir(f"{root_dir}/config_files/.upstream_confs"):
        os.system(f"cp -r {root_dir}/config_files/.upstream_confs {dst}")
    os.system(f"cp {root_dir}/config_files/*.py {dst}")
    for f in glob.glob(f"{dst}/**/*.py", recursive=True):
        _add_conf_file(f)
```

Generally, we copy all files under `config_files` of an upstream root_dir, recursively to the path of the current conf file, under a special dir called `.upstream_confs`. Then we add all files in this special dir recursively to ALF's `_CONF_FILES` which will be copied by ALF to the `config_files` of the training root dir after one training iteration of the current task. 

This can satisfy the second property, if in any conf file `x_conf.py` of the current task, we import another conf file `y_conf.py` of the immediate upstream task by 

```python
alf.import_config('.upstream_confs/y_conf.py')
```

This works for both the training and deployment modes of the task. We only call the above function when it's in the training mode:

```python
if is_training:
  save_upstream_confs(upstream_task_root_dir)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A method for managing upstream training task confs #1529

Background

A desirable scenario

Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A method for managing upstream training task confs #1529

Description

Background

A desirable scenario

Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions