Skip to content

Latest commit

 

History

History

model-based

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

OmniSafe's Navigation Benchmark for Model-based Algorithms

The OmniSafe Navigation Benchmark for model-based algorithms evaluates the effectiveness of OmniSafe's model-based algorithms across two different environments from the Safety-Gymnasium task suite. For each supported algorithm and environment, we offer the following:

  • Default hyperparameters used for the benchmark and scripts that enable result replication.
  • Graphs and raw data that can be utilized for research purposes.
  • Detailed logs obtained during training.
  • Suggestions and hints on fine-tuning the algorithm for achieving optimal results.

Supported algorithms are listed below:

Safety-Gymnasium

We highly recommend using Safety-Gymnasium to run the following experiments. To install, in a linux machine, type:

pip install safety_gymnasium

Run the Benchmark

You can set the main function of examples/benchmarks/experiment_grid.py as:

if __name__ == '__main__':
    eg = ExperimentGrid(exp_name='Model-Based-Benchmarks')

    # set up the algorithms.
    model_based_base_policy = ['LOOP', 'PETS']
    model_based_safe_policy = ['SafeLOOP', 'CCEPETS', 'CAPPETS', 'RCEPETS']
    eg.add('algo', model_based_base_policy + model_based_safe_policy)

    # you can use wandb to monitor the experiment.
    eg.add('logger_cfgs:use_wandb', [False])
    # you can use tensorboard to monitor the experiment.
    eg.add('logger_cfgs:use_tensorboard', [True])
    eg.add('train_cfgs:total_steps', [1000000])

    # set up the environment.
    eg.add('env_id', [
        'SafetyPointGoal1-v0-modelbased',
        'SafetyCarGoal1-v0-modelbased',
        ])
    eg.add('seed', [0, 5, 10, 15, 20])

    # total experiment num must can be divided by num_pool
    # meanwhile, users should decide this value according to their machine
    eg.run(train, num_pool=5)

After that, you can run the following command to run the benchmark:

cd examples/benchmarks
python run_experiment_grid.py

You can set the path of examples/benchmarks/experiment_grid.py : example:

path ='/home/username/omnisafe/omnisafe/examples/benchmarks/exp-x/Model-Based-Benchmarks'

You can also plot the results by running the following command:

cd examples
python analyze_experiment_results.py

For a detailed usage of OmniSafe statistics tool, please refer to this tutorial.

OmniSafe Benchmark

To demonstrate the high reliability of the algorithms implemented, OmniSafe offers performance insights within the Safety-Gymnasium environment. It should be noted that all data is procured under the constraint of cost_limit=1.00. The results are presented in Table 1 and Figure 1.

Performance Table

Table 1: The performance of OmniSafe model-based algorithms, encompassing both reward and cost, was assessed within the Safety-Gymnasium environments. It is crucial to highlight that all model-based algorithms underwent evaluation following 1e6 training steps.

PETS LOOP SafeLOOP
Environment Reward Cost Reward Cost Reward Cost
SafetyCarGoal1-v0 33.07 ± 1.33 61.20 ± 7.23 25.41 ± 1.23 62.64 ± 8.34 22.09 ± 0.30 0.16 ± 0.15
SafetyPointGoal1-v0 27.66 ± 0.07 49.16 ± 2.69 25.08 ± 1.47 55.23 ± 2.64 22.94 ± 0.72 0.04 ± 0.07
CCEPETS RCEPETS CAPPETS
Environment Reward Cost Reward Cost Reward Cost
SafetyCarGoal1-v0 27.60 ± 1.21 1.03 ± 0.29 29.08 ± 1.63 1.02 ± 0.88 23.33 ± 6.34 0.48 ± 0.17
SafetyPointGoal1-v0 24.98 ± 0.05 1.87 ± 1.27 25.39 ± 0.28 2.46 ± 0.58 9.45 ± 8.62 0.64 ± 0.77

Performance Curves

Figure 1: Training curves in Safety-Gymnasium environments, covering classical reinforcement learning algorithms and safe learning algorithms mentioned in Table 1.


SafetyCarGoal1-v0

SafetyPointGoal1-v0

Some Hints

In our experiments, we found that some hyperparameters are important for the performance of the algorithm:

  • action_repeat: The time of action repeat.
  • init_var: The initial variance of gaussian distribution for sampling actions.
  • temperature: The temperature factor for rescaling reward in planning.
  • cost_temperature: : The temperature factor for rescaling cost in planning
  • plan_horizon: Planning horizon.

We have done some experiments to show the effect of these hyperparameters, and we log the best configuration for each algorithm in each environment. You can check it in the omnisafe/configs/model_based.

In experiments, we found that the action_repeat=5 always performs better than action_repeat=1 in the navigation task for the cem-based methods. That means the change in reward or observation per action performed in a navigation task may be too small, the action_repeat=5 will enlarge these variable and make the dynamics model more trainable.

Importantly, we found that the high variance like init_var=4.0 performs better than low variance like init_var=0.01 in pets-based algorithms, but we found that the situation is the opposite in policy-guided algorithms like LOOP, LOOP need the low variance like init_var=0.01 to make the planning policy more similar to the neural policy.

Besides, the hyperparameter temperature and cost_temperature are also important. LOOP and SafeLOOP should fine tune these two parameters in in different environments. This affects the contribution of reward size to action mean and variance.

Moreover, No policy-guided like pets need the high plan_horizon, and policy-guided algorithms like loop only need low ``plan_horizon` in mujoco environments, but for a fair comparison, we use the planning horizon in navigation tasks.

If you find that other hyperparameters perform better, please feel free to open an issue or pull request.

Algorithm action_repeat init_var plan_horizon
PETS 5 4.0 7
LOOP 5 0.01 7
SafeLOOP 5 0.075 7
CCEPETS 5 4.0 7
CAPPETS 5 4.0 7
RCEPETS 5 4.0 7

However, there are some differences between these algorithms. We list the differences below:

LOOP

Environment temperature
SafetyPointGoal1-v0 10.0
SafetyCarGoal1-v0 10.0

SafeLOOP

Environment temperature cost_temperature
SafetyPointGoal1-v0 10.0 100.0
SafetyCarGoal1-v0 10.0 100.0