Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md

OmniSafe's Navigation Benchmark for Model-based Algorithms

The OmniSafe Navigation Benchmark for model-based algorithms evaluates the effectiveness of OmniSafe's model-based algorithms across two different environments from the Safety-Gymnasium task suite. For each supported algorithm and environment, we offer the following:

Default hyperparameters used for the benchmark and scripts that enable result replication.
Graphs and raw data that can be utilized for research purposes.
Detailed logs obtained during training.
Suggestions and hints on fine-tuning the algorithm for achieving optimal results.

Supported algorithms are listed below:

[NeurIPS 2001] Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models (PETS))
[CoRL 2021] Learning Off-Policy with Online Planning (LOOP and SafeLOOP)
[AAAI 2022] Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning (CAP)
[ICML 2022 Workshop] Constrained Model-based Reinforcement Learning with Robust Cross-Entropy Method (RCE)
[NeurIPS 2018] Constrained Cross-Entropy Method for Safe Reinforcement Learning (CCE)

Safety-Gymnasium

We highly recommend using Safety-Gymnasium to run the following experiments. To install, in a linux machine, type:

pip install safety_gymnasium

Run the Benchmark

You can set the main function of examples/benchmarks/experiment_grid.py as:

if __name__ == '__main__':
    eg = ExperimentGrid(exp_name='Model-Based-Benchmarks')

    # set up the algorithms.
    model_based_base_policy = ['LOOP', 'PETS']
    model_based_safe_policy = ['SafeLOOP', 'CCEPETS', 'CAPPETS', 'RCEPETS']
    eg.add('algo', model_based_base_policy + model_based_safe_policy)

    # you can use wandb to monitor the experiment.
    eg.add('logger_cfgs:use_wandb', [False])
    # you can use tensorboard to monitor the experiment.
    eg.add('logger_cfgs:use_tensorboard', [True])
    eg.add('train_cfgs:total_steps', [1000000])

    # set up the environment.
    eg.add('env_id', [
        'SafetyPointGoal1-v0-modelbased',
        'SafetyCarGoal1-v0-modelbased',
        ])
    eg.add('seed', [0, 5, 10, 15, 20])

    # total experiment num must can be divided by num_pool
    # meanwhile, users should decide this value according to their machine
    eg.run(train, num_pool=5)

After that, you can run the following command to run the benchmark:

cd examples/benchmarks
python run_experiment_grid.py

You can set the path of examples/benchmarks/experiment_grid.py : example:

path ='/home/username/omnisafe/omnisafe/examples/benchmarks/exp-x/Model-Based-Benchmarks'

You can also plot the results by running the following command:

cd examples
python analyze_experiment_results.py

For a detailed usage of OmniSafe statistics tool, please refer to this tutorial.

OmniSafe Benchmark

To demonstrate the high reliability of the algorithms implemented, OmniSafe offers performance insights within the Safety-Gymnasium environment. It should be noted that all data is procured under the constraint of cost_limit=1.00. The results are presented in Table 1 and Figure 1.

Performance Table

Table 1: The performance of OmniSafe model-based algorithms, encompassing both reward and cost, was assessed within the Safety-Gymnasium environments. It is crucial to highlight that all model-based algorithms underwent evaluation following 1e6 training steps.

	PETS		LOOP		SafeLOOP
Environment	Reward	Cost	Reward	Cost	Reward	Cost
SafetyCarGoal1-v0	33.07 ± 1.33	61.20 ± 7.23	25.41 ± 1.23	62.64 ± 8.34	22.09 ± 0.30	0.16 ± 0.15
SafetyPointGoal1-v0	27.66 ± 0.07	49.16 ± 2.69	25.08 ± 1.47	55.23 ± 2.64	22.94 ± 0.72	0.04 ± 0.07
	CCEPETS		RCEPETS		CAPPETS
Environment	Reward	Cost	Reward	Cost	Reward	Cost
SafetyCarGoal1-v0	27.60 ± 1.21	1.03 ± 0.29	29.08 ± 1.63	1.02 ± 0.88	23.33 ± 6.34	0.48 ± 0.17
SafetyPointGoal1-v0	24.98 ± 0.05	1.87 ± 1.27	25.39 ± 0.28	2.46 ± 0.58	9.45 ± 8.62	0.64 ± 0.77

Performance Curves

Figure 1: Training curves in Safety-Gymnasium environments, covering classical reinforcement learning algorithms and safe learning algorithms mentioned in Table 1.

SafetyCarGoal1-v0

SafetyPointGoal1-v0

Some Hints

In our experiments, we found that some hyperparameters are important for the performance of the algorithm:

action_repeat: The time of action repeat.
init_var: The initial variance of gaussian distribution for sampling actions.
temperature: The temperature factor for rescaling reward in planning.
cost_temperature: : The temperature factor for rescaling cost in planning
plan_horizon: Planning horizon.

We have done some experiments to show the effect of these hyperparameters, and we log the best configuration for each algorithm in each environment. You can check it in the omnisafe/configs/model_based.

In experiments, we found that the action_repeat=5 always performs better than action_repeat=1 in the navigation task for the cem-based methods. That means the change in reward or observation per action performed in a navigation task may be too small, the action_repeat=5 will enlarge these variable and make the dynamics model more trainable.

Importantly, we found that the high variance like init_var=4.0 performs better than low variance like init_var=0.01 in pets-based algorithms, but we found that the situation is the opposite in policy-guided algorithms like LOOP, LOOP need the low variance like init_var=0.01 to make the planning policy more similar to the neural policy.

Besides, the hyperparameter temperature and cost_temperature are also important. LOOP and SafeLOOP should fine tune these two parameters in in different environments. This affects the contribution of reward size to action mean and variance.

Moreover, No policy-guided like pets need the high plan_horizon, and policy-guided algorithms like loop only need low ``plan_horizon` in mujoco environments, but for a fair comparison, we use the planning horizon in navigation tasks.

If you find that other hyperparameters perform better, please feel free to open an issue or pull request.

Algorithm	action_repeat	init_var	plan_horizon
PETS	5	4.0	7
LOOP	5	0.01	7
SafeLOOP	5	0.075	7
CCEPETS	5	4.0	7
CAPPETS	5	4.0	7
RCEPETS	5	4.0	7

However, there are some differences between these algorithms. We list the differences below:

LOOP

Environment	temperature
SafetyPointGoal1-v0	10.0
SafetyCarGoal1-v0	10.0

SafeLOOP

Environment	temperature	cost_temperature
SafetyPointGoal1-v0	10.0	100.0
SafetyCarGoal1-v0	10.0	100.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model-based

model-based

README.md

OmniSafe's Navigation Benchmark for Model-based Algorithms

Safety-Gymnasium

Run the Benchmark

OmniSafe Benchmark

Performance Table

Performance Curves

Some Hints

LOOP

SafeLOOP

Files

model-based

Directory actions

More options

Directory actions

More options

Latest commit

History

model-based

Folders and files

parent directory

README.md

OmniSafe's Navigation Benchmark for Model-based Algorithms

Safety-Gymnasium

Run the Benchmark

OmniSafe Benchmark

Performance Table

Performance Curves

Some Hints

LOOP

SafeLOOP