-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
218 lines (204 loc) · 11.3 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Feasibility Tutorial</title>
<style>
body {
font-family: Arial, sans-serif;
margin: 0;
padding: 0;
}
header {
background-color: #333;
color: #fff;
padding: 20px;
text-align: center;
}
.container {
max-width: 800px;
margin: 20px auto;
padding: 0 20px;
}
section {
background-color: #fff;
padding: 10px;
}
h2 {
color: #333;
}
p {
color: #666;
line-height: 1.6;
text-align: justify;
}
a {
color: inherit;
}
.author-info p {
margin-top: 30px;
color: #ccc;
text-align: center;
}
img {
display: block;
margin: auto;
margin-top: 10px;
margin-bottom: 10px;
}
.img-wrapper {
display: flex;
}
</style>
</head>
<body>
<header>
<h1>The Feasibility of Constrained Reinforcement Learning Algorithms: A Tutorial Study</h1>
<div class="author-info">
<p>
Yujie Yang*<sup>1</sup>,
Zhilong Zheng*<sup>1</sup>,
Shengbo Eben Li<sup>†1</sup>,
Masayoshi Tomizuka<sup>2</sup>,
Changliu Liu<sup>3</sup>
</p>
<p>
*Equal contribution<br>
<sup>†</sup>Corresponding author. Email: <a href="mailto:[email protected]">[email protected]</a><br>
<sup>1</sup>School of Vehicle and Mobility and State Key Lab of Intelligent Green Vehicle and Mobility, Tsinghua University<br>
<sup>2</sup>Department of Mechanical Engineering, University of California, Berkeley<br>
<sup>3</sup>Robotics Institute, Carnegie Mellon University
</p>
<p>
<a href="https://arxiv.org/abs/2404.10064">paper</a> |
<a href="https://github.com/yangyujie-jack/Feasibility-Tutorial">code</a>
</p>
</div>
</header>
<div class="container">
<br>
<img src="assets/images/intro.png" width="650px"/>
<section>
<h2>Abstract</h2>
<p>Satisfying safety constraints is a priority concern when solving optimal control problems (OCPs). Due to the existence of infeasibility phenomenon, where a constraint-satisfying solution cannot be found, it is necessary to identify a feasible region before implementing a policy. Existing feasibility theories built for model predictive control (MPC) only consider the feasibility of optimal policy. However, reinforcement learning (RL), as another important control method, solves the optimal policy in an iterative manner, which comes with a series of non-optimal intermediate policies. Feasibility analysis of these non-optimal policies is also necessary for iteratively improving constraint satisfaction; but that is not available under existing MPC feasibility theories. This paper proposes a feasibility theory that applies to both MPC and RL by filling in the missing part of feasibility analysis for an arbitrary policy. The basis of our theory is to decouple policy solving and implementation into two temporal domains: virtual-time domain and real-time domain. This allows us to separately define initial and endless, state and policy feasibility, and their corresponding feasible regions. Based on these definitions, we analyze the containment relationships between different feasible regions, which enables us to describe the feasible region of an arbitrary policy. We further provide virtual-time constraint design rules along with a practical design tool called feasibility function that helps to achieve the maximum feasible region. We review most of existing constraint formulations and point out that they are essentially applications of feasibility functions in different forms. We demonstrate our feasibility theory by visualizing different feasible regions under both MPC and RL policies in an emergency braking control task.</p>
</section>
<section>
<h2>Containment Relationships of Feasible Regions<h2>
<div class="img-wrapper">
<img src="assets/images/containment_1.png" width="300px"/>
<img src="assets/images/containment_2.png" width="300px"/>
</div>
<img src="assets/images/containment_3.png" width="600px"/>
<img src="assets/images/containment_4.png" width="600px"/>
</section>
<section>
<h2>Example of Emergency Braking Control</h2>
<br>
<img src="assets/images/emergency_braking.png" width="500px">
<h3>MPC with pointwise constraint</h3>
<div class="img-wrapper">
<img src="assets/images/trajectory_MPC_PW_(2,).png" width="25%"/>
<img src="assets/images/trajectory_MPC_PW_(4,).png" width="25%"/>
<img src="assets/images/trajectory_MPC_PW_(6,).png" width="25%"/>
<img src="assets/images/trajectory_MPC_PW_(10,).png" width="25%"/>
</div>
<div class="img-wrapper">
<img src="assets/images/feasibility_MPC_PW_(2,).png" width="25%"/>
<img src="assets/images/feasibility_MPC_PW_(4,).png" width="25%"/>
<img src="assets/images/feasibility_MPC_PW_(6,).png" width="25%"/>
<img src="assets/images/feasibility_MPC_PW_(10,).png" width="25%"/>
</div>
<h3>RL with pointwise constraint</h3>
<div class="img-wrapper">
<img src="assets/images/trajectory_RL_PW_10.png" width="25%"/>
<img src="assets/images/trajectory_RL_PW_50.png" width="25%"/>
<img src="assets/images/trajectory_RL_PW_100.png" width="25%"/>
<img src="assets/images/trajectory_RL_PW_10000.png" width="25%"/>
</div>
<div class="img-wrapper">
<img src="assets/images/feasibility_RL_PW_10.png" width="25%"/>
<img src="assets/images/feasibility_RL_PW_50.png" width="25%"/>
<img src="assets/images/feasibility_RL_PW_100.png" width="25%"/>
<img src="assets/images/feasibility_RL_PW_10000.png" width="25%"/>
</div>
<h3>MPC with control barrier function (CBF) constraint</h3>
<div class="img-wrapper">
<img src="assets/images/trajectory_MPC_CBF_(0.5,).png" width="25%"/>
<img src="assets/images/trajectory_MPC_CBF_(0.2,).png" width="25%"/>
<img src="assets/images/trajectory_MPC_CBF_(0.1,).png" width="25%"/>
<img src="assets/images/trajectory_MPC_CBF_(0.05,).png" width="25%"/>
</div>
<div class="img-wrapper">
<img src="assets/images/feasibility_MPC_CBF_(0.5,).png" width="25%"/>
<img src="assets/images/feasibility_MPC_CBF_(0.2,).png" width="25%"/>
<img src="assets/images/feasibility_MPC_CBF_(0.1,).png" width="25%"/>
<img src="assets/images/feasibility_MPC_CBF_(0.05,).png" width="25%"/>
</div>
<h3>RL with CBF constraint</h3>
<div class="img-wrapper">
<img src="assets/images/trajectory_RL_CBF_10.png" width="25%"/>
<img src="assets/images/trajectory_RL_CBF_50.png" width="25%"/>
<img src="assets/images/trajectory_RL_CBF_100.png" width="25%"/>
<img src="assets/images/trajectory_RL_CBF_10000.png" width="25%"/>
</div>
<div class="img-wrapper">
<img src="assets/images/feasibility_RL_CBF_10.png" width="25%"/>
<img src="assets/images/feasibility_RL_CBF_50.png" width="25%"/>
<img src="assets/images/feasibility_RL_CBF_100.png" width="25%"/>
<img src="assets/images/feasibility_RL_CBF_10000.png" width="25%"/>
</div>
<h3>MPC with safety index (SI) constraint</h3>
<div class="img-wrapper">
<img src="assets/images/trajectory_MPC_SI_(2, 5).png" width="25%"/>
<img src="assets/images/trajectory_MPC_SI_(1, 1).png" width="25%"/>
<img src="assets/images/trajectory_MPC_SI_(0.5, 0.5).png" width="25%"/>
<img src="assets/images/trajectory_MPC_SI_(0.5, 0.23).png" width="25%"/>
</div>
<div class="img-wrapper">
<img src="assets/images/feasibility_MPC_SI_(2, 5).png" width="25%"/>
<img src="assets/images/feasibility_MPC_SI_(1, 1).png" width="25%"/>
<img src="assets/images/feasibility_MPC_SI_(0.5, 0.5).png" width="25%"/>
<img src="assets/images/feasibility_MPC_SI_(0.5, 0.23).png" width="25%"/>
</div>
<h3>RL with SI constraint</h3>
<div class="img-wrapper">
<img src="assets/images/trajectory_RL_SI_10.png" width="25%"/>
<img src="assets/images/trajectory_RL_SI_50.png" width="25%"/>
<img src="assets/images/trajectory_RL_SI_100.png" width="25%"/>
<img src="assets/images/trajectory_RL_SI_10000.png" width="25%"/>
</div>
<div class="img-wrapper">
<img src="assets/images/feasibility_RL_SI_10.png" width="25%"/>
<img src="assets/images/feasibility_RL_SI_50.png" width="25%"/>
<img src="assets/images/feasibility_RL_SI_100.png" width="25%"/>
<img src="assets/images/feasibility_RL_SI_10000.png" width="25%"/>
</div>
<h3>MPC with Hamilton-Jacobi (HJ) reachability constraint</h3>
<div class="img-wrapper">
<img src="assets/images/trajectory_MPC_HJR_().png" width="25%"/>
<img src="assets/images/feasibility_MPC_HJR_().png" width="25%"/>
</div>
<h3>RL with HJ reachability constraint</h3>
<div class="img-wrapper">
<img src="assets/images/trajectory_RL_HJR_10.png" width="25%"/>
<img src="assets/images/trajectory_RL_HJR_50.png" width="25%"/>
<img src="assets/images/trajectory_RL_HJR_100.png" width="25%"/>
<img src="assets/images/trajectory_RL_HJR_10000.png" width="25%"/>
</div>
<div class="img-wrapper">
<img src="assets/images/feasibility_RL_HJR_10.png" width="25%"/>
<img src="assets/images/feasibility_RL_HJR_50.png" width="25%"/>
<img src="assets/images/feasibility_RL_HJR_100.png" width="25%"/>
<img src="assets/images/feasibility_RL_HJR_10000.png" width="25%"/>
</div>
<h3>SI synthesis via evolutionary optimization</h3>
<img src="assets/images/cma_es.png" width="700px">
<h3>SI synthesis via reinforcement learning</h3>
<img src="assets/images/joint_synthesis.png" width="600px">
<h3>SI synthesis via adversarial optimization</h3>
<img src="assets/images/adversarial.png" width="600px">
</section>
</div>
</body>
</html>