-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
539 lines (491 loc) · 34 KB
/
Copy pathindex.html
File metadata and controls
539 lines (491 loc) · 34 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="description" content="ReViT: Rotational-equivariant Vision Transformers for Neural PDE Solvers">
<meta name="keywords" content="ReViT, Equivariant, Vision Transformer, PDE, Rotational Equivariance">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>ReViT: Rotational-equivariant Vision Transformers for Neural PDE Solvers</title>
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
<link rel="stylesheet" href="./static/css/bulma.min.css">
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="./static/css/index.css?v=tikz-diagrams-1">
<link rel="icon" href="./static/images/logo_mark_previous.svg">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script defer src="./static/js/fontawesome.all.min.js"></script>
<script src="./static/js/bulma-carousel.min.js"></script>
<script src="./static/js/bulma-slider.min.js"></script>
<script src="./static/js/index.js?v=shift-window-clarity-2"></script>
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
</head>
<body>
<nav class="navbar" role="navigation" aria-label="main navigation">
<div class="navbar-brand">
<a role="button" class="navbar-burger" aria-label="menu" aria-expanded="false">
<span aria-hidden="true"></span><span aria-hidden="true"></span><span aria-hidden="true"></span>
</a>
</div>
<div class="navbar-menu">
<div class="navbar-start" style="flex-grow: 1; justify-content: center;">
<a class="navbar-item" href="https://howw-way.github.io/personal-web/">
<span class="icon"><i class="fas fa-home"></i></span>
</a>
<div class="navbar-item has-dropdown is-hoverable">
<a class="navbar-link">More Research</a>
<div class="navbar-dropdown">
<a class="navbar-item" href="https://ge.in.tum.de/publications/">Physics-based simulation group</a>
</div>
</div>
</div>
</div>
</nav>
<!-- HERO -->
<section class="hero revit-hero">
<div class="hero-body">
<div class="container is-max-widescreen">
<div class="columns is-centered">
<div class="column has-text-centered">
<div class="revit-title-row">
<img src="./static/images/logo_mark_previous.svg" alt="ReViT Logo" class="revit-title-logo">
<h1 class="title is-1 publication-title">ReViT: Rotational-equivariant Vision Transformers for Neural PDE Solvers</h1>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="https://howw-way.github.io/personal-web/">Hao Wei</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://ge.in.tum.de/about/bjorn-list/">Bjoern List</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://ge.in.tum.de/about/n-thuerey/">Nils Thuerey</a><sup>1</sup></span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block"><sup>1</sup>Technical University of Munich</span>
</div>
<div class="is-size-6 publication-venue" style="margin: 0.5rem auto;">Oral at ICML 2026</div>
<div class="column has-text-centered">
<div class="publication-links">
<span class="link-block">
<a href="https://ge.in.tum.de/download/wei-icml2026-revit.pdf" class="external-link button is-normal is-rounded is-dark">
<span class="icon"><i class="fas fa-file-pdf"></i></span><span>Paper</span>
</a>
</span>
<span class="link-block">
<a href="https://github.com/tum-pbs/revit" class="external-link button is-normal is-rounded is-dark">
<span class="icon"><i class="fab fa-github"></i></span><span>Code</span>
</a>
</span>
</div>
</div>
<div class="revit-hero-showcase" aria-label="ReViT 3D predictions across 24 rotations">
<figure class="revit-showcase-panel revit-showcase-main">
<video autoplay loop muted playsinline preload="metadata" poster="./static/images/revit_3d_mhd_velocity_poster.jpg?v=ref-pair-1" aria-label="Animated ReViT MHD velocity prediction across 24 rotations">
<source src="./static/images/revit_3d_mhd_velocity.mp4?v=ref-pair-1" type="video/mp4">
<img src="./static/images/revit_3d_mhd_velocity.gif?v=ref-pair-1" alt="Animated ReViT MHD velocity prediction across 24 rotations.">
</video>
<figcaption><span>MHD velocity</span></figcaption>
</figure>
<div class="revit-showcase-stack">
<figure class="revit-showcase-panel">
<video autoplay loop muted playsinline preload="metadata" poster="./static/images/revit_3d_mhd_magnetic_poster.jpg?v=ref-pair-1" aria-label="Animated ReViT MHD magnetic-field prediction across 24 rotations">
<source src="./static/images/revit_3d_mhd_magnetic.mp4?v=ref-pair-1" type="video/mp4">
<img src="./static/images/revit_3d_mhd_magnetic.gif?v=ref-pair-1" alt="Animated ReViT MHD magnetic-field prediction across 24 rotations.">
</video>
<figcaption><span>MHD magnetic</span></figcaption>
</figure>
<figure class="revit-showcase-panel">
<video autoplay loop muted playsinline preload="metadata" poster="./static/images/revit_3d_tcf_velocity_poster.jpg?v=ref-pair-1" aria-label="Animated ReViT turbulent-channel-flow velocity prediction across 24 rotations">
<source src="./static/images/revit_3d_tcf_velocity.mp4?v=ref-pair-1" type="video/mp4">
<img src="./static/images/revit_3d_tcf_velocity.gif?v=ref-pair-1" alt="Animated ReViT turbulent-channel-flow velocity prediction across 24 rotations.">
</video>
<figcaption><span>TCF velocity</span></figcaption>
</figure>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<!-- TEASER: Fig 1 from paper -->
<section class="section">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column is-full-width">
<div class="content has-text-justified">
<p><strong>ReViT</strong> is the first Vision Transformer framework that enforces strict rotational equivariance on grid-based physical fields. By mapping scalar and vector inputs into locally invariant representations derived from physics-based canonical bases, ReViT enables standard self-attention without symmetry violations—yielding significant accuracy gains across 2D and 3D PDE benchmarks.</p>
</div>
</div>
</div>
</div>
</section>
<hr class="section-divider">
<!-- ABSTRACT -->
<section class="section">
<div class="container is-max-desktop">
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Abstract</h2>
<div class="content has-text-justified">
<p>Physics obeys strict symmetries like rotational equivariance. However, the standard Transformer architectures widely used in physics foundation models do not enforce these constraints by construction. We introduce <strong>ReViT</strong>, a rotationally equivariant Vision Transformer framework for neural PDE solvers operating on grid-based physical fields that strictly enforces rotational equivariance.</p>
<p>ReViT maps scalar and vector inputs into locally invariant representations derived from physics-based canonical bases, enabling the use of standard self-attention without symmetry violations. Built on a hierarchical Swin-style backbone with a precomputed reference basis pyramid, ReViT preserves equivariance across multi-scale operations.</p>
<p>We evaluate ReViT on a wide range of 2D and 3D PDE benchmarks, such as Magnetohydrodynamics and Turbulent Channel Flows, demonstrating significant gains over state-of-the-art baselines. ReViT exhibits strong generalization, and <strong>reduces MSE by up to 65%</strong> compared with the best-performing alternatives.</p>
</div>
</div>
</div>
</div>
</section>
<hr class="section-divider">
<!-- MOTIVATION: CHALLENGES -->
<section class="section">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column is-full-width">
<h2 class="title is-3">Theoretical Analysis: Why ViTs Fail</h2>
<div class="content has-text-justified">
<p>We consider a physical field \(\mathbf{f}: \Omega \rightarrow \mathcal{V}\) on a spatial domain \(\Omega \subset \mathbb{R}^d\). For a <em>vector field</em> \(\mathbf{u}\), the rotation group acts on both the domain and the value: \([L_g \mathbf{u}](x) = \mathbf{R}(g)\mathbf{u}(\mathbf{R}(g)^{-1}x)\). A neural network \(\Phi\) is <em>equivariant</em> if \(\Phi(L_g \mathbf{u}) = L_g \Phi(\mathbf{u})\). We identify <strong>three distinct mechanisms</strong> by which standard ViTs violate rotational equivariance:</p>
</div>
<div class="challenge-item challenge-illustrated">
<div class="challenge-text">
<strong>C1. The Tokenization Barrier.</strong>
Standard patch embeddings flatten the vector field within \(P_i\) into \(\mathbf{v} \in \mathbb{R}^{K^d \cdot d}\) and apply a learnable linear map \(\mathbf{E}\). Under rotation \(g\), pixels permute to \(\pi_g \mathbf{v}\), but \(\mathbf{E}(\pi_g \mathbf{v}) \neq \mathbf{E}(\mathbf{v})\)—projecting rotated patterns into disjoint latent regions.
</div>
<div class="challenge-diagram">
<img src="./static/images/fig_c1.png" alt="C1: Tokenization barrier — flattening a rotated patch maps it to a different raster vector.">
</div>
</div>
<div class="challenge-item challenge-illustrated">
<div class="challenge-text">
<strong>C2. Loss of Spatial Equivariance.</strong>
Absolute PEs break equivariance since \(\mathbf{P}(\mathbf{R}(g)x) \neq \boldsymbol{\pi}_g \mathbf{P}(x)\). Relative PEs depend on \(\boldsymbol{\delta}_{ij} = x_j - x_i\), restoring translation equivariance, but an <em>isotropic</em> function like \(\|x_i - x_j\|\) discards all directional information.
</div>
<div class="challenge-diagram">
<img src="./static/images/fig_c2.png" alt="C2: PE dilemma — existing PEs are either directional or equivariant, never both.">
</div>
</div>
<div class="challenge-item">
<strong>C3. The Representational Mismatch.</strong>
For vector fields, rotations act on both coordinates and values. By Schur's Lemma, any linear map commuting with rotations must be \(\mathbf{W} = \lambda \mathbf{I}\). Standard ViT projections are unconstrained and dense, violating \(\mathbf{W}(\mathbf{R}(g)\mathbf{u}) = \mathbf{R}(g)(\mathbf{W}\mathbf{u})\) and breaking equivariance.
</div>
</div>
</div>
</div>
</section>
<hr class="section-divider">
<!-- METHOD -->
<section class="section">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column is-full-width">
<h2 class="title is-3">Methodology of ReViT</h2>
<div class="content has-text-justified">
<p>By adapting local canonicalization to ViTs, we decouple basis transformations from feature learning, solving challenges C1–C3. The model alternates between <strong>invariant processing</strong> (features in local canonical frames) and <strong>global transitions</strong> (physical basis transformations). The architecture consists of three stages: (1) Local Canonicalization, (2) Invariant Transformer Processing, and (3) Equivariant Decoding.</p>
</div>
<div class="arch-diagram-container">
<img src="./static/images/structure.png" alt="ReViT architecture overview.">
<p class="is-size-7 has-text-grey has-text-centered" style="margin-top: 0.75rem;">
<strong>Figure 2.</strong> Overview of the ReViT architecture. The hierarchical encoder-decoder alternates between <strong>invariant processing</strong> (blue) and <strong>global transitions</strong> (orange). The <em>Reference Basis Pyramid</em> (purple) supplies local bases \(\mathcal{B}^{(l)}\) to mediate resolution changes (<em>Merge</em>, <em>Expand</em>) and the final <em>Equivariant Rebase</em>.
</p>
</div>
<div class="rebase-interactive" id="aov-rebasing">
<div class="rebase-header">
<p class="rebase-kicker">Interactive Local Rebasing</p>
<h3 class="title is-4">Rotate Globally, Learn Locally</h3>
<p>The local basis is built from a patch-aggregated vector and used to rebase global vectors into local coordinates. When the field rotates, the basis co-rotates, so projecting vectors with \(\mathbf{B}_i^{\mathsf T}\) keeps the local representation stable.</p>
</div>
<div class="rebase-control-bar" role="group" aria-label="Local-basis global rotation">
<div class="rebase-angle-buttons" aria-label="Preset rotations">
<button class="rebase-angle is-active" type="button" data-angle="0">0°</button>
<button class="rebase-angle" type="button" data-angle="90">90°</button>
<button class="rebase-angle" type="button" data-angle="180">180°</button>
<button class="rebase-angle" type="button" data-angle="270">270°</button>
</div>
<label class="rebase-slider-label" for="aov-angle">Input rotation</label>
<input id="aov-angle" class="rebase-slider" type="range" min="0" max="270" step="1" value="0">
<output id="aov-angle-output" for="aov-angle">0°</output>
</div>
<div class="rebase-figure-guide">
<span><strong>Red box:</strong> selected patch used to show the patch-level basis.</span>
<span><strong>Global vector</strong> \(\mathbf{u}\): orange, drawn in the fixed \(X\)-\(Y\) basis.</span>
<span><strong>Rebased vector</strong> \(\mathbf{u}^{\text{local}}=\mathbf{B}_i^{\mathsf T}\mathbf{u}\): green, drawn in the co-rotating local basis.</span>
</div>
<div class="rebase-panels">
<figure class="rebase-panel">
<figcaption class="rebase-panel-title">
<span>Global field</span>
<span>fixed \(X\)-\(Y\) basis</span>
</figcaption>
<canvas id="aov-global-canvas" class="rebase-canvas" aria-label="Global vector field under rotation"></canvas>
</figure>
<figure class="rebase-panel">
<figcaption class="rebase-panel-title">
<span>Rebased field</span>
<span id="aov-local-basis-label">local basis</span>
</figcaption>
<canvas id="aov-local-canvas" class="rebase-canvas" aria-label="Rebased field under the same rotation"></canvas>
</figure>
</div>
<div class="rebase-token-flow" aria-label="Local rebasing sequence">
<figure class="rebase-stage">
<canvas id="aov-stage-spatial" aria-label="Patch field stage"></canvas>
<figcaption>Global patch field</figcaption>
</figure>
<figure class="rebase-stage">
<canvas id="aov-stage-sequence" aria-label="Token sequence stage"></canvas>
<figcaption>Global token sequence (co-rotating)</figcaption>
</figure>
<figure class="rebase-stage">
<canvas id="aov-stage-basis" aria-label="Local basis stage"></canvas>
<figcaption>Local basis (co-rotating)</figcaption>
</figure>
<figure class="rebase-stage">
<canvas id="aov-stage-rebased" aria-label="Rebased feature stage"></canvas>
<figcaption>Rebased token sequence (invariant to rotation)</figcaption>
</figure>
</div>
<div class="rebase-equation-row">
<span>\(\bar{\mathbf{u}}_i = \frac{1}{|P_i|}\sum_{k\in P_i}\mathbf{u}_k\)</span>
<span>\(\mathbf{b}_{1,i} = \bar{\mathbf{u}}_i / \|\bar{\mathbf{u}}_i\|\)</span>
<span>\(\mathbf{u}^{\text{local}}_{i,k} = \mathbf{B}_i^{\mathsf T}\mathbf{u}_{i,k}\)</span>
</div>
</div>
<div class="method-box-illustrated">
<div class="method-text">
<h4 class="title is-5" style="margin-bottom: 0.5rem;">1. Local Canonicalization <span style="color:#999;font-size:0.85rem;font-weight:400;">— solves C3</span></h4>
<p>For each patch \(P_i\), we compute a <em>Local Canonical Basis</em> \(\mathbf{B}_i \in SO(d)\) deterministically from the field values. Vectors are projected into local frames: \(\mathbf{u}^{\text{local}}_{i,k} = \mathbf{B}_i^T \mathbf{u}_k\). This is provably invariant:</p>
<div class="equation-highlight">
$$\begin{aligned}
(\mathbf{R}(g)\mathbf{B}_i)^{T} (\mathbf{R}(g)\mathbf{u}_i^{\text{global}})
&= \mathbf{B}_i^{T} \mathbf{R}(g)^{T} \mathbf{R}(g) \mathbf{u}_i^{\text{global}} \\
&= \mathbf{B}_i^{T} \mathbf{u}_i^{\text{global}} \\
&= \mathbf{u}^{\text{local}}_{i,k}
\end{aligned}$$
</div>
<p>In 3D, the basis is derived from the mean velocity \(\bar{\mathbf{u}}_i\) and mean vorticity \(\bar{\boldsymbol{\omega}}_i\), using stabilized analytical orthogonalization based on sequential cross-products.</p>
</div>
<div class="method-diagram">
<img src="./static/images/fig_sol3.png" alt="S3: Local canonical basis — project vectors via B_i^T from global to local frame.">
</div>
</div>
<div class="method-box-illustrated">
<div class="method-text">
<h4 class="title is-5" style="margin-bottom: 0.5rem;">2. Invariant Patch Aggregation <span style="color:#999;font-size:0.85rem;font-weight:400;">— solves C1</span></h4>
<p>Instead of flattening patches (which breaks under rotation-induced permutations, C1), we treat each patch as a <strong>permutation-invariant set</strong>:</p>
<p><strong>Step 1.</strong> Map each local vector through an MLP to obtain invariant features:</p>
<div class="equation-highlight">
$$\mathcal{X}_i = \{\mathbf{h}_{i,k} \mid k \in P_i\}, \quad \mathbf{h}_{i,k} = \text{MLP}(\mathbf{u}^{\text{local}}_{i,k})$$
</div>
<p><strong>Step 2.</strong> Aggregate with a Set-Transformer: concatenate a learnable query token \(\mathbf{h}_{\text{query}}\) with the set, apply self-attention, and extract the query output:</p>
<div class="equation-highlight">
$$\mathbf{H} = [\mathbf{h}_{\text{query}},\, \mathbf{h}_1, \dots, \mathbf{h}_{K^d}], \quad \mathbf{z}_i = \text{SA}(\mathbf{H})[0]$$
</div>
<p><strong>Result.</strong> The aggregation is strictly permutation-invariant—pixel ordering within the patch is irrelevant:</p>
<div class="equation-highlight">
$$\text{Agg}(\pi(\mathcal{X}_i)) = \text{Agg}(\mathcal{X}_i)$$
</div>
</div>
<div class="method-diagram">
<img src="./static/images/fig_sol1.png" alt="S1: Set-Transformer aggregation — patch vectors → MLP → set → query token → z_i.">
</div>
</div>
<div class="method-box-illustrated">
<div class="method-text">
<h4 class="title is-5" style="margin-bottom: 0.5rem;">3. Rebased Relative Positional Encoding <span style="color:#999;font-size:0.85rem;font-weight:400;">— solves C2</span></h4>
<p>We project displacement vectors into the query token's local basis: \(\mathbf{p}_{ij \to i} = \mathbf{B}_i^T(x_j - x_i)\). This is invariant to global rotations while preserving local anisotropy. The modified self-attention becomes:</p>
<div class="equation-highlight">
$$\text{SA}(\mathbf{X}) = \text{softmax}\left(\frac{\mathbf{X}\mathbf{W}_Q (\mathbf{X}\mathbf{W}_K)^T}{\sqrt{d_k}}+\mathbf{P}\right)\mathbf{X}\mathbf{W}_V$$
</div>
</div>
<div class="method-diagram">
<img src="./static/images/fig_sol2.png" alt="S2: Rebased PE — project displacement δ_ij into local basis B_i^T for directional yet equivariant encoding.">
</div>
</div>
<div class="method-box-illustrated">
<div class="method-text">
<h4 class="title is-5" style="margin-bottom: 0.5rem;">4. Equivariant Decoder</h4>
<p>The transformer outputs invariant tokens \(\mathbf{h}_i^{(L)}\). A local query decoder uses a canonical grid \(\mathcal{G} = \{\mathbf{\xi}_m \in [-1,1]^d\}\) mapped via Fourier features as spatial queries. Cross-attention reconstructs dense spatial details:</p>
<div class="equation-highlight">
$$\mathbf{z}^{\text{local}}_{i} = \text{CrossAttn}\left(\mathbf{Q}_{\text{grid}},\, \mathbf{h}_i^{(L)}\right)$$
</div>
<p>Predictions are lifted back to global coordinates: \(\mathbf{u}'_{i} = \mathbf{B}_i \cdot \mathbf{z}^{\text{local}}_{i}\). Since \(\mathbf{z}^{\text{local}}\) is invariant and \(\mathbf{B}_i\) co-rotates with the input, the output is strictly equivariant.</p>
</div>
<div class="method-diagram">
<img src="./static/images/fig_decoder.png" alt="Equivariant Decoder: canonical grid → Fourier features → CrossAttn → z^local (invariant) → B_i lift → u' (equivariant).">
</div>
</div>
<div class="method-box">
<h4 class="title is-5" style="margin-bottom: 0.5rem;">5. Reference Basis Pyramid</h4>
<p>A pyramid of local canonical bases \(\mathcal{B}^{(l)} = \{\mathbf{B}_k^{(l)}\}\) is pre-computed at each resolution \(l\) from the input field. Standard patch merging is invalid for invariant tokens (averaging vectors in disparate local bases breaks physical consistency). Resolution changes use a <strong>“globalize–resample–localize”</strong> procedure: features are projected back to the global frame, undergo valid spatial operations (pooling/interpolation), and are re-projected into the target resolution's local bases—ensuring rotation-invariance across all scales.</p>
</div>
<div class="shift-window-interactive" id="shift-window-demo">
<div class="shift-window-header">
<p class="rebase-kicker">Interactive Shifted Windows</p>
<h3 class="title is-4">Same Tokens, Same Windows — Across Rotation</h3>
<p>ReViT leverages the <em>permutation-equivariance</em> of self-attention applied in each window, so the only thing the shifted-window pipeline needs to guarantee for global rotation equivariance is this: <strong>the set of tokens that share a window is the same in the normal lane and in the rotated lane</strong>. No attention is drawn here — the fake numbers label token identity so you can trace them through the cycle. Step through the stages to watch which tokens land in which window at each moment where attention would run. </p>
</div>
<div class="shift-control-bar">
<div class="shift-control-group">
<span class="shift-control-label">Rotated lane</span>
<div class="shift-button-row" id="shift-rotation-buttons" role="group" aria-label="Rotated lane angle">
<button class="shift-button" type="button" data-rotation="90">90°</button>
<button class="shift-button" type="button" data-rotation="180">180°</button>
<button class="shift-button" type="button" data-rotation="270">270°</button>
</div>
</div>
<div class="shift-control-group shift-stage-control">
<span class="shift-control-label">Pipeline step</span>
<div class="shift-button-row" id="shift-stage-buttons" role="group" aria-label="Shifted-window pipeline step">
<button class="shift-button" type="button" data-stage="input">1. Input 8×8</button>
<button class="shift-button" type="button" data-stage="shiftedInput">2. Shift 8×8</button>
<button class="shift-button" type="button" data-stage="shiftedBackInput">3. Shift back 8×8</button>
<button class="shift-button" type="button" data-stage="merged">4. Merge → 4×4</button>
<button class="shift-button" type="button" data-stage="shiftedMerge">5. Shift 4×4</button>
<button class="shift-button" type="button" data-stage="shiftedBackMerge">6. Shift back 4×4</button>
<button class="shift-button" type="button" data-stage="expanded">7. Expand → 8×8</button>
</div>
</div>
</div>
<div class="shift-window-panels">
<figure class="shift-window-panel">
<figcaption>
<span>Normal lane</span>
<span id="shift-normal-caption">Input 8×8</span>
</figcaption>
<canvas id="shift-normal-canvas" aria-label="Normal shifted-window pipeline"></canvas>
</figure>
<figure class="shift-window-panel">
<figcaption>
<span id="shift-rotated-title">Rotated lane</span>
<span id="shift-rotated-caption">Input 8×8</span>
</figcaption>
<canvas id="shift-rotated-canvas" aria-label="Rotated shifted-window pipeline"></canvas>
</figure>
</div>
<div class="shift-proof-row">
<div class="shift-proof-card">
<span>Tracked identity set</span>
<strong id="shift-tracked-values">36, 37, 44, 45</strong>
</div>
<div class="shift-proof-card">
<span>Window-set invariance at this stage</span>
<strong id="shift-window-invariance">windows match across lanes</strong>
</div>
<div class="shift-proof-card">
<span>Tracked tokens in …</span>
<strong id="shift-tracked-window">—</strong>
</div>
</div>
<p class="shift-stage-note" id="shift-stage-note">Stage 1 — Input 8×8 partitioned into four 4×4 windows.</p>
</div>
<div class="insight-box">
<p class="has-text-weight-semibold">Equivariance Analysis:</p>
<p>ReViT achieves <strong>exact chiral octahedral group \(O\) equivariance</strong> and approximate \(SO(3)\) equivariance. The gap stems from grid-based constraints: resampling introduces interpolation bias, and discretization artifacts arise from fixed patch/window boundaries. Unlike \(\frac{\pi}{2}\) rotations, arbitrary rotations break grid symmetry. Data augmentation helps dampen these artifacts.</p>
</div>
</div>
</div>
</div>
</section>
<hr class="section-divider">
<!-- RESULTS -->
<section class="section">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column is-full-width">
<h2 class="title is-3">Results</h2>
<!-- RotMNIST -->
<h3 class="title is-4">Classification: RotMNIST</h3>
<div class="content has-text-justified">
<p>ReViT achieves <strong>SOTA accuracy (98.26%)</strong> while delivering a ~4× speedup and a ~53× memory reduction (1.81 GB vs. 95.5 GB) compared to lifted baselines. The baselines' inefficiency stems from the lifting operation that expands self-attention complexity to \(\mathcal{O}(N^2 |\mathcal{H}|^2)\).</p>
</div>
<div class="model-table-wrapper">
<table class="table is-fullwidth">
<thead><tr><th>Model</th><th>Acc (%)</th><th>Train (ms)</th><th>Infer (ms)</th><th>Mem (GB)</th></tr></thead>
<tbody>
<tr><td>GSA-Nets(\(R_4\))</td><td>97.46</td><td>298.8±0.9</td><td>110.0±0.1</td><td>5.27</td></tr>
<tr><td>GSA-Nets(\(R_8\))</td><td>97.90</td><td>144.2±2.1</td><td>65.2±0.2</td><td>29.9</td></tr>
<tr><td>GSA-Nets(\(R_{12}\))</td><td>97.97</td><td>272.6±0.6</td><td>118.7±0.5</td><td>95.5</td></tr>
<tr><td>GE-ViT(\(R_{12}\))</td><td>98.01</td><td>281.0±0.7</td><td>118.9±0.3</td><td>95.5</td></tr>
<tr style="font-weight:bold; background:#f0f7ff;"><td>ReViT (Ours)</td><td><strong>98.26</strong></td><td><strong>67.7</strong>±0.2</td><td><strong>31.0</strong>±0.7</td><td><strong>1.81</strong></td></tr>
</tbody>
</table>
</div>
<!-- 2D Advection -->
<h3 class="title is-4">2D Advection (Adv)</h3>
<div class="content has-text-justified">
<p>ReViT achieves the lowest MSE (\(\approx 10^{-4}\)) among all compared methods. It consistently outperforms the non-equivariant PDETrans, highlighting the contribution of equivariant mechanisms. The computational overhead of ReViT is comparable to PDETrans with only 11.6% increase, yet delivers <strong>37.2% MSE reduction</strong>.</p>
</div>
<!-- KF Robustness -->
<h3 class="title is-4">Robustness on Arbitrary Angles (2D Kolmogorov Flow)</h3>
<div class="content has-text-justified">
<p>We analyze prediction accuracy over 20 rollout steps across angular intervals of \(\frac{\pi}{12}\) within the range \((0, \pi)\), focusing on orthogonal angular pairs (\(\theta\) and \(\theta + \frac{\pi}{2}\)). ReViT demonstrates <strong>perfect equivariance</strong> for all orthogonal pairs with exactly <strong>+0.0% relative error</strong>, regardless of input angle \(\theta\). PDETrans shows high variance (up to <strong>+162.8%</strong>) on unseen angles.</p>
</div>
<!-- 3D MHD -->
<h3 class="title is-4">3D Magnetohydrodynamics (MHD)</h3>
<div class="content has-text-justified">
<p>ReViT achieves the <strong>lowest MSE (\(0.82 \times 10^{-2}\))</strong> and <strong>highest \(R^2\) (0.98)</strong>, outperforming the strongest baseline (AViT) by approximately <strong>44% in MSE</strong>. ReViT preserves sharp, high-frequency structures of the magnetic field and velocity eddies, remaining virtually indistinguishable from the reference.</p>
</div>
<!-- 3D TCF -->
<h3 class="title is-4">3D Turbulent Channel Flow (TCF)</h3>
<div class="content has-text-justified">
<p>TCF represents a symmetry-starved regime with severe spatial anisotropy. ReViT performs the best with MSE of \(0.21 \times 10^{-2}\) and \(R^2\) of 0.96, representing a <strong>65% error reduction</strong> compared to the next best models.</p>
</div>
<!-- Combined Results Table -->
<h3 class="title is-4">3D Quantitative Results</h3>
<div class="content has-text-justified">
<p>Metrics computed over the full chiral octahedral group \(O\) with three different seeds, reported as mean ± std:</p>
</div>
<div class="model-table-wrapper">
<table class="table is-fullwidth">
<thead>
<tr>
<th rowspan="2">Model</th>
<th colspan="2" style="text-align:center;">MHD</th>
<th colspan="2" style="text-align:center;">TCF</th>
</tr>
<tr>
<th>MSE (\(\times 10^{-2}\)) ↓</th><th>\(R^2\) ↑</th>
<th>MSE (\(\times 10^{-2}\)) ↓</th><th>\(R^2\) ↑</th>
</tr>
</thead>
<tbody>
<tr><td>AFNO</td><td>16.40 ± 42.30</td><td>0.60 ± 1.00</td><td>28.40 ± 56.40</td><td>-3.79 ± 9.49</td></tr>
<tr><td>P3D</td><td>10.20 ± 6.24</td><td>0.73 ± 0.15</td><td>5.72 ± 2.94</td><td>0.04 ± 0.05</td></tr>
<tr><td>UNet3D</td><td>3.64 ± 0.93</td><td>0.90 ± 0.03</td><td>7.12 ± 3.67</td><td>-0.20 ± 0.62</td></tr>
<tr><td>Swin3D</td><td>3.58 ± 1.19</td><td>0.90 ± 0.03</td><td>0.60 ± 0.22</td><td>0.90 ± 0.04</td></tr>
<tr><td>AViT</td><td>2.20 ± 0.36</td><td>0.94 ± 0.01</td><td>0.60 ± 0.23</td><td>0.90 ± 0.04</td></tr>
<tr style="font-weight:bold; background:#f0f7ff;"><td>ReViT-3D (Ours)</td><td><strong>0.82 ± 0.00</strong></td><td><strong>0.98 ± 0.00</strong></td><td><strong>0.21 ± 0.00</strong></td><td><strong>0.96 ± 0.00</strong></td></tr>
</tbody>
</table>
</div>
<!-- Ablation -->
<h3 class="title is-4">Ablation Study</h3>
<div class="content has-text-justified">
<p>A systematic ablation study identifies the necessity of each ReViT component. Removing any single component leads to measurable degradation in both accuracy and equivariance.</p>
</div>
</div>
</div>
</div>
</section>
<hr class="section-divider">
<!-- BIBTEX -->
<section class="section" id="BibTeX">
<div class="container is-max-desktop content">
<h2 class="title">BibTeX</h2>
<pre><code>@inproceedings{ReViT2026,
title = {{ReViT}: Rotational-equivariant Vision Transformers for Neural {PDE} Solvers},
author = {Hao Wei and Bjoern List and Nils Thuerey},
booktitle = {Forty-Third International Conference on Machine Learning},
year = {2026},
}</code></pre>
</div>
</section>
<footer class="footer">
<div class="container">
<div class="content has-text-centered">
<p>Page template borrowed from <a href="https://nerfies.github.io/">Nerfies</a>.</p>
</div>
</div>
</footer>
</body>
</html>