Serial implementation for ParaDiag with collocation methods #530

brownbaerchen · 2025-02-14T16:05:16Z

In order to keep the PRs as short as possible, I thought now would be a good time to do the first one on ParaDiag.
It is still pretty long, and I do apologize for that.. Feel free to comment @colinjcotter and @JHopeCollins, but I don't expect you have time for reading the whole thing 🙈

The ParaDiag algorithm

Like PFASST, ParaDiag is a solver for the composite collocation problem or all-at-once problem in ParaDiag lingo. In the implementation here, the composite collocation problem is solved via predonditioned Picard iteration, where the preconditioner is a circularized version of the composite collocation problem that can be diagonalized and inverted in parallel.

Features

Sweepers

The sweepers are responsible for solving collocation problems in pySDC. I implemented two new ones for ParaDiag

Sweeper that solves collocation problem with diagonalization of Q.
"IMEX" Sweeper that solves only the implicit part of the collocation problem, but evaluates the whole residual

In principle, the first one can be used within SDC, but the default configuration is unsuitable. I put some documentation to this regard and clearly labeled them as ParaDiag sweepers to avoid confusion.

ParaDiag Controller

The non-MPI ParaDiag controller contains a lot of copied code from the non-MPI SDC controller. It has an option for averaging the Jacobian. If you have a nonlinear problem, you have to do that, but for linear problems, this just adds communication. Defaults to True. The controller therefore supports linear problems, nonlinear problems with IMEX splitting and nonlinear problems with averaged Jacobians.
Does no support multiple levels at the moment. Since one advantage ParaDiag is not use any coarsening, I am not sure this will ever be needed.

Tutorials

Added step 9 with three parts:

A: linear ParaDiag. Constructs global numpy arrays and matrices and solves the problem sequentially and then "in parallel"
B: nonlinear ParaDiag. Adds communication of the averaged solution for contstructing the averaged Jacobian
C: Running ParaDiag in pySDC. Compares to single level PFASST and serial timestepping for advection and van der Pol.

Peculiar implementation details

Weird choice 1: Swapping the solution for the residual

The natural formulation to implement in pySDC would be to solve $C_\alpha u^{k+1} = (C_\alpha - C) u^k + u_0$, where $C$ is the composite collocation problem operator and $C_\alpha$ the preconditioner. However, you want to avoid evaluating $C_\alpha u^k$ because it involves multiplication by an averaged Jacobian for non-linear problems, which is not implemented in pySDC.
Therefore, we instead split the iteration into $C_\alpha \delta = u_0 - Cu^k, u^{k+1} = u^k + \delta$. Now we only need to evaluate the residual of the composite collocation problem in the right hand side.
The disadvantage is that now we need to swap the solution values for the residual values before solving. This is a bit counter intuitive from a software point of view, where a variable called u now holds the solution outside of the ParaDiag iteration, but inside of the ParaDiag iteration holds the residual. Anyways, I don't see a better way.

Weird choice 2: Computing the Fourier transform via matrix-vector multiplication

The circularised preconditioner can be diagonalized via Fourier transforms. The default way of doing this would be to use a distributed FFT library with space-time-global data that is distributed in the time and possibly in further space directions. Before and after the solves, the data would be transposed to be aligned in the time direction, the FFTs in time would be executed in parallel, the data would be transposed back to be distributed in time and then the solves can proceed.

However, this does not fit great with the way we treat datatypes in pySDC. So as a first implementation, I have something else: Avoid the transposes by computing the transforms as distributed matrx-vector products.
The advantage is clearly that we do not need to worry about generating intermediate numpy arrays in the correct shape that allow to do the transforms, but we can operate on the problem datatypes directly. I am not sure how slow the transforms are because while they require $N^2$ operations instead of $N\log N$ in the FFT, they are computed in Reduce operations, which typically use tree algorithms afaik. So the individual transforms may take similar amounts of time. The main disadvantage is, however, that the transforms are performed in serial. All tasks are involved in the reduce operation to compute each datapoint in the transform.

For few parallel steps, this should not matter too much. And especially not for a serial implementation where all this pretty hyphotical anyways. Eventually, it may be worth to implement this with proper ditributed FFTs, but I suggest we worry about this later when doing the parallel implementation.

Tests

Multiplication by FT matrices yields the same result as numpy FFT implementation
Q diagonalization sweeper yields the same result as solving the collocation problem directly for a Dahlquist problem
ParaDiag converges for Dahlquist, IMEX Dahlquist, IMEX heat equation and van der Pol oscillator
ParaDiag gives the same result as PFASST controller for Dahlquist, IMEX Dahlquist and van der Pol oscillator
ParaDiag has expected order of accuracy for Dahlquist problem
Tutorials

Caveats

In the controller:

Number of steps has to be divisible by number of tasks. Otherwise, some matrices need to be recomputed. Should be added in the future. There is a warning when the controller continues beyond what you would expect from the PFASST controllers.
The residual is computed before the ParaDiag iteration and then used as stopping criterion afterwards. This is because of the increment formulation where we swap the solution for the residual before the solves. Makes sense to compute this only once. You may be a bit confused why the residual at the end is way below the tolerance, but that's all.

Outside the controller:

This PR contains the same IMEX Dahlquist implementation from Added IMEX Dahlquist equation #529 because that is used in the tests..
Petsc is still not working in the pipeline. It is working locally, so I don't know what's going on...

Up next

MPI parallel implementation of sweepers and controller
Allow swapping between PFASST and ParaDiag controller to allow things like using ParaDiag as "coarse" method in PFASST.
Implement Gaya's adaptive selection of $\alpha$
Reproduce Gaya's results. (First I need to figure out some details of what she did...)
Fix FFTs

pancetta · 2025-02-15T11:54:30Z

Please merge master here.

problem

Added serial implementation for ParaDiag for collocation methods

773bc26

brownbaerchen added 5 commits February 15, 2025 14:25

Merge remote-tracking branch 'upstream/master' into merge_paradiag

6e0dd72

Removed some old comments

bdfa5c9

Added test that ParaDiag converges within the bounds for Dahlquist

d313aef

problem

Fix test on older Python versions

8254c83

Added smaller values of alpha to the tests

9cddb17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serial implementation for ParaDiag with collocation methods #530

Serial implementation for ParaDiag with collocation methods #530

brownbaerchen commented Feb 14, 2025 •

edited

Loading

pancetta commented Feb 15, 2025

Serial implementation for ParaDiag with collocation methods #530

Are you sure you want to change the base?

Serial implementation for ParaDiag with collocation methods #530

Conversation

brownbaerchen commented Feb 14, 2025 • edited Loading

The ParaDiag algorithm

Features

Sweepers

ParaDiag Controller

Tutorials

Peculiar implementation details

Weird choice 1: Swapping the solution for the residual

Weird choice 2: Computing the Fourier transform via matrix-vector multiplication

Tests

Caveats

Up next

pancetta commented Feb 15, 2025

brownbaerchen commented Feb 14, 2025 •

edited

Loading