Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serial implementation for ParaDiag with collocation methods #530

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

brownbaerchen
Copy link
Contributor

@brownbaerchen brownbaerchen commented Feb 14, 2025

In order to keep the PRs as short as possible, I thought now would be a good time to do the first one on ParaDiag.
It is still pretty long, and I do apologize for that.. Feel free to comment @colinjcotter and @JHopeCollins, but I don't expect you have time for reading the whole thing 🙈

The ParaDiag algorithm

Like PFASST, ParaDiag is a solver for the composite collocation problem or all-at-once problem in ParaDiag lingo. In the implementation here, the composite collocation problem is solved via predonditioned Picard iteration, where the preconditioner is a circularized version of the composite collocation problem that can be diagonalized and inverted in parallel.

Features

Sweepers

The sweepers are responsible for solving collocation problems in pySDC. I implemented two new ones for ParaDiag

  • Sweeper that solves collocation problem with diagonalization of Q.
  • "IMEX" Sweeper that solves only the implicit part of the collocation problem, but evaluates the whole residual

In principle, the first one can be used within SDC, but the default configuration is unsuitable. I put some documentation to this regard and clearly labeled them as ParaDiag sweepers to avoid confusion.

ParaDiag Controller

The non-MPI ParaDiag controller contains a lot of copied code from the non-MPI SDC controller. It has an option for averaging the Jacobian. If you have a nonlinear problem, you have to do that, but for linear problems, this just adds communication. Defaults to True. The controller therefore supports linear problems, nonlinear problems with IMEX splitting and nonlinear problems with averaged Jacobians.
Does no support multiple levels at the moment. Since one advantage ParaDiag is not use any coarsening, I am not sure this will ever be needed.

Tutorials

Added step 9 with three parts:

  • A: linear ParaDiag. Constructs global numpy arrays and matrices and solves the problem sequentially and then "in parallel"
  • B: nonlinear ParaDiag. Adds communication of the averaged solution for contstructing the averaged Jacobian
  • C: Running ParaDiag in pySDC. Compares to single level PFASST and serial timestepping for advection and van der Pol.

Peculiar implementation details

Weird choice 1: Swapping the solution for the residual

The natural formulation to implement in pySDC would be to solve $C_\alpha u^{k+1} = (C_\alpha - C) u^k + u_0$, where $C$ is the composite collocation problem operator and $C_\alpha$ the preconditioner. However, you want to avoid evaluating $C_\alpha u^k$ because it involves multiplication by an averaged Jacobian for non-linear problems, which is not implemented in pySDC.
Therefore, we instead split the iteration into $C_\alpha \delta = u_0 - Cu^k, u^{k+1} = u^k + \delta$. Now we only need to evaluate the residual of the composite collocation problem in the right hand side.
The disadvantage is that now we need to swap the solution values for the residual values before solving. This is a bit counter intuitive from a software point of view, where a variable called u now holds the solution outside of the ParaDiag iteration, but inside of the ParaDiag iteration holds the residual. Anyways, I don't see a better way.

Weird choice 2: Computing the Fourier transform via matrix-vector multiplication

The circularised preconditioner can be diagonalized via Fourier transforms. The default way of doing this would be to use a distributed FFT library with space-time-global data that is distributed in the time and possibly in further space directions. Before and after the solves, the data would be transposed to be aligned in the time direction, the FFTs in time would be executed in parallel, the data would be transposed back to be distributed in time and then the solves can proceed.

However, this does not fit great with the way we treat datatypes in pySDC. So as a first implementation, I have something else: Avoid the transposes by computing the transforms as distributed matrx-vector products.
The advantage is clearly that we do not need to worry about generating intermediate numpy arrays in the correct shape that allow to do the transforms, but we can operate on the problem datatypes directly. I am not sure how slow the transforms are because while they require $N^2$ operations instead of $N\log N$ in the FFT, they are computed in Reduce operations, which typically use tree algorithms afaik. So the individual transforms may take similar amounts of time. The main disadvantage is, however, that the transforms are performed in serial. All tasks are involved in the reduce operation to compute each datapoint in the transform.

For few parallel steps, this should not matter too much. And especially not for a serial implementation where all this pretty hyphotical anyways. Eventually, it may be worth to implement this with proper ditributed FFTs, but I suggest we worry about this later when doing the parallel implementation.

Tests

  • Multiplication by FT matrices yields the same result as numpy FFT implementation
  • Q diagonalization sweeper yields the same result as solving the collocation problem directly for a Dahlquist problem
  • ParaDiag converges for Dahlquist, IMEX Dahlquist, IMEX heat equation and van der Pol oscillator
  • ParaDiag gives the same result as PFASST controller for Dahlquist, IMEX Dahlquist and van der Pol oscillator
  • ParaDiag has expected order of accuracy for Dahlquist problem
  • Tutorials

Caveats

In the controller:

  • Number of steps has to be divisible by number of tasks. Otherwise, some matrices need to be recomputed. Should be added in the future. There is a warning when the controller continues beyond what you would expect from the PFASST controllers.
  • The residual is computed before the ParaDiag iteration and then used as stopping criterion afterwards. This is because of the increment formulation where we swap the solution for the residual before the solves. Makes sense to compute this only once. You may be a bit confused why the residual at the end is way below the tolerance, but that's all.

Outside the controller:

  • This PR contains the same IMEX Dahlquist implementation from Added IMEX Dahlquist equation #529 because that is used in the tests..
  • Petsc is still not working in the pipeline. It is working locally, so I don't know what's going on...

Up next

  • MPI parallel implementation of sweepers and controller
  • Allow swapping between PFASST and ParaDiag controller to allow things like using ParaDiag as "coarse" method in PFASST.
  • Implement Gaya's adaptive selection of $\alpha$
  • Reproduce Gaya's results. (First I need to figure out some details of what she did...)
  • Fix FFTs

@pancetta
Copy link
Member

Please merge master here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants