Serial implementation for ParaDiag with collocation methods #530
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In order to keep the PRs as short as possible, I thought now would be a good time to do the first one on ParaDiag.
It is still pretty long, and I do apologize for that.. Feel free to comment @colinjcotter and @JHopeCollins, but I don't expect you have time for reading the whole thing 🙈
The ParaDiag algorithm
Like PFASST, ParaDiag is a solver for the composite collocation problem or all-at-once problem in ParaDiag lingo. In the implementation here, the composite collocation problem is solved via predonditioned Picard iteration, where the preconditioner is a circularized version of the composite collocation problem that can be diagonalized and inverted in parallel.
Features
Sweepers
The sweepers are responsible for solving collocation problems in pySDC. I implemented two new ones for ParaDiag
In principle, the first one can be used within SDC, but the default configuration is unsuitable. I put some documentation to this regard and clearly labeled them as ParaDiag sweepers to avoid confusion.
ParaDiag Controller
The non-MPI ParaDiag controller contains a lot of copied code from the non-MPI SDC controller. It has an option for averaging the Jacobian. If you have a nonlinear problem, you have to do that, but for linear problems, this just adds communication. Defaults to
True
. The controller therefore supports linear problems, nonlinear problems with IMEX splitting and nonlinear problems with averaged Jacobians.Does no support multiple levels at the moment. Since one advantage ParaDiag is not use any coarsening, I am not sure this will ever be needed.
Tutorials
Added step 9 with three parts:
Peculiar implementation details
Weird choice 1: Swapping the solution for the residual
The natural formulation to implement in pySDC would be to solve$C_\alpha u^{k+1} = (C_\alpha - C) u^k + u_0$ , where $C$ is the composite collocation problem operator and $C_\alpha$ the preconditioner. However, you want to avoid evaluating $C_\alpha u^k$ because it involves multiplication by an averaged Jacobian for non-linear problems, which is not implemented in pySDC.$C_\alpha \delta = u_0 - Cu^k, u^{k+1} = u^k + \delta$ . Now we only need to evaluate the residual of the composite collocation problem in the right hand side.
Therefore, we instead split the iteration into
The disadvantage is that now we need to swap the solution values for the residual values before solving. This is a bit counter intuitive from a software point of view, where a variable called
u
now holds the solution outside of the ParaDiag iteration, but inside of the ParaDiag iteration holds the residual. Anyways, I don't see a better way.Weird choice 2: Computing the Fourier transform via matrix-vector multiplication
The circularised preconditioner can be diagonalized via Fourier transforms. The default way of doing this would be to use a distributed FFT library with space-time-global data that is distributed in the time and possibly in further space directions. Before and after the solves, the data would be transposed to be aligned in the time direction, the FFTs in time would be executed in parallel, the data would be transposed back to be distributed in time and then the solves can proceed.
However, this does not fit great with the way we treat datatypes in pySDC. So as a first implementation, I have something else: Avoid the transposes by computing the transforms as distributed matrx-vector products.$N^2$ operations instead of $N\log N$ in the FFT, they are computed in Reduce operations, which typically use tree algorithms afaik. So the individual transforms may take similar amounts of time. The main disadvantage is, however, that the transforms are performed in serial. All tasks are involved in the reduce operation to compute each datapoint in the transform.
The advantage is clearly that we do not need to worry about generating intermediate numpy arrays in the correct shape that allow to do the transforms, but we can operate on the problem datatypes directly. I am not sure how slow the transforms are because while they require
For few parallel steps, this should not matter too much. And especially not for a serial implementation where all this pretty hyphotical anyways. Eventually, it may be worth to implement this with proper ditributed FFTs, but I suggest we worry about this later when doing the parallel implementation.
Tests
Caveats
In the controller:
Outside the controller:
Up next