Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose parallel path #4

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions decisions/0004-parallel-path.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Print Formatting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title needs to be updated.


- Status: **Proposed**
- Deciders: Adam Washington, Dan Nixon, Tristan Youngs
- Date: 16-11-2020

## Context and Problem Statement

The need for a comprehensive parallelism solution throughout the code
to enable large simulations to be completed in a reasonable time while
running on diverse architectures.


## Decision Drivers

- Current MPI solution does not easily support multiple platforms
- Need to support multiple hardware architectures (e.g. multi-core, GPU)


## Considered Options

1. [MPI](https://www.mpi-forum.org)
2. [Parallel STL](https://en.cppreference.com/w/cpp/header/execution)
3. [OpenMP](https://openmp.org)
4. [CUDA](https://developer.nvidia.com/cuda-zone)
5. [OpenCL](https://www.khronos.org/opencl/)
6. [CyCL](https://www.khronos.org/sycl/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
6. [CyCL](https://www.khronos.org/sycl/)
6. [SYCL](https://www.khronos.org/sycl/)


## Decision Outcome

We ultimately have a 4 stage path forward

### 1) Modernisation

We continue moving the code base toward standard C++ containers. This
provides more information to the compiler and should provide
performance benefits under all circumstances. Even if the roadmap for
parallelism changes, it is likely that this will still provide benefits
and highly unlikely that it will be a hindrance.

### 2) STL Parallelism

Using standard containers will enable us to use standard algorithms,
many of which now support parallelism as of C++17. This will involve
converting many of our loops into algorithms and lambdas. Beyond
immediately providing a speed-up on multi-core machines, this process
moves the code base into a position that will be easier to parallelize
through one of the GPU libraries

### 3) CyCL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### 3) CyCL
### 3) SYCL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And throughout the following paragraphs.


Once the parallel STL components are in place, we can then start
moving some of these pieces into CyCL. This will involve more
marshalling of arrays than straight C++ STL, but will enable GPU
access that should hopefully provide speed ups of orders of magnitude.

### 4) MPI removal

If the CyCL transition is successful and is able to perform faster on
a single GPU enabled node than the MPI version is able to perform
across multiple nodes in a cluster, then it would be possible to
remove the MPI code from dissolve. This would simplify multiple
sections of the code and the build systems, as well as eliminating
much of our use of the pre-processor.



### Positive Consequences

- No point in this process should involve a performance regression for the users.
- This should enable code speed-ups on everything from a laptop to a cluster.
- Should new technologies emerge, using algorithms will enable shifting the code faster than the current for-loops
- Intel's support of CyCL ensures that the library should continue to be available and supported for the foreseeable future.


### Negative Consequences

- It will not be possible to benchmark the CyCL code until late in the process. Should the CyCL code outperform expectation and the STL underperform, it's possible that excessive time may be spent on stage 2 instead of skipping straight to stage 3.
- CyCL is a relatively new library, so the avaiable knowledge and best
practices aren't as well defined.


## Pros and Cons of the Options

### MPI

- Good, already written
- Good, mature technology
- Good, supported by computing clusters at many institutions
- Bad, poor workstation support
- Bad, memory usage scales with processor count
- Bad, communication overhead between nodes
- Bad, no support for advanced hardware

### Parallel STL

- Good, supported on all platforms with a C++17 compiler
- Good, will likely be supported as long as C++ exists
- Good, support for multiple cores and vector instructions
- Good, support for standard C++ containers and algorithms
- Bad, requires algorithms to be specifically labelled (no just for loops)
- Bad, no support for advanced hardware (GPU code)

### OpenMP

- Good, supported by most common C++ compilers
- Good, support for multiple cores and vector instructions
- Good, requires simple annotations of existing for-loops
- Bad, weak support for advanced hardware (GPU code)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the case, OpenMP can target GPUs. It is only in the latest spec IIRC but there is support for at least some Nvidia GPUs from what I recall.

- Bad, no support for C++ algorithms

### CUDA

- Good, support for NVIDIA GPUs
- Good, significant tooling support available
- Good, plentiful libraries
- Good, single C++ code base
- Bad, no support for non-NVIDIA GPUs

### OpenCL

- Good, support for GPUs from multiple vendors
- Good, support for multi-core systems
- Bad, notoriously heavy boiler plate
- Bad, splits code base into two languages
- Bad, poor vendor support

### CyCL

- Good, support for GPUs from multiple vendors
- Good, support for multi-core systems
- Good, single C++ code base
- Good, tooling support available from Intel
- Good, multiple implementations
- Bad, newer technology with few best practices established