Skip to content

Commit

Permalink
Extend GPU documentation (#1073)
Browse files Browse the repository at this point in the history
* Extend GPU documentation

* fix

* GPU doc up
  • Loading branch information
mfherbst authored Mar 4, 2025
1 parent d22f8ae commit 7cce302
Show file tree
Hide file tree
Showing 7 changed files with 101 additions and 19 deletions.
2 changes: 2 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[deps]
AMDGPU = "21141c5a-9bdb-4563-92ae-f87d6854732e"
ASEconvert = "3da9722f-58c2-4165-81be-b4d7253e8fd2"
Artifacts = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"
AtomsBase = "a963bdd2-2df7-4f54-a1ee-49d51e6be12a"
Expand All @@ -7,6 +8,7 @@ AtomsCalculators = "a3e0e189-c65a-42c1-833c-339540406eb1"
AtomsIO = "1692102d-eeb4-4df9-807b-c9517f998d44"
AtomsIOPython = "9e4c859b-2281-48ef-8059-f50fe53c37b0"
Brillouin = "23470ee3-d0df-4052-8b1a-8cbd6363e7f0"
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
DFTK = "acf6eb54-70d9-11e9-0013-234b7a5f5337"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
Expand Down
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ PAGES = [
# reliability of calculations.
"tricks/achieving_convergence.md",
"tricks/parallelization.md",
"tricks/gpu.jl",
"tricks/scf_checkpoints.jl",
"tricks/compute_clusters.md",
],
Expand Down
1 change: 1 addition & 0 deletions docs/src/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
- Direct minimization, Newton solver
- Multi-level threading (``k``-points eigenvectors, FFTs, linear algebra)
- MPI-based distributed parallelism (distribution over ``k``-points)
- [Using DFTK on GPUs](@ref): Nvidia *(mostly supported)* and AMD GPUs *(preliminary support)*
- Treat systems of 1000 electrons

* Ground-state properties and post-processing:
Expand Down
14 changes: 14 additions & 0 deletions docs/src/guide/tutorial.jl
Original file line number Diff line number Diff line change
Expand Up @@ -116,3 +116,17 @@ plot_dos(bands; temperature=1e-3, smearing=Smearing.FermiDirac())
# Note that directly employing the `scfres` also works, but the results
# are much cruder:
plot_dos(scfres; temperature=1e-3, smearing=Smearing.FermiDirac())

# !!! info "Where to go from here"
# - **Background on DFT:**
# * [Periodic problems](@ref periodic-problems),
# * [Introduction to density-functional theory](@ref),
# * [Self-consistent field methods](@ref)
# - **Running calculations:**
# * [Temperature and metallic systems](@ref metallic-systems)
# * [Performing a convergence study](@ref)
# * [Geometry optimization](@ref)
# - **Tips and tricks:**
# * [Using DFTK on compute clusters](@ref),
# * [Using DFTK on GPUs](@ref),
# * [Saving SCF results on disk and SCF checkpoints](@ref)
78 changes: 78 additions & 0 deletions docs/src/tricks/gpu.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# # Using DFTK on GPUs
#
# In this example we will look how DFTK can be used on
# Graphics Processing Units.
# In its current state runs based on Nvidia GPUs
# using the [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) Julia
# package are better supported and there are considerably less rough
# edges.
#
# !!! info "GPU parallelism not supported everywhere"
# GPU support is still a relatively new feature in DFTK.
# While basic SCF computations and e.g. forces are supported,
# this is not yet the case for all parts of the code.
# In most cases there is no intrinsic limitation and typically it only takes
# minor code modification to make it work on GPUs.
# If you require GPU support in one of our routines, where this is not
# yet supported, feel free to open an issue on github or otherwise get in touch.
#

using AtomsBuilder
using DFTK
using PseudoPotentialData

# **Model setup.** First step is to setup a [`Model`](@ref) in DFTK.
# This proceeds exactly as in the standard CPU case
# (see also our [Tutorial](@ref)).

silicon = bulk(:Si)

model = model_DFT(silicon;
functionals=PBE(),
pseudopotentials=PseudoFamily("dojo.nc.sr.pbe.v0_4_1.standard.upf"))
nothing # hide

# Next is the selection of the computational architecture.
# This effectively makes the choice, whether the computation will be run
# on the CPU or on a GPU.
#
# **Nvidia GPUs.**
# Supported via [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl).
# Right now `Libxc` only supports CUDA 11,
# so we need to explicitly request the 11.8 CUDA runtime:
using CUDA
CUDA.set_runtime_version!(v"11.8") # Note: This requires a restart of Julia
architecture = DFTK.GPU(CuArray)

# **AMD GPUs.** Supported via [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl).
# Here you need to [install ROCm](https://rocm.docs.amd.com/) manually.
# With that in place you can then select:

using AMDGPU
architecture = DFTK.GPU(ROCArray)

# **Portable architecture selection.**
# To make sure this script runs on the github CI (where we don't have GPUs
# available) we check for the availability of GPUs before selecting an
# architecture:

architecture = has_cuda() ? DFTK.GPU(CuArray) : DFTK.CPU()

# **Basis and SCF.**
# Based on the `architecture` we construct a [`PlaneWaveBasis`](@ref) object
# as usual:

basis = PlaneWaveBasis(model; Ecut=30, kgrid=(5, 5, 5), architecture)
nothing # hide

# ... and run the SCF and some post-processing:

scfres = self_consistent_field(basis; tol=1e-6)
compute_forces(scfres)

# !!! warning "GPU performance"
# Our current (February 2025) benchmarks show DFTK to have reasonable performance
# on Nvidia / CUDA GPUs with a 50-fold to 100-fold speed-up over single-threaded
# CPU execution. However, support on AMD GPUs has been less benchmarked and
# there are likely rough edges. Overall this feature is relatively new
# and we appreciate any experience reports or bug reports.
10 changes: 5 additions & 5 deletions docs/src/tricks/parallelization.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ and secondly multiprocessing using MPI
(via the [MPI.jl](https://github.com/JuliaParallel/MPI.jl) Julia interface).
MPI-based parallelism is currently only over ``k``-points,
such that it cannot be used for calculations with only a single ``k``-point.
Otherwise combining both forms of parallelism is possible as well.
There is also support for [Using DFTK on GPUs](@ref).

The scaling of both forms of parallelism for a number of test cases
is demonstrated in the following figure.
Expand Down Expand Up @@ -126,10 +126,10 @@ DFTK.mpi_master() || (redirect_stdout(); redirect_stderr())
```
at the top of your script to disable printing on all processes but one.

!!! info "MPI-based parallelism not fully supported"
While standard procedures (such as the SCF or band structure calculations)
fully support MPI, not all routines of DFTK are compatible with MPI yet
and will throw an error when being called in an MPI-parallel run.
!!! info "MPI-based parallelism not supported everywhere"
While most standard procedures are now supported in combination with MPI,
some functionality is still missing and may error out when being called
in an MPI-parallel run.
In most cases there is no intrinsic limitation it just has not yet been
implemented. If you require MPI in one of our routines, where this is not
yet supported, feel free to open an issue on github or otherwise get in touch.
Expand Down
14 changes: 0 additions & 14 deletions examples/cuda.jl

This file was deleted.

0 comments on commit 7cce302

Please sign in to comment.