diff --git a/docs/aurora/compiling-and-linking/continuous-integration-aurora.md b/docs/aurora/compiling-and-linking/continuous-integration-aurora.md deleted file mode 100644 index f4b23cf4d..000000000 --- a/docs/aurora/compiling-and-linking/continuous-integration-aurora.md +++ /dev/null @@ -1 +0,0 @@ -# Continuous Integration Aurora diff --git a/docs/aurora/data-management/moving_data_to_aurora/globus.md b/docs/aurora/data-management/moving_data_to_aurora/globus.md index b2bc571a9..7ffd38a3f 100644 --- a/docs/aurora/data-management/moving_data_to_aurora/globus.md +++ b/docs/aurora/data-management/moving_data_to_aurora/globus.md @@ -1,12 +1,7 @@ # Transferring Files through Globus +For transfers to/from [Flare](../lustre/flare.md), you may use the Globus collection `alcf#dtn_flare`. -## During Acceptance Testing Period - -We have set up a temporary Globus endpoint for Flare that you can use to transfer data out. The endpoint is called "alcf#dtn_flare_at" and is set up for read-only access. The endpoint will be available throughout the acceptance testing (AT) period and will be shut down after AT concludes. - -## Before Acceptance Testing - -Currently, only Globus Personal is supported on Aurora. Perform the following steps to transfer data to/from the Aurora login nodes. +Currently, for transfers to/from Aurora `/home`, only Globus Connect Personal is supported. Perform the following steps to transfer data to/from there: 1. On a fresh connection to the login nodes, ensure no proxies are being set (which may require commenting out the proxy settings in the `~/.bashrc` or `~/.bash_profile` files), and execute: @@ -15,10 +10,11 @@ Currently, only Globus Personal is supported on Aurora. Perform the following st ``` 2. Paste the link provided by the above command into a browser and follow the instructions to set up a personal endpoint: - * When requested, input your ALCF username and one-time password from your CRYPTOCard/MobilePASS+ token. - * Select the Allow button. - * Enter the authentication code generated back into the terminal. - * Enter a name for the endpoint (e.g., `aurora_login_uan11`). + + - When requested, input your ALCF username and one-time password from your CRYPTOCard/MobilePASS+ token. + - Select the Allow button. + - Enter the authentication code generated back into the terminal. + - Enter a name for the endpoint (e.g., `aurora_login_uan11`). 3. On the same terminal, execute: @@ -26,7 +22,8 @@ Currently, only Globus Personal is supported on Aurora. Perform the following st /soft/tools/proxychains/bin/proxychains4 -f /soft/tools/proxychains/etc/proxychains.conf /soft/tools/globusconnect/globusconnect -start & ``` - * By default, the command only gives access to your home directory. - * You can add `-restrict-paths /lus/flare/projects/YOURPROJECT` to access your project directory. + - By default, the command only gives access to your home directory. + - You can add `-restrict-paths /lus/flare/projects/YOURPROJECT` to access your project directory. + +4. Open the [Globus web app](https://app.globus.org/file-manager?destination_id=05d2c76a-e867-4f67-aa57-76edeb0beda0) and search for the endpoint name defined above. You will now see your home directory (and project directory, if requested) on Aurora and can initiate transfers with other endpoints (e.g., the Eagle file system on Polaris at `alcf#dtn_eagle`). -4. Open the [Globus web app](https://app.globus.org/file-manager?destination_id=05d2c76a-e867-4f67-aa57-76edeb0beda0) and search for the endpoint name defined above. You will now see your home directory (and project directory, if requested) on Aurora and can initiate transfers with other endpoints (e.g., the Eagle file system on Polaris at `alcf#dtn_eagle`). \ No newline at end of file diff --git a/docs/aurora/data-science/applications/gpt-neox.md b/docs/aurora/data-science/applications/gpt-neox.md deleted file mode 100644 index f4452d9f0..000000000 --- a/docs/aurora/data-science/applications/gpt-neox.md +++ /dev/null @@ -1 +0,0 @@ -# Instruction for gpt-neox on Aurora diff --git a/docs/aurora/data-science/frameworks/jax.md b/docs/aurora/data-science/frameworks/jax.md deleted file mode 100644 index 57f25f6d1..000000000 --- a/docs/aurora/data-science/frameworks/jax.md +++ /dev/null @@ -1 +0,0 @@ -# Jax on Aurora diff --git a/docs/aurora/data-science/julia.md b/docs/aurora/data-science/julia.md deleted file mode 100644 index 407233540..000000000 --- a/docs/aurora/data-science/julia.md +++ /dev/null @@ -1 +0,0 @@ -# Julia on Aurora diff --git a/docs/aurora/getting-started-on-aurora.md b/docs/aurora/getting-started-on-aurora.md index 94d60e267..9ef398461 100644 --- a/docs/aurora/getting-started-on-aurora.md +++ b/docs/aurora/getting-started-on-aurora.md @@ -10,7 +10,7 @@ Then, type in the password from your CRYPTOCard/MobilePASS+ token. ## Hardware Overview -An overview of the Aurora system, including details on the compute node architecture, is available on the [Machine Overview](./hardware-overview/machine-overview.md) page. +An overview of the Aurora system, including details on the compute node architecture, is available on the [Machine Overview](./machine-overview.md) page. ## Compiling Applications diff --git a/docs/aurora/hardware-overview/machine-overview.md b/docs/aurora/machine-overview.md similarity index 98% rename from docs/aurora/hardware-overview/machine-overview.md rename to docs/aurora/machine-overview.md index 938aab3d5..f758380dd 100644 --- a/docs/aurora/hardware-overview/machine-overview.md +++ b/docs/aurora/machine-overview.md @@ -2,7 +2,7 @@ Aurora is a 10,624-node HPE Cray-Ex based system. It has 166 racks with 21,248 CPUs and 63,744 GPUs. Each node consists of 2 Intel Xeon CPU Max Series (codename Sapphire Rapids or SPR) with on-package HBM and 6 Intel Data Center GPU Max Series (codename Ponte Vecchio or PVC). Each Xeon CPU has 52 physical cores supporting 2 hardware threads per core and 64 GB of HBM. Each CPU socket has 512 GB of DDR5 memory. The GPUs are connected all-to-all with Intel X^e^ Link interfaces. Each node has 8 HPE Slingshot-11 NICs, and the system is connected in a Dragonfly topology. The GPUs may send messages directly to the NIC via PCIe, without the need to copy into CPU memory. -![Aurora Node Diagram](../images/aurora_node_dataflow.png) +![Aurora Node Diagram](./images/aurora_node_dataflow.png) /// caption Figure 1: Summary of the compute, memory, and communication hardware contained within a single Aurora node. @@ -34,4 +34,4 @@ The Intel Data Center GPU Max Series is based on X^e^ Core. Each X^e^ core consi | L1 cache | | | 128 KiB | | Last Level cache | a.k.a. RAMBO cache | | 384 MiB per GPU | -See [Aurora Overview](https://www.alcf.anl.gov/sites/default/files/2024-11/Overview-of-Aurora-Oct-2024.pdf) for more information. \ No newline at end of file +See [Aurora Overview](https://www.alcf.anl.gov/sites/default/files/2024-11/Overview-of-Aurora-Oct-2024.pdf) for more information. diff --git a/docs/aurora/programming-models/raja-aurora.md b/docs/aurora/programming-models/raja-aurora.md deleted file mode 100644 index ff58d1702..000000000 --- a/docs/aurora/programming-models/raja-aurora.md +++ /dev/null @@ -1,3 +0,0 @@ -# Raja - -Placeholder \ No newline at end of file diff --git a/docs/aurora/services/jupyterhub.md b/docs/aurora/services/jupyterhub.md deleted file mode 100644 index e8255bbf4..000000000 --- a/docs/aurora/services/jupyterhub.md +++ /dev/null @@ -1 +0,0 @@ -# JupyterHub diff --git a/docs/aurora/workflows/deephyper.md b/docs/aurora/workflows/deephyper.md deleted file mode 100644 index c817f6cad..000000000 --- a/docs/aurora/workflows/deephyper.md +++ /dev/null @@ -1 +0,0 @@ -# DeepHyper diff --git a/docs/aurora/workflows/libensemble.md b/docs/aurora/workflows/libensemble.md deleted file mode 100644 index 901cceb50..000000000 --- a/docs/aurora/workflows/libensemble.md +++ /dev/null @@ -1 +0,0 @@ -# libEnsemble on Aurora diff --git a/docs/crux/compiling-and-linking/compiling-and-linking-overview.md b/docs/crux/compiling-and-linking/compiling-and-linking-overview.md index 38676dbcd..0d1c9d3d3 100644 --- a/docs/crux/compiling-and-linking/compiling-and-linking-overview.md +++ b/docs/crux/compiling-and-linking/compiling-and-linking-overview.md @@ -1,7 +1,7 @@ # Compiling and Linking on Crux ## Overview -Crux has AMD processors on the login nodes (crux-login-01,02) and AMD processors on the compute nodes (see [Machine Overview](../hardware-overview/machine-overview.md) page). The login nodes can be used to compile software, create containers, and launch jobs. For larger, parallel builds, it will be beneficial to compile those directly on the compute nodes. +Crux has AMD processors on the login nodes (crux-login-01,02) and AMD processors on the compute nodes (see [Machine Overview](../machine-overview.md) page). The login nodes can be used to compile software, create containers, and launch jobs. For larger, parallel builds, it will be beneficial to compile those directly on the compute nodes. To launch an interactive job and acquire a compute node for compiling, use: @@ -41,4 +41,4 @@ To load new modules, use: ```bash module load -``` \ No newline at end of file +``` diff --git a/docs/crux/getting-started.md b/docs/crux/getting-started.md index a4ba020dd..a946f5f51 100644 --- a/docs/crux/getting-started.md +++ b/docs/crux/getting-started.md @@ -10,7 +10,7 @@ Then, type in the password from your CRYPTOCard/MobilePASS+ token. Once logged i ## Hardware Overview -An overview of the Crux system, including details on the compute node architecture, is available on the [Machine Overview](./hardware-overview/machine-overview.md) page. +An overview of the Crux system, including details on the compute node architecture, is available on the [Machine Overview](./machine-overview.md) page. ## Compiling Applications @@ -73,4 +73,4 @@ export ftp_proxy="http://proxy.alcf.anl.gov:3128" Please direct all questions, requests, and feedback to [support@alcf.anl.gov](mailto:support@alcf.anl.gov). --- ---- \ No newline at end of file +--- diff --git a/docs/crux/hardware-overview/machine-overview.md b/docs/crux/machine-overview.md similarity index 100% rename from docs/crux/hardware-overview/machine-overview.md rename to docs/crux/machine-overview.md diff --git a/docs/polaris/getting-started.md b/docs/polaris/getting-started.md index 9eaa3b26d..582e2fa7a 100644 --- a/docs/polaris/getting-started.md +++ b/docs/polaris/getting-started.md @@ -10,7 +10,7 @@ Then, type in the password from your CRYPTOCard/MobilePASS+ token. ## Hardware Overview -An overview of the Polaris system, including details on the compute node architecture, is available on the [Machine Overview](./hardware-overview/machine-overview.md) page. +An overview of the Polaris system, including details on the compute node architecture, is available on the [Machine Overview](./machine-overview.md) page. ## Compiling Applications @@ -52,4 +52,4 @@ export no_proxy="admin,polaris-adminvm-01,localhost,*.cm.polaris.alcf.anl.gov,po ## Getting Assistance -Please direct all questions, requests, and feedback to [support@alcf.anl.gov](mailto:support@alcf.anl.gov). \ No newline at end of file +Please direct all questions, requests, and feedback to [support@alcf.anl.gov](mailto:support@alcf.anl.gov). diff --git a/docs/polaris/hardware-overview/files/.DS_Store b/docs/polaris/hardware-overview/files/.DS_Store deleted file mode 100644 index 5008ddfcf..000000000 Binary files a/docs/polaris/hardware-overview/files/.DS_Store and /dev/null differ diff --git a/docs/polaris/hardware-overview/files/Aries1.gif b/docs/polaris/hardware-overview/files/Aries1.gif deleted file mode 100644 index a9952496f..000000000 Binary files a/docs/polaris/hardware-overview/files/Aries1.gif and /dev/null differ diff --git a/docs/polaris/hardware-overview/files/Aries2.gif b/docs/polaris/hardware-overview/files/Aries2.gif deleted file mode 100644 index 9636b497f..000000000 Binary files a/docs/polaris/hardware-overview/files/Aries2.gif and /dev/null differ diff --git a/docs/polaris/hardware-overview/machine-overview.md b/docs/polaris/machine-overview.md similarity index 90% rename from docs/polaris/hardware-overview/machine-overview.md rename to docs/polaris/machine-overview.md index c230b73e7..ccc2648b4 100644 --- a/docs/polaris/hardware-overview/machine-overview.md +++ b/docs/polaris/machine-overview.md @@ -1,4 +1,4 @@ -# Polaris Machine Overview +# Polaris Machine Overview Polaris is a 560-node HPE Apollo 6500 Gen 10+ based system. Each node has a single 2.8 GHz AMD EPYC Milan 7543P 32-core CPU with 512 GB of DDR4 RAM, four NVIDIA A100 GPUs connected via NVLink, a pair of local 1.6TB SSDs in RAID0 for user use, and a pair of Slingshot 11 network adapters. There are two nodes per chassis, seven chassis per rack, and 40 racks for a total of 560 nodes. More detailed specifications are as follows: ## Polaris Compute Nodes @@ -10,7 +10,7 @@ Polaris is a 560-node HPE Apollo 6500 Gen 10+ based system. Each node has a sing | GPUs | NVIDIA A100 | 4 | 2,240 | | Local SSD | 1.6 TB | 2/3.2 TB | 1,120/1.8 PB | -Note 1: 256 MB shared L3 cache, 512 KB L2 cache per core, 32 KB L1 cache per core +Note 1: 256 MB shared L3 cache, 512 KB L2 cache per core, 32 KB L1 cache per core Note 2: 8 memory channels rated at 204.8 GiB/s ## Polaris A100 GPU Information @@ -39,13 +39,13 @@ Note 2: 8 memory channels rated at 204.8 GiB/s ### Legend: -**X** = Self -**SYS** = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) -**NODE** = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node -**PHB** = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) -**PXB** = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) -**PIX** = Connection traversing at most a single PCIe bridge -**NV#** = Connection traversing a bonded set of # NVLinks +**X** = Self +**SYS** = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) +**NODE** = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node +**PHB** = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) +**PXB** = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) +**PIX** = Connection traversing at most a single PCIe bridge +**NV#** = Connection traversing a bonded set of # NVLinks Links to detailed NVIDIA A100 documentation: - [NVIDIA A100 Tensor Core GPU Architecture](https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf) @@ -64,12 +64,12 @@ All users share the same login nodes, so please be courteous and respectful of y | GPUs (Note 3) | No GPUs | 0 | 0 | | Local SSD | None | 0 | 0 | -Note 1: 256 MB shared L3 cache, 512 KB L2 cache per core, 32 KB L1 cache per core -Note 2: 8 memory channels rated at 204.8 GiB/s per socket +Note 1: 256 MB shared L3 cache, 512 KB L2 cache per core, 32 KB L1 cache per core +Note 2: 8 memory channels rated at 204.8 GiB/s per socket Note 3: If your build requires the physical presence of a GPU, you will need to build on a compute node. ## Gateway Nodes There are 50 gateway nodes. These nodes are not user-accessible but are used transparently for access to the storage systems. Each node has a single 200 Gbps HDR IB card for access to the storage area network. This gives a theoretical peak bandwidth of 1,250 GB/s, which is approximately the aggregate bandwidth of the global file systems (1,300 GB/s). ## Storage -Polaris has access to the ALCF global file systems. Details on storage can be found [here](../../data-management/filesystem-and-storage/data-storage.md). \ No newline at end of file +Polaris has access to the ALCF global file systems. Details on storage can be found [here](../data-management/filesystem-and-storage/data-storage.md). diff --git a/docs/sophia/compiling-and-linking/compiling-and-linking-overview.md b/docs/sophia/compiling-and-linking/compiling-and-linking-overview.md index 551714a1b..d301531f3 100644 --- a/docs/sophia/compiling-and-linking/compiling-and-linking-overview.md +++ b/docs/sophia/compiling-and-linking/compiling-and-linking-overview.md @@ -1,7 +1,7 @@ # Compiling and Linking on Sophia ## Overview -Sophia has AMD processors on the login nodes (`sophia-login-01,02`) and AMD processors and NVIDIA A100 GPUs on the compute nodes (see [Machine Overview](../hardware-overview/machine-overview.md) page). The login nodes can be used to create containers and launch jobs. +Sophia has AMD processors on the login nodes (`sophia-login-01,02`) and AMD processors and NVIDIA A100 GPUs on the compute nodes (see [Machine Overview](../machine-overview.md) page). The login nodes can be used to create containers and launch jobs. !!! warning inline end "Must compile on a compute node" @@ -76,4 +76,4 @@ elif [ -f /etc/bash.bashrc ] then . /etc/bash.bashrc fi -``` \ No newline at end of file +``` diff --git a/docs/sophia/getting-started.md b/docs/sophia/getting-started.md index 700c189f0..4c5aecf7c 100644 --- a/docs/sophia/getting-started.md +++ b/docs/sophia/getting-started.md @@ -10,7 +10,7 @@ Then, type in the password from your CRYPTOCard/MobilePASS+ token. Once logged i ## Hardware Overview -An overview of the Sophia system, including details on the compute node architecture, is available on the [Machine Overview](./hardware-overview/machine-overview.md) page. +An overview of the Sophia system, including details on the compute node architecture, is available on the [Machine Overview](./machine-overview.md) page. ## Compiling Applications @@ -53,4 +53,4 @@ export ftp_proxy="http://proxy.alcf.anl.gov:3128" Please direct all questions, requests, and feedback to [support@alcf.anl.gov](mailto:support@alcf.anl.gov). ---- \ No newline at end of file +--- diff --git a/docs/sophia/hardware-overview/machine-overview.md b/docs/sophia/machine-overview.md similarity index 100% rename from docs/sophia/hardware-overview/machine-overview.md rename to docs/sophia/machine-overview.md diff --git a/docs/sophia/queueing-and-running-jobs/running-jobs.md b/docs/sophia/queueing-and-running-jobs/running-jobs.md index 1695a1efb..e9346547f 100644 --- a/docs/sophia/queueing-and-running-jobs/running-jobs.md +++ b/docs/sophia/queueing-and-running-jobs/running-jobs.md @@ -16,7 +16,7 @@ There are three production queues you can target in your `qsub` command (`-q =9.6.0 mkdocs>=1.5.0 -mkdocs-video +mkdocs-video>=1.6.1 mkdocs-include-markdown-plugin -mkdocs-codeinclude-plugin>=0.2.1 +mkdocs-codeinclude-plugin>=0.2.1 # TODO: remove this single example in favor of snippets pymdown-extensions>=10.12 +mkdocs-get-deps +#mkdocs-git-authors-plugin +mkdocs-material-extensions +mkdocs-minify-plugin