Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leftover redesign; remove some placeholder pages #662

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

This file was deleted.

25 changes: 11 additions & 14 deletions docs/aurora/data-management/moving_data_to_aurora/globus.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,7 @@
# Transferring Files through Globus
For transfers to/from [Flare](../lustre/flare.md), you may use the Globus collection `alcf#dtn_flare`.

## During Acceptance Testing Period

We have set up a temporary Globus endpoint for Flare that you can use to transfer data out. The endpoint is called "alcf#dtn_flare_at" and is set up for read-only access. The endpoint will be available throughout the acceptance testing (AT) period and will be shut down after AT concludes.

## Before Acceptance Testing

Currently, only Globus Personal is supported on Aurora. Perform the following steps to transfer data to/from the Aurora login nodes.
Currently, for transfers to/from Aurora `/home`, only Globus Connect Personal is supported. Perform the following steps to transfer data to/from there:

1. On a fresh connection to the login nodes, ensure no proxies are being set (which may require commenting out the proxy settings in the `~/.bashrc` or `~/.bash_profile` files), and execute:

Expand All @@ -15,18 +10,20 @@ Currently, only Globus Personal is supported on Aurora. Perform the following st
```

2. Paste the link provided by the above command into a browser and follow the instructions to set up a personal endpoint:
* When requested, input your ALCF username and one-time password from your CRYPTOCard/MobilePASS+ token.
* Select the Allow button.
* Enter the authentication code generated back into the terminal.
* Enter a name for the endpoint (e.g., `aurora_login_uan11`).

- When requested, input your ALCF username and one-time password from your CRYPTOCard/MobilePASS+ token.
- Select the Allow button.
- Enter the authentication code generated back into the terminal.
- Enter a name for the endpoint (e.g., `aurora_login_uan11`).

3. On the same terminal, execute:

```bash
/soft/tools/proxychains/bin/proxychains4 -f /soft/tools/proxychains/etc/proxychains.conf /soft/tools/globusconnect/globusconnect -start &
```

* By default, the command only gives access to your home directory.
* You can add `-restrict-paths /lus/flare/projects/YOURPROJECT` to access your project directory.
- By default, the command only gives access to your home directory.
- You can add `-restrict-paths /lus/flare/projects/YOURPROJECT` to access your project directory.

4. Open the [Globus web app](https://app.globus.org/file-manager?destination_id=05d2c76a-e867-4f67-aa57-76edeb0beda0) and search for the endpoint name defined above. You will now see your home directory (and project directory, if requested) on Aurora and can initiate transfers with other endpoints (e.g., the Eagle file system on Polaris at `alcf#dtn_eagle`).

4. Open the [Globus web app](https://app.globus.org/file-manager?destination_id=05d2c76a-e867-4f67-aa57-76edeb0beda0) and search for the endpoint name defined above. You will now see your home directory (and project directory, if requested) on Aurora and can initiate transfers with other endpoints (e.g., the Eagle file system on Polaris at `alcf#dtn_eagle`).
1 change: 0 additions & 1 deletion docs/aurora/data-science/applications/gpt-neox.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/aurora/data-science/frameworks/jax.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/aurora/data-science/julia.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/aurora/getting-started-on-aurora.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Then, type in the password from your CRYPTOCard/MobilePASS+ token.

## Hardware Overview

An overview of the Aurora system, including details on the compute node architecture, is available on the [Machine Overview](./hardware-overview/machine-overview.md) page.
An overview of the Aurora system, including details on the compute node architecture, is available on the [Machine Overview](./machine-overview.md) page.

## Compiling Applications

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Aurora is a 10,624-node HPE Cray-Ex based system. It has 166 racks with 21,248 CPUs and 63,744 GPUs. Each node consists of 2 Intel Xeon CPU Max Series (codename Sapphire Rapids or SPR) with on-package HBM and 6 Intel Data Center GPU Max Series (codename Ponte Vecchio or PVC). Each Xeon CPU has 52 physical cores supporting 2 hardware threads per core and 64 GB of HBM. Each CPU socket has 512 GB of DDR5 memory. The GPUs are connected all-to-all with Intel X^e^ Link interfaces. Each node has 8 HPE Slingshot-11 NICs, and the system is connected in a Dragonfly topology. The GPUs may send messages directly to the NIC via PCIe, without the need to copy into CPU memory.

![Aurora Node Diagram](../images/aurora_node_dataflow.png)
![Aurora Node Diagram](./images/aurora_node_dataflow.png)

/// caption
Figure 1: Summary of the compute, memory, and communication hardware contained within a single Aurora node.
Expand Down Expand Up @@ -34,4 +34,4 @@ The Intel Data Center GPU Max Series is based on X^e^ Core. Each X^e^ core consi
| L1 cache | | | 128 KiB |
| Last Level cache | a.k.a. RAMBO cache | | 384 MiB per GPU |

See [Aurora Overview](https://www.alcf.anl.gov/sites/default/files/2024-11/Overview-of-Aurora-Oct-2024.pdf) for more information.
See [Aurora Overview](https://www.alcf.anl.gov/sites/default/files/2024-11/Overview-of-Aurora-Oct-2024.pdf) for more information.
3 changes: 0 additions & 3 deletions docs/aurora/programming-models/raja-aurora.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/aurora/services/jupyterhub.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/aurora/workflows/deephyper.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/aurora/workflows/libensemble.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Compiling and Linking on Crux

## Overview
Crux has AMD processors on the login nodes (crux-login-01,02) and AMD processors on the compute nodes (see [Machine Overview](../hardware-overview/machine-overview.md) page). The login nodes can be used to compile software, create containers, and launch jobs. For larger, parallel builds, it will be beneficial to compile those directly on the compute nodes.
Crux has AMD processors on the login nodes (crux-login-01,02) and AMD processors on the compute nodes (see [Machine Overview](../machine-overview.md) page). The login nodes can be used to compile software, create containers, and launch jobs. For larger, parallel builds, it will be beneficial to compile those directly on the compute nodes.

To launch an interactive job and acquire a compute node for compiling, use:

Expand Down Expand Up @@ -41,4 +41,4 @@ To load new modules, use:

```bash
module load <module_name>
```
```
4 changes: 2 additions & 2 deletions docs/crux/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Then, type in the password from your CRYPTOCard/MobilePASS+ token. Once logged i

## Hardware Overview

An overview of the Crux system, including details on the compute node architecture, is available on the [Machine Overview](./hardware-overview/machine-overview.md) page.
An overview of the Crux system, including details on the compute node architecture, is available on the [Machine Overview](./machine-overview.md) page.

## Compiling Applications

Expand Down Expand Up @@ -73,4 +73,4 @@ export ftp_proxy="http://proxy.alcf.anl.gov:3128"
Please direct all questions, requests, and feedback to [[email protected]](mailto:[email protected]).

---
---
---
4 changes: 2 additions & 2 deletions docs/polaris/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Then, type in the password from your CRYPTOCard/MobilePASS+ token.

## Hardware Overview

An overview of the Polaris system, including details on the compute node architecture, is available on the [Machine Overview](./hardware-overview/machine-overview.md) page.
An overview of the Polaris system, including details on the compute node architecture, is available on the [Machine Overview](./machine-overview.md) page.

## Compiling Applications

Expand Down Expand Up @@ -52,4 +52,4 @@ export no_proxy="admin,polaris-adminvm-01,localhost,*.cm.polaris.alcf.anl.gov,po

## Getting Assistance

Please direct all questions, requests, and feedback to [[email protected]](mailto:[email protected]).
Please direct all questions, requests, and feedback to [[email protected]](mailto:[email protected]).
Binary file removed docs/polaris/hardware-overview/files/.DS_Store
Binary file not shown.
Binary file removed docs/polaris/hardware-overview/files/Aries1.gif
Binary file not shown.
Binary file removed docs/polaris/hardware-overview/files/Aries2.gif
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Polaris Machine Overview
# Polaris Machine Overview
Polaris is a 560-node HPE Apollo 6500 Gen 10+ based system. Each node has a single 2.8 GHz AMD EPYC Milan 7543P 32-core CPU with 512 GB of DDR4 RAM, four NVIDIA A100 GPUs connected via NVLink, a pair of local 1.6TB SSDs in RAID0 for user use, and a pair of Slingshot 11 network adapters. There are two nodes per chassis, seven chassis per rack, and 40 racks for a total of 560 nodes. More detailed specifications are as follows:

## Polaris Compute Nodes
Expand All @@ -10,7 +10,7 @@ Polaris is a 560-node HPE Apollo 6500 Gen 10+ based system. Each node has a sing
| GPUs | NVIDIA A100 | 4 | 2,240 |
| Local SSD | 1.6 TB | 2/3.2 TB | 1,120/1.8 PB |

Note 1: 256 MB shared L3 cache, 512 KB L2 cache per core, 32 KB L1 cache per core
Note 1: 256 MB shared L3 cache, 512 KB L2 cache per core, 32 KB L1 cache per core
Note 2: 8 memory channels rated at 204.8 GiB/s

## Polaris A100 GPU Information
Expand Down Expand Up @@ -39,13 +39,13 @@ Note 2: 8 memory channels rated at 204.8 GiB/s

### Legend:

**X** = Self
**SYS** = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
**NODE** = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
**PHB** = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
**PXB** = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
**PIX** = Connection traversing at most a single PCIe bridge
**NV#** = Connection traversing a bonded set of # NVLinks
**X** = Self
**SYS** = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
**NODE** = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
**PHB** = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
**PXB** = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
**PIX** = Connection traversing at most a single PCIe bridge
**NV#** = Connection traversing a bonded set of # NVLinks

Links to detailed NVIDIA A100 documentation:
- [NVIDIA A100 Tensor Core GPU Architecture](https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf)
Expand All @@ -64,12 +64,12 @@ All users share the same login nodes, so please be courteous and respectful of y
| GPUs (Note 3) | No GPUs | 0 | 0 |
| Local SSD | None | 0 | 0 |

Note 1: 256 MB shared L3 cache, 512 KB L2 cache per core, 32 KB L1 cache per core
Note 2: 8 memory channels rated at 204.8 GiB/s per socket
Note 1: 256 MB shared L3 cache, 512 KB L2 cache per core, 32 KB L1 cache per core
Note 2: 8 memory channels rated at 204.8 GiB/s per socket
Note 3: If your build requires the physical presence of a GPU, you will need to build on a compute node.

## Gateway Nodes
There are 50 gateway nodes. These nodes are not user-accessible but are used transparently for access to the storage systems. Each node has a single 200 Gbps HDR IB card for access to the storage area network. This gives a theoretical peak bandwidth of 1,250 GB/s, which is approximately the aggregate bandwidth of the global file systems (1,300 GB/s).

## Storage
Polaris has access to the ALCF global file systems. Details on storage can be found [here](../../data-management/filesystem-and-storage/data-storage.md).
Polaris has access to the ALCF global file systems. Details on storage can be found [here](../data-management/filesystem-and-storage/data-storage.md).
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Compiling and Linking on Sophia

## Overview
Sophia has AMD processors on the login nodes (`sophia-login-01,02`) and AMD processors and NVIDIA A100 GPUs on the compute nodes (see [Machine Overview](../hardware-overview/machine-overview.md) page). The login nodes can be used to create containers and launch jobs.
Sophia has AMD processors on the login nodes (`sophia-login-01,02`) and AMD processors and NVIDIA A100 GPUs on the compute nodes (see [Machine Overview](../machine-overview.md) page). The login nodes can be used to create containers and launch jobs.

!!! warning inline end "Must compile on a compute node"

Expand Down Expand Up @@ -76,4 +76,4 @@ elif [ -f /etc/bash.bashrc ]
then
. /etc/bash.bashrc
fi
```
```
4 changes: 2 additions & 2 deletions docs/sophia/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Then, type in the password from your CRYPTOCard/MobilePASS+ token. Once logged i

## Hardware Overview

An overview of the Sophia system, including details on the compute node architecture, is available on the [Machine Overview](./hardware-overview/machine-overview.md) page.
An overview of the Sophia system, including details on the compute node architecture, is available on the [Machine Overview](./machine-overview.md) page.

## Compiling Applications

Expand Down Expand Up @@ -53,4 +53,4 @@ export ftp_proxy="http://proxy.alcf.anl.gov:3128"

Please direct all questions, requests, and feedback to [[email protected]](mailto:[email protected]).

---
---
2 changes: 1 addition & 1 deletion docs/sophia/queueing-and-running-jobs/running-jobs.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ There are three production queues you can target in your `qsub` command (`-q <qu
| `by-node` | 1 Node | 8 Nodes | 5 min | 12 hr |
| `bigmem` | 1 Node | 1 Node | 5 min | 12 hrs |

!!! note
!!! note

For all Sophia queues, `MaxQueued` will be 20 queued or running jobs (per project) and `MaxRunning` will be 5 concurrent jobs (per project)

Expand Down
Loading
Loading