Skip to content
/ atlahs Public

ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage

Notifications You must be signed in to change notification settings

spcl/atlahs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ATLAHS Simulator Toolchain

An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage

Warning

This repository is still under active development. The code is not yet stable, and the documentation is not yet finalized. It is highly recommended to start using the toolchain after the paper is published, and when the documentation is mostly complete.

Overview

Overview

This repository contains the source code for ATLAHS, a network simulator toolchain for AI, HPC, and storage applications. It contains the following components:

  • GOAL (Group Operation Assembly Language) generators that traces AI, HPC, and storage applications and converts them into network workloads usable by network simulators
  • Various backends for simulating network workloads, including LogGOPSim, HTSim, and NS-3.

The paper of this work is available on arXiv: https://arxiv.org/pdf/2505.08936, and it has been accepted by The International Conference for High Performance Computing, Networking, Storage and Analysis (SC25).

Along with the source code, we also release all the traces (raw files and converted GOAL traces) used in the paper as the ATLAHS Trace Collection. Not only does it cover a wide range of AI and HPC applications, it is still growing, and we want to encourage the community to contribute more traces to the collection.

Docker Environment

To facilitate the reproducibility of the results which we publish in the paper, we provide a Docker image that contains all the dependencies that are required to run the ATLAHS toolchain.

To build the Docker image, run the following command:

docker build -t atlahs .

To compile the components required to reproduce the results in the paper, run:

docker run --user $(id -u):$(id -g) -v $(pwd):/workspace atlahs build -r

This mounts the project directory to /workspace inside the con- tainer and invokes the build.py script in the scripts directory.

Running a quick test

To run a quick test, run the following command:

docker run --user $(id -u):$(id -g) -v $(pwd):/workspace atlahs run -q

This fetches a small subset of the ATLAHS traces from the SPCL storage server, and tests the functionality of the ATLAHS toolchain. It converts the raw traces of AI (nsys-reports) and HPC (PMPI traces) applications into the GOAL format, and simulates the workloads with different backends (e.g., LogGOPSim, htsim) in ATLAHS.

About

ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published