Skip to content

Releases: ROCm/aomp

AOMP Release 21.0-0

03 Apr 21:39
Compare
Choose a tag to compare

These are the release notes for AOMP 21.0-0. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-staging". This branch is found in a mirror of upstream LLVM found at https://github.com/ROCm/llvm-project. The amd-staging branch is constantly changing as it merges the upstream development trunk with its downstream development updates. The AMD modifications are experimental while under review for the upstream trunk. AOMP uses a snapshot of amd-staging at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (amdgpu-dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and the use of RPATH for runtime libraries.

For AOMP 21.0-0, the last LLVM trunk commit is 9cdab16da99ad9fdb823853fbc634008229e284f on March 31, 2025. The last amd-only commit is e9b040d02cd3f5e5dae032e7d15d934ea6486d18 on April 1, 2025. These commits form a frozen branch now called "aomp-21.0-0". See https://github.com/ROCm/llvm-project/tree/aomp-21.0-0.
The integrated ROCm components for this AOMP release were built with ROCM 6.3.3 sources.
This is the 1st AOMP release based on upstream LLVM 21 development.

Changes since AOMP 20.0-2:

  • In this release, the FORTRAN flang-classic compiler is replaced with the new LLVM compiler (flang-new). Flang-new is built using the LLVM 21 trunk plus changes in the amd-staging branch. In addition to improved performance flang-new, supports print and write statements in the target region to support user diagnostics. The existence of any print or write statement in the target region will trigger a service thread that could impact performance, even if the print or write statements are not executed.
  • The hipfort component built with flang-new has returned to aomp. Hipfort provides FORTRAN module interfaces to the HIP API and to many other hip math libraries. There are new examples in the examples directory to demonstrate hipfort.
  • Improved performance on min and max reductions using fmin and fmax functions to define the reduction.
  • Replacement of the amd-stging hostexec infrastructure with the upstream offload rpc mechanism.
  • A new infrastructure for executing host API's in target regions called "Emissary APIs". Emissary APIs use the offload rpc mechanism to transparently execute functions called from a target region on the host. Emissary APIs exist for print, FORTRAN runtime, MPI, and HDF5. MPI and HDF5 are currently placeholders requiring more development to make them functional. The Emissary API for print includes printf, fprintf, and asan exception reporting. The Emissary API for the FORTRAN runtime supports print, write, stop, and abort FORTRAN statements.
  • In this release, all OpenMP toolchains (c, c++, and FORTRAN) use a tool called clang-linker-wrapper as the default. This is a single command generated for host and device linking. Previously a multi-step process was used by the LLVM command driver. This multi-step process is still available with the --opaque-offload-linker command line option. Since clang-linker-wrapper obscures the process of device linking --opaque-offload-linker can be used to see the transformations from heterogeneous objects to fully linked device and host executable.
  • This release uses the sources from ROCM 6.3 components for non-compiler components. All llvm-project compiler components were built using the amd-staging branch with the above-mentioned commit hash.
  • In this release, we started a process to cleanup the examples for the different programming models supported by the ROCm compiler. The new examples are 100% driven by Makefiles so that users can see the compiler commands and environment that they are run in. Since the examples are typically in a read-only installation directory. They can now be executed from an out-of-tree directory to avoid the need to copy them. For example "make -f /usr/lib/aomp/examples/openmp/reduction/Makefile run " will build and run the example
  • A significant number of changes to the AOMP build infrastructure were done to both add flang-new build and remove flang-classic build.
  • Merging non-upstream changes into the amd-staging branch now uses github pull requests. We no longer use gerrit for this purpose. Merging of github PRs still requires successful passing of psdb tests. Merging from upstream trunk is still possible and preferred.

Errata:

  • The hip/lib_device example currently fails to build with a link error.

AOMP Release 20.0-2

10 Feb 20:19
Compare
Choose a tag to compare

These are the release notes for AOMP 20.0-2. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-staging". This branch is found in a mirror of upstream LLVM found at https://github.com/ROCm/llvm-project. The amd-staging branch is constantly changing as it merges the upstream development trunk with its downstream development updates. The AMD modifications are experimental while under review for the upstream trunk. AOMP uses a snapshot of amd-staging at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (amdgpu-dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and the use of RPATH for runtime libraries.

For AOMP 20.0-2, the last LLVM trunk commit is c8c2574832ed2064996389e4259eaf0bea0fa7951 on January 29, 2025. The last amd-only commit is c273851a8de71cf3001ad8fdc5abcc829b591b45 on January 29, 2025. These commits form a frozen branch now called "aomp-20.0-2". See https://github.com/ROCm/llvm-project/tree/aomp-20.0-2.
The integrated ROCm components for this AOMP release were built with ROCM 6.3.2 sources.
This is the 3rd AOMP release based on upstream LLVM 20 development.

Changes since AOMP 20.0-1:

  • Added build of math rocmlibs (aomp-hip-libraries). Currently only support the following architectures: gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102;gfx1151;gfx1200;gfx1201
  • Added optional aomp-hip-libraries package. This contains libraries for rocBLAS, rocPRIM, rocSPARSE, rocSOLVER, and hipBLAS.
  • Added preproduction flang-new executable. Flang-classic is still default with a flang to flang-classic symbolic link.
  • Moved to ROCm 6.3.2 sources for non-compiler related repositories.

Errata:

  • flang-classic failures seen in fbabelstream and Nekbone.

AOMP Release 20.0-1

17 Dec 21:51
Compare
Choose a tag to compare

THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.

These are the release notes for AOMP 20.0-1. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-staging". This branch is found in a mirror of upstream LLVM found at https://github.com/ROCm/llvm-project. The amd-staging branch is constantly changing as it merges the upstream development trunk with its downstream development updates. The AMD modifications are experimental while under review for the upstream trunk. AOMP uses a snapshot of amd-staging at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (amdgpu-dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and the use of RPATH for runtime libraries.

For AOMP 20.0-1, the last LLVM trunk commit is 151901c762b724ef6ffe6f3db163475071e7b215 on December 11, 2024. The last amd-only commit is e82d86c7c81631754d1af5cb72ceef2385d215e3 on December 12, 2024. These commits form a frozen branch now called "aomp-20.0-1". See https://github.com/ROCm/llvm-project/tree/aomp-20.0-1.
The integrated ROCm components for this AOMP release were built with ROCM 6.3.0 sources.
This is the 2nd AOMP release based on upstream LLVM 20 development.

While Linux distros usually have the amdgpu kernel module, we strongly recommend using the ROCm 6.3 amdgpu-dkms and amdgpu-dkms-firmware packages which resolve a long-standing SDMA firmware issue .

In this release of AOMP, we disabled the OpenMP workaround of the SDMA firmware issue. The OpenMP workaround for the SDMA issue was to not chain automatic asynchronous data transfers to the kernel completion signal. The workaround synchronously initiated data transfers after kernel completion was detected by the host CPU. This resulted in some loss of performance.
The environment variable LIBOMPTARGET_SYNC_COPY_BACK is the trigger to use the workaround. Before AOMP 20.0-1 it had a default value of true to force synchronous copy backs. In this release we set the default to false which will improve performance for kernels with lots of return maps. But if your machine does not have the ROCm 6.3 firmware, you should set LIBOMPTARGET_SYNC_COPY_BACK=true to avoid potential errors.

Changes since AOMP 20.0-0:

  • Changed default LIBOMPTARGET_SYNC_COPY_BACK=false
  • Dropped support for CentOS 7/8/9, Ubuntu 20.04, SLES15-SP4
  • Added support for RHEL 8/9, Ubuntu 24.04, SLES15-SP5
  • Updated to ROCm 6.3 sources
  • Added new component, SPIRV-LLVM-Translator. This is initial support for spirv JIT offloading. This includes a spirv to LLVM IR translation tool installed in the compiler bin directory lib/llvm/bin/amd-llvm-spirv. Toolchain support to support SPIRV is still in development.
  • Added a new release file showing the summary of relevant git commits since the last release. See llvm-project-20-0-1-gitlog-summary.txt
  • Upgraded cmake to 3.25.2
  • Changed the commands for OpenMP offload linking to use the clang-linker-wrapper command. The old method was set of intermediate commands that passed files between various steps of the heterogeneous linking process. The default command line option before 20.0-1 was --opaque-offload-linker. The default is now --no-opaque-offload-linker. While both methods performed similar GPU linking, IR optimizations and backend, there were minor differences in the final offloading image that caused issues that have been resolved. One can still see the commands from the old method with the command line options "-v -save-temps --opaque-offload-linker",
  • Corrected the installation lib-debug directories to contain debug builds of various runtime libraries. The sources of all debug runtimes are also installed so that gdbtui will automatically find the sources.
    Merged roct and rocr into a single aomp build COMPONENT.
  • Renamed flang-legacy binary to **flang-classic"" as it is better known by the flang community. Yes, this will be deprecated in the future for the new llvm flang. Currently "flang" is a symbolic link to flang-classic binary.

Errata:

  • Potential data corruption as a result of an SDMA issue when AOMP generated binaries are run without ROCm 6.3 amdgpu-dkms-firmware. Set LIBOMPTARGET_SINC_COPY_BACK=true to avoid problem with OpenMP.
  • THIS RELEASE CANNOT BE BUILT FROM SOURCE EXTERNALLY. This is because there is a new AMD repository that is not yet available. In the next release this repository will be made public and put in the aomp manifest for cloning to support source build of aomp.

AOMP Release 20.0-0

15 Oct 16:41
Compare
Choose a tag to compare

THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.

These are the release notes for AOMP 20.0-0. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-staging". This branch is found in a mirror of upstream LLVM found at https://github.com/ROCm/llvm-project. The amd-staging branch is constantly changing as it merges the upstream development trunk with its downstream development updates. The AMD modifications are experimental while under review for the upstream trunk. AOMP uses a snapshot of amd-staging at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and the use of RPATH for runtime libraries.

For AOMP 20.0-0, the last LLVM trunk commit is 7fa0d05a04056aac4365c69c4b515f613a43e454 on October 8, 2024. The last amd-only commit is 5809bc885c815fa281320094be6549458e15cf14 on October 10, 2024. These commits form a frozen branch now called "aomp-20.0-0". See https://github.com/ROCm/llvm-project/tree/aomp-20.0-0.

The integrated ROCm components for this AOMP release were built with ROCM 6.2.2 sources.

This is the 1st AOMP release based on upstream LLVM 20 development.

Changes since AOMP 19.0-3:

  • Switched to ROCm 6.2.2 sources. This introduced a new component called rocprofiler-register.
  • Move the install of llvm to lib/llvm, which is where ROCm installs llvm.
  • AOMP now creates and uses rocm.cfg, clang.cfg clang++.cfg, etc.
  • Add support for multiple devices (-md option) to gpurun utility.
  • AOMP example updates:
    • Use a common include file to set LLVM_INSTALL_DIR and LLVM_GPU_ARCH using amdgpu-arch and nvidia-arch.
    • Remove mygpu dependency from every example.
    • Create a new category stress for complex examples not in CI.
    • Build Kokkos with a make file instead of script.
  • Added build support for gfx90c, gfx1103, gfx1150, gfx1151, and gfx1152.
  • Add ROCm SMI and AMD SMI as AOMP components.

Errata for AOMP 20.0-0:

  • amdflang-new symbolic link should not exist as there is no flang-new binary.

AOMP Release 19.0-3

02 Aug 14:34
Compare
Choose a tag to compare

These are the release notes for AOMP 19.0-3 AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-staging". This branch is found in a mirror of upstream LLVM found at https://github.com/ROCm/llvm-project. The amd-staging branch is constantly changing as it merges the upstream development trunk with its downstream development updates. The AMD modifications are experimental while under review for the upstream trunk. AOMP uses a snapshot of amd-staging at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and the use of RPATH for runtime libraries.

For AOMP 19.0-3, the last LLVM trunk commit is 40954d7f9bb38b2407fe48a524befc5216f13cccon July 22, 2024. This was the last trunk commit before the trunk forked to LLVM-20. The last amd-only commit is baa883c3ad5d70e1f4da5b6f80f6d06c00b73c3a on July 22, 2024. These commits form a frozen branch now called "aomp-19.0-3". See https://github.com/ROCm/llvm-project/tree/aomp-19.0-3.

The integrated ROCm components for this AOMP release were built with ROCM 6.1.2 sources.

This is the 3rd AOMP release based on upstream LLVM 19 development. Since the LLVM trunk has moved to development of LLVM 20, the next AOMP release will be based on LLVM-20.

Changes since AOMP 19.0-2

  • Support for requires atomic default mem order clause was added.
  • OMPT no longer falls back into synchronous execution mode when profiler is attached.
  • OMPT now supports callbacks for omp_target_associate_ptr and omp_target_disassociate_ptr.
  • Xteam Reduction enabled by default at all opt levels.
  • Some HIP interoperability issues with tracking HIP memory allocations on Mi200 were resolved.
  • Remove deprecated utility offload-arch. This was replaced with amdgpu-arch or nvptx-arch.

AOMP Release 19.0-2

26 Jun 21:24
Compare
Choose a tag to compare

These are the release notes for AOMP 19.0-2 AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-staging". This branch is found in a mirror of upstream LLVM found at https://github.com/ROCm/llvm-project. The amd-staging branch is constantly changing as it merges the upstream development trunk with its downstream development updates. The AMD modifications are experimental and/or/while contributions under review for the upstream trunk. AOMP uses a snapshot of amd-staging at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and the use of RPATH for runtime libraries.

For AOMP 19.0-2, the last trunk commit is 6012de2b4ec24826574fe9f2d74c7d2ff2b52f23on June 20, 2024. The last amd-only commit is c3a455408b118b8c22f23c7a65d2b5dbf491ab56 on June 20, 2024. These commits forms a frozen branch now called "aomp-19.0-2". See https://github.com/ROCm/llvm-project/tree/aomp-19.0-2.

The integrated ROCm components for this AOMP release were built with ROCM 6.1.2 sources.
This is the 2nd AOMP release based on LLVM 19 development.

AOMP 19.0-1 was tagged, but will not be released.

Changes since AOMP 19.0-0:

  • Significant runtime features to support zero-copy for CPU-GPU unified shared memory. See subsections below.
  • Merge of the LLVM upstream relocation of libomptarget into the high level "offload" directory. This establishes the long term objective of the LLVM community to unify offload support for different offloading programming models.
  • The integrated ROCm components (non-compiler) were built from ROCM 6.1.2 sources.
  • Significant enhancements to the gpurun utility including: support for multiple devices, heterogeneous devices, malloc control inherited from numa-ctl -m -l options, and CPU core binding to same numa node as selected GPU. These changes preserve gpurun's ability to oversubscribe (run multiple processes per GPU) by segmenting a GPUs CUs to different processes. To be fixed in 19.0-3, gpurun fails in TPX mode on MI300X.
  • Changes in runtime library locations unique to CPU target triple including fixes for lib64 in Red Hat package.
  • Support for fp16 and bfloat16 reductions
  • Removed long deprecated utilities mygpu, mymcpu, aompcc, aompExtractRegion, clang-ocl, and cloc.sh.

Errata for AOMP 19.0-2

  • gpurun fails in TPX mode for MI300X
  • LIBOMPTARGET_SYNC_COPY_BACK default is still true. This is to circumvent a long-standing SDMA problem where signal values appear incorrect to SDMA engines.
  • Failure in dynamic_module_load which impacts application termination that uses offloading in shared objects.

Implicit Zero-Copy behavior on MI300A

OpenMP provides a relaxed shared memory model. Map clauses provided in the source code indicate how data is used and copied to and from the GPU device for each target region. On GPUs that provide unified shared memory like the MI300A, these clauses are optional but provide portability to discreet memory GPUs. There is an OpenMP pragma called "requires unified_shared_memory" which tell the compiler and runtime that the code is NOT portable to discreet memory GPUs, and must be compiled and executed on GPUs such as the MI300A. The MI300A is one of several AMD GPUs that has a feature to disable/enable page migration between CPU and GPU called xnack. In this release of the compiler and runtime, we set the runtime behavior depending on the status of xnack and existence of the pragma "requires unified_shared_memory".

MI300A NO requires unified_shared_memory requires unified_shared_memory
XNACK enabled Implicit Zero-Copy Zero-Copy
XNACK disabled Copy Runtime warning*

(*) The runtime warning when running an application using #pragma omp requires unified_shared_memory in XNACK disabled mode can be turned into a runtime error by setting environment variable OMPX_STRICT_SANITY_CHECKS to true (e.g., OMPX_STRICT_SANITY_CHECKS=true ./app_exec).

Implicit Zero-Copy on MI200 and MI300X and any other discrete GPU:

  • On discrete memory GPUs, for applications not using #pragma omp requires unified_shared_memory, turn on implicit zero-copy behavior by running applications in XNACK enabled environment and setting to true the environment variable OMPX_APU_MAPS (e.g. HSA_XNACK=1 OMPX_APU_MAPS=1 ./app_exec)
  • All other configurations, for applications not using #pragma omp requires unified_shared_memory, will be run in copy behavior.
MI200, MI300X, etc. not unified_shared_memory unified_shared_memory
XNACK enabled and OMPX_APU_MAPS=1 Implicit Zero-Copy Zero-Copy
XNACK enabled Copy Zero-Copy
XNACK disabled Copy Runtime warning(*)

MI300A host memory pre-faulting in Zero-Copy modes

On MI300A, host memory TLB prefaulting applies when running in in Implicit Zero-Copy and when using #pragma omp requires unified_shared_memory

  • By default, for all memory copies with size larger or equal to 1MB, the OpenMP runtime makes the copied host memory visible to the target device agent before calling the copy function
  • The environment variable LIBOMPTARGET_APU_PREFAULT_MEMCOPY controls this behavior and it is set to true by default. Setting it to false will disable prefaulting for all memory copy sizes (e.g., disable prefaulting with LIBOMPTARGET_APU_PREFAULT_MEMCOPY=false ./app_exec)
  • The environment variable LIBOMPTARGET_APU_PREFAULT_MEMCOPY_SIZE controls the minimum size after which prefaulting is performed. It is currently set to 1MB, meaning that all memory copies that are performed in a synchronous way will have the host memory first prefaulted. Changing the minimum size enables prefaulting at sizes different than larger or equal to 1MB (e.g., to prefault all memory copies larger than 1KB, run with LIBOMPTARGET_APU_PREFAULT_MEMCOPY_SIZE=1024 ./app_exe)

AOMP Release 19.0-0

21 Mar 15:20
Compare
Choose a tag to compare

These are the release notes for AOMP 19.0-0. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-staging". This branch is found in a mirror of upstream LLVM found at https://github.com/ROCm/llvm-project. The amd-staging branch is constantly changing as it merges the upstream development trunk with its downstream development updates. The AMD modifications are experimental and/or/while contributions under review for the upstream trunk. AOMP uses a snapshot of amd-staging at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and the use of RPATH for runtime libraries.

For AOMP 19.0-0, the last trunk commit is 601e102bdb55e12a2f791e0d68fd6f81ffc21e21 on March 17, 2024. The last amd-only commit is a3c2cd57a6f99709d61d35d17527c84d1af0c780 on March 16, 2024. These commits forms a frozen branch now called "aomp-19.0-0". See https://github.com/ROCm/llvm-project/tree/aomp-19.0-0.

The integrated ROCm components for this AOMP release were built with ROCM 6.0.2 sources.
This is the 1st AOMP release based on LLVM 19 development.

These are the changes since AOMP 18.0-1:

  • Now use ROCM 6.0.2 sources for non compiler components.
  • Default to not use multiple SDMA engines. This could be changed for testing by setting:
    LIBOMPTARGET_AMDGPU_USE_MULTIPLE_SDMA_ENGINES=true
  • Updates to gpurun utility
    • uses preset HSA_CU_MASK and ROCR_VISIBLE_DEVICES.
  • OpenMP lit testing pre-install, 100% pass rate.
  • [Perf] Implement the loop directive as used in the metadirective for spec.
  • Support for testing QMCpack NiO performance.
  • ROCgdb support for gfx1103.
  • Circumvented driver timing issue with complex signal chains. Post kernel copies occasionally started before kernel completion. The temporary fix forces the wait for kernel completion before scheduling post-kernel copies. For testing purposes, this behavior could be reverted by setting LIBOMPTARGET_SYNC_COPY_BACK=false

Fix regressions:
#616 llvm-addr2line does not work on AOMP-generated binary.

Known Failures:

  • Smoke
    • targetid_multi_image
    • get_mapped_ptr

AOMP Release 18.0-1

15 Jan 21:11
Compare
Choose a tag to compare

These are the release notes for AOMP 18.0-1. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-stg-open". This branch is found in a mirror of upstream LLVM found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. The AMD modifications are experimental and/or/while contributions under review for the upstream trunk. AOMP uses a snapshot of amd-stg-open at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and the use of RPATH for runtime libraries.

For AOMP 18.0-1, the last trunk commit is 5f71aa9270c3d680babfbc6e766773d113c2a79a on January 9, 2023. The last amd-only commit is 71a82c97d882bfece529db41130ef7aaf9696a6e on January 9, 2023 . These commits forms a frozen branch now called "aomp-18.0-1". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-18.0-1.

The integrated ROCm components for this AOMP release were built with ROCM 6.0 sources.
This is the 2nd AOMP release based on LLVM 18 development.

These are the changes since AOMP 18.0-0:

  • Move to ROCm 6.0 sources.

  • Python 3.8 minimum required for building of ompd.

  • Change the default optimization level to -O2 if user does not specify any -O option on the command line. Before 18.0-1, the default was -O0 if none was specified.

  • In this release we are requesting users to install the amdgpu kernel driver found in the ROCm 6.0 amdgpu-dkms package. Nothing prevents the use of an older driver, but for support with current and new problems, we ask users to upgrade to this kernel driver. Run this command to test that your amdgpu kernel driver is on the correct version "modinfo -F version amdgpu". That command should return the string "6.3.6".

  • Remove dependency on ROCm libraries for rocgdb. Reminder: A goal of aomp is to be isolated from ROCm stack with the exception of rocm-dkms which installs the correct kernel driver.

    • The following issues were resolved in this release:
      • #601 AOMP 17.0.3 crashes at compile time with conjunction in if clause
  • While the source build of aomp, supports ASAN (address sanitizer), the release build does not due to complexities in packaging asan. Therefore, the aomp release does not support ASAN.

  • This release has many build enhancements meant for developers that build aomp from source. These changes have little impact to user of AOMP.

    • The build scripts for this release now build openmp host and device libraries using LLVM_ENABLE_RUNTIMES=openmp with the COMPONENT build_project.sh. This was formerly done in the COMPONENT build_openmp.sh. This change allows lit testing of openmp. Extra libraries found in lib-debug and lib-perf directories are still build with the COMPONENT build_openmp.sh. ROCm-device-libs are also built as an external llvm project, which was formerly done by build_libdevice.sh.
    • Developers can now use ccache to speed rebuild process. The use of ccache is the default. To turn this off set AOMP_USE_CCACHE=0.
    • The build scripts have added a hidden directory in the installation called .aomp_component_stats showing the files added to the installation by each build COMPONENT. The current list of build COMPONENTs for AOMP are: prereq project roct rocr openmp extras comgr rocminfo flang-legacy pgmath flang flang_runtime hipcc hipamd rocdbgapi rocgdb roctracer rocprofiler
    • Changes to build_supp.sh which builds prequisite and supplemental components needed for the build and/or testing include: change to version 6.0 for of hsa-amd-aqlprofile, update wget location of silo tarball, change to version 6.0.x for rocmsmilib.
    • Build optimizations to reduce the size of the installation. Previously release binaries had llvm libraries statically linked in. We are no linking in the shared library build of llvm, and this reduces overall size by ~ 50%.
    • Changes to the build of libomptarget DeviceRTL to be code object version agnostic. These changes reduce the amount of files installed on to one per architecture for the opaque-linker + the fat archive for the linker wrapper as well as improve build time for the libraries.
  • The following enhancements were made to the AOMP test infrastructure:

    • Support for openmp lit testing.
    • New testing of UMT.
    • babelstream updates to support USM.
    • babelstream updates to support FORTRAN babelstream.

AOMP Release 18.0-0

12 Sep 19:32
Compare
Choose a tag to compare

These are the release notes for AOMP 18.0-0. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-stg-open". This branch is found in a mirror of upstream LLVM found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. The AMD modifications are experimental and/or/while contributions under review for the upstream trunk. AOMP uses a snapshot of amd-stg-open at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and the use of RPATH for runtime libraries.

For AOMP 18.0-0, the last trunk commit is c3b979e6512b00a5bd9c3e0d4ed986cf500630 on Sept 8, 2023. The last amd-only commit is def7057717b5098f6a9f773fc6e7b2a7f59cdd50 on Sept 11, 2023 . These commits forms a frozen branch now called "aomp-18.0-0". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-18.0-0.

The integrated ROCm components for this AOMP release were built with ROCM 5.6.1 sources.
This is the 1st AOMP release based on LLVM 18 development.

The changes from 17.0-3 to 18.0-0 include:

  • New driver default (opaque offload linker)
    • This driver uses clang-offload-packager to create and extract heterogeneous objects.
    • For amdgpu, the final link phase steps through a series of commands instead of making a single call to clang-linker-wrapper. clang-linker-wrapper obscures the process of linking and embedding offload and host objects. To use clang-linker-wrapper, use command line option --no-opaque-offload-linker.
    • Fix support for multi-arch.
    • Optimizations to remove initial hostexec malloc.
    • This driver uses clang-offload-packager to build and extract heterogeneous objects.
  • Zero copy support for MI300A.
  • Fixed data_share2 smoke test regression.
  • Fix new DeviceRTL schedule clause intermittent fail.
  • Support HIP bundles.
  • Upstream convergence (3490 lines removed)
    • Remove old plugin code.
    • Remove the hostRPC code.
    • DeviceRTL cleanup - Synchronized threads
  • Set default OpenMP to 5.1.
  • Restore safe buffer usage warnings for MIOpen GTest.
  • Fix build to use LLVM-project mono-repo components, ROCm devicelibs and comgr.

Errata:

  • smoke tests flang-272343-3 and flang-299043 get seg faults, both have PARALLEL DO with ENTER MAP and EXIT MAP
  • fprintf intermittent fails (~15%) when writing to open file descriptor, no problems with fprintf to stderr.
  • The non-default option --no-opaque-offload-linker often fails because of problems with clang-linker-wrapper.

AOMP Release 17.0-3

17 Jul 23:39
Compare
Choose a tag to compare

These are the release notes for AOMP 17.0-3. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-stg-open". This branch is found in a mirror of upstream LLVM found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. The AMD modifications are experimental and/or/while contributions under review for the upstream trunk. AOMP uses a snapshot of amd-stg-open at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and its use of RPATH on runtime libraries.

For AOMP 17.0-3, the last trunk commit is ec6b40ab9b577e6e9bf000ccd19d85a9753b6ca8 on JULY 13, 2023. The last amd-only commit is f959ea5d8d1e5aef4b6d06727a9698316d3d33cd on JULY 14, 2023 . These commits form a frozen branch now called "aomp-17.0-3". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-17.0-3.

The integrated ROCm components for this AOMP release were built with ROCM 5.6.0 sources.
This is the 4th AOMP release based on LLVM 17 development.
The changes from 17.0-2 to 17.0-3 include:

  • Non-compiler components are built with ROCm 5.6.0 sources
  • Support code object version 5. The libomptarget device library is now generated for both code object version 4 and code object version 5.
  • flang is no longer a symbolic link to clang. A new binary called flang-legacy has the driver support for flang. This is because the clang driver support for flang is going away. The new driver binary is called flang-legacy which uses a frozen set of driver support from ROCm 5.6 now found in the flang repository.
  • Enabled Big Jump Loop by default.
  • Improved target teams loop transform.
  • Removed the link from flang to clang. Replace it with flang-legacy.
  • Implemented dynamic LDS accesses from non-kernel functions.
  • Performance improvements for small kernels via lazy HSA queue creation and tracking of busy queues.
  • Restored GPU_MAX_HW_QUEUES in AMDGPU nextgen plugin.
  • Extended environment variable ompx_apu_maps to MI200.
  • Added --archive to the clang-offload-packager which repackages the extracted files into a new static library. This allows a fat binary static library to become a static library for a single architecture.
  • Disabled PIE in llvm until build issues in centos and sles are resolved.

Errata:

  • Bug in hip 5.6.0 sources when using code object v5 and -O0 causes program to crash.
  • flang compilations require -fPIC (need fix in flang-legacy for 17.0-4)
  • Smoke test failures
    fprintf (non-deterministic)
    complex_reduction (non-deterministic)
    schedule (non-deterministic)
    flang-274983
    flang-274983-2
    xteamr