You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: manual/tracy.tex
+29Lines changed: 29 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1707,6 +1707,35 @@ \subsubsection{CUDA}
1707
1707
1708
1708
To stop profiling, call the \texttt{TracyCUDAStopProfiling(ctx)} macro.
1709
1709
1710
+
\subsubsection{ROCm}
1711
+
1712
+
On Linux, if rocprofiler-sdk is installed, tracy can automatically trace GPU dispatches and collect
1713
+
performance counter values. If CMake can't find rocprofiler-sdk, you can set the CMake variable
1714
+
\texttt{rocprofiler-sdk\_DIR} to point it at the correct module directory. Use the
1715
+
\texttt{TRACY\_ROCPROF\_COUNTERS} environment variable with the desired counters separated by commas
1716
+
to control what values are collected. The results will appear for each dispatch in the tool tip and
1717
+
zone detail window. Results are summed across dimensions. You can get a list of the counters
1718
+
available for your hardware with this command:
1719
+
\begin{lstlisting}[language=sh]
1720
+
rocprofv3 -L
1721
+
\end{lstlisting}
1722
+
1723
+
\subparagraph{Troubleshooting}
1724
+
\begin{itemize}
1725
+
\item If you are taking very long captures, you may see drift between the GPU and
1726
+
CPU timelines. This may be mitigated by setting the CMake variable
1727
+
\texttt{TRACY\_ROCPROF\_CALIBRATION}, which will refresh the time synchronization about every
1728
+
second.
1729
+
\item The timeline drift may also be affected by network time synchronization, in which case the
1730
+
drift will be reduced by disabling that, with the advantage that there is no application performance
1731
+
cost.
1732
+
\item On some GPUs, you will need to change the the performance level to see non-zero results from
1733
+
some counters. Use this command:
1734
+
\begin{lstlisting}[language=sh]
1735
+
sudo amd-smi set -g 0 -l stable_std
1736
+
\end{lstlisting}
1737
+
\end{itemize}
1738
+
1710
1739
\subsubsection{Multiple zones in one scope}
1711
1740
1712
1741
Putting more than one GPU zone macro in a single scope features the same issue as with the \texttt{ZoneScoped} macros, described in section~\ref{multizone} (but this time the variable name is \texttt{\_\_\_tracy\_gpu\_zone}).
0 commit comments