-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new tutorial for compile time caching #3277
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3277
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 8f5fd59 with merge base a543d05 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Inductor Cache Settings | ||
---------------------------- | ||
|
||
Most of these caches are in-memory, only used within the same process, and are transparent to the user. An exception is caches that store compiled FX graphs (FXGraphCache, AOTAutogradCache). These caches allow Inductor to avoid recompilation across process boundaries when it encounters the same graph with the same Tensor input shapes (and the same configuration). The default implementation stores compiled artifacts in the system temp directory. An optional feature also supports sharing those artifacts within a cluster by storing them in a Redis database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put all the names of methods/APIs in double backticks:
Most of these caches are in-memory, only used within the same process, and are transparent to the user. An exception is caches that store compiled FX graphs (FXGraphCache, AOTAutogradCache). These caches allow Inductor to avoid recompilation across process boundaries when it encounters the same graph with the same Tensor input shapes (and the same configuration). The default implementation stores compiled artifacts in the system temp directory. An optional feature also supports sharing those artifacts within a cluster by storing them in a Redis database. | |
Most of these caches are in-memory, only used within the same process, and are transparent to the user. An exception is caches that store compiled FX graphs (``FXGraphCache``, ``AOTAutogradCache``). These caches allow Inductor to avoid recompilation across process boundaries when it encounters the same graph with the same Tensor input shapes (and the same configuration). The default implementation stores compiled artifacts in the system temp directory. An optional feature also supports sharing those artifacts within a cluster by storing them in a Redis database. |
* ``torch.compiler.save_cache_artifacts()`` | ||
* ``torch.compiler.load_cache_artifacts()`` | ||
|
||
The intented use case is after compiling and executing a model, the user calls ``torch.compiler.save_cache_artifacts()`` which will return the compiler artifacts in a portable form. Later, potentially on a different machine, the user may call ``torch.compiler.load_cache_artifacts()`` with these artifacts to prepopulate the ``torch.compile`` caches in order to jump-start their cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intented use case is after compiling and executing a model, the user calls ``torch.compiler.save_cache_artifacts()`` which will return the compiler artifacts in a portable form. Later, potentially on a different machine, the user may call ``torch.compiler.load_cache_artifacts()`` with these artifacts to prepopulate the ``torch.compile`` caches in order to jump-start their cache. | |
The intended use case is after compiling and executing a model, the user calls ``torch.compiler.save_cache_artifacts()`` which will return the compiler artifacts in a portable form. Later, potentially on a different machine, the user may call ``torch.compiler.load_cache_artifacts()`` with these artifacts to prepopulate the ``torch.compile`` caches in order to jump-start their cache. |
|
||
It is important to note that caching validates that the cache artifacts are used with the same PyTorch and Triton version, as well as, same GPU when device is set to be cuda. | ||
|
||
``torch.compile`` end-to-end caching (a.k.a. ``Mega-Cache``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``torch.compile`` end-to-end caching (a.k.a. ``Mega-Cache``) | |
``torch.compile`` end-to-end caching |
|
||
End to end caching, from here onwards referred to Mega-Cache, is the ideal solution for users looking for a portable caching solution that can be stored in a database and can later be fetched possibly on a separate machine. | ||
|
||
``Mega-Cache`` provides two compiler APIs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``Mega-Cache`` provides two compiler APIs | |
``Mega-Cache`` provides two compiler APIs: |
|
||
The intended use case is after compiling and executing a model, the user calls ``torch.compiler.save_cache_artifacts()`` which will return the compiler artifacts in a portable form. Later, potentially on a different machine, the user may call ``torch.compiler.load_cache_artifacts()`` with these artifacts to pre-populate the ``torch.compile`` caches in order to jump-start their cache. | ||
|
||
An example to this is as follows. First, compile and save the cache artifacts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example to this is as follows. First, compile and save the cache artifacts. | |
Consider the following example. First, compile and save the cache artifacts: |
|
||
# Now, potentially store these artifacts in a database | ||
|
||
Later, the user can jump-start their cache by the following. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later, the user can jump-start their cache by the following. | |
Later, you can jump-start the cache by the following: |
TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
This setting enables the remote FX graph cache feature. The current implementation uses Redis. ``1`` enables caching, and any other value disables it. The following environment variables configure the host and port of the Redis server: | ||
The above described MegaCache is also compromised of individual components that can be used without any user intervention. By default, PyTorch Compiler comes with local on-disk caches for ``TorchDynamo``, ``TorchInductor``, and ``Triton``. These caches are as following. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above described MegaCache is also compromised of individual components that can be used without any user intervention. By default, PyTorch Compiler comes with local on-disk caches for ``TorchDynamo``, ``TorchInductor``, and ``Triton``. These caches are as following. | |
The aforementioned ``MegaCache``is composed of individual components that can be used without any user intervention. By default, PyTorch Compiler comes with local on-disk caches for ``TorchDynamo``, ``TorchInductor``, and ``Triton``. These caches include: |
|
||
TORCHINDUCTOR_AUTOGRAD_REMOTE_CACHE | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
Like ``TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE``, this setting enables the remote ``AOTAutogradCache`` feature. The current implementation uses Redis. ``1`` enables caching, and any other value disables it. The following environment variables configure the host and port of the ``Redis`` server: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like ``TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE``, this setting enables the remote ``AOTAutogradCache`` feature. The current implementation uses Redis. ``1`` enables caching, and any other value disables it. The following environment variables configure the host and port of the ``Redis`` server: | |
Similar to ``TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE``, this setting enables the remote ``AOTAutogradCache`` feature. The current implementation uses Redis. Setting it to ``1`` enables caching, while any other value disables it. The following environment variables are used to configure the host and port of the Redis server: | |
TORCHINDUCTOR_AUTOGRAD_REMOTE_CACHE | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
Like ``TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE``, this setting enables the remote ``AOTAutogradCache`` feature. The current implementation uses Redis. ``1`` enables caching, and any other value disables it. The following environment variables configure the host and port of the ``Redis`` server: | ||
``TORCHINDUCTOR_REDIS_HOST`` (defaults to ``localhost``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``TORCHINDUCTOR_REDIS_HOST`` (defaults to ``localhost``) | |
* ``TORCHINDUCTOR_REDIS_HOST`` (defaults to ``localhost``) |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
Like ``TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE``, this setting enables the remote ``AOTAutogradCache`` feature. The current implementation uses Redis. ``1`` enables caching, and any other value disables it. The following environment variables configure the host and port of the ``Redis`` server: | ||
``TORCHINDUCTOR_REDIS_HOST`` (defaults to ``localhost``) | ||
``TORCHINDUCTOR_REDIS_PORT`` (defaults to ``6379``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``TORCHINDUCTOR_REDIS_PORT`` (defaults to ``6379``) | |
* ``TORCHINDUCTOR_REDIS_PORT`` (defaults to ``6379``) |
``TORCHINDUCTOR_REDIS_HOST`` (defaults to ``localhost``) | ||
``TORCHINDUCTOR_REDIS_PORT`` (defaults to ``6379``) | ||
|
||
`TORCHINDUCTOR_AUTOGRAD_REMOTE_CACHE`` depends on ``TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE`` to be enabled to work. The same Redis server can store both AOTAutograd and FXGraph cache results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`TORCHINDUCTOR_AUTOGRAD_REMOTE_CACHE`` depends on ``TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE`` to be enabled to work. The same Redis server can store both AOTAutograd and FXGraph cache results. | |
``TORCHINDUCTOR_AUTOGRAD_REMOTE_CACHE`` requires ``TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE`` to be enabled in order to function. The same Redis server can be used to store both AOTAutograd and FXGraph cache results. |
|
||
TORCHINDUCTOR_AUTOTUNE_REMOTE_CACHE | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
This setting enables a remote cache for ``TorchInductor``’s autotuner. As with the remote FX graph cache, the current implementation uses Redis. ``1`` enables caching, and any other value disables it. The same host / port environment variables listed above apply to this cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This setting enables a remote cache for ``TorchInductor``’s autotuner. As with the remote FX graph cache, the current implementation uses Redis. ``1`` enables caching, and any other value disables it. The same host / port environment variables listed above apply to this cache. | |
This setting enables a remote cache for ``TorchInductor``’s autotuner. Similar to the remote FX graph cache, the current implementation uses Redis. Setting it to ``1`` enables caching, while any other value disables it. The same host and port environment variables mentioned above apply to this cache. |
Previous tutorial was not fleshed out and primary talked about configurations.
cc @williamwen42 @msaroufim @anijain2305