Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new tutorial for compile time caching #3277

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Add a new tutorial for compile time caching #3277

wants to merge 6 commits into from

Conversation

oulgen
Copy link
Contributor

@oulgen oulgen commented Feb 26, 2025

Previous tutorial was not fleshed out and primary talked about configurations.

cc @williamwen42 @msaroufim @anijain2305

Copy link

pytorch-bot bot commented Feb 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3277

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 8f5fd59 with merge base a543d05 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@oulgen oulgen marked this pull request as draft February 26, 2025 21:37
@svekars svekars added torch.compile Torch compile and other relevant tutorials new tutorial skip-link-check Will allow you to skip linkcheck on a PR. Should only should be used when a link can't be fixed. labels Feb 26, 2025
Inductor Cache Settings
----------------------------

Most of these caches are in-memory, only used within the same process, and are transparent to the user. An exception is caches that store compiled FX graphs (FXGraphCache, AOTAutogradCache). These caches allow Inductor to avoid recompilation across process boundaries when it encounters the same graph with the same Tensor input shapes (and the same configuration). The default implementation stores compiled artifacts in the system temp directory. An optional feature also supports sharing those artifacts within a cluster by storing them in a Redis database.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put all the names of methods/APIs in double backticks:

Suggested change
Most of these caches are in-memory, only used within the same process, and are transparent to the user. An exception is caches that store compiled FX graphs (FXGraphCache, AOTAutogradCache). These caches allow Inductor to avoid recompilation across process boundaries when it encounters the same graph with the same Tensor input shapes (and the same configuration). The default implementation stores compiled artifacts in the system temp directory. An optional feature also supports sharing those artifacts within a cluster by storing them in a Redis database.
Most of these caches are in-memory, only used within the same process, and are transparent to the user. An exception is caches that store compiled FX graphs (``FXGraphCache``, ``AOTAutogradCache``). These caches allow Inductor to avoid recompilation across process boundaries when it encounters the same graph with the same Tensor input shapes (and the same configuration). The default implementation stores compiled artifacts in the system temp directory. An optional feature also supports sharing those artifacts within a cluster by storing them in a Redis database.

* ``torch.compiler.save_cache_artifacts()``
* ``torch.compiler.load_cache_artifacts()``

The intented use case is after compiling and executing a model, the user calls ``torch.compiler.save_cache_artifacts()`` which will return the compiler artifacts in a portable form. Later, potentially on a different machine, the user may call ``torch.compiler.load_cache_artifacts()`` with these artifacts to prepopulate the ``torch.compile`` caches in order to jump-start their cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The intented use case is after compiling and executing a model, the user calls ``torch.compiler.save_cache_artifacts()`` which will return the compiler artifacts in a portable form. Later, potentially on a different machine, the user may call ``torch.compiler.load_cache_artifacts()`` with these artifacts to prepopulate the ``torch.compile`` caches in order to jump-start their cache.
The intended use case is after compiling and executing a model, the user calls ``torch.compiler.save_cache_artifacts()`` which will return the compiler artifacts in a portable form. Later, potentially on a different machine, the user may call ``torch.compiler.load_cache_artifacts()`` with these artifacts to prepopulate the ``torch.compile`` caches in order to jump-start their cache.

@oulgen oulgen marked this pull request as ready for review February 26, 2025 21:59

It is important to note that caching validates that the cache artifacts are used with the same PyTorch and Triton version, as well as, same GPU when device is set to be cuda.

``torch.compile`` end-to-end caching (a.k.a. ``Mega-Cache``)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``torch.compile`` end-to-end caching (a.k.a. ``Mega-Cache``)
``torch.compile`` end-to-end caching


End to end caching, from here onwards referred to Mega-Cache, is the ideal solution for users looking for a portable caching solution that can be stored in a database and can later be fetched possibly on a separate machine.

``Mega-Cache`` provides two compiler APIs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``Mega-Cache`` provides two compiler APIs
``Mega-Cache`` provides two compiler APIs:


The intended use case is after compiling and executing a model, the user calls ``torch.compiler.save_cache_artifacts()`` which will return the compiler artifacts in a portable form. Later, potentially on a different machine, the user may call ``torch.compiler.load_cache_artifacts()`` with these artifacts to pre-populate the ``torch.compile`` caches in order to jump-start their cache.

An example to this is as follows. First, compile and save the cache artifacts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An example to this is as follows. First, compile and save the cache artifacts.
Consider the following example. First, compile and save the cache artifacts:


# Now, potentially store these artifacts in a database

Later, the user can jump-start their cache by the following.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Later, the user can jump-start their cache by the following.
Later, you can jump-start the cache by the following:

TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This setting enables the remote FX graph cache feature. The current implementation uses Redis. ``1`` enables caching, and any other value disables it. The following environment variables configure the host and port of the Redis server:
The above described MegaCache is also compromised of individual components that can be used without any user intervention. By default, PyTorch Compiler comes with local on-disk caches for ``TorchDynamo``, ``TorchInductor``, and ``Triton``. These caches are as following.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The above described MegaCache is also compromised of individual components that can be used without any user intervention. By default, PyTorch Compiler comes with local on-disk caches for ``TorchDynamo``, ``TorchInductor``, and ``Triton``. These caches are as following.
The aforementioned ``MegaCache``is composed of individual components that can be used without any user intervention. By default, PyTorch Compiler comes with local on-disk caches for ``TorchDynamo``, ``TorchInductor``, and ``Triton``. These caches include:


TORCHINDUCTOR_AUTOGRAD_REMOTE_CACHE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Like ``TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE``, this setting enables the remote ``AOTAutogradCache`` feature. The current implementation uses Redis. ``1`` enables caching, and any other value disables it. The following environment variables configure the host and port of the ``Redis`` server:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Like ``TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE``, this setting enables the remote ``AOTAutogradCache`` feature. The current implementation uses Redis. ``1`` enables caching, and any other value disables it. The following environment variables configure the host and port of the ``Redis`` server:
Similar to ``TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE``, this setting enables the remote ``AOTAutogradCache`` feature. The current implementation uses Redis. Setting it to ``1`` enables caching, while any other value disables it. The following environment variables are used to configure the host and port of the Redis server:

TORCHINDUCTOR_AUTOGRAD_REMOTE_CACHE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Like ``TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE``, this setting enables the remote ``AOTAutogradCache`` feature. The current implementation uses Redis. ``1`` enables caching, and any other value disables it. The following environment variables configure the host and port of the ``Redis`` server:
``TORCHINDUCTOR_REDIS_HOST`` (defaults to ``localhost``)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``TORCHINDUCTOR_REDIS_HOST`` (defaults to ``localhost``)
* ``TORCHINDUCTOR_REDIS_HOST`` (defaults to ``localhost``)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Like ``TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE``, this setting enables the remote ``AOTAutogradCache`` feature. The current implementation uses Redis. ``1`` enables caching, and any other value disables it. The following environment variables configure the host and port of the ``Redis`` server:
``TORCHINDUCTOR_REDIS_HOST`` (defaults to ``localhost``)
``TORCHINDUCTOR_REDIS_PORT`` (defaults to ``6379``)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``TORCHINDUCTOR_REDIS_PORT`` (defaults to ``6379``)
* ``TORCHINDUCTOR_REDIS_PORT`` (defaults to ``6379``)

``TORCHINDUCTOR_REDIS_HOST`` (defaults to ``localhost``)
``TORCHINDUCTOR_REDIS_PORT`` (defaults to ``6379``)

`TORCHINDUCTOR_AUTOGRAD_REMOTE_CACHE`` depends on ``TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE`` to be enabled to work. The same Redis server can store both AOTAutograd and FXGraph cache results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`TORCHINDUCTOR_AUTOGRAD_REMOTE_CACHE`` depends on ``TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE`` to be enabled to work. The same Redis server can store both AOTAutograd and FXGraph cache results.
``TORCHINDUCTOR_AUTOGRAD_REMOTE_CACHE`` requires ``TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE`` to be enabled in order to function. The same Redis server can be used to store both AOTAutograd and FXGraph cache results.


TORCHINDUCTOR_AUTOTUNE_REMOTE_CACHE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This setting enables a remote cache for ``TorchInductor``’s autotuner. As with the remote FX graph cache, the current implementation uses Redis. ``1`` enables caching, and any other value disables it. The same host / port environment variables listed above apply to this cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This setting enables a remote cache for ``TorchInductor``’s autotuner. As with the remote FX graph cache, the current implementation uses Redis. ``1`` enables caching, and any other value disables it. The same host / port environment variables listed above apply to this cache.
This setting enables a remote cache for ``TorchInductor``’s autotuner. Similar to the remote FX graph cache, the current implementation uses Redis. Setting it to ``1`` enables caching, while any other value disables it. The same host and port environment variables mentioned above apply to this cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed new tutorial skip-link-check Will allow you to skip linkcheck on a PR. Should only should be used when a link can't be fixed. torch.compile Torch compile and other relevant tutorials
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants