torch.compile caching is slow and incomplete, causing long warm-up times

6/10 Medium

Multiple gaps in PyTorch's compilation caching pipeline — including slow Triton cache artifact loading, excessive small network requests for remote caches with many small graphs, and an incomplete AOTAutograd cache rollout — collectively add significant overhead even on warm-cache runs.

PyTorch torch.compile Triton

Sources

New Years resolutions for PyTorch in 2025

Collection History

Query: “What are the most common pain points with PyTorch for developers in 2025?”4/4/2026

loading Triton cache artifacts takes a long time because we still re-parse the Triton code before doing a cache lookup... if you have a lot of small graphs, remote cache ends up having to do lots of small network requests, instead of one batched network request... AOTAutograd cache is not fully rolled out yet.

Created: 4/4/2026Updated: 4/4/2026