torch.compile caching is slow and incomplete, causing long warm-up times

6/10 Medium

Multiple gaps in PyTorch's compilation caching pipeline — including slow Triton cache artifact loading, excessive small network requests for remote caches with many small graphs, and an incomplete AOTAutograd cache rollout — collectively add significant overhead even on warm-cache runs.

Category
performance
Workaround
partial
Stage
build
Freshness
persistent
Scope
single_lib
Upstream
open
Recurring
Yes
Buyer Type
team
Maintainer
active

Sources

Collection History

Query: “What are the most common pain points with PyTorch for developers in 2025?4/4/2026

loading Triton cache artifacts takes a long time because we still re-parse the Triton code before doing a cache lookup... if you have a lot of small graphs, remote cache ends up having to do lots of small network requests, instead of one batched network request... AOTAutograd cache is not fully rolled out yet.

Created: 4/4/2026Updated: 4/4/2026