PyTorch MPS backend silently fails on non-contiguous tensor operations, causing phantom training bugs

9/10 Critical

On Apple Silicon (MPS backend, PyTorch <2.4), `addcmul_` and `addcdiv_` GPU kernel operations silently fail when writing to non-contiguous output tensors. This caused optimizer state to not update encoder weights, producing a loss plateau that was indistinguishable from a hyperparameter issue and took days to diagnose.

Category
other
Workaround
solid
Stage
debug
Freshness
declining
Scope
framework
Upstream
open
Recurring
No
Buyer Type
individual
Maintainer
active

Sources

Collection History

Query: “What are the most common pain points with PyTorch for developers in 2025?4/4/2026

PyTorch's MPS (Apple Silicon GPU) backend had a kernel bug where `addcmul_` and `addcdiv_` operations silently fail when writing to non-contiguous output tensors. The model appeared to be learning (the decoder was training normally), but progress stalled because the encoder stayed frozen. A subtle plateau that looked exactly like a hyperparameter issue.

Created: 4/4/2026Updated: 4/4/2026