PyTorch MPS backend silently fails on non-contiguous tensor operations, causing phantom training bugs

9/10 Critical

On Apple Silicon (MPS backend, PyTorch <2.4), `addcmul_` and `addcdiv_` GPU kernel operations silently fail when writing to non-contiguous output tensors. This caused optimizer state to not update encoder weights, producing a loss plateau that was indistinguishable from a hyperparameter issue and took days to diagnose.

PyTorch Apple Silicon

Sources

the bug that taught me more about PyTorch than years of using it

Collection History

Query: “What are the most common pain points with PyTorch for developers in 2025?”4/4/2026

PyTorch's MPS (Apple Silicon GPU) backend had a kernel bug where `addcmul_` and `addcdiv_` operations silently fail when writing to non-contiguous output tensors. The model appeared to be learning (the decoder was training normally), but progress stalled because the encoder stayed frozen. A subtle plateau that looked exactly like a hyperparameter issue.

Created: 4/4/2026Updated: 4/4/2026