PyTorch hardware-specific backend bugs cause failures across MPS, CUDA, and ONNX

8/10 High

Multiple hardware-specific issues affect PyTorch across different backends: LayerNorm/BatchNorm fail to compile on Apple M4 MPS, Conv2d is slower on macOS without MKLDNN, CUDA CI tests exhibit memory corruption (SIGIOT), and ONNX exports with dynamic inputs regressed between versions. These issues require constant per-platform debugging.

PyTorch CUDA ONNX Apple Silicon

Sources

Weekly GitHub Report for Pytorch: December 01, 2025

Collection History

Query: “What are the most common pain points with PyTorch for developers in 2025?”4/4/2026

Issues highlight hardware-specific limitations such as LayerNorm and BatchNorm failing to compile on Apple M4 GPU with MPS backend, Conv2d being slower on macOS CPUs due to missing MKLDNN backend, and FP8 lowering tests failing on certain NVIDIA devices due to hardware constraints. SIGIOT stack smashing errors in CUDA CI tests indicating memory corruption.

Created: 4/4/2026Updated: 4/4/2026