PyTorch hardware-specific backend bugs cause failures across MPS, CUDA, and ONNX

8/10 High

Multiple hardware-specific issues affect PyTorch across different backends: LayerNorm/BatchNorm fail to compile on Apple M4 MPS, Conv2d is slower on macOS without MKLDNN, CUDA CI tests exhibit memory corruption (SIGIOT), and ONNX exports with dynamic inputs regressed between versions. These issues require constant per-platform debugging.

Category
compatibility
Workaround
none
Stage
build
Freshness
persistent
Scope
cross_platform
Upstream
open
Recurring
Yes
Buyer Type
team
Maintainer
active

Sources

Collection History

Query: “What are the most common pain points with PyTorch for developers in 2025?4/4/2026

Issues highlight hardware-specific limitations such as LayerNorm and BatchNorm failing to compile on Apple M4 GPU with MPS backend, Conv2d being slower on macOS CPUs due to missing MKLDNN backend, and FP8 lowering tests failing on certain NVIDIA devices due to hardware constraints. SIGIOT stack smashing errors in CUDA CI tests indicating memory corruption.

Created: 4/4/2026Updated: 4/4/2026