Apple Silicon vs NVIDIA for local LLMs

Unified memory vs VRAM, MLX vs CUDA, M-series Ultra vs RTX 5090. Here's how the two architectures actually compare on real submitted runs.

Verdict

No models yet have a submission on both an Apple chip and an NVIDIA GPU, so the architectural comparison is currently empty. Submit a run on either side and this section will pair the data automatically.

No data submitted for this task yet.

Run the suite to be the first benchmark for this guide:

$ pipx install llm-speed && llm-speed bench

read the methodology

Apple Silicon and NVIDIA's discrete GPUs solve the local-LLM problem from opposite directions. Apple's M-series Ultra exposes up to 192 GB of unified memory at ~800 GB/s bandwidth, fitting any open-weights model the community has shipped — but decode speed is bandwidth-bound, so a smaller model often runs faster on a 1 TB/s+ NVIDIA card. NVIDIA's RTX 5090 ships 32 GB GDDR7 at ~1.8 TB/s, the H100 SXM ships 80 GB HBM3 at ~3 TB/s, and CUDA's software stack has the deepest tooling. Below: every model where we have a submitted run on both an Apple chip and an NVIDIA GPU, side by side.

Side-by-side comparisons

See also: All hardware · All models · Methodology