NVIDIA DIGITS vs. AMD Ryzen AI Max+ 395 “Strix Halo” for LLM Performance

By | February 23, 2025

With Strix Halo already here, Local LLM runners are now waiting for the next big thing in local i.e the NVIDIA DIGITS, dedicated hardware to run local models, it will be based on Ubuntu 22/24 and will come bundled with NVIDIA software ecosystem already installed. In this post we will compare their architectures, specifications, and projected LLM speeds to help you decide which might better suit your needs.

Next big thing: NVIDIA DIGITS

NVIDIA DIGITS is an Arm-based APU aimed at LLM creators, blending NVIDIA’s GPU expertise with a custom 20 core CPU design by Mediatek. It is expected to ship with RTX 5070 class high-performance GPU and up to 128 GB of unified LPDDR5X memory. NVIDIA claims it delivers up to 1 petaflop of FP4 precision performance, targeting local execution of large AI models, this performance is no slouch either, NVIDIA CEO Jensen in one of recent interviews told a youtuber that in 2017 it costed $75,000 of NVIDIA hardware to get similar performance.

Its memory bandwidth is speculated to range between 256 GB/s and 512 GB/s, a critical factor for LLM inference where memory speed often bottlenecks performance. We are expecting it to be 500+ otherwise it will be just too slow for anything. NVIDIA is marketing it as:

World’s Smallest AI Supercomputer Capable of Running 200B-Parameter Models

So hopefully it won’t be too slow in that.

DIGITS is marketed as a competitor to Apple Silicon and AMD’s APUs, with a focus on power efficiency and AI optimization. However, without official benchmarks or a confirmed release, its real-world LLM performance remains speculative.

Overview of AMD Ryzen AI Max+ 395 “Strix Halo”

The AMD Ryzen AI Max+ 395, part of the “Strix Halo” lineup, is a beastly APU launched in early 2025. It boasts 16 Zen 5 CPU cores (32 threads), a Radeon 8060S iGPU with 40 RDNA 3.5 compute units, and a 50 TOPS XDNA 2 NPU. Its standout feature is a quad-channel (256-bit) LPDDR5X-6400 memory interface, delivering 256 GB/s of bandwidth, with up to 128 GB of RAM—96 GB of which can be allocated to the GPU.

AMD claims it can run a 70 billion parameter LLM locally, outperforming the NVIDIA RTX 4090 by 2.2x in tokens per second at an 87% lower TDP (55W vs. 450W).

Designed for high-end laptops and mini PCs, the Ryzen AI Max+ 395 targets gamers, content creators, and AI enthusiasts. Its integrated design eliminates the need for a discrete GPU, making it a compelling all-in-one solution.

Architectural Comparison

  • CPU: The Ryzen AI Max+ 395 features 16 Zen 5 cores with a boost clock up to 5.1 GHz, offering robust multi-threaded performance for tasks beyond AI, like rendering or gaming. NVIDIA DIGITS is expected to use an Arm-based CPU (20 cores), optimized for efficiency rather than raw compute, though its clock speeds and core count remain unconfirmed.
  • GPU: The Radeon 8060S in the Ryzen AI Max+ 395 delivers RTX 4070 Mobile-level gaming performance and contributes significant compute for AI workloads. DIGITS is rumored to feature a cut-down Blackwell GPU, potentially offering superior FP4/FP8 performance due to NVIDIA’s Tensor Core heritage, though its CU count and architecture are unclear.
  • NPU: AMD’s 50 TOPS XDNA 2 NPU is tailored for AI acceleration, while NVIDIA DIGITS may include a next-gen NPU (possibly 70-100 TOPS), leveraging NVIDIA’s AI software ecosystem (e.g., TensorRT) for optimized LLM inference.
  • Memory: Both support up to 128 GB of LPDDR5X, but the Ryzen AI Max+ 395’s confirmed 256 GB/s bandwidth is a known quantity. DIGITS’ bandwidth is uncertain—estimates range from 250 GB/s (matching Strix Halo) to 512 GB/s (if NVIDIA opts for a wider bus or faster RAM), which could drastically affect LLM performance.

LLM Performance Factors

LLM speed depends heavily on memory bandwidth, GPU compute, and software optimization. For large models (e.g., 70B parameters), unified memory capacity is also critical, as it determines whether the model fits entirely in RAM/VRAM without offloading to slower storage. Here’s how the two stack up:

  • Ryzen AI Max+ 395: AMD’s benchmarks suggest it achieves 2.2x the tokens/second of the RTX 4090 (24 GB VRAM) on a 70B model. Independent tests are scarce, but community estimates on forums like Reddit’s r/LocalLLaMA peg its performance at 3-5 tokens/second for a 70B model (quantized to 4-bit), limited by its 256 GB/s bandwidth. With 96 GB allocatable to the GPU, it can handle massive models locally.
  • NVIDIA DIGITS: Without concrete data, we can extrapolate from NVIDIA’s ecosystem. The RTX 4090, with 1 TB/s bandwidth, achieves around 10-15 tokens/second on a 70B Q4 model (per LM Studio benchmarks). If DIGITS’ bandwidth is 250 GB/s, it might match or slightly exceed the Ryzen AI Max+ 395 (4-6 tokens/second). At 512 GB/s, it could reach 8-12 tokens/second, leveraging NVIDIA’s superior AI software stack.

Use Case Suitability

  • Gaming and Multitasking: The Ryzen AI Max+ 395 excels here, with its 16-core CPU and proven 1080p gaming prowess (up to 68% faster than RTX 4070 Mobile). DIGITS may compete in gaming but likely prioritizes AI efficiency over CPU grunt.
  • AI Workloads: DIGITS could edge out in raw LLM inference speed if its bandwidth and NPU deliver as promised, especially with NVIDIA’s software optimizations. The Ryzen AI Max+ 395, however, offers a balanced package for mixed workloads (e.g., RAG with large datasets).
  • Portability and Cost: Strix Halo laptops (e.g., ASUS ROG Flow Z13) start at $2,200-$2,700(for 128GB RAM), reflecting its premium APU design.
  • DIGITS’ pricing starts at $3000.

Digits vs Strix Halo Comparison Table: Estimated LLM Speeds

Model Size (Quantized)NVIDIA DIGITS (Est.)AMD Ryzen AI Max+ 395Notes
13B (Q4)10-15 tokens/s8-12 tokens/sDIGITS assumes 256-512 GB/s; Strix Halo based on 256 GB/s and AMD claims.
32B (Q4)8-12 tokens/s6-10 tokens/sMemory bandwidth limits both; DIGITS may benefit from TensorRT.
70B (Q4)4-12 tokens/s3-5 tokens/sStrix Halo confirmed for 70B; DIGITS range reflects bandwidth uncertainty.

Notes:

  • Speeds are estimates based on AMD’s claims, RTX 4090 baselines, and community speculation (e.g., r/LocalLLaMA).
  • DIGITS’ performance varies with assumed bandwidth (256 GB/s low-end, 512 GB/s high-end).
  • Real-world results await independent testing.

Conclusion

The AMD Ryzen AI Max+ 395 “Strix Halo” is a versatile powerhouse, delivering confirmed LLM capability alongside top-tier gaming and CPU performance. Its 256 GB/s bandwidth is a known limit, but its 128 GB memory capacity and low 55W TDP make it ideal for portable, all-in-one AI workstations. NVIDIA DIGITS, while promising up to 1 petaflop of FP4 performance, remains a wildcard until its release. If it achieves 500+ GB/s bandwidth at a low power and leverages NVIDIA’s software edge, it could outpace Strix Halo in pure LLM inference—though likely at a higher cost and with less CPU flexibility.

DIGITS might redefine local LLM performance in a couple of months and hopefully bring some excitement for the newly branded GPU poors. Are you one?

Leave a Reply