Nvidia RTX 5090 and 5080 Inference Performance numbers

Nvidia’s new RTX GPUs are here, they are expensive with little availability and no one is talking about the AI features and inference speed that they offer, so Nvidia itself has made a post that compares three top GPUs that also includes comparison with AMD 7900XTX, a 24GB VRAM competitor.

RTX 5090 and AMD 7900XTX Inference speed comparison

Model	Radeon 7900XTX (Tokens/sec)	NVIDIA RTX4090 (Tokens/sec)	NVIDIA RTX5090 (Tokens/sec)	NVIDIA RTX 5080 (Approx.)
DeepSeek R1 Distill Owen 7b	~110	160 (+46%)	200+ (+103%)	~120
DeepSeek R1 Distill Llama 8b	~98	145 (+47%)	200+ (+106%)	~115
DeepSeek R1 Distill Owen 32b	~17	~30 (+47%)	50+ (+124%)	~6-10 (Fall off to System RAM)

Nvidia didn’t tested the RTX 5080, the value provided are approximate values. The test is fair even though NVIDIA 5090 has 32GB VRAM advantage, as it is 4 bit quantization model(~20GB) that can fit fully inside VRAM off all cards other than 5080 in which case it will fall to system RAM and speeds will crawl.

Judging by the inference speeds 7900XTX isn’t that bad as it provide respectable speeds, available readily at a fair price, I think NVIDIA is marketing an AMD card here.

Exact model used in the test is available below:

https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF

DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf