NVIDIA RTX 5000 series CUDA cores and Memory bandwidth comparison table

By | January 17, 2025

In this another lets compare series we are going to compare the upcoming (in two weeks) NVIDIA (RTX 5090, 5080 , 5070ti and 5070) with older generation based on the CUDA cores and memory bandwidth alone, both of these metrics are required to make a buying decision if want to run AI locally on your computer. Nvidia’s CUDA cores and VRAM is the gold in the current AI industry, these determine whether your PC will be handle the tasks like local AI, image generation using Flux etc. As per the state of current models you will be needing a lot more VRAM in the days ahead.

Cuda cores and bandwidth comparison for NVIDIA GPUs:

Graphics CardCUDA CoresMemory Bandwidth (GB/s)Memory TypeMemory Interface (bits)Memory Size (GB)
RTX 5090217601792GDDR751232
RTX 508010752960GDDR725616
RTX 5070 Ti8960896GDDR725616
RTX 50706144672GDDR719212
RTX 4090163841008GDDR6X38424
RTX 408010240736GDDR6X25616
RTX 4070 Ti8448504GDDR6X19216
RTX 40707168504GDDR6X19212
RTX 309010496936GDDR6X38424
RTX 3080 Ti10240912GDDR6X32012
RTX 30705888448GDDR62568
RTX 3060 Ti4864448GDDR62568

Which one to buy?

Which NVIDIA 5000 series GPU to buy if you wish to run a Qwen2.5-Coder-32B q4 model or similar ones locally?

It is suggested that you buy only 16GB VRAM and above graphics card as per my experience. For LLM’s alone all future purchases if you are purchasing new should be above 16GB. For eg. I am able to run the above model locally using a 3080ti that only has 12gb VRAM and allowing the q4 model to spill over to memory that slows the token generation to about 3-4 tokens/second.

Also the upcoming GPUs have hardware decoders for 4 bits built into tensor cores that should significantly boost the local run of quantized AI models.

Leave a Reply