In this another lets compare series we are going to compare the upcoming (in two weeks) NVIDIA (RTX 5090, 5080 , 5070ti and 5070) with older generation based on the CUDA cores and memory bandwidth alone, both of these metrics are required to make a buying decision if want to run AI locally on your computer. Nvidia’s CUDA cores and VRAM is the gold in the current AI industry, these determine whether your PC will be handle the tasks like local AI, image generation using Flux etc. As per the state of current models you will be needing a lot more VRAM in the days ahead.
Cuda cores and bandwidth comparison for NVIDIA GPUs:
Graphics Card | CUDA Cores | Memory Bandwidth (GB/s) | Memory Type | Memory Interface (bits) | Memory Size (GB) |
---|---|---|---|---|---|
RTX 5090 | 21760 | 1792 | GDDR7 | 512 | 32 |
RTX 5080 | 10752 | 960 | GDDR7 | 256 | 16 |
RTX 5070 Ti | 8960 | 896 | GDDR7 | 256 | 16 |
RTX 5070 | 6144 | 672 | GDDR7 | 192 | 12 |
RTX 4090 | 16384 | 1008 | GDDR6X | 384 | 24 |
RTX 4080 | 10240 | 736 | GDDR6X | 256 | 16 |
RTX 4070 Ti | 8448 | 504 | GDDR6X | 192 | 16 |
RTX 4070 | 7168 | 504 | GDDR6X | 192 | 12 |
RTX 3090 | 10496 | 936 | GDDR6X | 384 | 24 |
RTX 3080 Ti | 10240 | 912 | GDDR6X | 320 | 12 |
RTX 3070 | 5888 | 448 | GDDR6 | 256 | 8 |
RTX 3060 Ti | 4864 | 448 | GDDR6 | 256 | 8 |
Which one to buy?
Which NVIDIA 5000 series GPU to buy if you wish to run a Qwen2.5-Coder-32B q4 model or similar ones locally?
It is suggested that you buy only 16GB VRAM and above graphics card as per my experience. For LLM’s alone all future purchases if you are purchasing new should be above 16GB. For eg. I am able to run the above model locally using a 3080ti that only has 12gb VRAM and allowing the q4 model to spill over to memory that slows the token generation to about 3-4 tokens/second.
Also the upcoming GPUs have hardware decoders for 4 bits built into tensor cores that should significantly boost the local run of quantized AI models.