Running LLMs locally using an AMD GPU (RX 7900 XTX and RX 9070 XT etc.) can be done using the AMD funded open source software called Lemonade that you can download on your Windows and Linux machines. Lemonade is local LLM desktop server software that has components and runtime that allows access to NPU on supported CPU and Graphics for local LLM inference and has both GUI and command line interfaces. Check out below to see How you can download the GUI version of installation of Lemonade and run LLMs without sending data to cloud.

Download Lemonade exe (AMD LLM software) for Windows 10
Lemonade_Server_Installer exe link


Currently Trending LLM models for local use
Model Name | Repository | Description | Download Link |
---|---|---|---|
Qwen3-30B-A3B-Instruct-2507-GGUF | unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF | Updated version of Qwen3-30B-A3B non-thinking mode with enhancements in instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage. Supports up to 262144 context length. | Hugging Face |
Qwen3-Coder-30B-A3B-Instruct-GGUF | unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF | Coding-focused variant of Qwen3-30B-A3B-Instruct, optimized for programming tasks with support for long outputs up to 65,536 tokens. Features improved reasoning and tool usage. | Hugging Face |
gpt-oss-120b-GGUF | unsloth/gpt-oss-120b-GGUF | Open-weight Mixture-of-Experts (MoE) model from OpenAI with 117B total parameters (5.1B active per token), excelling in coding, math, health, and agentic tool use. Apache 2.0 licensed. | Hugging Face |
gpt-oss-20b-GGUF | unsloth/gpt-oss-20b-GGUF | Smaller open-weight MoE model from OpenAI with 21B total parameters (3.6B active per token), suitable for fine-tuning on consumer hardware. Strong performance in reasoning tasks. | Hugging Face |
GLM-4.5-Air-UD-Q4K-XL-GGUF | unsloth/GLM-4.5-Air-GGUF | Quantized (UD-Q4_K_XL) version of GLM-4.5-Air, a high-performance MoE model praised for fast tool calls, reasoning, and agentic capabilities. Supports up to 32K context. | Hugging Face |
Alternatives to Ryzen AI/LLM Software :
If Lemonade is not your cup of tea 😀 then we have some other recommendations that work with both Nvidia and AMD graphics card and are quite popular too:
Jan Ai – This one I use currently and it is open source like Lemonade
LM Studio – Closed source, another brilliant alternative
Jan AI | Open-source ChatGPT alternative that runs 100% offline on your computer, supporting local LLMs like those from Hugging Face. | Official Download |
LM Studio | Desktop app to discover, download, and run local LLMs (e.g., Llama, Gemma, Qwen) privately on your machine. | Official Download |
Lemonade Intro Video
Lemonade software supported AMD GPU Series
GPU Series | Architecture | Key Examples | Notes |
---|---|---|---|
Radeon 7000 Series | RDNA 3 | RX 7900 XTX, RX 7800 XT | Discrete GPUs, strong Vulkan support for high-end LLM inference, hybrid capable with Ryzen AI. |
Radeon 9000 Series | RDNA 4 | RX 9070 XT, RX 9900 XTX | Latest gen (2024–2025), enhanced AI/RT features; primary focus for new deployments. |
Ryzen AI Integrated GPUs (7000/8000/300 Series) | RDNA 2/3 | Ryzen 7 7840HS iGPU, Ryzen AI 9 HX 370 iGPU | Integrated in APUs, Vulkan-accelerated,essential for hybrid NPU+iGPU on laptops/desktops. |
Older AMD GPU with Vulkan drivers (e.g. RX 6000 series on RDNA 2) can also work, but performance is best on the above. ROCm support (for Linux) targets similar modern hardware.