Download Ryzen AI/LLM Software for Desktop - Windows And Linux

Running LLMs locally using an AMD GPU (RX 7900 XTX and RX 9070 XT etc.) can be done using the AMD funded open source software called Lemonade that you can download on your Windows and Linux machines. Lemonade is local LLM desktop server software that has components and runtime that allows access to NPU on supported CPU and Graphics for local LLM inference and has both GUI and command line interfaces. Check out below to see How you can download the GUI version of installation of Lemonade and run LLMs without sending data to cloud.

Download Lemonade exe (AMD LLM software) for Windows 10

Download site

Lemonade_Server_Installer exe link

Currently Trending LLM models for local use

Model Name	Repository	Description	Download Link
Qwen3-30B-A3B-Instruct-2507-GGUF	unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF	Updated version of Qwen3-30B-A3B non-thinking mode with enhancements in instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage. Supports up to 262144 context length.	Hugging Face
Qwen3-Coder-30B-A3B-Instruct-GGUF	unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF	Coding-focused variant of Qwen3-30B-A3B-Instruct, optimized for programming tasks with support for long outputs up to 65,536 tokens. Features improved reasoning and tool usage.	Hugging Face
gpt-oss-120b-GGUF	unsloth/gpt-oss-120b-GGUF	Open-weight Mixture-of-Experts (MoE) model from OpenAI with 117B total parameters (5.1B active per token), excelling in coding, math, health, and agentic tool use. Apache 2.0 licensed.	Hugging Face
gpt-oss-20b-GGUF	unsloth/gpt-oss-20b-GGUF	Smaller open-weight MoE model from OpenAI with 21B total parameters (3.6B active per token), suitable for fine-tuning on consumer hardware. Strong performance in reasoning tasks.	Hugging Face
GLM-4.5-Air-UD-Q4K-XL-GGUF	unsloth/GLM-4.5-Air-GGUF	Quantized (UD-Q4_K_XL) version of GLM-4.5-Air, a high-performance MoE model praised for fast tool calls, reasoning, and agentic capabilities. Supports up to 32K context.	Hugging Face

Alternatives to Ryzen AI/LLM Software :

If Lemonade is not your cup of tea 😀 then we have some other recommendations that work with both Nvidia and AMD graphics card and are quite popular too:

Jan Ai – This one I use currently and it is open source like Lemonade

LM Studio – Closed source, another brilliant alternative

Jan AI	Open-source ChatGPT alternative that runs 100% offline on your computer, supporting local LLMs like those from Hugging Face.	Official Download
LM Studio	Desktop app to discover, download, and run local LLMs (e.g., Llama, Gemma, Qwen) privately on your machine.	Official Download

Lemonade Intro Video

Lemonade software supported AMD GPU Series

GPU Series	Architecture	Key Examples	Notes
Radeon 7000 Series	RDNA 3	RX 7900 XTX, RX 7800 XT	Discrete GPUs, strong Vulkan support for high-end LLM inference, hybrid capable with Ryzen AI.
Radeon 9000 Series	RDNA 4	RX 9070 XT, RX 9900 XTX	Latest gen (2024–2025), enhanced AI/RT features; primary focus for new deployments.
Ryzen AI Integrated GPUs (7000/8000/300 Series)	RDNA 2/3	Ryzen 7 7840HS iGPU, Ryzen AI 9 HX 370 iGPU	Integrated in APUs, Vulkan-accelerated,essential for hybrid NPU+iGPU on laptops/desktops.

Older AMD GPU with Vulkan drivers (e.g. RX 6000 series on RDNA 2) can also work, but performance is best on the above. ROCm support (for Linux) targets similar modern hardware.