NPU GenAI
The Developer Hub for On-Device AI
Resources, guides, and community for developers building with NPU-accelerated generative AI — across AMD, Intel, Qualcomm, and beyond.
Featured Topics
Start here — popular guides and resources from the community
Getting Started with NPU Development
From zero to your first NPU-accelerated inference. Set up your environment, install the right SDK, and run a model on-device in under 30 minutes.
Running LLaMA 3.2 on NPU
Deploy Meta's LLaMA 3.2 with INT4 quantization on your NPU using ONNX Runtime and Olive. Includes benchmark results across AMD, Intel, and Qualcomm hardware.
ONNX Runtime NPU Execution Provider
How to configure ONNX Runtime to target your NPU with DirectML and platform-specific execution providers. Covers model export, EP selection, and optimization flags.
Model Quantization with Olive
Microsoft Olive makes it easy to quantize and optimize models for NPU targets. This guide walks through INT4/INT8 quantization pipelines for common LLMs and vision models.
Whisper Speech Recognition on NPU
Real-time transcription with OpenAI Whisper running entirely on your NPU. Latency comparisons between NPU, CPU, and GPU across multiple device categories.
OpenVINO on Intel NPU Deep Dive
Intel's OpenVINO toolkit for deploying models on Intel NPU. From model conversion to async inference pipelines — covering Core Ultra 100H and 200V series.
Latest Updates
What's new in the NPU + GenAI ecosystem
ONNX Runtime 1.20 Adds AMD XDNA 3 Support
The latest ONNX Runtime release brings native execution provider support for AMD Ryzen AI Max series chips, with up to 40% throughput improvement over the previous NPU EP implementation.
Qualcomm AI Hub Expands Model Library to 200+ Models
Qualcomm's AI Hub now offers over 200 pre-optimized models for Snapdragon X NPU, including Phi-3.5 Mini, Stable Diffusion XL, and Whisper Large v3 — all ready for Hexagon deployment.
Intel OpenVINO 2025.1 Improves NPU Throughput by 35%
OpenVINO 2025.1 ships with a revamped NPU compiler backend delivering significant latency and throughput gains on Core Ultra 200V series, particularly for transformer-based models.
Join the Community
Connect with NPU developers around the world
Discord Server
Real-time discussions, help channels for each platform, and weekly Q&A sessions with NPU engineers.
Join Discord →GitHub Organization
Open-source NPU tools, model optimization scripts, and community benchmark contributions. PRs welcome.
View on GitHub →Weekly Newsletter
SDK release notes, curated tutorials, and NPU hardware news — delivered every Friday.
Subscribe →Developer Forum
Searchable Q&A, code snippets, and long-form technical discussions indexed for future reference.
Browse Forum →