Resource Hub

Curated SDKs, models, guides, tools, and papers for NPU-accelerated generative AI development.

⚙️
SDK

ONNX Runtime

Cross-platform inference engine with NPU execution providers for AMD XDNA (DirectML), Intel NPU, and Qualcomm Hexagon. The primary runtime for portable NPU deployment.

🔷
SDK

Intel OpenVINO Toolkit

Intel's open-source inference toolkit optimized for Intel NPU. Supports model conversion from PyTorch, TensorFlow, and ONNX with automatic NPU acceleration via the AUTO device plugin.

🔴
SDK

AMD Ryzen AI SDK

AMD's developer SDK for Ryzen AI series NPUs. Provides Vitis AI integration, ONNX Runtime DirectML extensions, and profiling tools targeting XDNA and XDNA 2 architectures.

🌐
SDK

Qualcomm AI Engine (QNN SDK)

The Qualcomm Neural Network SDK for Snapdragon NPU (Hexagon DSP). Supports model conversion, runtime execution, and hardware-specific optimizations for Snapdragon X Elite and X Plus.

🪟
SDK

DirectML

Microsoft's low-level ML API for Windows hardware acceleration. DirectML is the backbone of NPU support in ONNX Runtime and Windows AI APIs — essential reading for Windows NPU developers.

🤖
SDK

Windows AI APIs (WinML + Phi Silica)

Windows built-in AI APIs for NPU-accelerated inference, including Phi Silica (Microsoft's on-device LLM), OCR, and image description APIs. Available in Windows 11 24H2+ with Copilot+ PCs.

🦙
Model

LLaMA 3.2 (NPU-Optimized)

Meta's LLaMA 3.2 1B and 3B variants, pre-quantized to INT4 with ONNX export, ready for deployment via ONNX Runtime on AMD XDNA and Intel NPU. Available via Hugging Face GGUF and ONNX formats.

💎
Model

Phi-3.5 Mini INT4

Microsoft's Phi-3.5 Mini language model, pre-optimized with Olive for NPU targets. At 3.8B parameters with INT4 quantization, it achieves excellent quality-speed tradeoffs on all supported NPUs.

🎤
Model

Whisper Large v3 (NPU ONNX)

OpenAI Whisper Large v3 exported to ONNX with INT8 quantization for NPU-accelerated real-time speech recognition. Includes a ready-to-run Python demo with DirectML execution provider.

🎨
Model

Stable Diffusion ONNX (NPU)

Stable Diffusion XL Turbo and SDXL-Lightning pipelines exported to ONNX with DirectML NPU acceleration. Achieves 1-4 step inference on Copilot+ PC hardware with competitive quality.

📖
Guide

Getting Started: Your First NPU Inference

Step-by-step guide from environment setup to running your first model on NPU hardware. Covers Windows 11 setup, driver installation, ONNX Runtime configuration, and a working Phi-3.5 demo.

📐
Guide

INT4 Quantization with Olive: Full Pipeline

Complete walkthrough for quantizing any HuggingFace LLM to INT4 using Microsoft Olive, with NPU-specific calibration, accuracy evaluation, and ONNX export ready for deployment.

🔧
Guide

Profiling NPU Workloads on Windows

How to use AMD uProf, Intel VTune, and Windows Performance Analyzer to profile NPU utilization, identify bottlenecks, and optimize your inference pipeline.

📊
Tool

NPU Benchmark

Universal NPU performance testing tool that benchmarks AMD XDNA, Intel NPU, and Qualcomm Hexagon with a unified score. Real AI workloads, transparent methodology, global leaderboard.

🫒
Tool

Microsoft Olive

Hardware-aware model optimization toolchain for NPU targets. Automates quantization, pruning, and ONNX export pipelines with NPU-specific optimizations. Open source on GitHub.

🧪
Tool

Netron — ONNX Model Visualizer

Browser-based visualizer for ONNX, TensorFlow, and CoreML models. Essential for inspecting model graphs, verifying NPU-compatible operator sets, and debugging export issues.

📄
Paper

LLM Inference on Mobile NPUs: Survey 2024

Comprehensive survey of LLM inference techniques optimized for mobile and edge NPUs, covering quantization strategies, memory bandwidth constraints, and deployment frameworks across leading platforms.

📄
Paper

AWQ: Activation-aware Weight Quantization for LLM

MIT SONG Lab's AWQ paper — the quantization method behind most efficient INT4 NPU deployments. Key reading for understanding why INT4 quantization works well for on-device LLM inference.

Add a Resource

Know a great SDK, model, guide, or tool that belongs here? Submit it to the community resource hub. We review and publish within 48 hours.