Microsoft and its hardware partners recently launched its Copilot+ PCs, powered by Arm CPUs with built-in neural processing units. They’re an interesting redirection from the previous mainstream x64 platforms, focused initially on Qualcomm’s Snapdragon X Arm processors and running the latest builds of Microsoft’s Windows on Arm. Buy one now, and it’s already running the 24H2 build of Windows 11, at least a couple of months before 24H2 reaches other hardware.
Out of the box, the Copilot+ is a fast PC, with all the features we’ve come to expect from a modern laptop. Battery life is excellent, and Arm-native benchmarks are as good as, or in some cases better than, most Intel or AMD-based hardware. They even give Apple’s M2 and M3 Arm processors a run for their money. That makes them ideal for most common development tasks using Visual Studio and Visual Studio Code. Both have Arm64 builds, so they don’t need to run through the added complexity that comes with Windows On Arm’s Prism emulation layer.
Arm PCs for Arm development
With GitHub or other version control system to manage code, developers working on Arm versions of applications can quickly clone a repository, set up a new branch, build, test, and make local changes before pushing their branch to the main repository ready to use pull requests to merge any changes. This approach should speed up developing Arm versions of existing applications, with capable hardware now part of the software development life cycle.
To be honest, that’s not much of a change from any of the earlier Windows On Arm hardware. If that’s all you need, this new generation of hardware simply brings a wider set of sources. If you have a purchasing agreement with Dell, HP, or Lenovo, you can quickly add Arm hardware to your fleet and you’re not locked into using Microsoft’s Surface.
The most interesting feature of the new devices is the built-in neural processing unit (NPU). Offering at least 40 TOPs of additional compute capability, the NPU brings advanced local inference capabilities to PCs, supporting small language models and other machine learning features. Microsoft is initially showcasing these with a live captioning tool and a selection of different real-time video filters in the device camera processing path. (The planned Recall AI indexing tool is being redeveloped to address security concerns.)
Build your own AI on AI hardware
The bundled AI apps are interesting and potentially useful, but perhaps they are better thought of as pointers to the capabilities of the hardware. As always, Microsoft relies on its developers to deliver more complex applications that can push the hardware to its limits. That’s what the Copilot Runtime is about, with support for the ONNX inference runtime and, if not in the shipping Windows release, a version of its DirectML inferencing API for Copilot+ PCs and their Qualcomm NPU.
Although DirectML support would simplify building and running AI applications, Microsoft has already started shipping some of the necessary tools to build your own AI applications. Don’t expect it to be easy though, as many pieces are still missing, leaving AI development workflow hard to implement.
Where do you start? The obvious place is the AI Toolkit for Visual Studio Code. It’s designed to help you try out and tune small language models that can run on PCs and laptops, using CPU, GPU, and NPU. The latest builds support Arm64, so you can install the AI Toolkit and Visual Studio Code on your development devices.
Working with AI Toolkit for Visual Studio
Installation is quick, using the built-in Marketplace tools. If you’re planning on building AI applications, it’s worth installing both the Python and C# tools, as well as tools for connecting to GitHub or other source code repositories. Other useful features to add include Azure support and the necessary extensions to work with the Windows Subsystem for Linux (WSL).
Once installed, you can use AI Toolkit to evaluate a library of small language models that are intended to run on PCs and edge hardware. Five are currently available: four different versions of Microsoft’s own Phi-3 and an instance of Mistral 7b. They all download locally, and you can use AI Toolkit’s model playground to experiment with context instructions and user prompts.
Unfortunately, the model playground doesn’t use the NPU, so you can’t get a feel for how the model will run on the NPU. Even so, it’s good to experiment with developing the context for your application and see how the model responds to user inputs. It would be nice to have a way to build a fuller-featured application around the model—for example, implementing Prompt Flow or a similar AI orchestration tool to experiment with grounding your small language model in your own data.
Don’t expect to be able to fine-tune a model on a Copilot+ PC. They meet most of the requirements, with support for the correct Arm64 WSL builds of Ubuntu, but the Qualcomm hardware doesn’t include an Nvidia GPU. Its NPU is designed for inference only, so it doesn’t provide the capabilities needed by fine-tuning algorithms.
That doesn’t stop you from using an Arm device as part of a fine-tuning workflow, as it can still be used with a cloud-hosted virtual machine that has access to a whole or fractional GPU. Both Microsoft Dev Box and GitHub Codespaces have GPU-enabled virtual machine options, though these can be expensive if you’re running a large job. Alternatively, you can use a PC with an Nvidia GPU if you’re working with confidential data.
Once you have a model you’re happy with, you can start to build it into an application. This is where there’s a big hole in the Copilot+ PC AI development workflow, as you can’t go directly from AI Toolkit to code editing. Instead, start by finding the hidden directory that holds the local copy of the model you’ve been testing (or download a tuned version from your fine-tuning service of choice), set up an ONNX runtime that supports the PC’s NPU, and use that to start building and testing code.
Building an AI runtime for Qualcomm NPUs
Although you could build an Arm ONNX environment from source, all the pieces you need are already available, so all you have to do is assemble your own runtime environment. AI Toolkit does include a basic web server endpoint for a loaded model, and you can use this with tools like Postman to see how it works with REST inputs and outputs, as if you were using it in a web application.
If you prefer to build your own code, there is an Arm64 build of Python 3 for Windows, as well as a prebuilt version of the ONNX execution provider for Qualcomm’s QNN NPUs. This should allow you to build and test Python code from within Visual Studio Code once you’ve validated your model using CPU inference inside AI Toolkit. Although it’s not an ideal approach, it does give you a route to using a Copilot+ PC as your AI development environment. You could even use this with the Python version of Microsoft’s Semantic Kernel AI agent orchestration framework.
C# developers aren’t left out. There’s a .NET build of the QNN ONNX tool available on NuGet, so you can quickly take local models and include them in your code. You can use AI Toolkit and Python to validate models before embedding them in .NET applications.
It’s important to understand the limitations of the QNN ONNX tool. It’s only designed for quantized models, and that requires ensuring that any models you use are quantized to use 8-bit or 16-bit integers. You should check the documentation before using an off-the-shelf model to see if you need to make any changes before including it in your applications.
So close, but yet so far
Although the Copilot+ PC platform (and the associated Copilot Runtime) shows a lot of promise, the toolchain is still fragmented. As it stands, it’s hard to go from model to code to application without having to step out of your IDE. However, it’s possible to see how a future release of the AI Toolkit for Visual Studio Code can bundle the QNN ONNX runtimes, as well as make them available to use through DirectML for .NET application development.
That future release needs to be sooner rather than later, as devices are already in developers’ hands. Getting AI inference onto local devices is an important step in reducing the load on Azure data centers.
Yes, the current state of Arm64 AI development on Windows is disappointing, but that’s more because it’s possible to see what it could be, not because of a lack of tools. Many necessary elements are here; what’s needed is a way to bundle them to give us an end-to-end AI application development platform so we can get the most out of the hardware.
For now, it might be best to stick with the Copilot Runtime and the built-in Phi-Silica model with its ready-to-use APIs. After all, I’ve bought one of the new Arm-powered Surface laptops and want to see it fulfill its promise as the AI development hardware I’ve been hoping to use. Hopefully, Microsoft (and Qualcomm) will fill the gaps and give me the NPU coding experience I want.
Copyright © 2024 IDG Communications, Inc.