ZML - Model to Metal

Introducing zml-smi

Mon, 30 Mar 2026 16:00:00 +0100

zml-smi is a universal diagnostic and monitoring tool for GPUs, TPUs and NPUs. It provides real-time insights into the performance and health of your hardware.

It is a mix between nvidia-smi and nvtop.

It transparently supports all the platforms ZML supports. That is NVIDIA, AMD, Google TPU and AWS Trainium devices. It will be extended to support more platforms in the future as ZML continues to expand its hardware support.

Introducing ZML/v2

Tue, 24 Mar 2026 16:00:00 +0100

ZML is an inference stack built close to the hardware. It lowers models directly onto NVIDIA, AMD, TPU, and Trainium targets from a single codebase, without depending on and suffering from the Python-heavy runtime layers that most of the ecosystem is built around.

The guiding idea behind zml/v1 was simplicity: give ZML a model and its weights, and the system would take care of compilation, placement, and execution for you. That made the first version approachable and effective for standard deployments, but it also baked too much behavior into implicit global state. As the project pushed into partial compilation, custom passes, sharding, quantization, and more backend-specific execution paths, those implicit shortcuts became constraints. ZML/v2 is the rewrite that makes those concepts explicit: platform ownership, compilation, memory, IO, and placement are now first-class, so advanced use cases can be expressed directly instead of forced through workarounds.

About

Mon, 01 Jan 0001 00:00:00 +0000

The ZML Blog is a technical publication about running modern AI systems in production.

We write about:

inference systems
compiler architecture
hardware portability
deployment ergonomics
observability and operating discipline

The editorial bias is simple: practical speed, maintainable systems, and fewer hidden compromises.