Posts | ZML - Model to Metal

Introducing ZML/v2

March 24, 2026

ZML is an inference stack built close to the hardware. It lowers models directly onto NVIDIA, AMD, TPU, and Trainium targets from a single codebase, without depending on and suffering from the Python-heavy runtime layers that most of the ecosystem is built around.

Read full article...

compilers / inference / performance