ZML - Model to Metal

ZML is a production inference stack, purpose-built to decouple AI workloads from proprietary hardware.

Any model, many hardwares, one codebase, peak performance.

Compiled directly to NVIDIA, AMD, TPU, Trainium for peak hardware performance on any accelerator. No rewriting.

Python runtimes? Non, merci.
Hidden state? Non, merci.
Abstractions overhead? Non, merci.

We build inference from model to the metal because this is where performance lies.

Explicit over implicit.
Composability over systems.
Predicatibility over magic.