Supporting AI innovation from ideation to results.
AI frameworks optimized for rapid research iteration do not seamlessly transition into the infrastructure required for large-scale training. This fragmented pipeline creates redundant engineering effort and slows iteration cycles. Zigrad provides a path to performance that preserves the natural development workflow researchers prefer; bridging research and engineering. Using Zigrad you can:
- Experiment using high-level, PyTorch-like abstractions
- Gradually opt into fine-grained control and performance optimizations
- Access low-level primitives and assert control--without switching frameworks, code translation, or building complex extensions.
- Quickly transition research to high performance training
zigrad-demo.mp4
Fast
2.5x+ speedup over a compiled PyTorch model on Apple Silicon, 1.5x on x86. Expect similar performance gains across more architectures and platforms as MKL/CUDA support improves and Zigrad's ML graph compiler is operational.
*Tensorflow excluded for scaling purposes (too slow).
Flexible Zigrad supports research workflows with high level abstractions for rapid prototyping, and integrations like Tensorboard and Mujoco. Zigrad supports the transition of research code to training infrastructure.
Zigrad supports research through,
- Easy to use torch-like ergonomics
- A general purpose automatic differentiation system for n-dimensional data
- Eager execution and dynamic computation graph by default
- Computation graph tracing and visualization
- A design that naturally allows for custom differentiable operations
Zigrad supports engineering through,
- An architecture that enables deep control and customization through opt-in complexity,
- Offering flexible tradeoffs between performance characteristics like latency vs throughput
- Hardware-aware optimizations tailored to specific use cases and system requirements
- Fine-grained memory management and allocation control
- Cross-platform compatibility without compromising performance
- A streamlined design that avoids abstraction layers or build systems that hinder aggressive optimizations
An example of tracing the computation graph generated by a fully connected neural network for MNIST.
- Input: Batch of images 28x28 pixel samples.
- Flatten:
28x28 -> 784
- FC1: Linear layer
784 -> 128
- ReLU
- FC2: Linear layer
128 -> 64
- ReLU
- FC3: Linear layer
64 -> 10
- Output: Value for each of the 10 classes
We did not have to use Zigrad's modules to write this network at all, as Zigrad is backed by a capable autograd engine. Even when using the autograd backend to dynamically construct the same neural network Zigrad can still trace the graph and render it.
Note: Since the graph is generated from the autograd information, we set the labels for the nodes by naming the tensors for the sake of the diagram.
Only dependency is a BLAS library.
On linux (or intel mac) you have some options,
- MKL (recommended for best performance)
- See https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-download.html
- Reccommend a system installation for simplicity although this can work with
conda
for example, just make sure you adjust the library paths as necessary.
- OpenBLAS
- See https://github.com/OpenMathLib/OpenBLAS/wiki/Precompiled-installation-packages
- Likely available through your package manager as
libopenblas-dev
oropenblas-devel
- Nothing :)
The examples/
directory has some standalone templates you can take and modify, the zon files are pinned to commit hashes.
Hello world example shows how to run a backward pass using the GraphManager.
Note that in this very simple example, we do not need the GraphManager
and the script could be simplified but this is designed to get you familiar with the workflow.
git clone https://github.com/Marco-Christiani/zigrad/
cd zigrad/examples/hello-world
zig build run
Run the mnist demo
cd zigrad/examples/mnist
make help
make
A lot is planned and hoping for support from the Zig community so we can accomplish some of the more ambitious goals.
- More comprehensive MKL and CUDA support (in progress)
- Support for popular formats like ONNX and ggml.
- Standardized benchmarking procedures (always an ongoing effort)
- Lazy tensors
- Static graph optimization
- Dynamic graph compiler
- MLIR
- ZML translation for inference
- Apache TVM integration. Github Homepage
- More examples like LLMs, physics and robotic control, etc.
- Documentation. As the API stabilizes more documentation will be added. For now, the examples are designed to be quickstart guides.
- Effort has been directed towards performant primitives, not many layer types have been implemented
- e.g. conv, pooling, etc are test implementations for verification, they are slow and unoptimized, I would not use them
- Join the discord and into the dev channels
- Any open issue is available for development, just leave a comment mentioning your interest and I can provide support to help get you started if necessary
- Otherwise, please open an issue first, before working on a PR
- If you are interested in contributing but do not know where to start then open an issue or leave a comment