That’s insane. You’re basically talking about a custom implementation of pytorch autograd right? How long did it take?
Roughly one year. No GPU code however for that project as the target library is CPU-only anyway so not really comparable to PyTorch (and PyTorch is more than just the autodiff), but there was lots of SIMD vectorization. Yeah you could train a neural network on CPU with it if you want, and the expression template stuff I talked about would be somewhat equivalent to PyTorch’s operator fusion, but the target use is more quant finance code.