There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning method, over Adam in deep learning optimization tasks.
However, Shampoo’s drawbacks include additional hyperparameters and computational overhead when compared to Adam, which only updates running averages of…
SOAP: Improving and Stabilizing Shampoo using Adam.
N vyas, D morwani, R zhao, I shapira…
Contribute to nikhilvyas/SOAP development by creating an account on GitHub.
Leave a reply