Track 3
20201112T153000
20201112T160000
Accelerate Distributed Stochastic Gradient Descent for Nonconvex Optimization with Momentum
ptimization with Momentum
DESCRIPTION:Workshop\n\nAccelerate Distributed Stochastic Gradient Descent
for Nonconvex Optimization with Momentum\n\nCong, liu\n\nMomentum method
has been used extensively in optimizers for deep learning. Recent studies
show that distributed training through K-step averaging has many nice prop
erties. We propose a momentum method for such model averaging approaches.
At each individual learner level traditional stochastic gradient is applie
d. At the meta-level (global learner level), one momentum term is applied
and we call it block momentum. We analyze the convergence and scaling prop
erties of such momentum methods. Our experimental results show that block
momentum not only accelerates training, but also achieves better results.\
