The specific goal of pipelining the gradient update phase with the forward pass computation is to maximize hardware utilization by eliminating idle time, a strategy known as overlapping computation and communication. In multi-GPU training, a training iteration typically follows a sequential cycle: first, a forward pass calculates the network output; second, a backward pass calculates gradients; and third, an opti....
Log in to view the answer