Govur University Logo
--> --> --> -->
...

In a pipeline parallel training setup, what is the specific purpose of 'micro-batching' in minimizing the time GPUs spend in a stalled, waiting state?



In pipeline parallelism, a large batch of training data is divided into smaller units called micro-batches to improve hardware utilization. If a single large batch were processed as one unit, each stage of the pipeline—represented by a specific GPU—would have to wait for the previous stage to finish its entire computation before starting work. This results i....

Log in to view the answer



Redundant Elements