Govur University Logo
--> --> --> -->
...

When transitioning a neural network layer from FP32 to BF16, why is the numerical stability of the training process generally better preserved compared to a transition to standard FP16?



Neural network training involves floating-point numbers which consist of a sign bit, an exponent, and a fraction or mantissa. The exponent determines the range of numbers a format can represent, while the mantissa determines the precision or the density of representable numbers. FP32 uses 32 bits, with an 8-bit exponent and a 23-bit mantissa. Standard FP16 uses 16 bits, with a 5-bit exponent and a 10-bit mantissa. Becau....

Log in to view the answer



Redundant Elements