Neural network training involves floating-point numbers which consist of a sign bit, an exponent, and a fraction or mantissa. The exponent determines the range of numbers a format can represent, while the mantissa determines the precision or the density of representable numbers. FP32 uses 32 bits, with an 8-bit exponent and a 23-bit mantissa. Standard FP16 uses 16 bits, with a 5-bit exponent and a 10-bit mantissa. Becau....
Log in to view the answer