Converting weights from 32-bit floating point to 8-bit integers directly reduces the memory bandwidth requirement of an inference pipeline by a factor of four. Memory bandwidth refers to the rate at which data can be read from or written to the computer's memory, such as VRAM or RAM. A 32-bit floating point number occupies four bytes of memory, while an 8-bit integer oc....
Log in to view the answer