Govur University Logo
--> --> --> -->
...

What is the core function of model quantization in an inference pipeline regarding the trade-off between network precision and hardware memory bandwidth?



The core function of model quantization is to reduce the bit-width of numerical values used to represent a neural network's parameters, typically moving from high-precision formats like 32-bit floating point numbers to lower-precision formats like 8-bit integers. A neural network's inference speed is often bottlenecked by hardware memory bandwidth, which is the rate at which data can be transferred fr....

Log in to view the answer



Redundant Elements