Dennard scaling, which predicted that transistor density and performance would increase exponentially while power density remained constant, broke down in the mid-2000s. This breakdown has had profound implications for GPU architecture, as it became increasingly difficult to improve performance without exceeding thermal limits. The key challenge is that as transistors shrank, supply voltage could not be reduced proportionally due to limitations in threshold voltage scaling, leading to increased power density.
One major implication of Dennard scaling breakdown is the "power wall." As transistor density increases, the static power consumption (leakage current) becomes a significant contributor to total power consumption. This is because as transistors shrink, the gate oxide becomes thinner, increasing gate leakage. At the same time, subthreshold leakage also increases due to reduced threshold voltages. This increased power consumption limits the number of transistors that can be active simultaneously, hindering performance scaling. For GPUs, which rely on massive parallelism, the power wall poses a major challenge, as it limits the ability to utilize all of the available processing cores at their maximum frequency.
Another implication is the "memory wall." As the number of cores on a GPU increases, the demand for memory bandwidth also increases. However, memory bandwidth scaling has not kept pace with core scaling, leading to a bottleneck. This is because the speed of DRAM chips is limited by physical constraints. The memory wall limits the performance of memory-bound applications, such as those involving large datasets or complex data structures.
Voltage/frequency scaling is a common technique used to mitigate the power wall. It involves reducing the supply voltage and clock frequency of the GPU to reduce power consumption. Power consumption is approximately proportional to the square of the voltage and linearly proportional to the frequency. By reducing both voltage and frequency, it is possible to significantly reduce power consumption. However, voltage/frequency scaling also reduces performance. For example, if the voltage is reduced by 10% and the frequency is reduced by 10%, the power consumption is reduced by approximately 27%, but the performance is also reduced by 10%. Dynamic voltage and frequency scaling (DVFS) allows the voltage and frequency to be adjusted dynamically based on the workload. When the GPU is running a light workload, the voltage and frequency can be reduced to save power. When the GPU is running a heavy workload, the voltage and frequency can be increased to maximize performance.
Heterogeneous integration is another technique used to mitigate the power wall and the memory wall. It involves integrating different types of processing units and memory c....
Log in to view the answer