Govur University Logo
--> --> --> -->
...

When using PagedAttention to manage KV cache memory, what is the primary cause of internal fragmentation that the system prevents by partitioning memory into non-contiguous blocks?



The primary cause of internal fragmentation in traditional KV cache management is the requirement for contiguous memory allocation. In a standard setup, the system allocates a fixed, large, and sequential buffer for the maximum possible sequence length to ensure the model has enough room to store key and value states as the sequence grows. Bec....

Log in to view the answer



Redundant Elements