How does the principle of diversity in resilient system design mitigate risks that redundancy alone cannot fully address?
Redundancy in resilient system design involves using multiple identical components or systems to perform the same function. Its primary purpose is to ensure continued operation if one component fails; another identical one can immediately take its place, providing fault tolerance against individual failures. However, redundancy alone has a significant limitation: it is vulnerable to common-mode failures. A common-mode failure occurs when a single event, design flaw, or vulnerability affects all identical redundant components simultaneously or similarly, causing them all to fail. For example, if all redundant servers run the exact same operating system and application software, a critical bug or security vulnerability in that specific software will affect every server, rendering the redundancy ineffective. Similarly, if all backup power generators use identical parts from the same manufacturing batch, a latent defect in that batch could cause all generators to fail simultaneously.