Common Cause Failures, more common than you may think
Critical systems are usually designed with high redundancy and fault tolerance in order to prevent critical failures. The biggest enemy of redundancy is Common Cause Failure (CCF).
CCF is defined as failures of multiple items, which would otherwise be considered independent of one another, resulting from a single cause .
CCF events are usually rare, but their effect may be severe. Therefore, common cause analysis is an important part of safety analysis, and is required in certain standards e.g. Railway safety .
CCFs are more common than you may think. Following are a few common cause events that appear in many systems:
- A power failure may cause shutdown of many electrical sub-systems. Although the sub-systems did not fail themselves, they are unable to fulfill their required functionality, and therefore should be considered as failed for the analysis.
- A failure of a network communication switch may prevent many sub-systems from sending / receiving critical information. This may render the sub-systems useless.
In simple cases it is possible to account for CCF using standard Fault Tree Analysis (FTA) gates, but in other cases, more complex analysis is required.
Consider a power supply that feeds a server and a network communication switch. The server and switch are required for the system operation. Failure of either of them will cause the system failure. Clearly, failure of the power supply will also cause a system failure, therefore the following simple fault tree can be used: