Redundancy – more is not always safer
Safety analyses are critical for the design and implementation of safe systems, specifically in industries where failure may result in severe effects (aerospace, rail, medical, oil & gas).
In many cases the ‘magical’ cure to safety is redundancy, for example:
- Backup power supplies (UPS and generators)
- Communication networks with redundant paths
- Redundant brake systems
- Redundant engines
There are however some pitfalls to adding many redundancies:
Cost:
More units means a higher procurement cost (attributed to CAPEX).
Moreover, more unit failures will take place in the field, therefore the maintenance cost will also soar (attributed to OPEX).
Common Cause failures:
Common Cause failures are failure modes which are not independent from other item failures.
examples:
- If a power supply shorts, it can cause other systems to fail unless it is protected by a fuse
- If an engine fails and the internal damage is not contained by the casing, objects are ejected from the engine and may cause the failure of nearby systems
One way to reduce common causes is to use different types of redundant units (as discussed in NASA’s “Fault Tree Handbook with Aerospace Applications” [1]).
Following are two interesting examples from the field of aviation:
1. Boeing KC-46 Pegasus [2]
In July 2014, Boeing recorded a $272 million pre-tax charge to cover a redesign of the Boeing KC-46 Pegasus wiring. It was found that 5-10% of the wiring bundles did not have sufficient separation distance, or were not properly shielded to meet an Air Force requirement for double or triple-redundant wiring for some mission systems.
The rationale for demanding separation between redundant wiring is in order to prevent a common cause failure. Suppose that a loose object severs a wire, if the redundant wire is nearby, it has a high probability of being severed as well.
2. ETOPS
ETOPS stands for Extended Operations. In the past, 4 engine airplanes were used for long range flights, but today two-engine airplanes can conduct similar flights. Part of the reason is the high reliability of modern power plants.
Following is an analysis using BQR’s Fault Tree Analysis (FTA) software.
Probability for engine in-flight shutdown (IFSD) is about 2·10-6 per flight hour for a modern PW4000 power plant [3] (for comparison, in 1952 the IFSD probability per flight hour was 2.5·10-4 for piston engines [4]).
FAA demand a probability for catastrophic failure smaller than 10-9 per flight hour for a commuter type airplane [5].
Assuming that a single engine is sufficient during flight, a simple analysis shows that more engines yield a safer flight:
Number of engines | Probability for all engine failure per flight hour |
1 | 2·10-6 |
1 | 4·10-12 |
1 | 8·10-18 |
1 | 1.6·10-23 |
However, if common cause exists, the situation is different:
While engine failure tests are conducted (see movie [6]) to ensure that debris do not endanger nearby systems, consider the case where there is a 0.015%* chance that a single engine failure will create a common cause catastrophic failure. Following are fault tree diagrams for the cases of 2, 3 and 4 engines: