Redundancy – more is not always safer

Safety analyses are critical for the design and implementation of safe systems, specifically in industries where failure may result in severe effects (aerospace, rail, medical, oil & gas).
In many cases the ‘magical’ cure to safety is redundancy, for example:

  • Backup power supplies (UPS and generators)
  • Communication networks with redundant paths
  • Redundant brake systems
  • Redundant engines

There are however some pitfalls to adding many redundancies:

Cost:

More units means a higher procurement cost (attributed to CAPEX).
Moreover, more unit failures will take place in the field, therefore the maintenance cost will also soar (attributed to OPEX).

Common Cause failures:

Common Cause failures are failure modes which are not independent from other item failures.
examples:

  • If a power supply shorts, it can cause other systems to fail unless it is protected by a fuse
  • If an engine fails and the internal damage is not contained by the casing, objects are ejected from the engine and may cause the failure of nearby systems

One way to reduce common causes is to use different types of redundant units (as discussed in NASA’s “Fault Tree Handbook with Aerospace Applications” [1]).

Following are two interesting examples from the field of aviation:

1. Boeing KC-46 Pegasus [2]

In July 2014, Boeing recorded a $272 million pre-tax charge to cover a redesign of the Boeing KC-46 Pegasus wiring. It was found that 5-10% of the wiring bundles did not have sufficient separation distance, or were not properly shielded to meet an Air Force requirement for double or triple-redundant wiring for some mission systems.
The rationale for demanding separation between redundant wiring is in order to prevent a common cause failure. Suppose that a loose object severs a wire, if the redundant wire is nearby, it has a high probability of being severed as well.

 

2. ETOPS

ETOPS stands for Extended Operations. In the past, 4 engine airplanes were used for long range flights, but today two-engine airplanes can conduct similar flights. Part of the reason is the high reliability of modern power plants.
Following is an analysis using BQR’s Fault Tree Analysis (FTA) software.

Probability for engine in-flight shutdown (IFSD) is about 2·10-6 per flight hour for a modern PW4000 power plant [3] (for comparison, in 1952 the IFSD probability per flight hour was 2.5·10-4 for piston engines [4]).
FAA demand a probability for catastrophic failure smaller than 10-9 per flight hour for a commuter type airplane [5].
Assuming that a single engine is sufficient during flight, a simple analysis shows that more engines yield a safer flight:

Number of engines Probability for all engine failure per flight hour
1 2·10-6
1 4·10-12
1 8·10-18
1 1.6·10-23

 

However, if common cause exists, the situation is different:
While engine failure tests are conducted (see movie [6]) to ensure that debris do not endanger nearby systems, consider the case where there is a 0.015%* chance that a single engine failure will create a common cause catastrophic failure. Following are fault tree diagrams for the cases of 2, 3 and 4 engines:

2 Engines

Screenshot of Fault Tree Analysis software

3 Engines

Screenshot of Fault Tree Analysis software

4 Engines

Screenshot of Fault Tree Analysis software

By comparing the three FTA figures, it is found that 2 engines are safer than 3 or 4 engines, and the 4 engine case does not meet the FAA requirements!
This non-trivial result shows the importance of accounting for common cause failures in Fault Tree Analyses. This analysis is required as part of the civil airborne systems safety assessment [7].

 

*The value of 0.015% was chosen for demonstration purposes only.

BQR provides software and professional services for safety analysis, FMEA / FMECA and complex FTA including common causes and nested common causes (common cause events whose sub-trees include additional common causes).

References:

[1] Fault Tree Handbook with Aerospace Applications, NASA, (2002)
[2] Wikipedia, https://en.wikipedia.org/wiki/Boeing_KC-46_Pegasus
[3] ICAO, EDTO workshop, https://www.icao.int/SAM/Documents/2014-EDTO/EDTO%20Module%20%204%20%E2%80%93%20Aircraft%20certification%20considerations.pdf
[4] Engines Turn or Passengers Swim: A Case Study of How ETOPS Improved Safety and Economics in Aviation, J. Angelo DeSantis (2013), https://scholar.smu.edu/cgi/viewcontent.cgi?article=1305&context=jalc
[5] SYSTEM SAFETY ANALYSIS AND ASSESSMENT FOR PART 23 AIRPLANES, FAA, (2011), https://www.faa.gov/documentLibrary/media/Advisory_Circular/AC%2023.1309-1E.pdf
[6] A380 Blade Off Test, Youtube (2006), https://www.youtube.com/watch?v=j973645y5AA
[7] ARP 4761, GUIDELINES AND METHODS FOR CONDUCTING THE SAFETY ASSESSMENT PROCESS ON CIVIL AIRBORNE SYSTEMS AND EQUIPMENT