Nomenclature in accident investigation literature varies considerably, so for our purposes, we will view root cause analysis as a structured process aimed at determining how an undesired incident occurred, what caused it and what actions are needed to prevent its recurrence. Thus, the purpose of the investigation is to clearly identify what happened, determine the causal factors that led up to the incident (the how), understand why they occurred and make changes such that a recurrence is prevented or made much less likely.
The term 'root' in root cause analysis hints at the purpose of the investigation, namely to get to the fundamental causes behind the incident so that, when these are closed off, the likelihood of this particular event and potentially many others stemming from these causes will be much reduced.
When investigating incidents, we can identify two broad types of causes:
The first of these are called the immediate or proximate causes. These relate to the causes that led directly to the incident and, if eliminated or modified, would have prevented it.
The second are called root causes and refer to the factors that contributed to or led to the immediate or proximate causes. Root causes are often organisational in nature. Again, if they were eliminated, then the incident would have been prevented.
Finally, one may identify what are called contributory factors, events or conditions that may have contributed to the occurrence of the incident but, if eliminated or modified, would not by themselves have prevented it.
There are many published approaches to root cause analysis. For our purposes we will adopt the following model:
1. Preserve the site and evidence as far as possible. In other cases, identify and collect whatever data is available.
2. Determine the expertise required to conduct the investigation and set up the team, which will:
3. Define clearly the incident to be investigated
4. Gather information, conduct interviews etc. and produce a narrative
5. Identify the facts from the narrative, set them out clearly and generate a time line.
6. Identify and agree the causal factors (direct cause, contributory factors and root cause.
7. Produce a report and recommendations
8. Implement the recommendations
Within this framework a wide range of techniques are available to help in the analysis and derivation of the causal factors. To illustrate, some aspects of a root cause analysis fragment from an incident are set out below.
On 10 June 1995 the Panamanian passenger ship Royal Majesty grounded on the Rose and Crown Shoal about 16 km (10 miles) east of Nantucket Island, Massachusetts and about 27 km from where the watch officers thought the vessel was. The Royal Majesty, with 1,509 passengers and crew members onboard, was en route from St Georges, Bermuda to Boston, Massachusetts. There were no deaths or injuries, but damage to the vessel and lost revenue was estimated at about $7 million.
About an hour after leaving St Georges, the ship's global positioning system (GPS) antenna cable had become partly disconnected, causing the GPS to switch to dead reckoning mode. Nobody noticed. The auto-pilot continued to react to the information derived from the GPS. Thus, it did not detect or allow for the course of the vessel, caused by wind, current and sea conditions. The fault with the GPS, and the fact that the vessel was not in the position indicated by the integrated bridge navigational system, remained unnoticed by the watch officers during the 34 hours prior to the grounding.
The autopilot was also connected to a Loran C radio navigation unit that worked perfectly throughout the trip. When the GPS failed, it set a status bit indicating that the data it supplied was invalid and switched to dead reckoning mode. However, the autopilot failed to recognise the invalid data bit and, rather than switching to the Loran C device, continued to use the dead reckoning data supplied by the GPS which made no allowance for wind, tides or sea conditions.
Good watch keeping practice required that the officer of watch check the GPS position with the Loran C hourly; this was not done at all until the moment of the grounding.
Had the ship been in the correct position entering the Boston traffic lane, then buoys should have been observed: the BA buoy at the entrance to the lane and later the BB buoy. Coincidently, on the course followed by the Royal Majesty a wreck buoy, designated AR, appeared and was confused with the BA buoy. Visual identification was precluded because of glare. The ship never encountered the BB buoy, although the second officer misinformed the master that he had seen it. Subsequently, the port lookout reported to the second officer the sighting of yellow lights on the port side and just over an hour and a half later, the same lookout reported seeing blue and white water dead ahead. Some 10 minutes later the vessel grounded.
The ship was also equipped with an echo sounder and associated alarms, but it was discovered that the echo sounder was not turned on at the time of the accident. Standard procedure dictated that the echo sounder alarm be set so the alarm would activate when the depth of water below the keel was 3 m or less. Had the echo sounder been engaged, the alarm would have sounded some 40 minutes before the grounding.
Based on the incident narrative, a list of facts would be assembled. Table 1 identifies some of the facts collected about the incident.
Once the table of facts has been drawn up, a timeline can be produced showing what each of the actors was doing during the run up to the incident. Figure 2 shows a fragment of the timeline corresponding to the facts set out in Table 1. The full timeline extends over a number of pages and includes other parties.
The next step is to develop the causal model. Many techniques are available to determine the causes, including fault trees, event and causal factor trees, causal factor modelling and others. Figure 3 shows the top level analysis of this incident using the causal factor modelling. Here A is said to cause B, all other matters being the same, only if A did not happen, then neither would B. The arrow head in the figure points from the cause to the effect.
Once the analysis has been completed, the investigation team needs to identify the immediate causes of the incident as well as the root causes and contributory factors. These are summarised below.
- Disconnection of the GPS from the integrated navigation system shortly after departure from Bermuda resulting in the integrated navigation system using incorrect data for course setting.
- Failure to set the echo sounding alarm to 3 m after leaving Bermuda.
- Failure of the watch crew to monitor the status of the GPS, sole reliance on the position-fix alarm for warning of deviation from the vessel's intended course, failure to use independent sources of evidence to check the position of the vessel regularly, failure to react to cues from the lookouts in relation to shallow water and the appearance of unexpected navigation lights, and failure to act upon the non-appearance of the second channel buoy.
- Inadequate vessel maintenance and upgrade standards that did not require power or data feeds (for example, the GPS antenna link cable) to be shielded from potential mechanical damage.
- Watch keeping skills demonstrably lacking and no apparent procedures in place, or enforced, to ensure that regular duties were carried out and recorded. The duties were poorly specified in the Royal Majesty's 'Bridge Procedures Guide' and in the Line's Circular 9 'Duties of the Officer on Watch.'
- Inadequate crew training in the technical capabilities and limitations of the integrated bridge system.
- Over-reliance of the watch standing officers on the correct operation of the automated position display of the navigation and command system.
- The configuration of the integrated bridge system, which neither recognised nor allowed for the fact that the GPS had switched to dead reckoning mode; its design did not adequately incorporate human factors engineering.
- The remoteness of the GPS receiver and the short duration of the aural alarm that sounded when it switched to the dead reckoning mode, contributed to the watch keepers' failure to notice the change.
- Interface specifications between the auto-pilot and GPS equipment not compatible.
The National Transport Safety Board (NTSB) carried out the investigation of this incident and published its findings in June 1995. Its conclusions about the probable causes of the grounding of the Royal Majesty were not set out as described here.
The NTSB commented that modern navigation aids can fail, sometimes without the operator noticing. A fundamental rule of safe navigation is always to check the primary method of navigation by reference to an independent source, such as radio aids, astronavigation, visual fixing, use of the echo sounder etc. The NTSB also noted that special care is needed in this regard when making a landfall, as was the Royal Majesty when it grounded.
Based on this analysis, recommendations would need to be made to address the root causes and contributory factors.
Reflecting on root cause analysis
Although the process outlined above may seem simple in concept, conducting an investigation is generally far from straightforward. Firstly, the body commissioning the inquiry needs to give clear guidance on how to undertake the investigation and root cause analysis. It needs to define clearly the investigation process along with the nomenclature and recommended methods to be used by the investigators.
Guidance also needs to be provided on the types of incident that will be subject to this form of investigation. Setting up the team and securing their time for the investigation are often not easy. Determining the aetiology of the accident and collecting evidence pose difficulties and require skills in interviewing and interpretation.
When the information is available, a clear narrative needs to be written explaining what happened from which the facts of the case can be extracted. Determining the causes then requires a clear view of causation, as well as an understanding of the technological, psychological and organisational factors that underpin causation and influence events, as well as the barriers and defences that should be in place to prevent the incident arising again.
Identifying the direct cause is often where many investigations stop. However, direct causes are symptomatic of underlying problems which need to be uncovered through the identification of contributing factors and root causes. Once these causes have been identified, they need to be clearly written up and change recommendations made and put in place. The investigation process must eschew apportioning blame; this is best left to the courts. Finally, just deciding to conduct an investigation brings with it commitment to cost and freeing up the necessary technical expertise, often in scarce supply, and a determination to act on the recommendations.
From the perspective of those who eventually bear the cost of incidents, ensuring that investigative procedures such as root cause analysis are in place will provide added assurance that organisations will learn from their mistakes and become more resilient in the face of potential hazards.
- Roger Shaw is Head of Risk Management for City & Hackney Teaching Primary Care Trust. Email: firstname.lastname@example.org.