Davis Besse Reactor Head Corrosion
Cause Mapping Example:
This is an example of how the Cause Mapping process can be applied to a specific incident. In this case the Davis-Besse head corrosion is captured as an example of the Cause Mapping method. The three steps are 1) Define the problem, 2) Conduct the analysis and 3) Identify the best solutions. Each step will be discussed below.
Step 1. Define the Problem
The first step of the Cause Mapping approach is to define the problem by asking the four questions: What is the problem? When did it happen? Where did it happen? And how did it impact the goals? One person may say that the problem was steel wastage. Another person might say that the problem was the hole in the reactor head, and a third person could say that the problem was boric acid corrosion. We can write down these three “problems” on the first line. In the Cause Mapping methodology the facilitator anticipates that the group may disagree so all three responses are written down. There is no need to spend time debating the problem. The magnitude of this incident is defined by the impact to the goals.

The second question is the "When?" which is the date of the incident. When captures the timing of the issue and also has a line for what was different or unusual in this occurrence. The question of what was different is fundamental in any investigation. On the Davis-Besse issue we capture the date as March 7th, 2002 when the hole was discovered. The difference was that the plant was in a refueling outage.
In an investigation there can be several pieces of information that need to be captured when specifying the location. At a minimum the physical/geographic location and the process should be captured. The physical location is is where geographically the incident happened. Here it was the Davis-Besse Nuclear Power Station in Oak Harbor, Ohio, unit #1. The cavity was discovered during control rod drive mechanism (CRDM) nozzle inspections.
The next section is the impact to the overall goals. For commercial nuclear power, one of the overall goals is to maintain the integrity of the fission product barriers. Another impact to the safety goal is that the problem was rated as a "significant" precursor to core damage. This resulted in penalties, restitution and community service projects of $28 million. One of the other goals is to have no damage to the vessel. In the case of Davis-Besse, the damage to the vessel resulted in $293 million worth in repairs and upgrades. The other goals affected were the customer service goal, due to reduced production of electricity (costing $348 million in purchasing electricity) and production, because the plant was closed for 2 years.
Step 2. Identify the Causes (The Analysis)
The analysis step is where the incident is broken down into causes which are captured on the Cause Map. The Cause Map starts by writing down the goals that were affected as defined in problem outline. First we write down all the impacts to the goals, as discussed above. These are the first cause-and-effect relationships in the analysis.
The analysis can continue by asking Why questions and moving to the right of either of the cause-and-effect relationships above. In this example we’ll start with the precursor to core damage, which was caused by the loss of a principal fission product barrier (which was itself an impact to the safety goals. The next question is “Why did the loss of the principal fission product barrier occur?” The cavity through the entire reactor pressure vessel head (the barrier) resulted in the loss of a principal fission product barrier.
Now let's look at the other impacts to the goals. An impact to the customer service goal is the reduced production of electricity, which occurred because the plant was closed for two years, which was itself an impact to the production goal. The plant was closed because of the damage to the vessl, which was an impact to the materials goal, and which was due to the cavity through the head.
Now, why was there a cavity through the head? The cavity was due to continued boric acid corrosion, which occurred because leaking coolant evaporated into a boric acid solution (we'll talk about this later), and because the boric acid was not removed because it was not viewed as a safety concern, and because of inadequate boric acid corrosion control.
The boric acid corrosion control was inadequate because old corrosion products were not completely removed (more on this later, too), because the corrosion rate was higher than expected because less conservative corrosion rate data was used and because the rates were found with non-representative configurations. It was also inadequate because corrosion was undetected.
The corrosion was undetected because early signs of corrosion were missed or ignored, and there was no full inspection of the reactor head. There was no full inspection of the reactor head because the problem was believed to be low-risk, and the inspection was difficult, as the modification to add openings to allow inspection had been delayed and the accumulation of boric acid precluded inspection.
The old corrosion products were not completely removed because they were difficult to remove. Again, the modification to add openings to allow inspection (which would have also allowed cleaning) were delayed, and the deposits were very adherent. The removal was performed on a "best-effort" basis to attempt to minimize dose. And, there was an acceptance of the boric acid accumulation because, again, the problem was believed to be low-risk.
Now we'll step way back to the beginning and look at that leaking coolant I promised you we'd get back to. The leaking coolant evaporated into boric acid solution because there was a leak path, because long standing leaks were not resolved, and/or because the leakage was undetected. (We'll cover all these in more detail.)
Long-standing leaks were not resolved because repairs were delayed until refueling. The delay occurred because it was approved by the Nuclear Regulatory Commission (NRC), and to minimize production impact and dose. The leakage was undetected because inspections were delayed until refueling, leakage detection methods were ineffective because the leakage was not detected by leakage systems because it was below the minimum detection capability and because the whole head was not inspected, and because the leakage was masked by flange leakage, which occurred because the leaking flange was not repaired.
The leak path was a through-wall crack in control rod drive nozzle #3. This occurred because primary water stress corrosion cracking (SCC) occurred and went undetected.
The stress corrosion cracking occurred because a crack was initiated, due to exposure to high temperature primary water and tensile stress, and because the cracks propagated, due to tensile stress. The tensile stress was caused by the plant operating pressure and residual stress in the weld. The SCC was also aided by increased susceptibility to cracking, because Davis-Besse had a higher operating temperature than other plants, and because of some fabrication issues.
The cracks went undetected because they initiated earlier than expected. It was believed that the plant was too young for cracking. Additionally, the cracks were undetected because the boric acid buildup was blamed on flange leakage (which is very common) rather than searching for another cause, and because of ineffective inspections. The inspections were ineffective because cracks entirely within the weld could not be detected by ultrasonic testing (UT), cracking was not considered a safety concern, because cracks have a low growth rate and the cracks were considered unlikely to spread, and because it was believed that looking for leakage was effective to find cracking.
Even more detail can be added to this Cause Map as the analysis continues. As with any investigation the level of detail in the analysis is based on the impact of the incident on the organization’s overall goals. The specific action items from Davis-Besse can be matched to specific causes on the detailed Cause Map.
>br />
Step 3. Select the Best Solutions (Reduce the Risk)
Once the Cause Map is build to a sufficient level of detail with supporting evidence the solutions step can be started. The Cause Map is used to identify all the possible solutions for given issue so that the best solutions can be selected. It is easier to identify many possible solutions from the detailed Cause Map than the oversimplified high level analysis of "the cavity was caused by boric acid corrosion."
There are causes to every issue. The Davis-Besse problem at a high level has only one cause. At a more detailed level it has 7 causes, 19 causes, 60 causes and 145 causes (shown below). All of the levels of the Cause Map are accurate - some simply have more detail that others. An issue should be worked to a sufficient level of detail to prevent the incident, meaning to reduce the risk of the incident occurring to an acceptable level. This is why solutions and work processes at a coffee shop are not as thorough or detailed as an airline or nuclear power facility. The risk or impact to the goals dictates how effective the solutions should be. Lower risk incidents will have relatively lower detail investigations while significantly high risk to an organization’s goals requires a much more through analysis.
Cause Mapping Improves Problem Solving Skills
The Cause Mapping method focuses on the basics of the cause-and-effect principle so that it can be applied consistently to day-to-day issues as well as catastrophic, high risk issues. The steps of Cause Mapping are the same, but the level of detail is different. Focusing on the basics of the cause-and-effect principle make the Cause Mapping approach to root cause analysis a simple and effective method for investigating safety, environmental, compliance, customer, production, equipment or service issues.
Click on "Download PDF" above to download a PDF showing the high level Cause Map.