- Cause Mapping
- Tools & Resources
- About Us
COME SEE US for the next Cause Mapping Root Cause Analysis Public Workshop on JAN 29-31 in HOUSTON TX
Considering how complicated it is, nuclear energy actually boils down to a very simple concept: getting a turbine to turn in a circle, like a hamster wheel. The wheel powers a generator, and from that motion we get electricity (wind power operates on the same principle). Generating the energy to make the wheel turn…now, that’s where it gets complicated. Wind energy accomplishes this by using the wind; nuclear energy, like power plants that run on fossil fuels, accomplishes this by generating heat, which turns water into steam, whose pressure turns the generator. The difference is how that heat is generated. Nuclear power generates heat by a process called nuclear fission, meaning by splitting uranium atoms (more specifically, splitting the nucleus in two).
Conceived after World War II, nuclear power came of age in the 1970s. In the United States, there are 104 commercial reactors (69 pressurized water reactors and 35 boiling water reactors), more than in any other nation, operating at 65 nuclear power plants. All of these power plants were built in 1974 or before.
In a Pressurized water reactors, high pressure keeps water in the reactor vessel (a steel container that holds the reactor’s core, moderator, coolant, and control rods) from boiling (though it remains super hot, reaching degrees of 300 Celsius). This water then goes to the steam generator, composed of many small pipes. The heat from the pipes turns a second supply of water (that remains isolated from the reactor vessel water) into steam, which turns the turbine, which powers the generator. The reactor vessel water is then pumped back to the reactor and reheated; the steam that powers the turbine is cooled in a condenser, which turns the steam back into water, which then goes back to the steam generator. The condenser uses a supply of cold sea (or lake) water running through a series of pipes to cool the reactor water, and the two water supplies never come in contact.
Unlike energy created by burning fossil fuels (like coal, oil, or natural gas), nuclear energy is clean. In the United States, we get about 20% of our energy from it. But, since uranium is highly radioactive (which is why it can create so much radiation and kinetic energy), it can be dangerous business. It is what Root Cause Analysis terms a “high risk business;” in order to function normally, it involves accepting and working with a much higher degree of risk than other businesses–coffee shops, bookstores, schools–have to deal with. The problem with high risk organizations, as Root Cause Analysis has shown again and again, is that when all functions normally, people hardly notice them…but when something goes wrong, it does so in a big, scary way. As such, safety is of paramount importance.
Nuclear power plant safety procedures are so important, in fact, that disaster need not even occur to generate a “catastrophic incident” for the industry. In the case of the Davis Besse Nuclear Power Plant, nothing actually happened–no nuclear meltdown, no three-eyed fish. Yet the plant’s safety culture had become so dangerously lax that it very nearly did permit something very bad to happen; risk of a radiation release was raised to what constitutes an unacceptable level of risk for the industry. And this near miss was such a big deal that it ranks among the top 5 worst nuclear incidents since 1979 according to the United States Nuclear Regulatory Commission (NRC).
Located in Oak Harbor, Ohio, on the shores of Lake Erie, the Davis-Besse Nuclear Power Station is home to a single pressurized water reactor.
On February 16, 2002, Davis Besse was shut down for refueling and routine inspections, an operation that included checking for cracks in the reactor head nozzles. They found more than just cracks: operators caught wind of a major problem that could have easily turned catastrophic had refueling been scheduled just a few weeks later. On March 5th, a hole with a surface area of 20-30 square inches (about the size of a football) was found in the reactor pressure vessel head (the shell that holds coolant layer around reactor core). The cavity had eaten through the carbon steel reactor pressure vessel head to the thin internal liner of stainless steel. It was just 3/16ths of an inch from going all the way through – at which point it could easily have ruptured due to the heavy (1 ton per square inch) pressure inside.
That thin stainless steel covering was the only thing standing in the way of the collapse of the containment structure and widespread radioactive contamination, posing a health risk to thousands in the vicinity and contaminating Lake Erie as well. That might not seem catastrophically dramatic, unless you know that roughly 20% of the drinking water in the United States comes from Lake Erie.
It was a very, very close call, but the problem was caught in time to avert disaster–so what’s the big deal?
Glad you asked. As it turns out, the damage to the vessel head occurred over 6 years, meaning that two earlier inspections in 1998 and 2000 failed to catch the problem. Moreover, Davis Besse had had several issues in the past, and thus plenty of opportunities to create the kind of safety culture that would have caught the problem sooner. Finally, the problem discovered on March 5, 2002 was entirely preventable.
The extent of a problem in an organization’s safety culture is directly proportionate to the amount of risk involved in the industry. In theory, nobody should know this better than those in the industry. Not only does nuclear energy carry an inherently high risk, but in the United States, where the public remains wary of nuclear energy, negative publicity can be catastrophic for the industry. The glaring question, then, is this: being fully aware of the risks, how in the world could a preventable problem go undetected through multiple safety checks until Davis Besse found itself on the edge of a disaster?
This is an example of how the Cause Mapping process can be applied to a specific incident. In this case the Davis-Besse head corrosion is captured as an example of the Cause Mapping method. The three steps are 1) Define the problem, 2) Conduct the analysis and 3) Identify the best solutions. Each step will be discussed below.
All Root Cause Analysis investigations open by defining the incident as precisely as possible in a problem outline. Root Cause Analysis problem outlines break down the incident into its essential components by defining the incident’s location in time and space, and considering how it affected the goals of the organization that encountered the problem.
What was the problem? Root Cause Analysis conceptualizes a problem as anything that negatively affects the ideal state (the goals) of an organization. There are many ways of looking at it, and before launching an investigation one cannot objectively evaluate whether one perspective is more valid than another. While different people may say that the problem was steel wastage, the hole in the reactor head, or boric acid corrosion, at this point in the Root Cause Analysis all three ideas are noted in the problem outline to be evaluated later in the process.
To define when the incident occurred, Root Cause Analysis specifies the incident’s date and also anything that was different or unusual in this instance–what differentiates this incident from every other, normal day. The problem was discovered on March 5, 2002 (the date) during a refueling outage (the difference).
Addressing the question of where the incident occurred, the problem outline makes note of the physical location at minimum, as well as the unit equipment involved and the work or task underway at the time if possible. With the Davis Besse incident, we can specify all three pieces of information in the problem outline. The physical location was the Davis-Besse Nuclear Power Station in Oak Harbor, Ohio, Unit #1. The cavity was discovered during control rod drive mechanism (CRDM) nozzle inspections.
To complete the problem outline, it is important to think carefully about the affected organization’s goals that were adversely affected by the incident.
For commercial nuclear power, one of the overall goals is to maintain the integrity of the fission product barriers. This was not accomplished, adversely affecting the safety goal. Another negative impact on the safety goal was that the problem was rated as a “significant” precursor to core damage. This resulted in penalties, restitution and community service projects to the tune of $28 million.
Another industry goal involves incurring no damage to the vessel. In the case of Davis-Besse, the damage to the vessel resulted in $293 million worth in repairs and upgrades. This results in a negative impact on the materials and labor goal.
Finally, the customer service goal was affected due to the resulting reduced production of electricity (costing $348 million in purchasing electricity), as was the production goal, because the plant was closed for 2 years.
This is a lot of information, but the beauty of building a Cause Map to conduct Root Cause Analysis lies in its visual simplicity. All of this information fits neatly and clearly on the problem outline like so:
Now that we know what happened in broad strokes, Root Cause Analysis can address the question of why it happened. By building a Cause Map, our Root Cause Analysis is able to answer this question in detail, charting the complex chain of causes and effects that contributed to the incident. Breaking down the incident in this manner has the advantage of rendering it intelligible even for those who don’t remember enough chemistry to really know how nuclear power plants work in a technical sense, and by providing multiple opportunities to solve the problem along the chain.
In order to conduct the most thorough investigation possible, Root Cause Analysis begins with the effects and works its way backwards to the causes, by asking a simple yet critical question: Why? When we answer the question completely, we will then ask, “why?” again of each cause we identified and capture our responses to the right. Repeating the process until the incident is mapped in enough detail is what makes the Root Cause Analysis Cause Map a powerful tool that can find multiple viable solutions that can be put into action to prevent the incident from happening again.
A Root Cause Analysis investigation begins by writing down the goals that were identified in the Root Cause Analysis problem outline as having been adversely affected by the incident and then ask why they were affected. These are the first cause-and-effect relationships in the Root Cause Analysis.
Let’s start with the safety goal affected by the significant precursor to core damage — in other words, a big first step on the way to damage to the nuclear reactor’s core.
The core is the part of the nuclear reactor holding nuclear fuel components, where the nuclear reactions occur and generate heat. This is the part of that holds the dangerous materials; damaging the core means that they could be released. Core damage from overheating, for example, carries a familiar and terrifying name: nuclear meltdown.
Why, in this incident, were we well on the way to damaging the core? Because a principal fission product barrier (which prevent the release of fission products, which are the radioactive atomic fragments left after fissions) was lost. This itself was its own impact to the safety goals.
So, why was a principal fission product barrier lost? A barrier was lost because of the cavity through the entire reactor pressure vessel head–ie, the barrier. This is the football-sized hole that we talked about before; the reactor pressure vessel head was the barrier. On our Root Cause Analysis Cause Map, the information discussed above appears like so:
Let’s now turn to the other impacted goals. The customer service goal was affected by the reduced production in electricity due to the plant being closed for two years. This also affected the production goal. The plant was closed because of the damage to the vessel; this damage, in turn, was caused by the cavity through the vessel head, and affected the materials goal.
Once again, we have what looks like a confusing jumble of information. Once charted on the Cause Map, though, it all becomes a lot clearer.
All of the affected goals, be they safety, material, production, or customer service, lead us to the cavity through the entire reactor pressure vessel head.
Now, why was there a cavity through the head?
The cavity was caused by continued boric acid corrosion.
Nuclear power plants use boron as a neutron poison, to slow the rate at which fission occurs, by dissolving it in the coolant water as boric acid. Fission chain reactions are usually driven by the number of neutrons present; adding boron to the reactor coolant that circulates through the reactor slows the process down.
Boric acid had been dripping on to the reactor head, eating away at 70 pounds of carbon steel, creating a hole 6 inches deep in the shell that holds the coolant layer around the reactor core. Boric acid is highly corrosive to carbon steel—this is why it was able to eat away such a large hole. It is much less corrosive to stainless steel, which is why the liner held steady.
The boric acid corrosion occurred for several reasons: leaking coolant evaporated into a boric acid solution (we’ll talk about this later), the boric acid was not removed because it was not viewed as a safety concern, and there was inadequate boric acid corrosion control. In order to perform a complete Root Cause Analysis of the situation, each of these elements must be examined individually.
Why was boric acid corrosion control inadequate? Again, there are several reasons. First, old corrosion products were not completely removed (more on this later). Next, the corrosion rate was higher than expected. This is because less conservative corrosion rate data was used and because these rates were found with non-representative configurations. Finally, boric acid corrosion control was also inadequate because the corrosion was undetected.
Why was the corrosion undetected? Because early signs of corrosion were missed or ignored, and there was no full inspection of the reactor head. There was no full inspection of the reactor head because the problem was believed to be low-risk, and the inspection was difficult (the cavity was hard to see because of the covering on the vessel head): a modification that would have added openings to make inspection easier had been delayed and the accumulation of boric acid precluded inspection.
The old corrosion products (the boric acid deposits) were not completely removed because they were difficult to remove. Again, the modification to add openings to allow inspection (which would have also allowed cleaning) was delayed, and the deposits were very adherent. The removal was performed on a “best-effort” basis to attempt to minimize the dose of radiation that workers are exposed to. Finally, there was an acceptance of the boric acid accumulation because, again, the problem was believed to be low-risk.
Now we’ll step way back to the beginning and look at that leaking coolant. As we noted before in the Root Cause Analysis, borated water within the nuclear reactor vessel had leaked from its container. The leaking coolant evaporated into boric acid solution because there was a leak path; the boric acid solution was what ate the hole in the reactor head, and that was dangerous. The leak path existed because long standing leaks were not resolved, and/or because the leakage was undetected.
Long-standing leaks were not resolved because repairs were delayed until refueling. The delay, which had been approved by the Nuclear Regulatory Commission (NRC), occurred to minimize production impact (a working nuclear power plant can earn up to $1 million per day) and dose (there is a radiation dose that workers cannot exceed, and repairs mean exposure).
The leakage was undetected for several reasons. First, inspections were delayed until refueling. Second, leakage detection methods were ineffective: the leakage was not detected by leakage systems because it was below the minimum detection capability, the whole head was not inspected, and because it was masked by flange leakage. The flange leakage, in turn, occurred because the leaking flange was not repaired.
The leak path was a through-wall crack in control rod drive nozzle #3. This occurred because primary water stress corrosion cracking (SCC) went undetected.
Why, then, did the stress corrosion cracking occur? A crack was initiated due to exposure to high temperature primary water and tensile stress. The cracks then propagated because of tensile stress. The tensile stress was caused by the plant operating pressure and residual stress in the weld. The SCC was also aided by increased susceptibility to cracking, because Davis-Besse had a higher operating temperature than other plants, and because of some fabrication issues.
The cracks went undetected because they initiated earlier than expected: it was believed that the plant was too young for cracking. The cracks were also undetected because rather than searching for another cause the boric acid buildup was blamed on flange leakage (which is very common), and because of ineffective inspections.
The inspections were ineffective for several reasons: cracks entirely within the weld could not be detected by ultrasonic testing (UT), cracking was not considered a safety concern, cracks have a low growth rate and were considered unlikely to spread, and it was believed that looking for leakage was an effective way to find cracking.
Even more detail can be added to this Cause Map as the Root Cause Analysis continues. As with any investigation, the level of detail in the Root Cause Analysis is based on the way the incident affected the organization’s overall goals. As we move on to the solutions phase of the Root Cause Analysis, specific action items from Davis-Besse can be matched to specific causes on the detailed Cause Map.
Once the Cause Map is build to a sufficient level of detail with supporting evidence, we can begin to select solutions. The Cause Map is used to identify all the possible solutions for given issue so that the best among them can be selected and implemented. It is easier to identify many possible solutions from the detailed Cause Map than from the oversimplified high level analysis that begins and ends with “the cavity was caused by boric acid corrosion.”
As should be clear from the Root Cause Analysis, boric acid corrosion was a well-known issue in the industry. Solving the problem, then, must involve not only preventing boric acid corrosion itself, but also reducing the risk that it remain undetected for long enough to provoke an incident of this magnitude. As is often the case with Root Cause Analysis example incidents, an important solution involves better oversight. Simply put, there must be safety nets. Humans make mistakes; accepting this, high risk industries, be they NASA or a nuclear power plant, must ensure that mistakes are caught before they cause disaster.
The industry had long been aware of the damage that boric acid can cause to the metal parts of reactor coolant systems, and knew that any accumulation of boric acid should be removed. The design of the vessels themselves is proof. Reactor vessels are made of thick carbon steel and lined on the inside with a thin layer of stainless steel to protect against boric acid corrosion (stainless steel is much more resistant to borated water corrosion than carbon steel is). Multiple incidents highlighted the point. In the early 1970s, borated water caused similar damage to a reactor in Switzerland: the water dripped onto a reactor head and the water evaporated leaving behind boric acid crystals, which ate away a small indentation in the metal. In the 1980s, the NRC sent out at least 6 advisories to owners of pressurized water reactors regarding the issue; when this began to seem insufficient, the NRC mandated the institution of boric acid corrosion control programs. The Davis Besse plant itself was nearly fined in 1999 for a breach after workers replaced certain stainless steel components with carbon steel ones, which were damaged by boric acid. Thus, as was the case with the space shuttle Challenger’s O-ring issue that ultimately led to disaster, the problem had been observed and was known but had never been corrected.
The Root Cause Analysis Cause Map shows that the corrosion was undetected. This indicates a lapse in the power plant’s safety culture. Our Root Cause Analysis can thus suggest multiple changes to fix the problem on this level — and indeed, this is where the NRC placed the most focus on in the aftermath of the incident.
The Davis Besse Plant remained shut down until 2004, during which time a number of other design flaws and safety issues were found and addressed. On top of the $28 million in fines levied by the US Department of Justice, the NRC imposed its largest fine ever (more than $5 million) against FirstEnergy for the actions (or inactions) that contributed to the corrosion.
In the wake of the incident, a panel of experts from the NRC made 51 recommendations designed to prevent a similar event from happening again, including industry inspection requirements based on American Society of Mechanical Engineers (ASME) code, assessment and improvement of NRC procedures and processes, increased training, practices for monitoring reactor vessel and piping leaks, and improved guidance regarding stress corrosion cracking and boric acid corrosion.
The incident also served as a big warning to other nuclear power facilities to stay on their toes regarding this kind of leakage, and to the NRC to overhaul its oversight methods which Davis Besse revealed to have been much too lax.
Although in this case Davis Besse was a half-millimeter away from disaster, it survived and continues to operate.
The Cause Mapping method focuses on the basics of the cause-and-effect principle so that it can be applied consistently to day-to-day issues as well as catastrophic, high risk issues. The steps of Cause Mapping are the same, but the level of detail is different. Focusing on the basics of the cause-and-effect principle make the Cause Mapping approach to root cause analysis a simple and effective method for investigating safety, environmental, compliance, customer, production, equipment or service issues.
Schedule a workshop at your location to train your team on how to lead, facilitate, and participate in a root cause analysis investigation.