Space Shuttle Columbia | ThinkReliability, Case Studies

The Columbia space shuttle was a seasoned veteran of space travel. The first space-worthy space shuttle in NASA’s orbital fleet, Columbia was first launched in April of 1981, and had successfully completed 27 missions by 2003. Colombia’s long running success, especially in the wake of the 1986 Challenger explosion signaled to some that human space travel was approaching the point at which it might be considered routine.

This notion came to a sudden end during the shuttle’s 28th mission (STS-107). During launch, a briefcase-sized piece of foam insulation broke off from the shuttle’s external tank and struck the left wing, damaging its thermal protection system, which protects the vehicle from the intense heat generated upon reentry.

Although such foam shedding was technically listed as a safety issue in the original shuttle design specifications, the same kind of foam insulation had been observed falling off on four previous flights and had caused no serious damage. Such occurrences were given the name of “foam shedding,” normalizing the process even further. By Columbia’s twenty eighth launch, engineers had come to see such foam debris as inevitable, presenting at worst an acceptable risk.

After spending 16 days in space, Columbia broke apart when reentering Earth’s atmosphere on February 1, 2003, resulting in the total destruction of the orbiter and killing all seven astronauts aboard.

The Columbia disaster serves as a prime example of the value of root cause analysis precisely because every aspect of the mission had seemed so routine. Given its successful prior 27 missions, Columbia benefitted from an experienced command crew at the control center, and from a space vehicle that had been tested and appeared reliable. Even the errant piece of foam that struck Columbia’s wing during launch had been detected and was deemed not to have been a safety concern well before reentry was attempted. The Columbia disaster thus came as a terrible surprise to those involved, and carried dramatic consequences for NASA’s future missions. In investigating the incident methodologically, root cause analysis illuminates not only what indications of danger were missed, but also why they were overlooked, while proposing solutions to ensure that the same mistakes are never re-made.

Root Cause Analysis Example

The disintegration and loss of the space shuttle Columbia on re-entry serves as an example of how root cause analysis can be applied to a specific incident. As with any incident, root cause analysis involves three steps:

Define the problem
Conduct the analysis
Identify the best solutions

Each step will be discussed below in relation to the Columbia disaster.

Root Cause Analysis Step 1. Define the Problem

The first step in any root cause analysis approach is to define the problem by asking four critical questions:

What is the problem?

While the loss of all seven Columbia crew members upon re-entry may seem like the obvious problem, root cause analysis requires probing deeper to arrive at complex chain of causality with multiple problem sources.

The more closely we examine the Columbia disaster, the more elements we can identify that cumulated in a tragic loss of life. The cause map’s level of complexity, however, is dictated by how useful each problem element is to determining solutions. Thus, in addition the deaths of the seven crew members, root cause analysis of the Columbia disaster considers the space shuttle’s damaged left wing, which was hit by a loose piece of foam during launch, to be a problem, because this problem can lead to a viable solution consistent with industry goals. The fact that the space program exists at all could, in a way, also be considered to have been a contributing problem, but since it leads to no useful solutions, it is not included in the Cause Map.

Our cause mapping professionals note all useful problem ideas, which are analyzed later in the root cause analysis exercise.

When did it happen?

In order to measure change, root cause analysis requires specifying a date. Here, the incident occurred on February 1, 2003, at around 8:59 AM EST.

Where did it happen?

The space shuttle Columbia disintegrated upon re-entry over north-central Texas, only 16 minutes prior to its scheduled landing at the Kennedy Space Center in Cape Canaveral, Florida. In this case, the ‘problem’ was thus technically in two locations: with the space shuttle, and at the command center. In order to list a specific location for the incident, root cause analysis considers the Columbia (ST-107) facility at Cape Canaveral to be the most useful and precise location to focus on, both as the origin of the problems and as the proper location to enact solutions.

How did it impact the goals?

Just as root cause analysis of any incident considers multiple problems, it also asks how those problems affected multiple goals.

In the case of the Columbia disaster, the safety goal was affected by the deaths of all seven astronauts aboard the shuttle. The total loss of the shuttle can be considered to have impacted the vehicle goal. Additionally, even if it had not led to the disintegration of the space shuttle, the damage from the foam strike had an impact on the equipment goal. As a result of the disaster, the shuttle flight schedule was significantly disrupted, which affected the mission goal. Finally, investigation and cleanup efforts had an effect on the labor goal.

cm-columbia-outline

Root Cause Analysis Step 2: Identify the Causes (The Analysis)

Root cause analysis of any incident takes dissecting the problem and the goals as the starting point. In the second step, the incident is broken down into causes, which are captured on the Cause Map.

First, we write down the goals that were affected as defined in the problem outline. Next, we can ask why an incident occurred and trace its causes backwards from there.

For the loss of Columbia, we begin with the human tragedy: the safety goal was impacted because all seven crew members lost their lives. This is the first cause-and-effect relationship in our analysis.

cm-columbia-1

The analysis continues by asking, “Why?” and moves to the right of the cause-and-effect relationship above.

The loss of the astronauts resulted from the orbiter breaking up on reentry. Why? Because of the disintegration of the left wing. Why? Because hot gases were inside the wing.

cm-columbia-2

There were two conditions necessary for the hot gases to get inside the wing. First, there was a hole in the left wing. Additionally, the shuttle was traveling at high velocity. We will look at each of these causes in turn.

cm-columbia-3

The hole in the left wing was caused by piece of foam striking the wing. Video and photographic evidence has shown that the foam fell off of an external tank and struck the wing during the ascent (i.e., prior to the shuttle’s re-entry). Pre-existing defects were found in the subsequent investigation.

cm-columbia-4

The shuttle was traveling at high velocity because it had begun to re-enter the earth’s atmosphere. The decision was made to allow re-entry because the risk of damage was considered acceptable: the program had a long history of foam strikes, and the extent of the damage to the wing was unknown.

cm-columbia-5

Root Cause Analysis Step 3: Select the Best Solutions (Reduce the Risk)

Once the Cause Map is built to a sufficient level of detail with supporting evidence, it can be used to develop solutions that will prevent an incident altogether or reduce the risk of an incident occurring to an acceptable level. The level of detail implicit in root cause analysis makes it easier to identify multiple possible solutions to a given problem than oversimplified higher level analysis permits. The level of detail necessary to investigate an incident will vary depending on the level of risk associated with the industry; therefore, solutions and work processes at a restaurant are less detailed than those necessary for higher risk industries like nuclear power facilities, which require a much more thorough analysis. In all cases, the risk associated or the impact on the goals dictates how effective each solution will be.

The risk or impact on the goals dictates how effective the solutions will be.

The Cause Map thus identifies all possible solutions for a given issue so that the best can be selected. These solutions are then documented directly on the Cause Map, and are typically placed in a green box directly above the cause that the solution controls.

At this stage of the root cause analysis, all solutions are considered and placed on the Cause Map.

cm-columbia-solutions

After the analysis is complete, the best solutions are selected based on the organization’s goals. Shown below are the action items implemented as a result of this incident.

cm-columbia-actions

Space Shuttle Columbia Aftermath

The Columbia disaster held far-reaching consequences for NASA and the future of space travel, beginning with the suspension of the space shuttle program in Columbia’s aftermath. The disaster also shook NASA out of the complacency it had fallen into, prompting a serious reevaluation of their notions of acceptable levels of risk. The Columbia Accident Investigation Board (CAIB) released a multi-volume incident report that included a searing criticism of the minimization of safety issues at NASA over the years, citing “reliance on past success as a substitute for sound engineering practices.”

Aside from the shift in perspective, the Columbia disaster also provoked changes in design and procedure. The shuttle’s external tank was redesigned, more camera views were placed on the shuttle during launch to better monitor the foam shedding, and in 2005 astronauts tested a new procedure to scan the shuttle for broken tiles using cameras and a robotic arm.

Finally, in memory of the crew, seven asteroids orbiting the sun between Mars and Jupiter now bear their names.

Root Cause Analysis Improves Problem Solving Skills

Grounded on the basics of the cause-effect principle, root cause analysis can be applied consistently to mundane as well as catastrophic or high-risk issues. The cause Mapping approach to root cause analysis is thus a simple yet powerful way to investigate all kinds of issues, be they safety, environmental, compliance, customer, production, equipment or service related.

Want to see more NASA-related cause maps? Check out our root cause analysis of the fire aboard Apollo 1 or the 1986 Challenger explosion.

Download the PDF

Bring Cause Mapping® Root Cause Analysis training to your site

Schedule a workshop at your location to train your team on how to lead, facilitate, and participate in a root cause analysis investigation.

REQUEST A QUOTE

Root Cause Analysis: The Space Shuttle Columbia Disaster