Root Cause Analysis – Challenger Explosion

Download the PDF
The Challenger space shuttle made history repeatedly. Its virgin flight in April of 1983 witnessed the first spacewalk during a space shuttle mission. Two months later, Sally Ride became the first American woman into space aboard the vessel; throughout its subsequent missions, the Challenger also carried the first African-American, Canadian, and Dutchman into space.

The temperature was below freezing on the morning of January 28, 1986, when the Challenger prepared for its tenth launch. This too was a momentous occasion, even for a space shuttle well versed in historic firsts: as the first voyage of the new Teachers in Space Project, it was the first flight on which someone who was not a trained astronaut was permitted to travel into space. It is difficult to overstate the significance of NASA’s choice to allow a schoolteacher to participate in the mission, and the excitement it generated in the press and in the public. Chosen from over 11,000 other candidates, Christa McAuliffe had planned to give a lesson from space, and schools all around the country were going to watch the liftoff. This was supposed to get the public invested in the space program, while signaling that space travel was becoming more and more normal, though no less awe-inspiring, and perhaps even more accessible. Accordingly, there was more media present than usual on the morning of the launch, and millions of people around the world were watching live.

NASA, for its part, had every reason to be confident about this mission. At the time, Challenger was the most-flown orbiter in NASA’s fleet.

Sadly, the Challenger made history in a different way that morning, entering the books as NASA’s first space shuttle disaster. A mere 73 seconds after liftoff, the space shuttle broke apart over the Atlantic, taking the lives of all seven crewmembers. This, for NASA, was a catastrophic event on several levels: seven colleagues lost their lives; NASA lost a multi-million dollar shuttle; the prestige that NASA was hoping to cultivate with the extra press was obliterated with the space shuttle (news coverage was so extensive that an estimated 85% of Americans had heard about the disaster within the hour that it happened).

A Short History of Space Incidents

Human curiosity is a beautiful thing. We will live with unknowns for exactly as long as technology limits our ability to explore, to discover, and no longer. First we conquered the seas, then the sky, then we tackled the great beyond: space, the final frontier. No such advance has come without a cost, however, and space flight has certainly been no exception.

A total of 18 astronauts (12 of which were Americans) have perished while on space flight missions, and many more have lost their lives while preparing or testing for them. Nobody takes such losses lightly, and after each space shuttle incident there have been extensive investigations into their causes and ways to prevent their recurrence.

NASA tends to learn from its setbacks. After its first astronaut fatality in a training exercise, for example, zero-zero ejection seats that would have saved Theodore Freeman during the 1964 jet crash that took his life were created.

At the same time, the lessons learned from some accidents led to changes that cause others. One striking example of such unintended consequences follows astronaut Gus Grissom from an accident that nearly took his life to a modified design that contributed directly to his death. The shuttle used on the Liberty Bell 7 flight was designed with a new explosive hatch release that would allow the astronauts to get out of the spacecraft quickly if they needed to, as a safety measure. This safety measure nearly killed Grissom when the hatch accidentally blew open after a water landing and flooded the vessel. To prevent this, NASA opted for an inward-opening hatch design for Apollo 1, which trapped the astronauts inside the vessel when it caught fire on a launch pad test, killing the entire crew, including Grissom (click here to see our root cause analysis of the fire aboard Apollo 1).

The Challenger would not be the last fatal incident for NASA, either. In 2003, the space shuttle Columbia broke up on re-entry over Texas due to a piece of foam insulation that broke off during launch. For a more detailed root cause analysis of this incident, click here. For now, let’s turn to our investigation of what happened on the Challenger.

Root Cause Analysis- The Challenger Incident’s Stats

The first step in any root cause analysis approach is to define the problem by asking four critical questions:

What is the problem?

In investigating any incident, big or small, the process of specifying the problem is likely to elicit multiple responses. At this stage in the analysis, all potential problems are written down for later evaluation. In this example, we will begin by identifying the loss of all seven crewmembers and the loss of the space shuttle as the major problems.

When did it happen?

In order to measure change, root cause analysis specifies as precise a time as possible for a given incident. Here, the Challenger broke apart two minutes into its tenth mission, at 11:39:12 AM EST on January 28, 1986.

Where did it happen?

Root cause analysis also requires capturing as specific location as possible when defining the problem; in particular, emphasis is placed on specifying the location in which solutions can be enacted. Thus, while other Cause Maps related to space disasters technically describe one incident that occurs in two locations (with the space shuttle and at command center), the emphasis remains on the location that can be controlled: Command center. In this case, the space shuttle broke apart just after launch; the Challenger facility (STS 51-L) at Cape Canaveral is thus captured as the location for the incident.

How did it impact the goals?

Root cause analysis involves being as specific to a given organization’s goals as it is in defining the problem. All organizations have multiple goals in common. It is good business to ensure the safety of employees and the public, to remain within budget, to achieve the intended purpose of the organization, to avoid damaging equipment, and to do it all as efficiently as possible. Here, these elements are understood in terms of safety, vehicle, equipment, mission, and labor goals.

Safety is clearly an important goal to NASA’s spaceflight missions: a manned mission can never be considered a success unless all of the astronauts who are launched into space come back home. In this case, the safety goal was affected by the deaths of all seven astronauts aboard the Challenger. In addition, space flight is extremely expensive business. The Challenger was built to complete multiple missions (indeed, it had already completed ten), and its construction involved a gigantic investment in time and money; losing the space shuttle meant losing billions of dollars of equipment as well as the possibility of reusing the shuttle in other missions. The total loss of the shuttle can therefore be considered to have impacted the vehicle goal. Since the shuttle disintegrated just after loss, the mission was a complete loss; this affected the mission goal. As a result of the disaster, the solid rocket booster joints had to be redesigned, impacting the equipment goal. Finally, investigation and testing in the wake of the disaster affected the labor goal.

cm-challenger-outline

Collecting Causes- The Next Step in Root Cause analysis for the Challenger

The next step in root cause analysis breaks the incident down into a chain of cause and effect, with each cause documented on the cause map. That way we build a nice clean Cause Map for more efficient root cause analysis.

In this case, the safety goal was affected because seven astronauts lost their lives. This is the first cause-and-effect relationship in the analysis.

cm-columbia-1

The analysis continues by asking, “Why?”, moving to the right of the cause-effect relationship above. The astronauts’ deaths were due to the loss of Challenger, which was caused by an external tank explosion: the space shuttle broke apart because gasses in the external fuel tank mixed, exploded, and tore the space shuttle apart.

cm-challenger-2

The external fuel tank exploded after a rocket booster came loose and ruptured the tank. Why? Because hot gasses and flames leaking out of the rocket boosters burned a hole into the external fuel tank and the piece that held the rocket boosters onto the shuttle. Why were hot gases leaking out of the rocket boosters? Because a seal around the O-ring (a piece of the rocket boosters) failed.

cm-challenger-3

Why did the primary O-ring fail?

o-ring

There are three reasons, the first of which was structural. There was a fundamental design flaw in the joint that engineers had grown accustomed to and had learned to live with. Although the boosters were not designed to work this way, it was not uncommon for the booster casing to balloon under the stress of ignition, causing the metal parts of the casing to bend away from each other, creating gaps through which hot gases could leak. In prior instances, the primary O-ring would shift out of its groove and form a seal. This process is called extrusion, and the hot gases escaping is called blow-by. The evidence of previous issues with o-ring erosion and blow-by can be captured directly on the Cause Map.

The more time it takes for extrusion to occur, however, the greater the damage to the O-rings. This brings us to the most immediate reason for the O-ring failure: the low temperatures at launch caused the O-rings to harden. On the morning of the launch, the cold weather lengthened the time of extrusion and hardened the O-ring, which could not form a seal in time.

Finally, the primary O-ring blow-by (the escape of hot gas) occurred because the O-ring hardened and did not fully seal at the low temperatures, and because the decision was made to launch in low temperatures, despite the fact that the vehicle was never certified to operate in temperatures that low. This decision was found to have been made because of ineffective launch commit criteria.

cm-challenger-4

Root Cause Analysis Step 3: Select the Best Solutions (Reduce the Risk)

Once the Cause Map is built to a sufficient level of detail with supporting evidence, the solutions component of root cause analysis can begin. The Cause Map is used to identify all possible solutions for given issue so that the best among them can be selected. Root Cause analysis thus makes it easier to identify many possible solutions from the detailed Cause Map; root cause analysis thus facilitates identifying more multiple workable solutions than can more oversimplified high-level analysis.

In root cause analysis, solutions can be documented directly on the Cause Map, and are typically placed in a green box directly above the cause that the solution controls. At this stage, all solutions are considered and documented on the Cause Map.

cm-challenger-solutions

After the analysis is complete, the best solutions are selected based on their impact on the organization’s goals. Shown below are the action items implemented as a result of the Challenger disaster.

cm-challenger-actions

Every issue has its causes, and should be worked to a sufficient level of detail to prevent the incident or to reduce the risk of the incident occurring to an acceptable level. This is why solutions and work processes at a coffee shop are not as thorough or detailed as an airline or nuclear power facility. The risk or impact to the goals dictates how effective the solutions will be. Lower risk incidents will have relatively lower detail investigations while significantly high risk to an organization’s goals requires a much more thorough analysis.

Click on “Download PDF” above to download a PDF showing the Root Cause Analysis Investigation.

Aftermath

Every issue has its causes, and should be worked to a sufficient level of detail to prevent the incident or to reduce the risk of the incident occurring to an acceptable level. This is why solutions and work processes at a coffee shop are not as thorough or detailed as an airline or nuclear power facility. The risk or impact to the goals dictates how effective the solutions will be. Lower risk incidents will have relatively lower detail investigations while significantly high risk to an organization’s goals requires a much more thorough analysis.

As for the Teachers in Space Program, it was supplanted by the The Educator Astronaut Project. In August of 2007, Barbara Morgan, the backup teacher for the Challenger flight, became the first teacher in space on the orbiter Endeavour STS-118 mission.

Root Cause Analysis Improves Problem Solving Skills

Grounded on the basics of the cause-effect principle, root cause analysis can be applied consistently to mundane as well as catastrophic or high-risk issues. The steps of root cause analysis are the same, but the level of detail will vary depending on the specific incident or issue under consideration. Focusing on the basics of the cause-effect principle make the Cause Mapping approach to root cause analysis a simple and effective method for investigating safety, environmental, compliance, customer, production, equipment or service issues.

Bring Cause Mapping training to your organization

Schedule a workshop at your location to train your team on how to lead, facilitate, and participate in a root cause analysis investigation

REQUEST A QUOTE