Mars Climate Orbiter


Download Cause MapMars Climate Orbiter - Cause Map

Background:

The Mars Climate Orbiter was launched atop a Delta II launch vehicle on December 11, 1998. The mission of the Mars Climate Orbiter was to function as an interplanetary satellite and service as a communication relay for the Mars Planetary Lander. Working together, the Mars Climate Orbiter and Mars Planetary Lander were planned to map Mar's surface, profile the structure of the atmosphere, try to detect surface ice reservoirs and dig for traces of water beneath the surface.





Nine and a half months after launch, the Mars Climate Orbiter was scheduled to begin the process of establishing an orbit around Mars. The plan was to use a technique called aerobraking to reduce the velocity and slowly move the orbiter from a 14 hour orbit to a 2 hour orbit. On September 23, the $125 million dollar Mars Climate Orbiter was lost during the attempt to establish orbit around Mars.

The Cause Mapping Process

The failure of the Mars Climate Orbiter can be used as an example of how the Cause Mapping process can be applied to a specific incident. The three steps in the Cause Mapping Process are 1) Define the problem, 2) Conduct the analysis and 3) Identify the best solutions. Each step will be discussed below.

Step 1. Define the Problem

The first step of the Cause Mapping approach is to define the problem by asking four questions: What is the problem? When did it happen? Where did it happen? And how did it impact the goals? The answers to these questions are documented in an Outline. When asked "What is the problem?" people may give different answers. In this example, people may say the Mars Climate Orbiter satellite was destroyed or the mission was a failure. There is no need to debate which is the right answer because all the answers are relevant. Write down all the different answers to the question on the first line of the Outline so the investigation can continue without wasting time debating.

In answering the second question "When did this problem happen?", the date and time of the incident should be documented as well as any differences that were present. The question of what is different is fundamental to any investigation and can provide clues as to why the failures occurred. In this example, the date is September 23, 1999 and the time is 9:04am. The most significant difference was that this was a new design.

In an investigation there can be several pieces of information that need to be captured when specifying the location. At a minimum the physical/geographic location and the process location should be captured. For this example, the physical/geographic location of the orbiter accident is Mar's upper atmosphere. The process location would be orbital insertion. In some cases, there may also be a business location, where the name of the company and the business is listed. The Outline can be modified as needed to document any relevant location information.

The final question is to define the impact to the overall goals. The overall goals reflect the ideal state of an organization. The list of overall goals can be modified to be representative of the goals of any company, but the goals should represent the goals of the whole company and shouldn't change for each division within a company. Additionally, this section should be used to record potential impacts to the goal in addition to actual impacts to the goals.

In the case of Mars Climate Orbiter, the equipment failed prior to accomplishing any of the goals of the mission, which would be listed under the Production and Schedule goal. Under the Material and Labor Goal, the cost of the orbiter would be listed and the cost of the orbital included. Additionally, complete failure of the Mars Climate Orbiter project would result in a negative impact on public support for NASA so that would be listed under the Customer Service goal.

The final piece of information documented on the Outline is the frequency of the incident. This indicates how often this has occurred or is likely to occur. The frequency is a multiplier that helps us to understand the total magnitude of an issue.

Below is a completed Outline for the Mars Climate Orbiter example.




Step 2. Conduct the analysis

During the analysis step, the main question asked is "Why did this happen?". To answer the question, the incident is broken down into causes which are captured on the Cause Map. To begin creating a Cause Map, start by writing down one of the goals that was impacted. When the Cause Map is completed, all the goals with their associated causes will be listed on the Cause Map, but it is usually simplest to start building the Cause Map with a single goal. The goal is listed in a red box and the first cause will be the impact to the goal that was listed on the Outline. A Cause Map can be read left to right by putting the words "was caused by" between the boxes.

For this example, the Production and Schedule goal was selected to begin building the Cause Map. This is what the first cause-and-effect relationship would look like.



The next cause on the Cause Map is added by asking "why" the mission goals failed. Keep asking "why" questions and adding boxes to the right. This is what the Cause Map could look like after asking three more "why" questions.



While the above Cause Map is accurate, it is a simplified analysis. Detail can be added to the Cause Map in a number of ways. The Cause Map can continue to be built left to right by asking why questions and adding causes. Causes can be added in between existing Causes by taking smaller steps when asking why questions. Causes can also be added vertically. Many effects required more than one cause to happen. To determine if additional causes should be added vertically, ask "Is this cause sufficient (on its own) to produce the effect? If the answer is no, more causes should be added. To check if a cause is documented appropriately, ask "Is the cause necessary to produce the effect?" If the answer is yes, the cause is documented correctly. If the answer is no, the cause should not be included.

Below is the Mars Climate Orbiter Cause Map with additional detail added. Additional causes have been added to the right of the cause map, a cause "hit gas environment" has been added between two existing causes, and a cause "high velocity" has been added vertically. In this example hitting the gas environment and traveling at a high velocity were both needed to produce the extreme heat that destroyed the orbiter. Both causes were necessary for the effect to occur so both are listed vertically and separated with an "and".


As much detail as needed can be incorporated by continuing to add causes. As with any investigation the level of detail in the analysis is based on the impact of the incident on the organization’s overall goals. The greater the impact to the goals, the more detailed the Cause Map will be.

Additionally, each impacted goal should be added to the cause map. Using the same method as before, each impacted goal should be listed in a red box and cause map build by asking "why" questions. This should continue until the point where each goal connects to a cause already documented on the initial Cause Map. The causes for each impacted goal should lead back to a cause already listed on the main cause map, otherwise the impact to the goals may have occurred from two separate incidents and the Outline should be revisited.

Below is the Mars Climate Orbiter Cause Map with all impacted goals added to the analysis.



Continue to ask "why" questions and build the Cause Map until the level of detail is sufficient to understand the issue.

In this example, a few more causes need to be added at the right hand side to understand why the trajectory was lower than expected.



This version of the Cause Map is still simplified, but it has significantly more detail than the first maps with only one or four boxes.

To help document and visualize the analysis, evidence can be documented directly on the Cause Map. The typical way to do this is to state the evidence in a pink box under the Cause the evidence supports. There can be many sources of evidence. Evidence may be a statement or testimony, diagram, historical trend, experiment or test, etc. Any piece of information that furnishes proof of a cause can be documented on the Cause Map.

During an investigation, additional evidence may disprove a particular cause. When this happens, the evidence that disproves the cause is placed below the cause and the cause is crossed out, but not removed from the cause map. This helps document causes that were considered, but ultimately determined not to be related to the incident. Evidence that disproves the cause should also be included.

Below is an example of how an evidence block would look on the Mars Climate Orbiter Cause Map. In this case, loss of all communications with the orbiter is evidence that the orbiter mission was a complete loss.




Once the Cause Map is built to a sufficient level of detail with supporting evidence, it can be used to develop solutions. The Cause Map is used to identify all the possible solutions for given issue so that the best solutions can be selected. It is easier to identify many possible solutions from the detailed Cause Map than the oversimplified high level analysis.

This is an intermediate level Cause Map of the loss of the Mars Climate Orbiter. This example shows what the Cause Map looks like with 40 causes, evidence blocks and some possible solutions added.




Why was the Mars Climate Orbiter Destroyed?

Now that the analysis is completed, the question "Why was the Mars Climate Orbiter lost?" can be answered. As the Cause Map demonstrates, there are a number of causes that contributed to the loss. One of the most obvious causes is a unit error in the software used to help predict the velocity of the Mars Climate Orbiter, which in turn is used to predict the trajectory the Mars Climate Orbiter would enter Martian atmosphere. The error was a simple conversion mistake. The results were in pound force and the program that predicted velocity assumed Newton's, a factor of 4.45 difference. The error in the software resulted in the calculated trajectory being higher than the actual trajectory.

There are a number of other causes as well. For example, consider the scenario where the initial conversion error was still made. An effective software validation program would have identified and corrected the error before it resulted in a complete loss of mission. The ineffectiveness of the software test program is clearly a cause of the loss of the orbiter. Additionally, even if the software error was made and the calculated trajectory was wrong, the mission might still have been saved if the lower trajectory was found earlier. Early identification of the low trajectory of the orbiter would have allowed the team to take action and potentially raise the trajectory. The ability to raise the trajectory was included in the design, but no attempt to change the trajectory was made because the team didn't understand how low the actual trajectory was going to be, and the necessary planning wasn't done to alllow quick act at the time of Mars insertion.

Another cause is the inherent difficulties associated with space travel, making measurement of the exact trajectory tricky. The difficulty in determining actual trajectory is one of the reasons that NASA relied on the calculated trajectory to make decisions during the mission of the Mars Climate Orbiter.

The NASA investigation also identified a number of areas where the project team wasn’t effective. The NASA reports weren’t particularly detailed in this area so it is hard to clearly understand what factors contributed to the ineffective team, but it is clear that were difficulties in several areas. One area where there were issues is communication among team members. The project team consisted of a number of different organizations in different geographic locations. Additionally, inadequate training was a cause that contributed to the loss of the orbiter. The software conversion mistake indicated that the Software Integration Specification, a document which identified what units to use in software, either wasn’t well understood or wasn’t used by the entire team. A project team that was more effective, with adequate staffing, adequate training and a more clearly defined organization would have increased the likelihood that the errors that resulted in the loss of the Mars Climate Orbiter would have been caught earlier and corrected.

Step 3. Select the Best Solution (Reduce the Risk)

The loss of the Mars Climate Orbiter at a high level has only one cause. At a more detailed level it has 4 causes, 12 causes or even 100 causes. All of the levels of the Cause Map are accurate, some simply have more detail that others. This is analogous to zooming in and zooming out to reveal more or less detail. An issue should be worked to a sufficient level of detail to prevent the incident; meaning to reduce the risk of the incident occurring to an acceptable level. This is why solutions and work processes at a coffee shop are not as thorough or detailed as an airline or nuclear power facility. The risk or impact to the goals dictates how effective the solutions should be. Lower risk incidents will have relatively lower detail investigations while significantly high risk to an organization’s goals requires a much more through analysis. Possible solutions are typically documented on the Cause Map as a green box above the cause it addresses. When proposing the possible solutions don't be concerned about limits, boundaries, schedules or financial constraints. Add all possible solutions to the Cause Map so everybody can see them and think about them. The best solutions are selected from the possible solutions and an action plan with owners and due dates is defined.

Cause Mapping Improves Problem Solving Skills

The Cause Mapping method focuses on the basics of the cause-and-effect principle so that it can be applied consistently to day-to-day issues as well as catastrophic, high risk issues. The steps of Cause Mapping are the same, but the level of detail is different. Focusing on the basics of the cause-and-effect principle make the Cause Mapping approach to root cause analysis a simple and effective method for investigating safety, environmental, compliance, customer, production, equipment or service issues.

Resources

The images used were produced by NASA. Use of these images is not meant to imply NASA endorsement of Cause Mapping. Many more images of the Climate Orbiter and detailed information on the mission is available on the NASA website.

Information used for the write up is from:
Mars Climate Orbiter Mishap Investigation Board Phase I Report (dated November 10, 1999)

Report on Project Management in NASA by the Mars Climate Orbiter Mishap Investigation Board (dated March 13, 2000)
Think Reliability :: Root Cause Analysis
Cause Mapping I - Effective Root Cause Analysis Workshop Training
      
February 7-8, 2012
     Houston, TX
Cause Mapping II Root Cause Analysis Facilitation and Documentation Workshop Training
      February 9-10, 2012
      Houston, TX
Sitemap     Copyright © ThinkReliability 2011
Root Cause Problem Analysis