Collapse of the I-35 Bridge – August 1, 2007 - Cause Map
Summary:

On August 1, 2007 at 6:05 p.m., the Interstate 35 bridge over the Mississippi River in Minneapolis, Minnesota (officially Bridge 9340) collapsed with no warning, killing 13 people and injuring 145. The bridge was designed in 1961, built starting in 1964, and opened in 1967. Since 1991, the bridge has been rated "structurally deficient" due to a "poor" superstructure rating. The National Transportation Safety Board (NTSB) Highway Accident Report was finalized on November 14, 2008.
We will use the information found in the NTSB's investigation to build a Cause Map, or a visual diagram showing the root cause analysis relationships. There are three steps to the Cause Mapping process: 1) Define the problem, 2) Conduct the analysis, and 3) Identify the best solutions. We will work through each step in detail below.
Step 1. Define the Problem
The first step of the Cause Mapping approach is to define the problem. First we ask four basic questions: What is the problem? When did it happen? Where did it happen? And how did it impact the goals?
There may be differing opinions on "what" the problem is. Here, one person may say that the problem was the bridge collapsed. Another person might say that the problem was a design flaw; another may say it was that the bridge was overloaded and a fourth could say that the problem was that there was a loss of life. These are all "problems" and they are all related to the bridge collapse, so we write down the four “problems” on the first line. There is no need to spend time debating what the "problem" is. The magnitude of this incident is defined by the impact to the goals.
The second question is the "when", or the date and time of the incident. Thanks to security cameras and traffic monitoring, we are able to pinpoint the exact time when the bridge began to fall (6:05 p.m. on August 1, 2007). Additionally, we capture the "differences". The question of what was different is fundamental in any investigation. At the time of the collapse, it was evening rush hour, and there was roadwork underway on the bridge.
Next is the "where". Where can be defined several ways - the geographical location (Minneapolis, Minnesota), the specific unit (Bridge 9340), or the process (crossing the Mississippi River).
We've now filled out the top half of the Outline (step 1), shown below.
The second part of the outline is the impact to the overall goals. Every organization has goals, which define the ideal state. Obviously, the goals depend on whose perspective the Cause Map is based on. Here we will look at the incident from the perspective of the State of Minnesota. From any perspective, there is an overall goal to have zero injuries. The bridge collapse resulted in 13 deaths and 145 injuries.
A State also generally wants to avoid environmental impacts. It's unclear what the environmental impact of the bridge collapse was, so we'll put a question mark. (It's important to put the question mark so that people who are looking at the outline know that it wasn't inadvertently missed.)
The loss of the bridge represented the loss of a major transportation route for more than 140,000 vehicles a day. This is an impact to both the customer service goal, and the production/schedule goal of the State.
Last, but certainly not least, is the materials and labor cost goal. The loss of the bridge resulted in an additional $400,000 per day in commuting costs. There were 414 days between the bridge collapse and when the replacement bridge was opened on September 18, 2008, resulting in a total of over $165 million. The estimated loss to the economy was $60 million. Finally, the cost to replace the bridge was $234 million, for a grand total of almost half a billion dollars.
The last piece of information we want to record is the frequency of the incident. Catastrophic interstate bridge failures in Minnesota are not frequent. (Keep in mind we're doing this from the perspective of Minnesota, so we don't have to worry about the frequency of all bridge collapses, just the ones that affect Minnesota directly.)
Our outline is now complete.
Step 2. Conduct the analysis
During the analysis step, we break the incident down into causes which are captured on the Cause Map. The Cause Map starts by writing down the goals that were affected as defined in the problem outline. For the I-35 bridge collapse, we'll begin with the impacts to the safety goal. The safety goal was impacted because 13 people were killed and 145 people were injured. These are the first cause-and-effect relationships on our Cause Map.
We continue the analysis by asking Why questions and moving to the right of either of the cause-and-effect relationships above. In this example we’ll start with the loss of 13 lives which was caused by the collapse of the main span of the bridge.
A simple way to create a high level Cause Map is to continue the map linearly by asking "Why" five times. This is known as the "Five Whys" technique. For example, the main span of the bridge collapsed . . .Why? Because of the fracture of the gusset plate*. Why? The insufficient load capacity of the gusset plate. Why? The necessary calculations were not performed. (*The gusset plate is the riveted metal plate that joins several structural members.)
Although the Cause Map above is accurate, it is neither very complete nor detailed. We can add more detail to the map as the investigation continues. Detail should continue being added to the map until it reaches a level of detail commensurate with the impact to the goals. Because of the fatalities associated with this bridge collapse, we are going to add all the detail we can get.
We can add more detail in between the causes. For example, 13 people were killed because they were in vehicles that fell into the water. The main span of the bridge collapsing CAUSED the vehicles to fall into the water, which caused the deaths.
We can add detail vertically. Two causes in the same vertical column joined by an "AND" are both necessary for the effect to occur. For example, the fracture of the gusset plate was not solely caused by the insufficient load capacity of the gusset plate. It also required increased load on the gusset plate for the fracture to occur.
Additional vertical causes can also be joined with "OR". This indicates that either of the causes could have caused the effect. Note that in incidents that DID happen, OR is only used for a lack of information (something we don't know) or for possibilities that have since been refuted. There are no OR relationships in the actual incident; OR relationships only exist due to a lack of knowledge.
Early on in the investigation several possibilities that could have caused the fracture of the gusset plate were considered, and later refuted. We want people who look at our map to know we've considered these possibilities, so we leave them on the map, but cross them off. Possible causes for the fracture of the gusset plate that were considered, and then refuted, were corrosion damage, preexisting cracking, and temperature effects.
People looking at our map may want to know how we determined certain causes did (or did not) contribute to the incident. We can record the evidence supporting (or not) a certain cause in an evidence box attached to the cause the evidence is for. For example, NTSB determined that preexisting cracking was NOT a cause of the fracture of the gusset plate because there was no evidence of cracking. We include this in the evidence box and place it below the cause.
We can also add detail to our Cause Map by adding effects (i.e. building to the left). We can add the other impacts to the goals to our map. The safety goal was impacted by the deaths and injuries, which were both caused by people in vehicles falling into the river, caused by the collapse of the main span of the bridge. There were also impacts to the customer service, production and materials/labor goals, but these are not as important as the impacts to the safety goal, which will be our focus.
Our Cause Map now looks like this:
This Cause Map is more detailed than what we began with, but there is still more information we can add. We can add more detail to the right by asking additional "Why" questions. Why was there an increased load on the gusset plate? The increased load on the gusset plate was caused due to concentrated weight over the gusset plate and increased load on the bridge. We will discuss each of these causes in more detail.
The concentrated weight over the gusset plate occurred because there were construction equipment and materials on the bridge, and these materials were concentrated over the gusset plate.
The increased load on the bridge was partially due to a high volume of traffic, due to rush hour, lanes being closed for the roadway work discussed above, and increased use of the bridge. The traffic on the bridge in 1976 (the earliest date for which there are records) was 60,600 cars/day; for 2004 (the latest date for which there are records) was 141,000 cars/day. The increased load on the bridge was also due to the increased dead load, or weight of the structure itself.
Two major increases in the bridge's dead load occurred as part of two previous bridge improvement projects. More than 3 million pounds of additional dead load was due to an increased concrete overlay to protect the bridge rebar from corrosion. An additional million pounds was added due to upgrades to the median barrier and railings because the bridge did not meet updated safety standards.
One of these causes is an unintended result from a previous decision. In 1977, it was determined that there was an impact to the material and labor cost goal due to decreased life of the bridge. The decreased life was due to rebar corrosion from the rebar interacting with road chemicals. The solution was to increase the concrete overlay to protect the rebar. This solution from 1977 has now become a cause in our map for the 2007 incident.
At this point in the investigation, somebody might point out that there's more information available about the design of the gusset plate that resulted in insufficient strength. All we have so far is:
Maybe someone will point out: the REAL problem is that the gusset plate fractured because of insufficient strength. The strength was insufficient because of an insufficient load capacity AND an increased load on the gusset plate. Good point; we are always interested in increasing the accuracy of our map. Now we have:
The load capacity was not sufficient because the design thickness of the gusset plate was not sufficient. (It was 1/2" thick instead of being 1" thick.) There were a lot of causes for the insufficient design thickness. Chronologically, the first error to happen was that the necessary calculations weren't performed. This is apparently because what ended up being the final design was meant to be a preliminary design. Next, the design firm's review process (which required an over check of each calculation) wasn't followed, probably because there was no procedure to ensure that all the calculations were performed. Then, there was a lack of oversight from the government, whose design review did not apply to gusset plates. Lastly, the error was never noticed, in the 40+ years since the bridge was designed. We'll talk about this more.
Now, why wasn't the error noticed in the 40 years the bridge was in operation? The two ongoing checks that might have noticed this error are the load rating calculations (which determines the weight limit for the bridge) and the bridge inspections. The error was not noticed during either. Why?
The error was not noticed in load rating calculations because a load rating does NOT include an evaluation of the gusset plate capacity. This simplifies the analysis, since gusset plates are not considered important to load rating. This is because it is widely assumed that gusset plates are stronger than the members they connect. This assumption is generally true . . . when the gusset plates have been properly sized. Due to the design error, this assumption was no longer accurate. The assumption was never verified because the gusset plates were left out of the analysis.
The gusset plate design error was not found during the inspections, which were inadequate. The bowing of the gusset plates (visible on photographs from 1999) was not mentioned as part of any inspection. The inspectors noted that they assumed it occurred during construction. (More unverified assumptions.) The gusset plates were not listed as separate inspection elements, and there was a lack of training on gusset plate inspections, because gusset plates were not considered important to the load rating, as discussed above. You'll notice that this assumption has become a common cause. That doesn't make it THE root cause, but it does show how a common error may be found in several locations on a Cause Map.
The full Cause Map based on the information found in the NTSB report has more than 50 causes, but even more detail can be added for clarification or if additional information arises as part of the investigation. Again, due to the fatalities associated with the event, we want an extremely detailed Cause Map.
Step 3. Select the Best Solutions (Reduce the Risk)
Once the Cause Map is built to a sufficient level of detail, and supporting evidence is included, we begin looking for solutions. The Cause Map allows us to find specific solutions for the individual causes on the map. However, each cause may not have a solution. Possible solutions are placed directly on the map above the cause they control, as shown below.
The full Cause Map with associated evidence and possible solutions is shown below.
The best solutions are selected from the Cause Map and placed in the Action Items Table, shown below. The solutions that will be implemented are based on the organization's goals. Because the impacts to the goals for this incident were so severe, including loss of life, many solutions will be implemented.
Cause Mapping Improves Problem Solving Skills
The Cause Mapping method focuses on the basics of the cause-and-effect principle so that it can be applied consistently to day-to-day issues as well as catastrophic, high risk issues. The steps of Cause Mapping are the same, but the level of detail is different. Focusing on the basics of the cause-and-effect principle make the Cause Mapping approach to root cause analysis a simple and effective method for investigating safety, environmental, compliance, customer, production, equipment or service issues.