- Cause Mapping
- Tools & Resources
- About Us
Download the PDF
Root Cause Analysis focuses on process. And there is perhaps no group that relies more on process than engineers.
In fact, to function in the modern world, one must often place trust in engineers.
To cope, we are attentive to signals of disrepair and tend to ask ourselves more questions when outward appearances give us pause. We think we can expect the brand new, pristine elevator in an ultra-modern building to work just fine, but the old, musty-smelling one with the basic button layout and the door that gets stuck for a split second when open will make us consider taking the stairs. Similarly, the first elevator ever built, no matter how new and clean and shiny, must have been pretty scary to ride in; the 879th elevator ever built probably inspired a bit more confidence.
Assumptions are a double-edged sword; they allow us to function in the modern world, but at times they reveal themselves to be vacuous, and disaster results. This is a familiar theme in Root Cause Analysis. The consequences of assuming, for example, that all members of a large project are working with the same units can come with a billion dollar price tag (as was the case with the loss of the Mars Climate Orbiter. Other times, people pay with their lives.
This Root Cause Analysis investigation will present another tale of assumptions gone bad. Since Root Cause Analysis is a solutions-oriented enterprise, though, this Root Cause Analysis example will not only give you a clear idea of what happened, but also propose solutions to prevent disaster when engineers and designers make assumptions when building a large structure.
Our story unfolds at the Kansas City Hyatt-Regency Hotel in 1981. At the time, the Hyatt was new, modern, slick, and impressive. Construction began in May, 1978, and the hotel opened in July 1 1980. The main attraction in the hotel lobby was an atrium spanned by 3 pedestrian walkways at the second, third, and fourth floors suspended from the roof. The bridge on the fourth floor was located above that on the second; the third floor bridge was offset. Each bridge was roughly 12 feet long, weighed 64,000 pounds, and was suspended by three pairs of hangars at the ends and at uniform intervals.
It was a pretty cool place to hold events–and so it happened that on July 17, 1981, just over a year after the hotel had first opened, roughly 1,600 people gathered in the lobby to watch (or participate in) a dance competition. The walkways offered a great location to watch from, so about 40 people gathered on the one on the second floor and roughly 20 were on the one on the fourth floor. The weight from those people, apparently, was too much for the walkways to withstand; the fourth floor collapsed on the second, and both then fell down on the lobby floor. 111 people died immediately, 3 died later in the hospital, and 216 were injured.
Description: View of the collapsed walkways, during the first day of the investigation of the Hyatt Regency walkway collapse. Source: http://ethics.tamu.edu/ethics/hyatt/hyatt2.htm. Author: Lee L. Lowery, Jr., PhD, P.E.
View of the lobby floor, during the first day of the investigation of the Hyatt Regency walkway collapse. Source: http://ethics.tamu.edu/ethics/hyatt/hyatt2.htm.
The collapse was totally unexpected; this wasn’t an old, run-down building where one might think twice about the structural integrity of where one walks. This was a new, modern, clearly expensive place that you wouldn’t think twice setting foot in. The walkways themselves were certainly modern and visually impressive, but their design was not particularly innovative or revolutionary—nothing about them would cause the average person to think, “be cautious, it’s the first of its kind, tread lightly.” They looked solid; they were not. The disaster at the Hyatt was the deadliest structural collapse in American history, until the World Trade Center collapsed on 9/11.
In some cases, a degree of failure is expected. When implementing an innovative design or perfecting some new technology, setbacks are the price of progress.
However, this is not one of those stories. What happened at the Kansas City Hyatt-Regency Hotel was the result of a number of project management errors that combined to permit a fatal construction design problem to be installed in the bridges’ support system. The good news: there are relatively simple solutions to the problem that can be enacted. The bad news: the problem was caught only when people died.
Root Cause Analysis is no stranger to structural failure. When something this big goes this wrong, it is critical to understand exactly what happened–not just that the walkway fell or even that a miscalculation was evidently made, but also how every contributing factor came to exist and why errors were not caught earlier.
The cause of this tragic incident can be investigated by building a Cause Map, a visual Root Cause Analysis, which shows the cause and effect relationships between the different factors that contributed to the collapse. Root Cause Analysis will help us analyze the collapse and implement solutions to ensure that it never happens again.
Our Root Cause Analysis of the walkway disaster identifies the walkway collapse and the injuries sustained as the central problems. Because Root Cause Analysis investigations lead from the goals to various problems, here we are just trying to give ourselves a sketch of the incident based on known facts; we will find other “problems” in the analysis phase of our investigation, but for now we list the major, obvious problems and move on.
Root Cause Analysis also requires that we capture the date and time, as well as any differences present at the time, or what made this day and time different from any other. The disaster occurred on July 17, 1981, around 7:05 PM; the difference on this day was that there were more people than usual on the walkways, and some of them may have been dancing or swaying, creating more movement than usual on the walkways.
To continue with our Root Cause Analysis problem outline, we also must specify the location (a hotel in Kansas City, Missouri), and the process being undertaken at the time (a dance competition).
Finally, our Root Cause Analysis problem outline details the impact that the incident had on the goals of the organization in question, which in this case would be the Hyatt hotel. Root Cause Analysis involves being as specific in defining a given organization’s goals as it is in defining the problem. All organizations have multiple goals in common. It is good business to ensure the safety of employees and the public, remain within budget, achieve the intended purpose of the organization, avoid damaging property, and do it all as efficiently as possible. In Root Cause Analysis, these elements are understood in terms of safety, property, production, and labor goals.
Just as Root Cause Analysis of any incident considers multiple problems, it also asks how those problems affected multiple goals. Root Cause Analysis always thinks about “the problem” in relation to the impact the issue under investigation had on the organization’s overall goals. Customer deaths are never part of a otel’s business plan; ideally, everyone who walks in the doors will walk out of them in much the same condition. The deaths and injuries thus can be said to have affected the safety goal. Similarly, hotels don’t like to be known for disaster. The public relations fallout that resulted from the walkway collapse clearly affected the hotel’s reputation, which our Root Cause Analysis calls the customer goal. The hotel also had to be repaired after the walkway collapse, a costly endeavor that impacted the customer service and production goals.
Now that we have a pretty good idea of exactly what happened, our Root Cause Analysis turns to analyzing the incident by asking, as often as necessary, another key question: Why?
Root Cause Analysis continues in this step by identifying the cause and effect relationships that comprise the incident. Starting with the goals that were affected by the incident, we build our Cause Map by asking, “why?” 5 times. Each time we get and note an answer we ask why again, building our Cause Map out to the right. While the Cause Map may start linearly, it will expand to provide a detailed view of the incident as more information is collected as the Root Cause Analysis continues.
Both the safety and customer goals were affected by the fatalities and injuries caused by falling and/or being crushed by the structure when the walkways collapsed.
Why did the walkways collapse? A Root Cause Analysis of the Hyatt incident reveals a number of contributing causes.
First, the structural design of the walkway was inadequate. A weld failed, allowing a support rod to pull through the box beam, causing the walkways to fall.
The fact that the longitudinal weld wasn’t strong enough was not alone able to produce the result, however. At the same time, there was higher stress than usual on the weld that day because there were more people than usual on the walkways. At the time of failure, a large crowd had gathered to watch the dance contest; about 20 people were on the second floor walkway, and roughly 40 were on the fourth floor walkway, creating a higher load that combined with the flawed structural design to produce a disaster.
When two conditions are necessary to produce an effect, our Root Cause Analysis joins both on the Cause Map with the word “and,” like so:
Having asked and answered the question, “why?” several times, our Root Cause Analysis begins to reveal a basic visual representation of what happened. Identifying the failure mechanism is important during a Root Cause Analysis investigation, and we have it: the weld and the crowd. Yet a thorough Root Cause Analysis must take the analysis further in order to better understand the causes and propose multiple solutions. If the design was inadequate, why was it built?
In this case, it appears that the design was changed without the approval of the structural engineer. How do we know? Our evidence for this assertion, logged under the cause that it controls, is that the final design was different from the design concept at contract. All of this information appears on the Cause Map like so:
These design changes resulted from a communication error between the fabricator and the structural engineer. Nobody caught the problem because the design review process was ineffective. The structural engineer had sent a sketch of a proposed walkway design to the fabricator, assuming the fabricator would work out the details of the design. The fabricator, for his part, assumed that the sketch was a finalized drawing.
The original design would have been difficult to implement, as it required the long portion of the rod to be threaded, and therefore required non-standard parts. The drawings, however, allowed for “fabricator’s judgement”, and the fabricator wanted to use standard parts (it’s cheaper and easier). Assuming the drawing was final and he could, in fact, pick out standard parts to fit the sketch, he did.
This resulted in a significant change from the original design and dramatically decreased the load bearing capacity of the walkways.
The changes in question relate to the hanger rod connections. The fabricator changed the design from a one-rod system to a two-rod system. This doubled the load on the connecter, ultimately causing the walkway collapse.
Now that we have a more detailed picture of what happened and why, our Root Cause Analysis turns to the matter of proposing solutions.
While most organizations consider Root Cause Analysis to be a search for the root cause or root causes of a problem, the Cause Mapping approach to Root Cause Analysis focuses on finding specific solutions to prevent problems by matching them to specific causes. Because Cause Mapping identifies the system of causes that contributed to a problem and maps that information visually and clearly, it lays the perfect groundwork to reveal all of the possible solutions. The best are selected from the possible solutions, and in the end one can select specific action items to prevent the problem from occurring.
As our Root Cause Analysis has shown, the disaster in Kansas City occurred due to a connection that was overloaded because of an ill-advised change to a badly defined structural detail. As an investigation in the aftermath showed, even if the original design had been implemented, the walkway would not have been able to hold the expected load, thereby failing to meet the requirements of the Kansas City Building Code. Checking the calculations at the design stage could have prevented this disaster. So how do we make sure it never happens again?
Our Root Cause Analysis reveals that part of the problem in this incident was that neither the designers nor the builders assumed responsibility for the final product. This is not a matter of placing blame on one group or another, but rather of determining at what point one can consider an engineering project to be finalized. Without clear guidelines and procedures, miscommunications and assumptions could strike again rather easily, putting lives in danger. This Root Cause Analysis thus suggests a rather simple procedural solution: Implement clear guidelines for determining ultimate responsibility for a design.
Even with responsibility for the designs established, another element missing from the conception and construction of the lobby was formal oversight. As our Root Cause Analysis shows, part of the reason that unapproved design changes were made is that the design review process was ineffective. Root Cause Analysis thus suggests creating a formal design review process as a possible solution.
Finally, Root Cause Analysis makes clear that the walkway collapse was also caused by relying on a longitudinal weld that wasn’t strong enough. Another solution, then, would be to strengthen the weld.
In the wake of the walkway collapse, the owners of the Hyatt-Regency Hotel paid over $140 million in damages to victims. The Missouri Board of Architects, Professional Engineers, and Land Surveyors convicted engineers employed by Jack D. Gillum and Associates who had approved the final drawings of gross negligence, misconduct and unprofessional conduct in the practice of engineering. All of them lost their engineering licenses in Missouri and Texas, as well as their American Society of Civil Engineers (ASCE) memberships. Jack D. Gillum and Associates, for its part, lost its license to be an engineering firm, but Jack Gillum himself continued to take speaking engagements at engineering conferences, so that others might learn from his mistakes.
For the engineering profession, the walkway collapse tragedy became a classic model in studying engineering ethics and errors. To begin with, the American Society of Civil Engineers established the precedent that responsibility for the building lies with the engineer’s seal, meaning whoever sets their seal of approval on a set of plans. A city engineer, moreover, must do a formal check on load-bearing calculations. In this way, potential failures will be caught at the design stage, and not by a bridge full of unwitting spectators.
The hotel itself did rebuild, and still stands today (though it has changed names a few times in the intervening years). Instead of suspended walkways, the lobby was rebuilt with a single crossing on the second floor supported by large pillars, resulting in a structurally sound construction.
Root Cause Analysis of the Hyatt-Regency disaster reveals the importance of building in safety nets to the work process that do not compromise the business but do reduce the level of risk it operates in to an acceptable level. Grounded on the basics of the cause-effect principle, Root Cause Analysis can be applied consistently to mundane as well as catastrophic or high-risk issues. The Cause Mapping approach to Root Cause Analysis is thus a simple yet powerful way to investigate all kinds of issues, be they safety, environmental, compliance, customer, production, equipment or service related.
Schedule a workshop at your location to train your team on how to lead, facilitate, and participate in a root cause analysis investigation