Incident Investigation: Rethinking the Chain of Events Analogy

The chain is both a tool and a symbol that is familiar to one and all. Almost inseparable from any thinking involving a chain is the notion that a chain will fail at its weakest link.

In the safety profession, we frequently use a chain analogy in describing incidents and their causation. This most commonly takes the form of relating an incident to a chain of events. Within this chain of events, we look for the weak link as a means of identifying what went wrong that allowed the incident to occur. We then very often go further and identify a specific human error that was made, and the person who made it. That person, and/or what they did or didn't do, is thought of as a weak link in the sense of a "performance" chain. Rigid adherence to this way of thinking can lead to some significant errors in improving safety performance. We can and should avoid them.

There are three main problems that this traditional thinking about the chain of events analogy can lead to:

1. The very notion of a chain invokes an image of a linear sequence and can lead to a failure to acknowledge the multivariable nature of outcomes in systems where people are involved. That is to say, there are, in fact, many different possible paths to an incident. A consequence of ignoring multivariable outcomes is the incorrect notion that any one change or interruption in "the chain" will prevent an incident. In reality, this wishful thinking is seldom the case.

2. The "weakest link" approach implies that there is only one "main" cause for a given incident, and that doing something directly to deal with that one cause will preclude a recurrence of the incident. This is very much at odds with modern thinking on multiple causation factors in virtually all incidents. It compounds the problem by tending to focus on what are commonly called direct or immediate causes, at the expense of getting to the root or underlying causes.

3. Looking almost exclusively at the weak link creates a focus on the point of failure and assumes that this is also the best and most effective point of control. The point of failure is often well removed from the best point of control. Not understanding this crucial concept is an error that can make it nearly impossible to seek out and deal with root causes of a problem and the system deficiencies that underlie those root causes. It also has significant implications towards overemphasis on behavioral approaches or any single-point intervention technique.

Every link in a physical chain is in fact only connected to one other on each end. The real world chain of events, however, has many more "options" in terms of inputs and outputs. Breaking a single "link" will not necessarily preclude the end event from occurring.

Human actions are a combination of attitudes, beliefs, moods, training, awareness and many other factors. The point being, we may not respond to a given situation today the same way we did yesterday. The key idea here is that many sets of inputs and outputs are possibilities in incident causation. We must be very careful to avoid thinking about causation in a purely linear manner.

It is not that hard to find what is apparently a single weak link in almost any given incident situation, whether it is a physical problem, a human error problem or some combination of both. In fact, it is almost too easy. Too easy, that is, in the sense that once we do find the weak link, we tend to stop looking for any other sources of the problem. It is vitally important to move past the notion that there is only one cause for an incident, or the almost congruent notion that only one thing needs to be corrected to preclude a recurrence. It is also important to note that any and all immediate or direct causes are but symptoms of more serious problems, the root causes. The failed link itself is a direct cause, the observable multiple factors leading to it are likewise, and until we ask why those are present, we can all too easily overlook the root (or underlying) causes.

Root causes are likely to apply to a whole series of potential incidents, not just one event. These root causes are in fact the key to prevention of future incidents. And contrary to what all too many people may think, human error is not one of them! Human error itself is a symptom that there are other problems in the management of the work that is taking place. These error problems themselves have root causes. When a worker makes an error or fails to follow a procedure, there are reasons that set up the situation. These are the root causes that must be found.

Corrective Action

When we look at the failed link in a chain, it can be very tempting to focus all attention on keeping that one link from failing again. How should we go about doing that? The most obvious immediate course of action might be to repair and/or strengthen that one link so that it is no longer the weak point. This sets us up for failure, for as was pointed out previously, the failure of the link is just a symptom. The difficulty is a failure to see the difference between the point of failure and the point of control. It is often necessary to design corrective actions for both places, but we need to base such a decision on careful analysis. Another way to think of this is that unless and until you are reasonably sure of all the factors leading to a problem, you can't make an effective decision on how to control the problem.

The leading writers in quality management disciplines point out that at least 85 percent of the factors leading to quality problems are the responsibility of management, yet we in the safety profession still deal with believers (particularly among the managers we work for) that a similar percentage of safety incidents are the sole result of "unsafe acts" of the workers. Errors or omissions on the part of executives and managers do not enter into this equation. Such thinking is then too often used as justification for over-reliance on single-point approaches to behavior-based safety (BBS), when in reality, there is no single point where the problem can be dealt with exclusively. By the way, please do not take this statement to be a denunciation of BBS; it is a proven and useful part of an overall approach to safety improvement.

What we cannot do is allow ourselves to think that the only place human error can be effectively dealt with is at the individual worker level. That approach to BBS falls prey to the problems inherent in misinterpreting the weak link problem. So where is the most effective point of control, if not at the point of failure? It is back where all aspects of the workplace are actually controlled, at the heart of the management system governing the organization.

Consider the following example. A large trucking company has a policy to fire any driver who has a "preventable" traffic accident. This in fact keeps that driver from having another accident (for that company), but does nothing to improve the performance of the company's remaining drivers. If the same company determines the root causes and points of control leading to the first accident, effective remedial actions will have a positive effect on all of the company's drivers. Which approach makes more sense? It depends on whether you want to address only the broken link, or the whole "chain system." Most of us shouldn't have much trouble making the most effective choice.

Avoiding Pitfalls

We have looked at three ways that the chain of events analogy is commonly misused. In each of these areas, we have seen that these problems can keep us from finding useful solutions to incident prevention. As in so many other situations, the very problems themselves contain the seeds of their own solutions. The chain analogy can be a useful tool in dealing with incidents, but only if we avoid the pitfalls to which its traditional interpretation can lead. There are key aspects to consider in making sure we get it right:

Recognize the multivariable nature of incident causation. Avoid the trap of thinking that there is only one path to an incident and that any change made along the way will provide adequate protection against recurrence.

Understand the Principle of Multiple Causes. Look for all the causes of an incident, not just at the failed link in the chain. Make sure you find root causes as well as direct causes, and don't mistake human error as a root cause.

Realize the point of failure and the point of control are not necessarily the same. Seek to understand the problem as part of the overall system, and identify where the system itself can be best controlled.

When incidents are looked at in this manner, great success can be achieved. Instead of misleading us, the chain analogy can be an effective component in our toolbox, joining timelines, Ishikawa (fishbone) diagrams, and various other analysis techniques as vital ways to help solve problems. Edmund Burke, the English political philosopher, said, "Experience is the school of mankind, and they will learn at no other." Almost all of us have experience in thinking about or using the chain analogy. We need not abandon or ignore this experience, but we do need to rethink how we use it in interpreting events. When we use the chain analogy properly, it will become a more useful way to help us find the solutions to new and ever more complex problems. Effective controls to the causes of incidents can and must be found. We can't afford to do less in a world of hazards ready to lead to serious workplace incidents.

Allan T. Goldberg is a senior risk manager with International Risk Control America (IRCA). His experience as a consultant has covered a broad range of safety and loss control management specialties. These included safety management training, conducting management systems assessments, leading accident investigations for clients, development of custom audit protocols and manuals, and direct consulting projects to facilitate program implementation. He has had numerous articles published in journals such as Occupational Hazards, Professional Safety, and Plant Engineering.