It’s debatable whether you can do Problem Management without fully understanding how to do Root Cause Analysis. After all, isn’t identifying the root cause of a problem at the very heart of Problem Management? I suppose you could limit yourself to just identifying work arounds, quick fixes and circumventions, but that isn’t real Problem Management.
There are a number of Problem Management Root Cause Analysis Techniques to choose from, some lend themselves to being more of an information gathering exercise, whilst others lend themselves to forensic analysis.
Ideally, as a Problem Manager you should develop a good understanding of most if not all of them, and you should be able to apply the appropriate technique to the problem in hand.
Here are some of my favourite techniques, which I’ll expand on in future blogs, articles and downloads So look out for the publication of our future short ‘how to do it’ articles:
- Ishikawa Diagrams (I've already expanded this one in a previous article)
- Pareto Analysis
- Pain Value analysis
- Chronological Analysis
- Technical Observation Post
- Affinity Mapping
- Fault Tree Analysis
- Fault Isolation
- Five whys?
- Hypothesis Testing
I should point out at this stage that it isn’t really a case of just using one of these techniques, they can and should be used in combination to achieve a much greater effect.
If I have maintained your interest so far, I would like to take one step back to point out that if you are really interested in seriously doing Root Cause Analysis, you should have already established a good Problem Management policy and process. Of course, effective Incident Management should also be in place… but that’s a given…
For a moment, let’s briefly get back to basics. Firstly you need to establish a Problem Management policy that will define what is in scope of the process, i.e. who can raise a problem, when and how. You can keep it fairly simple, but it does need to be documented and more importantly agreed across all of the stakeholders.
Next, you need to establish a process (which is no mean feat). We’ve previously provided guidance to this in our ‘Top 10 Tips For Problem Management’ article, so if you haven’t read it as yet, now might be a good time to review this as well.
Assuming you’ve established the Problem Management process, the next step is to establish/expand your Root Cause Analysis activities/sub-process, which provides a framework in which we can use the various RCA techniques. For want of a better title we could call it a ‘Root Cause Analysis’ or ‘RCA’ sub-process… I’m not concerned about what you want to call it.
As with all things in life try to use the ‘KISS’ approach, i.e. ‘Keep It Simple and Structured’.
Our RCA sub-process documented here has five identifiable steps. But we shouldn’t be overly prescriptive, use what works in your organisation, so adopt and adapt to meet your organisational needs. And don’t be afraid of the ‘ITIL® Police’.
Step One: Firstly, define the Problem. ‘What the problem is, and what the problem isn’t’… hopefully that makes sense. It can be quite valuable to clearly understand what is working correctly as well as what is not working correctly as part of your investigations for identifying the root cause.
If you are working remotely and are dependent upon other peoples eyes (in fact any or all of their senses), then ensure you ask ‘open’ questions that people can’t just answer with a ‘yes’ or ‘no’. For example:
- What do you see happening?
- What are the specific symptoms?
Once you know what the symptoms are you can start to delve deeper…
Step Two: Then collect the relevant data
You’ll need to acquire evidence that this is a real problem and not just a perception, although it can be argued that poor perceptions are often problems. So ask…
- What proof do you have that the problem actually exists?
- How long has the problem existed?
- What is the impact of the problem? (Number of people affected, loss of revenue, effect on brand etc.)
- What’s the urgency to resolve the problem? (Is it getting worse?)
You will need to analyse a situation fully before you can move on to look at factors that contributed to this problem. To maximise the effectiveness of your RCA, get ‘everyone’ together from your various stakeholders – your experts and front line staff – who understands the situation. People who are most familiar with the problem so can help lead you to a better understanding of the issues.
A helpful method/tool at this stage is CATWOE. (I must admit I ‘borrowed’ this acronym and concept from someone… but can’t remember where it originated). It’s a bit contrived but it works for me as a high level checklist… With this method, you look at the same situation from a number of different perspectives: the Customers, the people (Actors) who implement the solutions, the Transformation process that's affected, the World view, the process Owner, and Environmental constraints.
And remember to document your findings…
Step Three: Now you will need to identify the possible causal factors
- What sequence of events lead to the problem?
- What specific conditions allowed the problem to occur?
- What other problems surround the occurrence of the underlying problem?
During this stage, identify as many causal factors as possible. Too often, people identify one or two factors and then stop, but that's not sufficient. With RCA, you don't want to simply treat the most obvious causes – you want to dig deeper, and deeper and deeper, although if you dig ‘too’ deep you will discover that everything boils down to either ‘human error’ or an ‘act of god’… So you want to dig down to the level that you can resolve.
Use these tools to help identify causal factors:
- Appreciation – Use the facts and ask "So what?" to determine all the possible consequences of a fact. (Anyone with teenagers will recognise this technique)
- 5 Whys – Ask "Why?" until you have dug down as far as you can or dare? (Anyone with small children will recognise this technique)
- Drill Down – Break down a problem into small, detailed parts to better understand the big picture. Try the Fault Tree Analysis technique
- Ishikawa or Fishbone (Cause and Effect Diagrams) – Create a chart of all of the possible causal factors, to see where the trouble may have begun.
Step Four: You should now be in a position to better identify the Root Cause(s)
- Why does the causal factor exist?
- What is the real reason the problem occurred? (For example we could have an application configuration problem which has occurred due to lack of user training)
You can use the same tools you used to identify the causal factors (in Step Three) to look at the root cause(s) of each factor. These tools are designed to encourage you to dig deeper at each level of cause and effect.
Step Five: Recommend and implement solutions to remove the root causes or where they can’t be removed ensure that you have effective acceptable documented workarounds in place, or in those rare cases where there is no workaround and the underlying root cause cannot be justifiably resolved, then people are made aware that they will have to ‘live with the problem’.
Assuming that we have a resolution to the problem, now identify:
- What can you do to prevent the problem from happening again?
- How will the solution be implemented?
- Who will be responsible for it?
- What are the risks of implementing the solution? (This could include considerable cost)
- Are the workarounds a feasible approach in the short /medium term?
Analyse your cause-and-effect process, and identify the changes needed for various systems. (For example the removal of a single point of failure). It's also important that you plan ahead to predict the positive, and potential negative, effects of your solution. This way, you can spot potential failures and knock-on effects before they happen.
One way of doing this is to use Failure Mode and Effects Analysis (FMEA). This tool builds on the idea of risk analysis to identify points where a solution could fail. FMEA is also a great system to implement across your organisation; the more systems and processes that use FMEA at the start, the less likely you are to have problems that need RCA in the future.
Business Impact Analysis (BIA) is another useful tool here. This helps you explore possible positive and negative consequences of a change on different parts of a system or organisation.
Another great strategy to adopt is Kaizen, or continuous service improvement. This is the idea that continual small changes create better systems overall. Kaizen also emphasises that the people closest to a process should identify places for improvement. Again, with Kaizen or a CSI programme alive and well in your organisation, the root causes of problems can be identified and resolved quickly and effectively.
Oh, and don’t forget that you should also remember to engage with the Change Management process to get the root cause resolved… Otherwise you may end up in a constant loop of problem, unauthorised change, further problem etc.
Key Points – In a nutshell
Root Cause Analysis is a useful activity for understanding and resolving a problem, and potentially preventing one... It can be used reactively and proactively and isn’t just for use as an IT Service Management technique, it can equally be used in Project Management or within business units to identify the root cause of problems.
Figure out what negative events are occurring. Then, look at the complex systems around those problems, and identify key points of failure. Finally, determine solutions to address those key points, or root causes.
As an analytical tool, Root Cause Analysis is an essential way to perform a comprehensive, system-wide review of significant problems as well as the events and factors leading to them.
Learning More - Our Offer To You.
Purple Griffon offer a number of Problem Management and Root Cause Analysis courses and workshops to help you improve you skills and knowledge.
If you would like to learn more why not give us a call on 01539 736828 or email firstname.lastname@example.org
Note: The information included in this document has been aggregated from a number of sources. We believe the description of each method describes the most accurate information available at the time of printing. Any information to the contrary or constructive feedback would be most welcome.