Getting Started With Problem Management

Posted by | Last Updated on | Estimated Reading Time: 8 minutes

Most organisations have established a Service Desk in some form and most have even documented their Incident Management process, however many organisations have stalled in their continued implementation of Service Management and are failing to achieve the benefits that they deserve. This is probably due to a number of reasons; lack of resources, lack of management commitment to invest further in IT Service Management, no strategy, no expertise in Service Management etc.

Over the years I have spoken to a number of these organisations and helped many of them to progress further. The answer to taking them to the next step after Incident Management is often achieved by establishing Problem Management.

Okay so what’s the business case for Problem Management? – The justification is that the impact of incidents can be reduced and that some repeat incidents can be prevented from recurring and even happening in the first place, thus improving service quality. Overall the impact of incidents on the business can be reduced; there will be less incidents and improved organisational learning. You will also be able to improve the effectiveness of the Service Desk with better first time fix rates. The reverse is that if you don’t implement Problem Management you will end up with a purely reactive support organisation, facing up to Problems only when the service to customers has already been disrupted. Your customers will be confronted with recurring incidents, losing faith in the quality of the IT support organisation. Ultimately, you’ll end up with an ineffective support organisation with high costs and low employee motivation, since similar incidents have to be resolved repeatedly as structural solutions are not provided.

Who’s involved? – Problem Management in most organisations is a team game. The Problem Manager oversees and coordinates the effort, and Problem Resolvers in the Technical, Operational and Application teams identify the workarounds, quick fixes and circumventions. Problems may also be caused by 3rd party hardware and software, so you will also be working with them to resolve these problems.

How do you gain management commitment and support? – You will need management commitment, which means that you have to sell the concept to them. In my experience the ‘What’s in it for me’ approach generally works best. By your efforts, IT management will be able to demonstrate improved performance, reliability, maintainability, consistency of service, cost reduction and general improvements in effectiveness and efficiency.

When should you raise a Problem record? – You can literally raise a problem record at any time, but it makes sense to define who can raise a problem record, under what circumstances and how; otherwise the Problem Manager will be inundated by problems. You will also require an ITSM software tool that can differentiate between Incidents, Problems and Changes, but can also link them together when required. The ability to categorise, prioritise and track problems will also be a requirement.

Problem Resolvers – One of the roles of the Problem Manager is to coordinate problem activities. This will require identifying areas of expertise and responsibility and establishing a good working relationship as you will have to influence these people and their management. This can often be a challenge as they may not see the benefit of a permanent fix if they don’t benefit, e.g. fixing an application error may result in less call-out fees.

Separating Problems from Incidents – Although I have been an Incident & Problem Manager, I am acutely aware that there is a conflict of interest between managing both the Incident and Problem Management processes. On the one hand you want to reboot the server ASAP, and on the other hand you want to find out why it keeps crashing. The key is to keep the problem management process separate from the incident management process, and if possible the ownership of the processes.

Where do I start? – Well apart from establishing the Problem Management process, defining roles and responsibilities and gaining management commitment, good Problem Management relies on good Incident Management and access to configuration data in the form of a CMDB. So, working with the Service Desk and Incident Management is a good start. At some stage you will be reliant on the data they collect so now is a good time to make sure that they are collecting the right level of data for you. As the saying goes, ‘garbage in…garbage out’. Some smaller organisations introduce reactive Problem Management by focusing daily or even weekly on the ‘top ten’ Incidents of the previous day/week. This can prove to be an effective approach, since experience shows us that roughly 20% of Problems cause 80% of service degradation.

What techniques do I use? - Root Cause Analysis (RCA) techniques like Kepner-Tregoe, Pareto Analysis, brainstorming etc will certainly help you. You will also need to develop your Cost Benefit Analysis (CBA) skills to justify remedial and preventative work.

How do I establish a Known Error database? – Most ITSM tools nowadays have this functionality, but you will have to populate it. One source will be from the development and testing area, as services go live so should their respective known errors. Another source will be from the Service Desk and second line support teams, who will often have a list of Known Errors which they can’t get fixed for one reason or another. Working with them to identify, document and hopefully permanently fix them will help to cement good working relationships.

How do we get permanent fixes to resolve and prevent Problems? – Quite simple you raise a Request for Change (an RFC) with Change Management. The only problem is that you may have to invest some resources in identifying permanent fixes, so at some stage you will have to build a business case (so start using your CBA skills!) Unfortunately, you will find problems that you can’t justify a permanent fix for. These you are going to have to live with as a known error, until a case can be made for its permanent resolution.

How do I go beyond these first steps? – Initially you will find that you are working 99% of the time reactively, and this is the way Problem Management starts in most organisations. After a period of time you will also be seeking to prevent problems from occurring in the first place, and then you will be starting to work proactively. You will be starting to hone different skills like trend analysis and risk management. Unfortunately you will never find yourself working 100% proactively. Why? Because try as we might things still break and people still do silly things. The best you will probably achieve is an 80% proactive and 20% reactive workload.

How do I know if I am doing a good job? – Firstly you should have base-lined the situation before Problem Management was formally established to document a starting position. This may have been done in a formal baseline assessment and gap analysis assessment by an internal or external consultant. You should also have established some Critical Success Factors (CSFs) and some Key Performance Indicators (KPIs) for Problem Management, not just focusing on the process but on the whole Problem Management functional capability. By monitoring and reporting against these indicators you will be able to justify on-going expenditure on Problem Management.

I hope these pointers have been of use? If you need any further help with Problem Management or any other aspects of Service Management call out for assistance. We may charge, but you’ll be surprised by how quickly we can set you on the right path.

Also if you are looking to get some training to hone your own skills as a Problem Management practitioner there is a 3 day BCS Problem Management Specialist course which will give you these skills.

Regards Steve Lawless