Are your SLA and other performance reports really reflecting the correct picture?
There was an IT service contract manager, whose job it was to review the performance of the “managed outsourced service provider” and certify that the SLA was met as claimed. For a few years, month on month, the vendor has been claiming 100% compliance to SLA targets and taking home a bonus of 1.5 million rupees. The management of the enterprise often questioned our friend the service contract manager about why bonuses were being paid when the service provided was no where near the quality that was expected.
In yet another engagement, I came across a complex service level agreement, which had three level targets like 80% of the tickets to be resolved in four hours, 90% of the tickets to be resolved under eight hours and 100% of the tickets to be resolved in 16 hours. This was for an A – grade service location, basically the head office and other Key Sites. The B – grade and C – grade Locations had their own SLA. The service provider in this case was as usual claiming 100% compliance to the SLA across the board and all targets.
This kind of a problem seems to be quite common, whether the services are outsourced or insourced. A lot of effort (meetings after meetings) is spend in fine tuning these agreements and in reviewing them, much can be said about the focus on SLA versus focus on improvement based objectives; but that is grist for another mill.
The fact remains that in a lot of cases, the SLA reporting, I have seen, is far from reflecting the reality and various stains on the window hide the truth. If you really want to see why your end users are still groaning and moaning, despite 100% SLA compliance, you need to clean the window and let in the light.
Everyone knows that SLA compliance is a ratio made up of two numbers: the number of tickets that have met the SLA target and the total number of tickets. If you want to stop deluding yourself you better start cleaning the glass and see what exactly you count as the total number of tickets and which tickets you count as having met the target. Self delusion is rampant in both areas.
I would like to document three of the more popular ways of self delusion:
1. Not differentiating between apples, oranges, peaches, grapes, etc.,
This delusion is likely to exist where management reports club all services into one bucket:
2. No segregation between dropped calls, no action taken calls, end user unavailable calls and actual calls where incidents were resolved and services restored.
This is a subtle delusion…….
What it means in practice is that the service desk / support groups want to record and account all calls that are received. Some calls are dropped because of phone problems, some calls result in no action being taken because there is actually no problem, some calls fix themselves magically, etc., and these tickets are also considered as resolved within the SLA targets.
In the above real life analysis we see that ‘no fault’ is the third highest cause of incidents that are resolved in this particular service. This instance illustrates how a significant number of tickets where no action is taken, can actually skew the SLA compliance parameters.
3. Allowing the service desk and support groups to pause the SLA clock on a ticket for what ever reason.
There are many compelling reasons for allowing tickets to be placed on hold (stopping the SLA clock), primary amongst them being, ‘user not available’ or ticket transferred to a vendor, I have even seen ‘break fix’ as a reason.

Fig. 1
SLA Compliance Sep '08
Fig. 2 Sla compliance Apr '09
The impact of this is best explained by this case study. Figure 1 illustrates the situation where the SLA was first calculated from the ITSM tool, the pale green is the required target and the others indicate actual achievements for the three targets. It is quite evident that the service provider was unable to meet the targets as of September 08.
As can be imagined there was much consternation. The service provider went into shock and vowed to improve the statistics and to make sure that corrective action will be taken to ensure that service level agreement compliance will be improved. Figure 2 illustrates the improved results after a 6 month period of improvement effort. There was much thumping of backs and awards for this demonstrated improvement.
Smelling a RAT, I decided to take a deeper look. Fortunately the service desk management tool, used by the customer, gave me the capability to clean the window and let the light in.

I computed the trend above, et voila: here is the secret behind the vendors success. While resolution time in minutes showed a declining trend, we see that the hold time shows an increasing trend, in fact these two curves are inversely proportional, I invite you to come to your own conclusions.
In conclusion, if you really want to use your performance data as a measure of how well you are doing, you better look at the way you are collecting the data and the way you are compute compliance.
More often than not, if you come out smelling of roses all the time, there is a good chance that your windows are dirty….
Have fun window cleaning!
Great Article....I guess we should not allow hold times to be deducted from Resolution Times.....
Generally tickets are put on hold for whatever reasons as mentioned and few common reasons are :
Third Party Dependence
User Not Available/Busy
Awaiting Approval etc
Instead of stopping clock let the clock tick and lets deduct average past hold time...
Tickets with very high hold times can be treated as out-liars and we will have to do RCA...Problem Record MUST Be created for such incidents to find ways to reduce hold times...
After all we should focus on improving business outcomes and not try to see if we are meeting SLAs by hook or crook...
Thanks for sharing such a wonderful article...
Regards,
Chandresh(at)QLogy.com
The number of methods of (self-)delusion in SLA is large - possibly even limitless. As I usually start out saying in my SLA classes: If you want to reduce the number of support calls, you have two ways to achieve that goal:
- Educate your users, and provide easy to use "self help"
- Give lousy support, so they stop calling
You would be surprised how often it seems that organisations have chosen the second approach. Perhaps because it is cheaper?
- Rolf Frydenberg., rolff@joymount.no
Nice column with some very good points.
I might however add a point. I see a great many SLA's that focus on what happens when things go wrong, when in fact the customer is much more interested in things not going wrong.
In my experience very few customers go to the trouble to define their Service Level Requirements, nor the consequences when they aren't met. Instead they answer a set of questions provided by the service provider and oriented around what they want or are capable of delivering.
SLA's should document the common understanding of the value of the service to the customer and the capabilities of the service provider. Implicit should be implications of pricing of the service to meet the requirements (higher levels of performance and availability of course cost more, but when matched against costs of failure, may be easily justified).
Lies, damned lies, and SLA reports, to paraphrase Disraeli. I think the moral of this tale is to understand what is being measured and how the data could be skewed by behaviour. Voice of the customer should be used to validate what you think you are seeing in the reports and if it doesn't you need to get the window cleaners to let you see more clearly.
Great article Mr. Sukumar.
In many IT organization, help desk managers and staff have become more efficient at the things they’ve been traditionally measured on, such as closing out more calls, faster. But those metrics aren’t the important ones, for executives to focus on.
We may be closing out more tickets, but we should know how many of those incidents could have been prevented in the first place? And how many were caught and resolved by IT before the business was affected?
When an event reaches is determined to be worthy of immediate attention based on business impact, an intelligent ticket should be opened on the service desk which includes not only the priority of the ticket but also the technical and business context.
We would like to hear more from you on the same topic...
Fairooz
Comments