I tweeted about this and it seems that it surprised some people so I decided to explain this. Let us first have a look what the books say, as nobody actually reads the books. Incident is according to latest ITIL definition: An unplanned interruption to an IT Service or reduction in the Quality of an IT Service. Failure of a Configuration Item that has not yet affected Service is also an Incident. For example Failure of one disk from a mirror set.
Let’s have look what would happen if incident management were a Service Desk process.
When James sees that the mirror disk of a critical system at Brandon & Lane Overseas Bank (BLOB) has failed he rushes to replace it. The he stops as he remembers the strict order from the BLOB IT Di-rector, Mr. Mann that all personnel must follow the ITIL standard procedures. Therefore he logs an incident and sits to wait for further orders. Meanwhile in the Service Desk Chris and Rhet are quite busy arguing about the correct pronunciation of the word apophenia, and they do not notice James’s incident. Then the other part of the mirrored disk fails too and suddenly Chris and Rhet are overwhelmed with incoming calls. James tries to call them to report an urgent incident but fails to reach them...
Something like this happened to Tieto in Sweden recently. They had a really major outage lasting for a week which was caused by both sides of a mirrored disk systems failing at the same time. They said that it happened at the same time but my suspicion is that the first part failed earlier and they were still processing it thru the event-incident-problem-change-release chain of processes.
Let’s see what happens at BLOB when Mr. Mann figures out what happened.
Mr Mann issues a new order to Operations. Now Ops must open and handle their own incidents. First poor James is overwhelmed with work as he keeps logging and closing incidents but then he solves the problem by writing a piece of script which automatically opens and closes incidents based on the events coming from the monitoring systems and James can relax and continue reading his Poodle magazines. Chris and Rhett are also very happy as their bonus goals were based on the number of incidents closed within SLA targets and thanks to James’ clever script, they get their maximal bonuses.
As we can see, there is no real connection between hardware and system failure warnings and the Service Desk. It is a bit like an airplane Captain would call the Stewardess and say “Dear Sharon, it looks like number three engine is overheating so is it ok if I turn it off for a while?” and she would answer “ Wait a sec Howard, I’ll just go round the 1st class with Cognac and get back to you”. It should be clear that Operations handle their systems and hardware without mixing Service Desk in it.
The Service Desk is then left with the Request Fulfillment process. According to ITIL a request is: A request from a User for information, or advice, or for a Standard Change or for Access to an IT Service. For example to reset a password, or to provide standard IT Services for a new User.
For example a user who complains that “this f**king system does not work” is only requesting advice on how to start the system. If 15.000 users call the Service Desk after the release of a new version of the ERP system and complain that they cannot use the system because it is so difficult or the instructions are unclear, these are not 15.000 incidents but 15.000 requests for information or advice.
This is very simple and clear good practice according to ITIL.
Now I would call it malpractice but who cares.
Liked the post....reminded me of a rant of mine some time ago; while I don't think we can really do without Incident Management I do think that Event Management should play a much more prominent role, particularly in today's virtualized service infrastructure environments...
anyway, here's the rant....
Let's Kill Incident Management...
OK, it's a Friday and I'm tired. We've had two bomb scares, the Dow dropped like a stone and I'm just friggin' cranky. So I've decided to kill the Incident Management process.
It's one process I've always hated anyway…. users are pissed, zombie-like IT staff take endless calls --- many of them for the same Incident --- and we never seem to really know what's going on anyway. Log it, assign it, and re-assign it sometime later… let's just put this bastard out of it's misery.
We're goin' to a virtual cloud environment anyway, and since we'll be provisioning new services faster than sh*t outta' a loose goose we'd better get collaboration' in real time. Generating tickets after users calls is a sure road to hell in the new word that's already upon us...
There are still people out there who insist that unless we can 'create Incidents' then "you will not be a (XYC) company standard". Why? I like traceability as much as the next guy, but at what cost? Your arm's bleeding like crazy! Don't you want a tourniquet?
Get me some DevOps dudes and get them quick!
We'll provide them with service monitoring intelligence that will enable them to establish truly collaborative management. We won't open tickets, we'll just assign events to the right person; and in many cases before the service is impacted (so screw the damned Incident!).
We'll report on real time threats, service impacts, capacity trends and bring processes like Capacity, Availability and Event Management front and center (where they belong in the new world order).
I don't need a room full of people creating, reporting and shuffling tickets (Incident tickets anyway). I need people who can understand utilization trends, early warnings and take immediate action.
If users want to open tickets, we'll open a laundry.
Let's kill Incident Management.
Yes.
This area needs to be clarified. Customers' problems ("user incidents") and system failures ("real incidents") should be handled with different processes.
I agree. The Incident Process could be initiated by any IT personnel, not only by Service Desk agents.
I think James should have logged an incident and assigned to himself to fix the problem.
Cheers,
Carlos
When did any process belong to a particular function? Many incidents may get logged with the Service Desk as the first point of call by end users but I do not recall anyone saying it was a Service Desk process. That is like saying that the Change Management process belongs to a particular function such as Application Management.
In the suggested scenario James would have raised an Incident and instigated an emergency change - most likely identified as a standard change and therefore pre-approved - and just got on with the task in hand. The Service Desk could continue to discuss and hopefully find the meaning of apophenia.
I have disagreements over this all the time. We encourage support teams and our suppliers to log their own Problems and work to find a workaround or to fix immediately if they are high priority. They create a Problem record and advise the Service Desk of the open record. If an associated Incident is reported the Service Desk staff respond by linking it to the open Problem record and advising on the workaround or progress to Fix (through change of course). As we measure performance on Incident Resolution the support teams and, more importantly our suppliers, are encouraged to use proactive Problem Management as they get a head start to find the workaround or Fix before an Incident is spotted and reported. Hence maintaining up time and reducing Resolution time.
Comments