#

Saturday, September 2, 2017

Implementing ITIL v3 Framework in a Small Network Operations Center

Adapting a best practice framework is crucial to improve the quality of any IT service delivered to a customer. There are some well known frameworks designed to meet business requirements.

You can use one of them which suits for your organization and do some adjustments / customization if needed. 

The framework we chose to implement is ITIL (Information Technology Infrastructure Library) which was developed in United Kingdom around late 90's and currently at its version 3 which is used by many companies around the world. I have done some customization to this version of ITIL to match the NOC I work and currently we are adapting to this new framework.. If you are working in a small NOC too, you will be able to implement ITIL in your work place after reading this.. 

The NOC I work is a small technical team which consists about 15 engineers including the Team Leader. We have the Help Desk function & the L1 / L2 support functions. We give onsite support as a 3rd party contractor to the national airline at an international airport. What we do here mostly resides in 2 stages (Service Transition & Service Operation) out of 5 stages of ITIL. These stages group processes which we should follow..

Five Stages of ITIL are as following;

(1) Service Strategy
(2) Service Design
(3) Service Transition
(4) Service Operation
(5) Continual Service Improvement

Service Transition is the implementation stage while Service Operation is the monitoring & support stage.. First let's look at the original framework..
Processes of Service Transition stage are like the following.. (click on the image to view in full size)


























The objective of ITIL Service Transition is to build and deploy IT services. The Service Transition lifecycle stage also makes sure that changes to services and service management processes are carried out in a coordinated way.

Processes of Service Operation stage are like the following..
















The objective of ITIL Service Operation is to make sure that IT services are delivered effectively and efficiently. The Service Operation lifecycle stage includes the fulfilling of user requests, resolving service failures, fixing problems, as well as carrying out routine operational tasks.

Because Request Fulfillment is a process handled by Service Desk of the customer in our environment, we could neglect it. However the most important aspect of ITIL is the idea of responsibilities assigned to individuals. Every process needs to be assigned a process owner to ensure that the process activities are carried out smoothly. Many can be assigned responsibilities but only one should be assigned accountability in any process.

Steps of Implementing ITIL?

(1) Study the work currently doing by the employees and identify the current procedures.
(2) Study the ITIL framework. Here is a good resource. Click here
(3) Decide the ITIL stages which the organization / team operate.
(4) Define the processes with necessary adjustments.
(5) Assign the manager roles to selected employees.

Because we are a small team, I merged some roles in Service Transition stage with some roles in Service Operation stage; so that they can cover more work while operating in both the stages. 

So I created 6 designations (manager roles) who are accountable in carrying out  the above processes.

IT Operations Manager (Team Lead)
Operational Stage: Service Transition, Service Operation
Associated Functions: IT Operations Control, Technical Management
Accountable Processes: Change Evaluation, Release & Deployment Management, Knowledge Management
Databases Maintained: n/a
Responsibilities: This guy is accountable for daily work, new implementations, projects coordination, knowledge sharing, supervising other manager roles. 

 - Represent NOC and lead the team to meet business requirements
 - Evaluate the major changes raised from Change Manager
 - Lead the technical team in implementations
 - Share technical knowledge with other team members
 - Plan & implement best practices
 - Give technical solutions for daily technical matters
 - Verify all documents coming through all other processes
 - Execute root cause analysis for deployment issues
 - Follow up L2 support for deployment issues
 - Follow up L3 support for deployment issues
 - Follow up the life cycle of the deployment issues
 - Create & maintain the RACI Matrix
 - Design SKMS
 - Create roster

Incident Manager
Operational Stage: Service Operation
Accountable Processes: Incident Management, Problem Management
Databases Maintained: IRDB, KEDB
Responsibilities: This person is accountable for handling incidents & problems.

 - Log issues in IRDB
 - Deal with Service Desk
 - Handle customer employees
 - Carryout normal changes to the network
 - Escalate issues to relevant parties
 - Create & maintain the Escalation Matrix
 - Analyze and diagnose the problems (recurring issues)
 - Execute root cause analysis for problems
 - Update KEDB with work arounds to problems
 - Follow up L2 support for problems
 - Follow up L3 support for problems
 - Follow up the life cycle of the problems

Event Manager
Operational Stage: Service Operation
Accountable Processes: Event Management
Databases Maintained: AEDB
Responsibilities: This guy is accountable for the proactive monitoring of the network.

 - Maintain monitoring tools
 - Log alerts & events in AEDB
 - Inform NOC about the issues to attend
 - Execute root cause analysis for alerts / events
 - Follow up L2 support for alerts / events
 - Follow up L3 support for alerts / events
 - Follow up the life cycle of the alerts / events

Change Manager
Operational Stage: Service Transition
Accountable Processes: Change Management, Transition Planning & Support
Databases Maintained: n/a
Responsibilities:  This guy is accountable for the changes doing to the network.

 - Creates RFCs/CRs
 - Plan maintenance windows
 - Create change schedules
 - Define & communicate with CAB (Change Advisory Board)
 - Categorize changes (Standard, Normal, Emergency)
 - Create emergency change plans
 - Execute root cause analysis for change issues
 - Follow up L2 support for change issues
 - Follow up L3 support for change issues
 - Follow up the life cycle of the change issues

Test Manager
Operational Stage: Service Transition
Accountable Processes: Service Validation & Testing
Databases Maintained: n/a
Responsibilities: This guy is responsible for the resiliency of the network.

 - Prepare test cases
 - Perform tests
 - Produce test reports
 - Carryout fail over tests
 - Prepare user acceptance
 - Execute root cause analysis for test failure issues
 - Follow up L2 support for test failure issues
 - Follow up L3 support for test failure issues
 - Follow up the life cycle of the test failure issues

Configuration Manager
Operational Stage: Service Transition
Accountable Processes: Service Asset & Configuration Management
Databases Maintained: CMDB
Responsibilities: This guy is accountable for everything about network devices.

 - Keep the inventory (CMDB) up to date
 - Deal with RMAs
 - Carryout Audits
 - Managing software & hardware licenses
 - Backup configurations
 - Maintain network diagrams
 - Execute root cause analysis for RMA/ VAPT/ ISO issues
 - Follow up L2 support for RMA/ VAPT/ ISO issues
 - Follow up L3 support for RMA/ VAPT/ ISO issues
 - Follow up the life cycle of the RMA/ VAPT/ ISO issues

Those above roles are all the IT service management roles we have.
Database Components in SKMS (Service Knowledge Management System) are as following..


AEDB - Alerts & Events Database

IRDB - Incident Records Database

KEDB - Known Errors Database

CMDB - Configuration Management Database


Communication Protocol within the team?

All the written communication will be carried out via emails. Every manager will send addressed mails directly to the IT Operations Manager + all the managers who are directly accountable for every issue raised through their processes. Additionally all the team members (not only managers) should be copied in the mailing list. Because we only have about 15 members in total, it is OK to put every one in the list so that everyone have an idea about the issue.

RACI Matrix

This document is created by IT Operations Manager defining groups and roles that are responsible for performing a defined activity.

Here is an example matrix format..

R for Responsible: 
These are the people who is executing the work.
A for Accountable: 
This is the person that at the end is in charge for the results / outcome, usually is an executive.
C for Consult: These are the people in the related fields that we should keep a two-way communication to consult for problem solving and improvement.
I for Informed: These are the people that should receive one-way communication.
(ex:- a report)



Escalation Matrix

This document is created by Incident Manager to define when and how to escalate issues beyond the operational scope of the team. Escalation procedure will be carried out by Incident Manager and will be followed up by IT Operations Manager.

Framework Customization Summary:

(01) Change Evaluation, Release & Deployment Management and Knowledge Management processes are assigned to the IT Operations Manager which will be handled by the Team Lead Engineer. This is the superior role of IT Manager in ITIL who supervises and represents the entire team and all other manager roles. IT Operations Control/Management and Technical Management functions are associated to this role.
(02) 'Event Manager' role is introduced for Event Management process which is dedicated for proactive monitoring. Originally in the framework, this is a process handled by IT Operations Manager but because lot of dedicated work to be carried out regarding this process related to a NOC, this new role is created.
(03) Incident Management & Problem Management processes are assigned to Incident Manager.
(04) Change Management, Transition Planning & Support processes are assigned to Change Manager.
(05) Task of maintaining backups and creating backup plans are removed from IT Operations Manager and added to the Configuration Manager.

1 comment:

  1. trimakasih mas… lagi ngembangi usaha mudah-mudahan sukses ya
    mas...

    ReplyDelete