Please enable JavaScript.

Coggle requires JavaScript to display documents.

ITIL OM (Service operation inputs and output Examples of interfaces to…

- - - - It is difficult to obtain funding during the
        operational stage, to fix design flaws or
        unforeseen requirements – because this was not
        part of the original value proposition.
      - It is difficult to obtain additional funding for
        tools or actions (including training) aimed at
        improving the efficiency of service operation.
      - some services are taken for granted and any action
        to optimize them is perceived as ‘fixing services
        that are not broken’
  - - - IT operations control
        This is generally staffed
        by shifts of operators which ensures that
        routine operational tasks are carried out. IT
        operations control will also provide centralized
        monitoring and control activities, usually using
        an operations bridge or network operations
        centre.
      - Facilities management
        This refers to the
        management of the physical IT environment,
        usually data centres or computer rooms. In
        many organizations technical and application
        management are co-located with IT operations
        in large data centres.
- - - - ■ Active monitoring tools that poll key CIs to
        determine their status and availability. Any
        exceptions will generate an alert that needs to
        be communicated to the appropriate tool or
        team for action.
        ■■ Passive monitoring tools that detect and
        correlate operational alerts or communications
        generated by CIs.
      - Scope
        
        Configuration items (CIs)
        
        Environmental conditions
        
        Software licence monitoring
        
        Security
        
        Normal activity (e.g. tracking the use of an
        application or the performance of a server).
      - Policies, principles and basic concepts
        
        Event notifications should only go to those
        responsible for the handling of their further
        actions or decisions related to them
        
        Event management and support should be
        centralized as much as reasonably possible.
        
        All application events should utilize a common
        set of messaging and logging standards and
        protocols wherever possible
        
        Event handling actions should be automated
        wherever possible.
        
        A standard classification scheme should be in
        place that references common handling and
        escalation processes.
        
        All recognized events should be captured
        and logged. This will provide a means for
        examining incidents, problems and trends after
        events have occurred.
    - - InformatIonal events
        ■■ A scheduled workload has completed
        ■■ A user has logged in to use an application
        ■■ An email has reached its intended recipient.
      - WarnIng events
        ■■ A server’s memory utilization reaches within 5%
        of its highest acceptable performance level
        ■■ The completion time of a transaction is 10%
        longer than normal.
      - exceptIon events
        ■■ A user attempts to log on to an application
        with the incorrect password
        ■■ An unusual situation has occurred in a business
        process that may indicate an exception
        requiring further business investigation (e.g.
        a web page alert indicates that a payment
        authorization site is unavailable – impacting
        financial approval of business transactions)
        ■■ A device’s CPU is above the acceptable
        utilization rate
        ■■ A PC scan reveals the installation of
        unauthorized software.
    - - ■ Integration Integrate event management
        into all service management processes where
        feasible. This will ensure that only the events
        significant to these processes are reported.
      - ■ Design Design new services with event
        management in mind
      - Trial and error No matter how thoroughly
        event management is prepared, there will be
        classes of events that are not properly filtered.
        Event management must therefore include a
        formal process to evaluate the effectiveness of
        filtering.
      - Planning Proper planning is needed for
        the deployment of event management
        software across the entire IT infrastructure.
    - - Instrumentation
        instrumentation is about defining and designing
        exactly how to monitor and control the IT
        infrastructure and IT services.
        ■ How will events be generated?
        ■■ How will events be classified?
        ■■ How will events be communicated and
        escalated?
        ■■ Does the CI already have event generation
        mechanisms as a standard feature and, if so,
        which of these will be used? Are they sufficient
        or does the CI need to be customized to include
        additional mechanisms or information?
        ■■ What data will be used to populate the event
        record?
        ■■ Are events generated automatically or does the
        CI have to be polled?
        ■■ Where will events be logged and stored?
        ■■ How will supplementary data be gathered?
      - Event detection and alert mechanisms
        Router(CI) ---> |Rule set| ---> Service "n" ---> |Rule set| ---> Process sales order (Business Process)
        Thorough design of the event detection and alert
        mechanisms requires the following:
        
        Detailed knowledge of the service level
        requirements of the service being supported by
        each CI
        
        Knowledge of who is going to be supporting
        the CI
        
        Knowledge of the significance of multiple
        similar events (on the same CI or various similar
        CIs)
        
        Familiarity with incident prioritization and
        categorization codes so that if it is necessary to
        create an incident record, these codes can be
        provided
        
        Knowledge of other CIs that may be dependent
        on the affected CI, or those CIs on which
        it depends
    - - Event notification
        A general principle of event notification is that
        the more meaningful the data it contains and
        the more targeted the audience, the easier it is
        to make decisions about the event.
      - Event occurs
      - Event detection
        There should be a record of the event and any
        subsequent actions. The event can be logged as
        an event record in the event management tool or
        it can simply be left as an entry in the system log
        of the device or application that generated the
        event
      - First-level event correlation and filtering (CI level)
        The purpose of first-level event correlation and
        filtering is to decide whether to communicate
        the event to a management tool or to ignore it.
        During the filtering step, the first level of
        correlation is performed, i.e. the determination
        of whether the event is informational, a warning,
        or an exception (see next step)
        Significance of events
        Every organization will have its own categorization
        of the significance of an event, but it is suggested
        that at least these three broad categories be
        represented.
        
        Informational
        This refers to an event that does not require
        any action and does not represent an exception.
        They are typically stored in the system or service
        log files and kept for a predetermined period.
        ■ A user logs onto an application
        ■■ A job in the batch queue completes successfully
        ■■ A device has come online
        ■■ A transaction is completed successfully.
        
        Warning
        A warning is an event that is generated when
        a service or device has reached a threshold
        that indicates a situation must be checked and
        appropriate actions taken to prevent an exception.
        ■ Memory utilization on a server is currently at
        65% and increasing. If it reaches 75%, response
        times will be unacceptably long and the OLA
        for that department will be breached.
        ■■ The collision rate on a network has increased by
        15% over the past hour.
        
        Exception
        n exception means that a service or device is
        currently operating abnormally (however that has
        been defined). Typically, this means that an OLA
        and SLA have been breached and the business
        is being impacted
        ■ A server is down
        ■■ Response time of a standard transaction across
        the network has slowed to more than 15 seconds
        ■■ More than 150 users have logged on to the
        general ledger application concurrently
        ■■ A segment of the network is not responding to
        routine requests
      - Second-level event correlation
        If an event is a warning, a decision has to be made
        about exactly what the significance is and what
        actions need to be taken to deal with it. It is here
        that the meaning of the event is determined.
      - Further action required?**
        textIf the second-level correlation activity recognizes
        an event, a response will be required.**
    - - Trigers
        Event management can be initiated by any type of
        change in state. The key is to define which of these
        state changes need to be acted upon.
        ■ An exception within a business process that is
        being monitored by event management
        ■■ The completion of an automated task or job
        ■■ A status change in a server or database CI
        ■■ Access of an application or database by a user
        or automated procedure or job
        
        ■ Exceptions to any level of CI performance
        defined in the design specifications, OLAs or
        SOPs
        
        ■ Exceptions to an automated procedure
        or process, e.g. a routine change that has
        been assigned to a build team has not been
        completed in time
      - Inputs
        
        Alarms, alerts and thresholds
        
        Operational and service level requirements
        associated with events and their actions
        
        Event correlation tables, rules, event codes and
        automated response solutions that will support
        event management activities
        
        Roles and responsibilities for recognizing events
        and communicating them to those that need to
        handle them
        
        Operational procedures for recognizing,
        logging, escalating and communicating events
      - Outputs
        
        Events that have been communicated and
        escalated to those responsible for further action
        
        Event logs describing what events took place
        and any escalation and communication activities
        taken to support forensic, diagnosis or further
        CSI activities
        
        Events that indicate an incident has occurred
        
        Events that indicate the potential breach of an
        SLA or OLA objective
        
        Events and alerts that indicate completion
        status of deployment
        
        Populated SKMS with event information and
        history.
    - - CSF Detecting all changes of state that have
        significance for the management of CIs and IT
        services
        
        ●■ KPI Number and ratio of events compared
        with the number of incidents
        
        ●■ KPI Number and percentage of each type
        of event per platform or application versus
        total number of platforms and applications
        underpinning live IT services
      - CSF Ensuring all events are communicated
        to the appropriate functions that need to be
        informed or take further control actions
        
        KPI Number and percentage of events that
        required human intervention and whether
        this was performed
        
        KPI Number of incidents that occurred and
        percentage of these that were triggered
        without a corresponding even
      - CSF Providing the trigger, or entry point,
        for the execution of many service operation
        processes and operations management activities
        
        KPI Number and percentage of events that
        required human intervention and whether
        this was performed
      - CSF Provide the means to compare actual
        operating performance and behaviour against
        design standards and SLA
        
        KPI Number and percentage of incidents
        that were resolved without impact to the
        business
        
        KPI Number and percentage of events that
        resulted in incidents or changes
        
        KPI Number and percentage of events
        caused by existing problems or known errors
        
        KPI Number and percentage of events
        indicating performance issues (for
        example, growth in the number of times
        an application exceeded its transaction
        thresholds over the past six months
        
        KPI Number and percentage of events
        indicating potential availability issues
      - CSF Providing a basis for service assurance,
        reporting and service improvement
        
        KPI Number and percentage of repeated
        or duplicated events
        
        KPI Number of events/alerts generated
        without actual degradation of service/
        functionality (false positives – indication
        of the accuracy of the instrumentation
        parameters, important for CSI)