Please enable JavaScript.

Coggle requires JavaScript to display documents.

Reliability and Fault Tolerance (N-Version Programming vs Recovery Blocks,…

- - - - Reliable
      - Safe
      - Confidential
      - Integral
      - Maintainable
      - Available
    - - Fault Prevention
      - Fault Tolerance
      - Fault Removal
      - Fault Forecasting
    - - Faults
      - Errors
      - Failures
- - - - Fault that starts at a particular time, remains in the system for some period and then disappears
      - E.g. communications systems
    - - Faults that remain in the system until they are reparied
      - e.g. broken wire or a software design error
    - - faults that is transient that occur from time to time
      - e.g. a hardware component that is heat sensitive, it works for a time, stops working, cools down and then starts to work again.
- - - - Fail silent
      - Fail stop
      - Fail controlled
- - - - Attempts to limit the introduction of faults during system construction
    - - Procedures for finding and removing the causes of errors
      - E.g.
        
        design reviews
        
        program verification
        
        code inspections
        
        system testing
  - - - Graceful Degradation (fail soft)
        
        The system continues to operate in the presence of errors, accepting a partial degradation of functionality or performance during recovery or repair
      - Fail Safe
        
        The system maintains its integity while accepting a temporary halt in its operation
      - Full Fault Tolerance
        
        System continues to operate in the presence of faults, albeit for a limited period, with no significant loss of functionality or performance
    - - Aims
        
        minimise redundancy while maximising reliability, subject to the cost and size constraints of the system
      - Advisable to separate out the fault-tolerant components from the rest of the system
    - - Static (masking) redundancy
        
        redundant components are used inside a system to hide the effects of faults
        
        e.g.
        
        Triple Modular Redundance(TMR)
        
        3 identical subcomponents and majority voting circuits;
        the outputs are compared and if one differs from the other two, that output is masked out
        
        NMR
        
        To mask faults from more than one component
      - Dynamic redundancy
        
        Redundancy supplied inside a component which indicates that the output is in error
        Provides an error detection facility
        recovery must be provided by another component
        
        e.g.
        
        Communications checksums
        
        memory parity bits
    - - Used for detecting design errors
      - Static
        
        N-Version programming
        
        depends on
        
        initial specification
        
        independence of effort
        
        Adequate budget
      - Dynamic Redundancy
        
        error detection
        
        no fault tolerance scheme can be utilised until the associated error is detected
        
        type of error detection
        
        Environmental detection
        
        Application detection
        
        damage confinement and assessment
        
        to what extent has the system been corrupted?
        
        error recovery
        
        techniques should aim to transform the corrupted system into a state from which it can continue its normal operation (perhaps with degraded functionality
        
        2 approaches
        
        forward error recovery
        
        continues from an errorneous state by making selective corrections to the system state
        
        backward error recovery
        
        BER relies on restoring the system to a previous safe state and executing an alternative section of the program
        
        fault treatment and continued service
        
        an error is a symptom of a fault;
        although the damage is repaired, the fault may still exist
        
        Fault Treatment
        
        2 stage
        
        fault location
        
        system repair