Please enable JavaScript.

Coggle requires JavaScript to display documents.

An Evaluation of Buffer Management Strategies for Relational Database…

- - - - Description
        
        Buffers inside each domain are managed by the LRU discipline
        
        One domain is assigned to each nonleaf level of the B-tree structure, and one to the leaf level together with the data.
        
        When a page of certain type is needed, a buffer is allocated from the corresponding domain. If no buffer available, one is borrowed from another domain.
        
        Pages are classified into types, each of which is separately managed in its associated domain of buffers.
      - Limitations
        
        Main limitation is that the algorithm fails to reflect the dynamics of page references as the importance of a page my vary in differenet queries.
        
        The DS algorithm does not differentiate the relative importance between different types of pages. A page type overwrites the same page type, even though another one might be less important.
    - - Description
        
        The buffer tool is subdivided and allocated on a per-relation basis. Each active relation is assigned a resident set which is initially empty.
        
        Resident sets are linked in a priority list with a global free list on the top.
        
        When a page fault occurs, a search is initiated from the top of the priority list until a suitable buffer is found. The faulting page is then brought into the buffer and added to the resident set of the relation.
        
        Presents a new approach to buffer management, which tracks the locality of a query through relations.
      - Limitations
        
        The rules suggested by Kaplan for arranging the order of relations on the priority list are based solely on intuition.
        
        Under high memory contention, searching through a priority list for a free buffer can be expensive.
        
        Hard to extend to a multiuser environment, as not clear how to prioritize concurrent queries.
    - - Description
        
        The algorithm integrates advanced knowledge on reference patterns. A set of pages over which there is a looping behavior is called a hot set.
        
        If a query is given a buffer large enough to hold the hot sets, its processing will be efficient. On the other hand, a large number of page faults may result if the memory allocated to a query is insufficient to hold a hot set.
        
        Applying the predictability of reference patterns in queries, the hot set model provides a more accurate reference model for relational database systems than a stochastic model.
        
        In the hot set (HOT) algorithm, each query is provided with a separate list of buffers managed by an LRU discipline
      - Limitations
        
        The algorithm tends to overallocate memory, which implies that memory may be underutilized.
        
        The use of LRU in the hot set model lacks a logical
        justification. There exist cases where LRU is the worst possible discipline under tight memory constraint.
- - - - Description: Based on the observation that relational database systems support a limited set of operations, exhibiting regular and predictable patterns.
      - Common Page Reference Patterns
        
        Sequential References
        
        Straight Sequential
        
        Pages are referenced and processed one after the other. Often a sequential scan is done only once without repetition. A single page frame provides all the required buffer space.
        
        Clustered Sequential
        
        Once in a while a scan my back up a short distance and then start forward again. This can happen in a merge join, where records with the same key value in the inner relation are repeatedly scanned and matched with those in the outer relation.
        
        Looping Sequential
        
        A sequential reference to a file is repeated several times. This can happen in a nested loop join. The entire file being scanned should be kept in memory if possible. MRU algorithm should be used if the file is too large to fit.
        
        Random References
        
        Independant Random
        
        Series of independent accesses. As an example, during an index scan through a nonclustered index, the data pages are accessed in a random manner.
        
        Clustered Random
        
        A locality of reference exists in a series of "random" acesses. The reference behavior is similar to that of a CS scan. If possible, each page containing a record in a cluster should be kept in memory.
        
        Hierarchical References
        
        Description: A sequence of page accesses that form a traversal path from the root down to the leaves of an index.
        
        Straight Hierarchical
        
        When an index is traversed only once and one page frame is enough for buffering all the index pages.
        
        Hierarchical with Straigh Sequence (H/SS)
        
        When a tree traversal is followed by a straight sequential scan
        
        Hierarchical with Clustered Sequential (H/CS)
        
        When a tree traversal is NOT followed by a straight sequential scan
        
        Looping Hierarchical
        
        Repeated accesses to the index structure during the evaluation of a join, where the inner relation is indexed on the join field. Pages closer to the root are more likely to be accesses than those closer to the leaves.
    - - Description
        
        Buffers are managed on a per file instance basis.
        
        The set of buffered pages associated with a file instance is referred to as its locality set. A page can belong to at most one locality set.
        
        If a buffer contains a page that does not
        belong to any locality set, the buffer is placed on a global free list.
        
        A file instance is considered the owner of all the pages
        in its locality set
        
        To allow for data sharing among concurrent queries, all the
        buffers in memory are also accessible through a global buffer table
      - On Start Up
        
        The global table is initialized and all buffers are linked to the blogal free list
        
        On opening a file, its locality set size and replacement policy are established
        
        Variables r (# allocated buffers) and l (max # buffers allocatable) are set to 0
      - Upon Query Page Reuqests
        
        ... a search is made to the global table, followed by an adjustment to the associated locality set.
        
        Possible Outcomes
        
        Page found both in global table and locality set
        
        Only usage statistics might need to be updated
        
        Page found in memory but not in locality set
        
        Page has owner?
        
        Yes
        
        Page is given to the requesting query
        
        No
        
        Add page to locality set and increment r
        
        if r>l a page is chosen and released back to the global free list and r set to l
        
        Usage statistics updated (required by local replacement policy)
        
        Page is not in memory
        
        A disk read is scheduled to bring the page from disk into a buffer from the global free list.
- - - - Abstract
        
        Another component is a simulator for database systems, managing CPU, I/O and memory
        
        Fault rate NOT chosen as a measure of performance, since it does not guarantee optimal system behavior in a multiuser environment
        
        Chosen performance metric: Average number of queries completed per second
        
        A hybrid simulation model is used, combining features of both trace-driven (based on real system recirds) and distribution-driven (random events) models.
      - Workload Synthesis
        
        Recorded Events
        
        Several types of events were recorded during the execution of each query, including page accesses, disk I/O and file operations.
        
        More specifically...
        
        page write
        
        disk read
        
        disk write
        
        file open
        
        file close
        
        page read
        
        Data Sharing
        
        Three levels of data sharing were defined according to the average number of concurrent queries accessing a copy of the database
        
        full sharing, all queries access the same database
        
        half sharing, every two queries share a copy of the database
        
        no sharing, every query has its own copy
        
        Factors Affecting Throughput in a Multiuser Environment
        
        Number of concurrent queries
        
        Degree of data sharing
        
        Query mix
      - Configuration Model
        
        Three hardware components are simulated in the model
        
        CPU
        
        Allocating CPU cycles to competing queries via Round Robin
        
        disk
        
        pool of buffers
        
        Under the control of the buffer manager using one of the buffer management algorithms.
    - - Six buffer management algorithms, divied into two groups, were included in the eperiments
        
        First Group - typical, easy to implement.
        
        RAND
        
        FIFO
        
        CLOCK
        
        All of the below algorithms are global algorithms in the sens that the replacement discipline is applied globally to all the buffers in the system.
        
        Second group - more sophisticated algorithms
        
        DBMIN
        
        :pencil2:
        
        WS
        
        :pencil2:
        
        HOT
        
        :pencil2:
        
        All of the below algorithms have local policies, in which replacement decisions are made locally
    - - The simulations test the performance of the algorithms under a workload consisting of a mixture of query types. The following three mixes are used:
        
        All six query types are equally likely to be requested
        
        One of the two simple queries is chosen half the time
        
        The two simple queries have a combined probability of 75%
        
        First set of tests...
        
        No data sharing between concurrently executing queries
        
        Worst performance observed by RAND and FIFO, with CLOCK performing better.
        
        WS does not perform well due to failing to capture lops of joins.
        
        HOT has similar performance to that of DBMIN. By slight load DBMIN only marginally faster than the rest.
        
        With the introduction of more concurrent queries DBMIN provides 7-13% more throughput than HOT and 25-45% more than WS.
        
        Second and third set of tests...
        
        Goal is to study the effects of data sharing on performance. Each set utilizes a different degree of data sharing.
        
        Observations
        
        Troughput increases as data sharing increases, reinforcing the view that data sharing among concurrent queries is important in a multiprogrammed DB.
        
        Half sharing and no data sharing yield similar performance. The same is not true for full data sharing.
        
        Algorithms that make an effort to identify the localities perform better
        
        Effect of Load Control
        
        Experiments show that no load control leads to thrashing under high workloads. Here we test with load controller based on the "50% rule" (paging device is kept busy about half the time).
        
        50% Rule Load Controller Components
        
        Optimizer - analyzes the measurements and decides what load adjustment is appropriate
        
        Control switch - activates or deactivates proceses according to the decisions made by the optimizer
        
        Estimator- measures the utilization of the device
        
        Observations
        
        With load control every simple algorithm outperformed WS
        
        CLOCK's performance with load control enabled very close to that of HOT
        
        Experiments showed that throughput was maximized with a disk utilization of 87%.
        
        Problems
        
        Runtime overhead expensive when sampling too frequent. If not frequent enough optimizer may not adjust load effectively.
        
        Feedback load controllers only respond after detecting an undesirable condition, which may result in unnecessary process activations/deactivations.
        
        Feedback load controllers don't work well in an environment with large number of small transactions., which enter/leave the system before their effects can be assessed.