Please enable JavaScript.
Coggle requires JavaScript to display documents.
An Evaluation of Buffer Management Strategies for Relational Database…
An Evaluation of Buffer Management Strategies for Relational Database Systems
Abstract
The paper presents an algorithm called
DBMIN
. The algorithm aimst at managing the buffer pool of a relational database management system.
DBMIN
is based on a new model of relational query behavior -
q
uery
l
ocality
s
et
m
odel. Advantage of
QLSM
is its ability to predict future reference behavior.
Furthermore the paper goes through a methodology for evaluating buffer management algorithms in a multiuser environment.
Buffer Management for Database Systems
The following algorithms aim at exploiting the predictability of database reference behavior.
Domain Separation Algorithms
Description
Buffers inside each domain are managed by the
LRU
discipline
One domain is assigned to each nonleaf level of the B-tree structure, and one to the leaf level together with the data.
When a page of certain type is needed, a buffer is allocated from the corresponding domain. If no buffer available, one is borrowed from another domain.
Pages are classified into types, each of which is separately managed in its associated
domain
of buffers.
Limitations
Main limitation is that the algorithm fails to reflect the dynamics of page references as the importance of a page my vary in differenet queries.
The DS algorithm does not differentiate the relative importance between different types of pages. A page type overwrites the same page type, even though another one might be less important.
"New" Algorithm (Working Set)
Description
The buffer tool is subdivided and allocated on a per-relation basis. Each active relation is assigned a resident set which is initially empty.
Resident sets are linked in a priority list with a global free list on the top.
When a page fault occurs, a search is initiated from the top of the priority list until a suitable buffer is found. The faulting page is then brought into the buffer and added to the resident set of the relation.
Presents a new approach to buffer management, which tracks the locality of a query through relations.
Limitations
The rules suggested by Kaplan for arranging the order of relations on the priority list are based solely on intuition.
Under high memory contention, searching through a priority list for a free buffer can be expensive.
Hard to extend to a multiuser environment, as not clear how to prioritize concurrent queries.
Hot Set Algorithm
Description
The algorithm integrates advanced knowledge on reference patterns. A set of pages over which there is a looping behavior is called a
hot set
.
If a query is given a buffer large enough to hold the hot sets, its processing will be efficient. On the other hand, a large number of page faults may result if the memory allocated to a query is insufficient to hold a hot set.
Applying the predictability of reference patterns in queries, the hot set model provides a more accurate reference model for relational database systems than a stochastic model.
In the hot set (HOT) algorithm, each query is provided with a separate list of buffers managed by an LRU discipline
Limitations
The algorithm tends to overallocate memory, which implies that memory may be underutilized.
The use of LRU in the hot set model lacks a logical
justification. There exist cases where LRU is the worst possible discipline under tight memory constraint.
The DBMIN Buffer Management Algorithm
This section introduces the
q
uery
l
ocality
s
et
m
odel for database systems.
It will be shown how the reference behavior of common database operations can be described as a composition of a set of simple and regular reference patterns.
Like the
hot set model
,
QLSM
is better than stochastic models, however it is also able to avoid potential problems of the
hot set model
.
The Query Locality Set Model
Description:
Based on the observation that relational database systems support a limited set of operations, exhibiting regular and predictable patterns.
Common Page Reference Patterns
Sequential References
Straight Sequential
Pages are referenced and processed one after the other. Often a sequential scan is done only once without repetition. A single page frame provides all the required buffer space.
Clustered Sequential
Once in a while a scan my back up a short distance and then start forward again. This can happen in a merge join, where records with the same key value in the inner relation are repeatedly scanned and matched with those in the outer relation.
Looping Sequential
A sequential reference to a file is repeated several times. This can happen in a nested loop join. The entire file being scanned should be kept in memory if possible. MRU algorithm should be used if the file is too large to fit.
Random References
Independant Random
Series of independent accesses. As an example, during an index scan through a nonclustered index, the data pages are accessed in a random manner.
Clustered Random
A locality of reference exists in a series of "random" acesses. The reference behavior is similar to that of a CS scan. If possible, each page containing a record in a cluster should be kept in memory.
Hierarchical References
Description:
A sequence of page accesses that form a traversal path from the root down to the leaves of an index.
Straight Hierarchical
When an index is traversed only once and one page frame is enough for buffering all the index pages.
Hierarchical with Straigh Sequence (H/SS)
When a tree traversal is followed by a straight sequential scan
Hierarchical with Clustered Sequential (H/CS)
When a tree traversal is NOT followed by a straight sequential scan
Looping Hierarchical
Repeated accesses to the index structure during the evaluation of a join, where the inner relation is indexed on the join field. Pages closer to the root are more likely to be accesses than those closer to the leaves.
DBMIN - A Buffer Management Algorithm Based on the QLSM
Description
Buffers are managed on a per file instance basis.
The set of buffered pages associated with a file instance is referred to as its
locality set
. A page can belong to at most one locality set.
If a buffer contains a page that does not
belong to any locality set, the buffer is placed on a global free list.
A file instance is considered the owner of all the pages
in its locality set
To allow for data sharing among concurrent queries, all the
buffers in memory are also accessible through a global buffer table
On Start Up
The global table is initialized and all buffers are linked to the blogal free list
On opening a file, its locality set size and replacement policy are established
Variables
r
(# allocated buffers) and
l
(max # buffers allocatable) are set to 0
Upon Query Page Reuqests
... a search is made to the global table, followed by an adjustment to the associated locality set.
Possible Outcomes
Page found both in global table and locality set
Only usage statistics might need to be updated
Page found in memory but not in locality set
Page has owner?
Yes
Page is given to the requesting query
No
Add page to locality set and increment
r
if
r>l
a page is chosen and released back to the global free list and
r
set to
l
Usage statistics updated (required by local replacement policy)
Page is not in memory
A disk read is scheduled to bring the page from disk into a buffer from the global free list.
Evaluation of Buffer Management Algorithms
A comparison will be made on the performane of the
DBMIN algorithm
with the
hot set algorithm
and four other buffer management strategies in a multiuser environment.
Performance Evaluation Methodology
Abstract
Another component is a simulator for database systems, managing CPU, I/O and memory
Fault rate NOT chosen as a measure of performance, since it does not guarantee optimal system behavior in a multiuser environment
Chosen performance metric:
Average number of queries completed per second
A hybrid simulation model is used, combining features of both
trace-driven
(based on real system recirds)
and
distribution-driven
(random events)
models.
Workload Synthesis
Recorded Events
Several types of events were recorded during the execution of each query, including page accesses, disk I/O and file operations.
More specifically...
page write
disk read
disk write
file open
file close
page read
Data Sharing
Three levels of data sharing were defined according to the average number of concurrent queries accessing a copy of the database
full sharing, all queries access the same database
half sharing, every two queries share a copy of the database
no sharing, every query has its own copy
Factors Affecting Throughput in a Multiuser Environment
Number of concurrent queries
Degree of data sharing
Query mix
Configuration Model
Three hardware components are simulated in the model
CPU
Allocating CPU cycles to competing queries via Round Robin
disk
pool of buffers
Under the control of the buffer manager using one of the buffer management algorithms.
Buffer Management Algorithms
Six buffer management algorithms, divied into two groups, were included in the eperiments
First Group - typical, easy to implement.
RAND
FIFO
CLOCK
All of the below algorithms are global algorithms in the sens that the replacement discipline is applied globally to all the buffers in the system.
Second group - more sophisticated algorithms
DBMIN
:pencil2:
WS
:pencil2:
HOT
:pencil2:
All of the below algorithms have local policies, in which replacement decisions are made locally
Simulation Results
The simulations test the performance of the algorithms under a workload consisting of a mixture of query types. The following three mixes are used:
All six query types are equally likely to be requested
One of the two simple queries is chosen half the time
The two simple queries have a combined probability of 75%
First set of tests...
No data sharing between concurrently executing queries
Worst performance observed by RAND and FIFO, with CLOCK performing better.
WS does not perform well due to failing to capture lops of joins.
HOT has similar performance to that of DBMIN. By slight load DBMIN only marginally faster than the rest.
With the introduction of more concurrent queries DBMIN provides 7-13% more throughput than HOT and 25-45% more than WS.
Second and third set of tests...
Goal is to study the effects of data sharing on performance. Each set utilizes a different degree of data sharing.
Observations
Troughput increases as data sharing increases, reinforcing the view that data sharing among concurrent queries is important in a multiprogrammed DB.
Half sharing and no data sharing yield similar performance. The same is not true for full data sharing.
Algorithms that make an effort to identify the localities perform better
Effect of Load Control
Experiments show that no
load control
leads to thrashing under high workloads. Here we test with load controller based on the "50% rule"
(paging device is kept busy about half the time)
.
50% Rule Load Controller Components
Optimizer
- analyzes the measurements and decides what load adjustment is appropriate
Control switch
- activates or deactivates proceses according to the decisions made by the optimizer
Estimator
- measures the utilization of the device
Observations
With
load control
every simple algorithm outperformed WS
CLOCK's performance with
load control
enabled very close to that of HOT
Experiments showed that throughput was maximized with a disk utilization of 87%.
Problems
Runtime overhead expensive when sampling too frequent. If not frequent enough optimizer may not adjust load effectively.
Feedback load controllers only respond after detecting an undesirable condition, which may result in unnecessary process activations/deactivations.
Feedback load controllers don't work well in an environment with large number of small transactions., which enter/leave the system before their effects can be assessed.