Please enable JavaScript.
Coggle requires JavaScript to display documents.
M6 (SETS AND DICTIONARIES, space and time trade-offs) - Coggle Diagram
M6
SETS AND DICTIONARIES
a
set
is an unordered collection of dinstict items
S = {2, 3, 5, 7}
S = {n | n is a prime number smaller than 10}
set operations:
Checking membership of a given number
Union between two sets
Intersection between two sets
implementations
S is a subset of U and is represented by a
bit vector
U = {1, 2, 3, 4, 5, 6, 7, 8, 9} and S = {2, 3, 5, 7}
S can also be represented as 011010100
very fast but at the expanse of space
S is represented using a list structure
lists are ordered, sets are unorderd
sets don't contain identical elements, lists do
dictionary
: data structure that implements searching, adding and deleting a given item from a set
space and time trade-offs
input enhancement
pocess the problem's input and store the additional information to accelerate solving the problem later on
prestructuring
extra space to facilitate faster or more flexible access to the data
hashing
HASHING
implementarion of
dictionaries
records with several fields
key
is the field that identifiies different elements of the record
hashing
: distribution of keys among a one dimensional array H[0...m-1] called
hash table
hash function
: function
h
responsible for the distribution of the
keys
in the
hash table
hash address
: integer returned by the
hash function
to a
key
requirements
the
hash table
shouldn't be too larger than the number of
keys
, but not too small to jeopardize the implementation's time efficiency
a
hash function
has to
evenly
distribute the
keys
the
hash function
has to be easy to compute
collisions
: two or more
keys
being
hashed
to the same
hash adress
Closed hashing
keys
are stored in the
hash table
without the use of linked lists
diferrent strategies for
colllision
ex:
linear probing
, if the cell is occupied it checks the immediate successor and so on until find an empty cell
serach
: given a key
K
, compute
h(K)
if the cell computed is not equal to
K
, it will use the algorithm for collision to look for the key until hit an empty cell or find the key
insertion
: compute
h(K)
and if the cell is occupied, uses the collision algorithm to find another one
deletion
: it's treaky because of the
search
algorithm, if a key which occupies the actual
hash address
computed for it is deleted, all the keys that collided with it will be "lost"
A "lazy" way to deal with it is to use a flag that tells if a cell has never been occupied or not
cluster
: sequence of occupied cells caused by linear probing
different collision resolutions can solve the
cluster
problem
double hashing
: if a collision occur, use a new
hash function
to find a different
hash address
rehashing
: when a table is close to being full it is scanned and all its keys are realocated into a new larger table
probe functions
linear probing
pseudo-random probing
quadratic probing
double hashing
Open hashing
keys
are stored into linked lists attached to the hash addresses
search
: apply
h(K)
to find the
hash address
it should be stored and, if the linked list attached is not empty, search in the linked list for the search key
search efficiency
: depends on the length of the linked list, which depend on the table size and the quality of the
hash function
load factor
: n/m
if
n
keys are evenly distributed among
m
cells
insert
: compute the
hash address
and append the key to the list
delete
: search for the key to be deleted and remove it from the list
the efficiency of insertion and deletion are identical to searching in the average case