Please enable JavaScript.

Coggle requires JavaScript to display documents.

M7 (Hashing) - Coggle Diagram

- - - - Hashing is based on the idea of distributing keys among a one-dimensional array H[0..m − 1] called a hash table. The distribution is done by computing, for each of the keys, the value of some predefined function h called the hash function. This function assigns an integer between 0 and m − 1, called the hash address, to a key.
        
        Obviously, if we choose a hash table’s size m to be smaller than the numberof keys n, we will get collisions—a phenomenon of two (or more) keys beinghashed into the same cell of the hash table (Figure 7.4). But collisions should beexpected even if m is considerably larger than n (see Problem 5 in this section’sexercises). In fact, in the worst case, all the keys could be hashed to the same cellof the hash table. Fortunately, with an appropriately chosen hash table size and agood hash function, this situation happens very rarely. Still, every hashing schememust have a collision resolution mechanism. This mechanism is different in thetwo principal versions of hashing: open hashing (also called separate chaining)and closed hashing (also called open addressing).
        
        Open Hashing (Separate Chaining)
        
        In open hashing, keys are stored in linked lists attached to cells of a hash table.
        
        How do we search in a dictionary implemented as such a table of linked lists?We do this by simply applying to a search key the same procedure that was usedfor creating the table. To illustrate, if we want to search for the key KID in the hashtable of Figure 7.5, we first compute the value of the same hash function for thekey: h(KID) = 11. Since the list attached to cell 11 is not empty, its linked list maycontain the search key. But because of possible collisions, we cannot tell whetherthis is the case until we traverse this linked list. After comparing the string KID firstwith the string ARE and then with the string SOON, we end up with an unsuccessfulsearch.
        
        1 more item...
        
        Closed Hashing (Open Addressing)
        
        In closed hashing, all keys are stored in the hash table itself without the useof linked lists. (Of course, this implies that the table size m must be at least aslarge as the number of keys n.) Different strategies can be employed for collisionresolution. The simplest one—called linear probing—checks the cell followingthe one where the collision occurs. If that cell is empty, the new key is installedthere; if the next cell is already occupied, the availability of that cell’s immediatesuccessor is checked, and so on. Note that if the end of the hash table is reached,the search is wrapped to the beginning of the table; i.e., it is treated as a circulararray.
        
        To search for a given key K, we start by computing h(K) where h is the hash function used in the table construction. If the cell h(K) is empty, the search is unsuccessful. If the cell is not empty, we must compare K with the cell’s occupant: if they are equal, we have found a matching key; if they are not, we compare K with a key in the next cell and continue in this manner until we encounter either a matching key (a successful search) or an empty cell (unsuccessful search).
        
        1 more item...
        
        If the hash function distributes n keys among m cells of the hash table about evenly, each list will be about n/m keys long. The ratio α = n/m, called the load factor of the hash table, plays a crucial role in the efficiency of hashing.