Typically, records comprise several fields, each responsible for keeping a particular type of information about an entity the record represents. For example,a student record may contain fields for the student’s ID, name, date of birth, sex,home address, major, and so on. Among record fields there is usually at least one called a key that is used for identifying entities represented by the records (e.g.,the student’s ID). In the discussion below, we assume that we have to implement a dictionary of n records with keys K1, K2, . . . , Kn.
Hashing is based on the idea of distributing keys among a one-dimensional array H[0..m − 1] called a hash table. The distribution is done by computing, for each of the keys, the value of some predefined function h called the hash function. This function assigns an integer between 0 and m − 1, called the hash address, to a key.

Obviously, if we choose a hash table’s size m to be smaller than the numberof keys n, we will get collisions—a phenomenon of two (or more) keys beinghashed into the same cell of the hash table (Figure 7.4). But collisions should beexpected even if m is considerably larger than n (see Problem 5 in this section’sexercises). In fact, in the worst case, all the keys could be hashed to the same cellof the hash table. Fortunately, with an appropriately chosen hash table size and agood hash function, this situation happens very rarely. Still, every hashing schememust have a collision resolution mechanism. This mechanism is different in thetwo principal versions of hashing: open hashing (also called separate chaining)and closed hashing (also called open addressing).
-
-
If the hash function distributes n keys among m cells of the hash table about evenly, each list will be about n/m keys long. The ratio α = n/m, called the load factor of the hash table, plays a crucial role in the efficiency of hashing.