Open In App

Open Addressing Collision Handling technique in Hashing

Last Updated : 12 Jun, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Open Addressing is a method for handling collisions. In Open Addressing, all elements are stored in the hash table itself. So at any point, the size of the table must be greater than or equal to the total number of keys (Note that we can increase table size by copying old data if needed). This approach is also known as closed hashing. This entire procedure is based upon probing. We will understand the types of probing ahead:

  • Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k. 
  • Search(k): Keep probing until the slot’s key doesn’t become equal to k or an empty slot is reached. 
  • Delete(k): Delete operation is interesting. If we simply delete a key, then the search may fail. So slots of deleted keys are marked specially as “deleted”. 
    The insert can insert an item in a deleted slot, but the search doesn’t stop at a deleted slot. 

Different ways of Open Addressing:

1. Linear Probing: 

In linear probing, the hash table is searched sequentially that starts from the original location of the hash. If in case the location that we get is already occupied, then we check for the next location. 

The function used for rehashing is as follows: rehash(key) = (n+1)%table-size. 

For example, The typical gap between two probes is 1 as seen in the example below:

Let hash(x) be the slot index computed using a hash function and S be the table size 

If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S 

Example: Let us consider a simple hash function as “key mod 5” and a sequence of keys that are to be inserted are 50, 70, 76, 85, 93. 

2. Quadratic Probing 

If you observe carefully, then you will understand that the interval between probes will increase proportionally to the hash value. Quadratic probing is a method with the help of which we can solve the problem of clustering that was discussed above.  This method is also known as the mid-square method. In this method, we look for the i2‘th slot in the ith iteration. We always start from the original hash location. If only the location is occupied then we check the other slots.

let hash(x) be the slot index computed using hash function.  

If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S

Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and collision resolution strategy to be f(i) = i2 . Insert = 22, 30, and 50.

3. Double Hashing 

The intervals that lie between probes are computed by another hash function. Double hashing is a technique that reduces clustering in an optimized way. In this technique, the increments for the probing sequence are computed by using another hash function. We use another hash function hash2(x) and look for the i*hash2(x) slot in the ith rotation. 

let hash(x) be the slot index computed using hash function.  

If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S

Example: Insert the keys 27, 43, 692, 72 into the Hash Table of size 7. where first hash-function is h1​(k) = k mod 7 and second hash-function is h2(k) = 1 + (k mod 5)

Comparison of the above three: 

Open addressing is a collision handling technique used in hashing where, when a collision occurs (i.e., when two or more keys map to the same slot), the algorithm looks for another empty slot in the hash table to store the collided key.

  • In linear probing, the algorithm simply looks for the next available slot in the hash table and places the collided key there. If that slot is also occupied, the algorithm continues searching for the next available slot until an empty slot is found. This process is repeated until all collided keys have been stored. Linear probing has the best cache performance but suffers from clustering. One more advantage of Linear probing is easy to compute. 
  • In quadratic probing, the algorithm searches for slots in a more spaced-out manner. When a collision occurs, the algorithm looks for the next slot using an equation that involves the original hash value and a quadratic function. If that slot is also occupied, the algorithm increments the value of the quadratic function and tries again. This process is repeated until an empty slot is found. Quadratic probing lies between the two in terms of cache performance and clustering. 
  • In double hashing, the algorithm uses a second hash function to determine the next slot to check when a collision occurs. The algorithm calculates a hash value using the original hash function, then uses the second hash function to calculate an offset. The algorithm then checks the slot that is the sum of the original hash value and the offset. If that slot is occupied, the algorithm increments the offset and tries again. This process is repeated until an empty slot is found.  Double hashing has poor cache performance but no clustering. Double hashing requires more computation time as two hash functions need to be computed. 

The choice of collision handling technique can have a significant impact on the performance of a hash table. Linear probing is simple and fast, but it can lead to clustering (i.e., a situation where keys are stored in long contiguous runs) and can degrade performance. Quadratic probing is more spaced out, but it can also lead to clustering and can result in a situation where some slots are never checked. Double hashing is more complex, but it can lead to more even distribution of keys and can provide better performance in some cases.

S.No. Separate Chaining Open Addressing
1. Chaining is Simpler to implement. Open Addressing requires more computation.
2. In chaining, Hash table never fills up, we can always add more elements to chain. In open addressing, table may become full.
3. Chaining is Less sensitive to the hash function or load factors. Open addressing requires extra care to avoid clustering and load factor.
4. Chaining is mostly used when it is unknown how many and how frequently keys may be inserted or deleted. Open addressing is used when the frequency and number of keys is known.
5. Cache performance of chaining is not good as keys are stored using linked list. Open addressing provides better cache performance as everything is stored in the same table.
6. Wastage of Space (Some Parts of hash table in chaining are never used). In Open addressing, a slot can be used even if an input doesn’t map to it.
7. Chaining uses extra space for links. No links in Open addressing

Note: Cache performance of chaining is not good because when we traverse a Linked List, we are basically jumping from one node to another, all across the computer’s memory. For this reason, the CPU cannot cache the nodes which aren’t visited yet, this doesn’t help us. But with Open Addressing, data isn’t spread, so if the CPU detects that a segment of memory is constantly being accessed, it gets cached for quick access.

Performance of Open Addressing: 

Like Chaining, the performance of hashing can be evaluated under the assumption that each key is equally likely to be hashed to any slot of the table (simple uniform hashing) 

m = Number of slots in the hash table

n = Number of keys to be inserted in the hash table

 Load factor α = n/m  ( < 1 )

Expected time to search/insert/delete < 1/(1 – α) 

So Search, Insert and Delete take (1/(1 – α)) time

Related Articles: Hashing | Set 1 (Introduction), Hashing | Set 2 (Separate Chaining) 



Previous Article
Next Article

Similar Reads

Separate Chaining Collision Handling Technique in Hashing
Separate Chaining is a collision handling technique. Separate chaining is one of the most popular and commonly used techniques in order to handle collisions. In this article, we will discuss about what is Separate Chain collision handling technique, its advantages, disadvantages, etc. What is Collision? Since a hash function gets us a small number
4 min read
Top 20 Hashing Technique based Interview Questions
Find whether an array is subset of another arrayUnion and Intersection of two Linked ListsFind a pair with given sumFind Itinerary from a given list of ticketsFind four elements a, b, c and d in an array such that a+b = c+dFind the largest subarray with 0 sumCount distinct elements in every window of size kFind smallest range containing elements fr
1 min read
Program to implement Hash Table using Open Addressing
The task is to design a general Hash Table data structure with Collision case handled and that supports the Insert(), Find(), and Delete() functions. Examples: Suppose the operations are performed on an array of pairs, {{1, 5}, {2, 15}, {3, 20}, {4, 7}}. And an array of capacity 20 is used as a Hash Table: Insert(1, 5): Assign the pair {1, 5} at th
15 min read
Implementing own Hash Table with Open Addressing Linear Probing
Prerequisite - Hashing Introduction, Implementing our Own Hash Table with Separate Chaining in JavaIn Open Addressing, all elements are stored in the hash table itself. So at any point, size of table must be greater than or equal to total number of keys (Note that we can increase table size by copying old data if needed). Insert(k) - Keep probing u
13 min read
Collision Course | TCS MockVita 2020
Problem Description On a busy road, multiple cars are passing by. A simulation is run to see what happens if brakes fail for all cars on the road. The only way for them to be safe is if they don't collide and pass by each other. The goal is to identify whether any of the given cars would collide or pass by each other safely around a Roundabout. Thi
8 min read
Probability of collision between two trucks
Given two strings S and T, where S represents the first lane in which vehicles move from left to right and T represents the second lane in which vehicles move from right to left. Vehicles can be either B (bike), C (car), or T (truck). The task is to find the probability of collision between two trucks. Examples: Input: S = "TCCBCTTB", T = "BTCCBBTT
9 min read
First collision point of two series
Given five numbers a, b, c, d and n (where a, b, c, d, n &gt; 0). These values represent n terms of two series. The two series formed by these four numbers are b, b+a, b+2a....b+(n-1)a and d, d+c, d+2c, ..... d+(n-1)c These two series will collide when at any single point summation values becomes exactly the same for both the series.Print the colli
7 min read
Decrypt the encoded string with help of Matrix as per given encryption decryption technique
Given an encoded (or encrypted) string S of length N, an integer M. The task is to decrypt the encrypted string and print it. The encryption and decryption techniques are given as: Encryption: The original string is placed in a Matrix of M rows and N/M columns, such that the first character of the Original text or string is placed on the top-left c
6 min read
Reconstruct original string from resultant string based on given encoding technique
A binary string S of length N is constructed from a string P of N characters and an integer X. The choice of the ith character of S is as follows: If the character Pi-X exists and is equal to 1, then Si is 1if the character Pi+X exists and is equal to 1, then Si is 1if both of the aforementioned conditions are false, then Si is 0. Given the resulti
10 min read
LCA in a tree using Binary Lifting Technique
Given a binary tree, the task is to find the Lowest Common Ancestor of the given two nodes in the tree. Let G be a tree then the LCA of two nodes u and v is defined as the node w in the tree which is an ancestor of both u and v and is farthest from the root node. If one node is the ancestor of another one then that particular node is the LCA of tho
14 min read
Article Tags :
Practice Tags :
three90RightbarBannerImg