Hash Table Data Structure
Last Updated :
08 May, 2024
What is Hash Table?
A Hash table is defined as a data structure used to insert, look up, and remove key-value pairs quickly. It operates on the hashing concept, where each key is translated by a hash function into a distinct index in an array. The index functions as a storage location for the matching value. In simple words, it maps the keys with the value.
What is Load factor?
A hash table’s load factor is determined by how many elements are kept there in relation to how big the table is. The table may be cluttered and have longer search times and collisions if the load factor is high. An ideal load factor can be maintained with the use of a good hash function and proper table resizing.
What is a Hash function?
A Function that translates keys to array indices is known as a hash function. The keys should be evenly distributed across the array via a decent hash function to reduce collisions and ensure quick lookup speeds.
- Integer universe assumption: The keys are assumed to be integers within a certain range according to the integer universe assumption. This enables the use of basic hashing operations like division or multiplication hashing.
- Hashing by division: This straightforward hashing technique uses the key’s remaining value after dividing it by the array’s size as the index. When an array size is a prime number and the keys are evenly spaced out, it performs well.
- Hashing by multiplication: This straightforward hashing operation multiplies the key by a constant between 0 and 1 before taking the fractional portion of the outcome. After that, the index is determined by multiplying the fractional component by the array’s size. Also, it functions effectively when the keys are scattered equally.
Selecting a decent hash function is based on the properties of the keys and the intended functionality of the hash table. Using a function that evenly distributes the keys and reduces collisions is crucial.
Criteria based on which a hash function is chosen:
- To ensure that the number of collisions is kept to a minimum, a good hash function should distribute the keys throughout the hash table in a uniform manner. This implies that for all pairings of keys, the likelihood of two keys hashing to the same position in the table should be rather constant.
- To enable speedy hashing and key retrieval, the hash function should be computationally efficient.
- It ought to be challenging to deduce the key from its hash value. As a result, attempts to guess the key using the hash value are less likely to succeed.
- A hash function should be flexible enough to adjust as the data being hashed changes. For instance, the hash function needs to continue to perform properly if the keys being hashed change in size or format.
Collisions happen when two or more keys point to the same array index. Chaining, open addressing, and double hashing are a few techniques for resolving collisions.
- Open addressing: collisions are handled by looking for the following empty space in the table. If the first slot is already taken, the hash function is applied to the subsequent slots until one is left empty. There are various ways to use this approach, including double hashing, linear probing, and quadratic probing.
- Separate Chaining: In separate chaining, a linked list of objects that hash to each slot in the hash table is present. Two keys are included in the linked list if they hash to the same slot. This method is rather simple to use and can manage several collisions.
- Robin Hood hashing: To reduce the length of the chain, collisions in Robin Hood hashing are addressed by switching off keys. The algorithm compares the distance between the slot and the occupied slot of the two keys if a new key hashes to an already-occupied slot. The existing key gets swapped out with the new one if it is closer to its ideal slot. This brings the existing key closer to its ideal slot. This method has a tendency to cut down on collisions and average chain length.
Dynamic resizing:
This feature enables the hash table to expand or contract in response to changes in the number of elements contained in the table. This promotes a load factor that is ideal and quick lookup times.
Implementations of Hash Table
Python, Java, C++, and Ruby are just a few of the programming languages that support hash tables. They can be used as a customized data structure in addition to frequently being included in the standard library.
Example – Count characters in the String “geeksforgeeks”.
In this example, we use a hashing technique for storing the count of the string.
C++
#include <bits/stdc++.h>
using namespace std;
int main() {
//initialize a string
string s="geeksforgeeks";
// Using an array to store the count of each alphabet
// by mapping the character to an index value
int arr[26]={0};
//Storing the count
for(int i=0;i<s.size();i++){
arr[s[i]-'a']++;
}
//Search the count of the character
char ch='e';
// get count
cout<<"The count of " <<ch<< " is " <<arr[ch-'a']<<endl;
return 0;
}
Java
public class CharacterCount {
public static void main(String[] args) {
// Initialize a string
String s = "geeksforgeeks";
// Using an array to store the count of each alphabet
// by mapping the character to an index value
int[] arr = new int[26];
// Storing the count
for (int i = 0; i < s.length(); i++) {
arr[s.charAt(i) - 'a']++;
}
// Search the count of the character
char ch = 'e';
// Get count
System.out.println("The count of " + ch + " is " + arr[ch - 'a']);
}
}
Python
# Initialize a string
s = "geeksforgeeks"
# Using a list to store the count of each alphabet
# by mapping the character to an index value
arr = [0] * 26
# Storing the count
for i in range(len(s)):
arr[ord(s[i]) - ord('a')] += 1
# Search the count of the character
ch = 'e'
# Get count
print("The count of ", ch, " is ", arr[ord(ch) - ord('a')])
C#
using System;
class Program {
static void Main(string[] args) {
//initialize a string
string s = "geeksforgeeks";
// Using an array to store the count of each alphabet
// by mapping the character to an index value
int[] arr = new int[26];
//Storing the count
for (int i = 0; i < s.Length; i++) {
arr[s[i] - 'a']++;
}
//Search the count of the character
char ch = 'e';
// get count
Console.WriteLine("The count of " + ch + " is " + arr[ch - 'a']);
}
}
Javascript
// Initialize a string
const s = "geeksforgeeks";
// Using an array to store the count of each alphabet
// by mapping the character to an index value
const arr = Array(26).fill(0);
// Storing the count
for (let i = 0; i < s.length; i++) {
arr[s.charCodeAt(i) - 'a'.charCodeAt(0)]++;
}
// Search the count of the character
const ch = 'e';
// Get count
console.log(`The count of ${ch} is ${arr[ch.charCodeAt(0) - 'a'.charCodeAt(0)]}`);
Output:
The count of e is 4
Complexity Analysis of a Hash Table:
For lookup, insertion, and deletion operations, hash tables have an average-case time complexity of O(1). Yet, these operations may, in the worst case, require O(n) time, where n is the number of elements in the table.
Applications of Hash Table:
- Hash tables are frequently used for indexing and searching massive volumes of data. A search engine might use a hash table to store the web pages that it has indexed.
- Data is usually cached in memory via hash tables, enabling rapid access to frequently used information.
- Hash functions are frequently used in cryptography to create digital signatures, validate data, and guarantee data integrity.
- Hash tables can be used for implementing database indexes, enabling fast access to data based on key values.
Please Login to comment...