Open In App

Hash Functions and Types of Hash functions

Last Updated : 20 May, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Hash functions are a fundamental concept in computer science and play a crucial role in various applications such as data storage, retrieval, and cryptography. In data structures and algorithms (DSA), hash functions are primarily used in hash tables, which are essential for efficient data management. This article delves into the intricacies of hash functions, their properties, and the different types of hash functions used in DSA.

What is a Hash Function?

A hash function is a function that takes an input (or ‘message’) and returns a fixed-size string of bytes. The output, typically a number, is called the hash code or hash value. The main purpose of a hash function is to efficiently map data of arbitrary size to fixed-size values, which are often used as indexes in hash tables.

Key Properties of Hash Functions

  • Deterministic: A hash function must consistently produce the same output for the same input.
  • Fixed Output Size: The output of a hash function should have a fixed size, regardless of the size of the input.
  • Efficiency: The hash function should be able to process input quickly.
  • Uniformity: The hash function should distribute the hash values uniformly across the output space to avoid clustering.
  • Pre-image Resistance: It should be computationally infeasible to reverse the hash function, i.e., to find the original input given a hash value.
  • Collision Resistance: It should be difficult to find two different inputs that produce the same hash value.
  • Avalanche Effect: A small change in the input should produce a significantly different hash value.

Applications of Hash Functions

  • Hash Tables: The most common use of hash functions in DSA is in hash tables, which provide an efficient way to store and retrieve data.
  • Data Integrity: Hash functions are used to ensure the integrity of data by generating checksums.
  • Cryptography: In cryptographic applications, hash functions are used to create secure hash algorithms like SHA-256.
  • Data Structures: Hash functions are utilized in various data structures such as Bloom filters and hash sets.

Types of Hash Functions

There are many hash functions that use numeric or alphanumeric keys. This article focuses on discussing different hash functions:

  1. Division Method.
  2. Multiplication Method
  3. Mid-Square Method
  4. Folding Method
  5. Cryptographic Hash Functions
  6. Universal Hashing
  7. Perfect Hashing

Let’s begin discussing these methods in detail.

1. Division Method

The division method involves dividing the key by a prime number and using the remainder as the hash value.

h(k)=k mod m

Where k is the key and 𝑚m is a prime number.

Advantages:

  • Simple to implement.
  • Works well when 𝑚m is a prime number.

Disadvantages:

  • Poor distribution if 𝑚m is not chosen wisely.

2. Multiplication Method

In the multiplication method, a constant 𝐴A (0 < A < 1) is used to multiply the key. The fractional part of the product is then multiplied by 𝑚m to get the hash value.

h(k)=⌊m(kAmod1)⌋

Where ⌊ ⌋ denotes the floor function.

Advantages:

  • Less sensitive to the choice of 𝑚m.

Disadvantages:

  • More complex than the division method.

3. Mid-Square Method

In the mid-square method, the key is squared, and the middle digits of the result are taken as the hash value.

Steps:

  1. Square the key.
  2. Extract the middle digits of the squared value.

Advantages:

  • Produces a good distribution of hash values.

Disadvantages:

  • May require more computational effort.

4. Folding Method

The folding method involves dividing the key into equal parts, summing the parts, and then taking the modulo with respect to 𝑚m.

Steps:

  1. Divide the key into parts.
  2. Sum the parts.
  3. Take the modulo 𝑚m of the sum.

Advantages:

  • Simple and easy to implement.

Disadvantages:

  • Depends on the choice of partitioning scheme.

5. Cryptographic Hash Functions

Cryptographic hash functions are designed to be secure and are used in cryptography. Examples include MD5, SHA-1, and SHA-256.

Characteristics:

  • Pre-image resistance.
  • Second pre-image resistance.
  • Collision resistance.

Advantages:

  • High security.

Disadvantages:

  • Computationally intensive.

6. Universal Hashing

Universal hashing uses a family of hash functions to minimize the chance of collision for any given set of inputs.

h(k)=((ak+b)modp)modm

Where a and b are randomly chosen constants, p is a prime number greater than m, and k is the key.

Advantages:

  • Reduces the probability of collisions.

Disadvantages:

  • Requires more computation and storage.

7. Perfect Hashing

Perfect hashing aims to create a collision-free hash function for a static set of keys. It guarantees that no two keys will hash to the same value.

Types:

  • Minimal Perfect Hashing: Ensures that the range of the hash function is equal to the number of keys.
  • Non-minimal Perfect Hashing: The range may be larger than the number of keys.

Advantages:

  • No collisions.

Disadvantages:

  • Complex to construct.

Conclusion

In conclusion, hash functions are very important tools that help store and find data quickly. Knowing the different types of hash functions and how to use them correctly is key to making software work better and more securely. By choosing the right hash function for the job, developers can greatly improve the efficiency and reliability of their systems.



Similar Reads

What are Hash Functions and How to choose a good Hash Function?
Prerequisite: Hashing | Set 1 (Introduction) What is a Hash Function? A function that converts a given big phone number to a small practical integer value. The mapped integer value is used as an index in the hash table. In simple terms, a hash function maps a big number or string to a small integer that can be used as the index in the hash table. W
5 min read
Comparison of an Array and Hash table in terms of Storage structure and Access time complexity
Arrays and Hash Tables are two of the most widely used data structures in computer science, both serving as efficient solutions for storing and accessing data in Java. They have different storage structures and time complexities, making them suitable for different use cases. In this article, we will explore the differences between arrays and hash t
3 min read
Applications, Advantages and Disadvantages of Hash Data Structure
Introduction : Imagine a giant library where every book is stored in a specific shelf, but instead of searching through endless rows of shelves, you have a magical map that tells you exactly which shelf your book is on. That's exactly what a Hash data structure does for your data! Hash data structures are a fundamental building block of computer sc
7 min read
What is the difference between Hashing and Hash Tables?
What is Hashing? Hashing refers to the process of generating a fixed-size output from an input of variable size using the mathematical formulas known as hash functions. This technique determines an index or location for the storage of an item in a data structure. It might not be strictly related to key-value pairs only if you are manipulating the d
2 min read
Introduction to Rolling Hash - Data Structures and Algorithms
A rolling hash is a hash function that is used to efficiently compute a hash value for a sliding window of data. It is commonly used in computer science and computational biology, where it can be used to detect approximate string matches, find repeated substrings, and perform other operations on sequences of data. The idea behind a rolling hash is
15+ min read
Merkle Tree and Hash Chain Data Structures with difference
In this post, we will deep dive into what are Merkel Trees and Hash Chain data structures, their advantages and disadvantages, and the differences between Merkel Tree vs. Hash Chain. Table of Content Merkle TreesHash ChainsDifference between Merkle Tree vs. Hash ChainMerkle Trees:A Merkle Tree is a tree-like data structure where each leaf represent
6 min read
Find the Longest Common Substring using Binary search and Rolling Hash
Given two strings X and Y, the task is to find the length of the longest common substring. Examples: Input: X = “GeeksforGeeks”, y = “GeeksQuiz” Output: 5 Explanation: The longest common substring is “Geeks” and is of length 5. Input: X = “abcdxyz”, y = “xyzabcd” Output: 4 Explanation: The longest common substring is “abcd” and is of length 4. Inpu
11 min read
Graph representations using set and hash
We have introduced Graph implementation using array of vectors in Graph implementation using STL for competitive programming | Set 1. In this post, a different implementation is used which can be used to implement graphs using sets. The implementation is for adjacency list representation of graph. A set is different from a vector in two ways: it st
15+ min read
Data Structures | Hash | Question 1
A hash table of length 10 uses open addressing with hash function h(k)=k mod 10, and linear probing. After inserting 6 values into an empty hash table, the table is as shown below. Which one of the following choices gives a possible order in which the key values could have been inserted in the table? (A) 46, 42, 34, 52, 23, 33 (B) 34, 42, 23, 52, 3
2 min read
Data Structures | Hash | Question 2
How many different insertion sequences of the key values using the hash function h(k) = k mod 10 and linear probing will result in the hash table shown below? (A) 10 (B) 20 (C) 30 (D) 40 Answer: (C) Explanation: In a valid insertion sequence, the elements 42, 23 and 34 must appear before 52 and 33, and 46 must appear before 33. Total number of diff
1 min read