Open In App

Count distinct substrings of a string using Rabin Karp algorithm

Last Updated : 04 Oct, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Given a string, return the number of distinct substrings using Rabin Karp Algorithm.

Examples

Input  : str = “aba”
Output : 5
Explanation :
Total number of distinct substring are 5 - "a", "ab", "aba", "b" ,"ba"
Input : str = “abcd”
Output : 10
Explanation :
Total number of distinct substring are 10 - "a", "ab", "abc", "abcd", "b", "bc", "bcd", "c", "cd", "d"

Approach:

Prerequisite: Rabin-Karp Algorithm for Pattern Searching

Calculate the current hash value of the current character and store
in a dictionary/map to avoid repetition. 

To compute the hash (rolling hash) as done in Rabin-Karp algorithm follow:

The hash function suggested by Rabin and Karp calculates an integer value. The integer value for a string is the numeric value of a string. For example, if all possible characters are from 1 to 10, the numeric value of “122” will be 122. The number of possible characters is higher than 10 (256 in general) and the pattern length can be large. So the numeric values cannot be practically stored as an integer. Therefore, the numeric value is calculated using modular arithmetic to make sure that the hash values can be stored in an integer variable (can fit in memory words). To do rehashing, we need to take off the most significant digit and add the new least significant digit in the hash value. Rehashing is done using the following formula.

hash( txt[s+1 .. s+m] ) = ( d ( hash( txt[s .. s+m-1]) – txt[s]*h ) + txt[s + m] ) mod q

hash( txt[s .. s+m-1] ) : Hash value at shift s.
hash( txt[s+1 .. s+m] ): Hash value at next shift (or shift s+1)
d: Number of characters in the alphabet
q: A prime number
h: d^(m-1)

The idea is similar as we evaluate a mathematical expression. For example, we have a string of “1234” let us compute the value of the substring “12” as 12 and we want to compute the value of the substring “123” this can be calculated as ((12)*10+3)=123, similar logic is applied here.
 

C++




#include <bits/stdc++.h>
using namespace std;
 
// Driver code
int main()
{
  int t = 1;
 
  // store prime to reduce overflow
  long long mod = 9007199254740881;
 
  for(int i = 0; i < t; i++)
  {
 
    // string to check number of distinct substring
    string s = "abcd";
 
    // to store substrings
    vector<vector<long long>>l;
 
    // to store hash values by Rabin Karp algorithm
    unordered_map<long long,int>d;
 
    for(int i=0;i<s.length();i++){
      int suma = 0;
      long long pre = 0;
 
      // Number of input alphabets
      long long D = 256;
 
      for(int j=i;j<s.length();j++){
 
        // calculate new hash value by adding next element
        pre = (pre*D+s[j]) % mod;
 
        // store string length if non repeat
        if(d.find(pre) == d.end())
          l.push_back({i, j});
        d[pre] = 1;
      }
    }
 
    // resulting length
    cout<<l.size()<<endl;
 
    // resulting distinct substrings
    for(int i = 0; i < l.size(); i++)
      cout << s.substr(l[i][0],l[i][1]+1-l[i][0]) << " ";
  }
}
 
// This code is contributed by shinjanpatra


Java




import java.util.*;
 
public class Main {
    public static void main(String[] args) {
        int t = 1;
        // store prime to reduce overflow
        long mod = 9007199254740881L;
 
        for (int i = 0; i < t; i++) {
 
            // string to check number of distinct substring
            String s = "abcd";
 
            // to store substrings
            List<List<Integer>> l = new ArrayList<>();
 
            // to store hash values by Rabin Karp algorithm
            Map<Long, Integer> d = new HashMap<>();
 
            for (int j = 0; j < s.length(); j++) {
                long suma = 0;
                long pre = 0;
 
                // Number of input alphabets
                int D = 256;
 
                for (int k = j; k < s.length(); k++) {
 
                    // calculate new hash value by adding next element
                    pre = (pre*D + (long)s.charAt(k)) % mod;
 
                    // store string length if non repeat
                    if (!d.containsKey(pre)) {
                        List<Integer> sublist = new ArrayList<>();
                        sublist.add(j);
                        sublist.add(k);
                        l.add(sublist);
                    }
                    d.put(pre, 1);
                }
            }
 
            // resulting length
            System.out.println(l.size());
 
            // resulting distinct substrings
            for (int j = 0; j < l.size(); j++) {
                int start = l.get(j).get(0);
                int end = l.get(j).get(1);
                System.out.print(s.substring(start, end+1) + " ");
            }
        }
    }
}


Python3




# importing libraries
import sys
import math as mt
t = 1
# store prime to reduce overflow
mod = 9007199254740881
 
for ___ in range(t):
 
    # string to check number of distinct substring
    s = 'abcd'
 
    # to store substrings
    l = []
 
    # to store hash values by Rabin Karp algorithm
    d = {}
 
    for i in range(len(s)):
        suma = 0
        pre = 0
 
        # Number of input alphabets
        D = 256
 
        for j in range(i, len(s)):
 
            # calculate new hash value by adding next element
            pre = (pre*D+ord(s[j])) % mod
 
            # store string length if non repeat
            if d.get(pre, -1) == -1:
                l.append([i, j])
            d[pre] = 1
 
    # resulting length
    print(len(l))
 
    # resulting distinct substrings
    for i in range(len(l)):
        print(s[l[i][0]:l[i][1]+1], end=" ")


C#




using System;
using System.Collections.Generic;
 
class GFG {
static void Main()
{
int t = 1;
 
 
    // store prime to reduce overflow
    long mod = 9007199254740881;
 
    for (int i = 0; i < t; i++)
    {
        // string to check number of distinct substring
        string s = "abcd";
 
        // to store substrings
        List<List<long>> l = new List<List<long>>();
 
        // to store hash values by Rabin Karp algorithm
        Dictionary<long, int> d = new Dictionary<long, int>();
 
        for (int j = 0; j < s.Length; j++)
        {
            int suma = 0;
            long pre = 0;
 
            // Number of input alphabets
            long D = 256;
 
            for (int k = j; k < s.Length; k++)
            {
                // calculate new hash value by adding next element
                pre = (pre * D + s[k]) % mod;
 
                // store string length if non repeat
                if (!d.ContainsKey(pre))
                {
                    List<long> sub = new List<long>();
                    sub.Add(j);
                    sub.Add(k);
                    l.Add(sub);
                }
                d[pre] = 1;
            }
        }
 
        // resulting length
         
        Console.WriteLine(l.Count);
        
 
        // resulting distinct substrings
        for (int j = 0; j < l.Count; j++)
        {
            Console.Write(s.Substring((int)l[j][0], (int)l[j][1] + 1 - (int)l[j][0]) + " ");
        }
    }
}
}
//This code is contributed by rudra1807raj


Javascript




<script>
 
let t = 1
 
// store prime to reduce overflow
let mod = 9007199254740881
 
for(let i = 0; i < t; i++){
    // string to check number of distinct substring
    let s = 'abcd'
 
    // to store substrings
    let l = []
 
    // to store hash values by Rabin Karp algorithm
    let d = new Map()
 
    for(let i=0;i<s.length;i++){
        let suma = 0
        let pre = 0
 
        // Number of input alphabets
        let D = 256
 
        for(let j=i;j<s.length;j++){
 
            // calculate new hash value by adding next element
            pre = (pre*D+s.charCodeAt(j)) % mod
 
            // store string length if non repeat
            if(d.has([pre, -1]) == false)
                l.push([i, j])
            d.set(pre , 1)
        }
    }
 
    // resulting length
    document.write(l.length,"</br>")
 
    // resulting distinct substrings
    for(let i = 0; i < l.length; i++)
        document.write(s.substring(l[i][0],l[i][1]+1)," ")
}
 
// This code is contributed by shinjanpatra
 
</script>


Output

10
a ab abc abcd b bc bcd c cd d 





Time Complexity: O(N2), N is the length of the string
Auxiliary Space: O(N*2) => O(N)

Another Approach:

  1. Define an input string “s”.
  2. Create an empty unordered set “substrings” to store the distinct substrings.
  3. Use two nested loops to iterate over all possible substrings of “s”. The outer loop iterates over the starting index of each substring.
    The inner loop iterates over the ending index of each substring, starting from the starting index.
  4. Use the “substr” method of the input string “s” to extract each substring and insert it into the “substrings” set.
    The “substr” method takes two arguments: the starting index of the substring and the length of the substring.
    The length of the substring is computed as “j – i + 1”.
  5. After all substrings have been added to the set, output the size of the set to get the number of distinct substrings.

Below is the implementation of the above approach:

C++




#include <iostream>
#include <unordered_set>
#include <string>
 
using namespace std;
 
int main() {
    // Input string
    string s = "abcd";
 
    // Set to store distinct substrings
    unordered_set<string> substrings;
 
    // Iterate over all possible substrings and add them to the set
    for (int i = 0; i < s.size(); i++) {
        for (int j = i; j < s.size(); j++) {
            substrings.insert(s.substr(i, j - i + 1));
        }
    }
//This code is contributed rudra1807raj
    // Output the number of distinct substrings
    cout << substrings.size() << endl;
}


Java




import java.util.*;
 
public class GFG {
    public static void main(String[] args) {
        // Input string
        String s = "abcd";
 
        // Set to store distinct substrings
        Set<String> substrings = new HashSet<>();
 
        // Iterate over all possible substrings and add them to the set
        for (int i = 0; i < s.length(); i++) {
            for (int j = i; j < s.length(); j++) {
                substrings.add(s.substring(i, j + 1));
            }
        }
 
        // Output the number of distinct substrings
        System.out.println(substrings.size());
    }
}
// This code is contributed by rudra1807raj


Python




def main():
    # Input string
    s = "abcd"
 
    # Set to store distinct substrings
    substrings = set()
 
    # Iterate over all possible substrings and add them to the set
    for i in range(len(s)):
        for j in range(i, len(s)):
            substrings.add(s[i:j + 1])
 
    # Output the number of distinct substrings
    print(len(substrings))
 
if __name__ == "__main__":
    main()


C#




using System;
using System.Collections.Generic;
 
class GFG
{
    static void Main(string[] args)
    {  
        // Input string
        string s = "abcd";
         
        // Set to store distinct substrings
        HashSet<string> substrings = new HashSet<string>();
         
        // Iterate over all possible substrings and add them to the set
        for (int i = 0; i < s.Length; i++){
           for (int j = i; j < s.Length; j++){
              substrings.Add(s.Substring(i, j - i + 1));
            }
          }
        // Output the number of distinct substrings
        Console.WriteLine(substrings.Count);
    }
}


Javascript




<script>
function countDistinctSubstrings(s) {
    const substrings = new Set();
 
    // Iterate over all possible substrings and add them to the set
    for (let i = 0; i < s.length; i++) {
        for (let j = i; j < s.length; j++) {
            substrings.add(s.substring(i, j + 1));
        }
    }
 
    // Output the number of distinct substrings
    return substrings.size;
}
 
// Input string
const s = "abcd";
 
// Get the number of distinct substrings
const distinctSubstringsCount = countDistinctSubstrings(s);
document.write(distinctSubstringsCount);
</script>


Output

10





Time complexity: O(n^3), where n is the length of the input string “s”.
Auxiliary Space: O(n^2), where n is the length of the input string “s”. The space complexity is dominated by the number of distinct substrings that are stored in the unordered_set.



Similar Reads

Implementing Rabin Karp Algorithm Using Rolling Hash in Java
There are so many pattern searching algorithms for the string. KMP algorithm, Z algorithm Rabin Karp algorithm, etc these algorithms are the optimization of Naive Pattern searching Algorithm. Naive Pattern Searching Algorithm: Input : "AABACACAACAC" Pattern : "CAC" Output : [4,9] AABACACAACAC Implementation: Java Code // Java Program to Search for
5 min read
Rabin-Karp algorithm for Pattern Searching in Matrix
Given matrices txt[][] of dimensions m1 x m2 and pattern pat[][] of dimensions n1 x n2, the task is to check whether a pattern exists in the matrix or not, and if yes then print the top most indices of the pat[][] in txt[][]. It is assumed that m1, m2 ? n1, n2 Examples: Input: txt[][] = {{G, H, I, P} {J, K, L, Q} {R, G, H, I} {S, J, K, L} } pat[][]
15+ min read
Rabin-Karp Algorithm for Pattern Searching
Given a text T[0. . .n-1] and a pattern P[0. . .m-1], write a function search(char P[], char T[]) that prints all occurrences of P[] present in T[] using Rabin Karp algorithm. You may assume that n &gt; m. Examples: Input: T[] = "THIS IS A TEST TEXT", P[] = "TEST"Output: Pattern found at index 10 Input: T[] = "AABAACAADAABAABA", P[] = "AABA"Output:
15 min read
Hopcroft–Karp Algorithm for Maximum Matching | Set 1 (Introduction)
A matching in a Bipartite Graph is a set of the edges chosen in such a way that no two edges share an endpoint. A maximum matching is a matching of maximum size (maximum number of edges). In a maximum matching, if any edge is added to it, it is no longer a matching. There can be more than one maximum matching for a given Bipartite Graph. We have di
3 min read
Karp's minimum mean (or average) weight cycle algorithm
Given a directed and strongly connected graph with non-negative edge weights. We define the mean weight of a cycle as the summation of all the edge weights of the cycle divided by the no. of edges. Our task is to find the minimum mean weight among all the directed cycles of the graph. Example: Input : Below Graph Output : 1.66667 Method to find the
11 min read
Hopcroft–Karp Algorithm for Maximum Matching | Set 2 (Implementation)
We strongly recommend to refer below post as a prerequisite.Hopcroft–Karp Algorithm for Maximum Matching | Set 1 (Introduction) There are few important things to note before we start implementation. We need to find an augmenting path (A path that alternates between matching and not matching edges and has free vertices as starting and ending points)
13 min read
Hopcroft–Karp Algorithm in Python
A matching in a Bipartite Graph is a set of edges chosen in such a way that no two edges share an endpoint. A maximum matching is a matching of maximum size (maximum number of edges). In a maximum matching, if any edge is added to it, it is no longer a matching. There can be more than one maximum matching for a given Bipartite Graph. Hopcroft Karp
5 min read
Find distinct characters in distinct substrings of a string
Given a string str, the task is to find the count of distinct characters in all the distinct sub-strings of the given string.Examples: Input: str = "ABCA" Output: 18 Distinct sub-stringsDistinct charactersA1AB2ABC3ABCA3B1BC2BCA3C1CA2 Hence, 1 + 2 + 3 + 3 + 1 + 2 + 3 + 1 + 2 = 18Input: str = "AAAB" Output: 10 Approach: Take all possible sub-strings
5 min read
Primality Test | Set 3 (Miller–Rabin)
Given a number n, check if it is prime or not. We have introduced and discussed School and Fermat methods for primality testing.Primality Test | Set 1 (Introduction and School Method) Primality Test | Set 2 (Fermat Method)In this post, the Miller-Rabin method is discussed. This method is a probabilistic method ( like Fermat), but it is generally pr
15+ min read
Count of distinct substrings of a string using Suffix Trie
Given a string of length n of lowercase alphabet characters, we need to count total number of distinct substrings of this string. Examples: Input : str = “ababa” Output : 10 Total number of distinct substring are 10, which are, "", "a", "b", "ab", "ba", "aba", "bab", "abab", "baba" and "ababa"Recommended: Please solve it on “PRACTICE ” first, befor
11 min read
three90RightbarBannerImg