Python – All substrings Frequency in String
Last Updated :
16 May, 2023
Given a String, extract all unique substrings with their frequency.
Input : test_str = “ababa”
Output : {‘a’: 3, ‘ab’: 2, ‘aba’: 2, ‘abab’: 1, ‘ababa’: 1, ‘b’: 2, ‘ba’: 2, ‘bab’: 1, ‘baba’: 1}
Explanation : All substrings with their frequency extracted.
Input : test_str = “GFGF”
Output : {‘G’: 2, ‘GF’: 2, ‘GFG’: 1, ‘GFGF’: 1, ‘F’: 2, ‘FG’: 1, ‘FGF’: 1}
Explanation : All substrings with their frequency extracted.
Method #1: Using count() method
First, we need to find all the substrings then count() method can be used to find the frequency of a substring and store it in the dictionary. Then, simply print the dictionary.
Python3
test_str = "abababa"
print ( "The original string is : " + str (test_str))
temp = [test_str[idx: j] for idx in range ( len (test_str)) for j in range (idx + 1 , len (test_str) + 1 )]
d = dict ()
for i in temp:
d[i] = test_str.count(i)
print ( "Extracted frequency dictionary : " + str (d))
|
Output
The original string is : abababa
Extracted frequency dictionary : {'a': 4, 'ab': 3, 'aba': 2, 'abab': 1, 'ababa': 1, 'ababab': 1, 'abababa': 1, 'b': 3, 'ba': 3, 'bab': 1, 'baba': 1, 'babab': 1, 'bababa': 1}
Method #2: Using loop + list comprehension
The combination of the above functionalities can be used to solve this problem. In this, we first extract all the substrings using list comprehension, post that loop is used to increase frequency.
Python3
test_str = "abababa"
print ( "The original string is : " + str (test_str))
temp = [test_str[idx: j] for idx in range ( len (test_str))
for j in range (idx + 1 , len (test_str) + 1 )]
res = {}
for idx in temp:
if idx not in res.keys():
res[idx] = 1
else :
res[idx] + = 1
print ( "Extracted frequency dictionary : " + str (res))
|
Output
The original string is : abababa
Extracted frequency dictionary : {'a': 4, 'ab': 3, 'aba': 3, 'abab': 2, 'ababa': 2, 'ababab': 1, 'abababa': 1, 'b': 3, 'ba': 3, 'bab': 2, 'baba': 2, 'babab': 1, 'bababa': 1}
Method #3: Using list comprehension
This is yet another way in which this task can be performed. In this, we perform both the tasks, of extracting substring and computing frequency in a single nested list comprehension.
Python3
test_str = "abababa"
print ( "The original string is : " + str (test_str))
res = dict ()
for ele in [test_str[idx: j] for idx in range ( len (test_str)) for j in range (idx + 1 , len (test_str) + 1 )]:
res[ele] = 1 if ele not in res.keys() else res[ele] + 1
print ( "Extracted frequency dictionary : " + str (res))
|
Output
The original string is : abababa
Extracted frequency dictionary : {'a': 4, 'ab': 3, 'aba': 3, 'abab': 2, 'ababa': 2, 'ababab': 1, 'abababa': 1, 'b': 3, 'ba': 3, 'bab': 2, 'baba': 2, 'babab': 1, 'bababa': 1}
Time Complexity: O(n2)
Auxiliary Space: O(n)
Method #4: Using regex + findall() method
Step by step Algorithm:
- Initialize a dictionary ‘d’ to store substring frequencies.
- Loop through range(1, len(test_str)+1).
- For each i in range, find all substrings of length i using regex findall function.
- For each substring ‘sub’, update its frequency in the dictionary ‘d’.
- Return the ‘d’ dictionary with substring frequencies.
Python3
import re
test_str = "abababa"
print ( "The original string is : " + str (test_str))
d = {}
for i in range ( 1 , len (test_str) + 1 ):
for sub in re.findall( '(?=(.{' + str (i) + '}))' , test_str):
d[sub] = d.get(sub, 0 ) + 1
print ( "Extracted frequency dictionary : " + str (d))
|
Output
The original string is : abababa
Extracted frequency dictionary : {'a': 4, 'b': 3, 'ab': 3, 'ba': 3, 'aba': 3, 'bab': 2, 'abab': 2, 'baba': 2, 'ababa': 2, 'babab': 1, 'ababab': 1, 'bababa': 1, 'abababa': 1}
Time complexity: O(n^2), where n is the length of the input string. The nested loops for finding substrings and counting their frequencies contribute to the O(n^2) time complexity.
Auxiliary Space: O(n), where n is the length of the input string.
Method 5: Using a sliding window technique with a dictionary to keep track of the counts.
Step-by-step approach:
- Initialize an empty dictionary freq_dict to keep track of the substring frequencies.
- Initialize a variable n to the length of the given string test_str.
- Loop through the range of n:
- Initialize a variable window_size to i + 1.
- Loop through the range n – window_size + 1:
- Initialize a variable substring to the substring from test_str starting at the current index and having length window_size.
- If substring is already in freq_dict, increment its value by 1. Otherwise, add it to freq_dict with a value of 1.
- Return the freq_dict.
Python3
test_str = "abababa"
print ( "The original string is : " + str (test_str))
freq_dict = {}
n = len (test_str)
for i in range (n):
window_size = i + 1
for j in range (n - window_size + 1 ):
substring = test_str[j:j + window_size]
freq_dict[substring] = freq_dict.get(substring, 0 ) + 1
print ( "Extracted frequency dictionary : " + str (freq_dict))
|
Output
The original string is : abababa
Extracted frequency dictionary : {'a': 4, 'b': 3, 'ab': 3, 'ba': 3, 'aba': 3, 'bab': 2, 'abab': 2, 'baba': 2, 'ababa': 2, 'babab': 1, 'ababab': 1, 'bababa': 1, 'abababa': 1}
Time complexity: O(n^3), since we have a nested loop over the range of n and over the range n – window_size + 1 for each window_size.
Auxiliary space: O(n^3), since we are storing all possible substrings in the dictionary.
Please Login to comment...