Open In App

Python – All substrings Frequency in String

Last Updated : 16 May, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Given a String, extract all unique substrings with their frequency.

Input : test_str = “ababa” 

Output : {‘a’: 3, ‘ab’: 2, ‘aba’: 2, ‘abab’: 1, ‘ababa’: 1, ‘b’: 2, ‘ba’: 2, ‘bab’: 1, ‘baba’: 1} 

Explanation : All substrings with their frequency extracted. 

Input : test_str = “GFGF” 

Output : {‘G’: 2, ‘GF’: 2, ‘GFG’: 1, ‘GFGF’: 1, ‘F’: 2, ‘FG’: 1, ‘FGF’: 1} 

Explanation : All substrings with their frequency extracted.

Method #1: Using count() method

First, we need to find all the substrings then count() method can be used to find the frequency of a substring and store it in the dictionary. Then, simply print the dictionary.

Python3




# Python3 code to demonstrate working of
# All substrings Frequency in String
# Using loop + list comprehension
 
# initializing string
test_str = "abababa"
 
# printing original string
print("The original string is : " + str(test_str))
 
# list comprehension to extract substrings
temp = [test_str[idx: j] for idx in range(len(test_str)) for j in range(idx + 1, len(test_str) + 1)]
 
# loop to extract final result of frequencies
d=dict()
for i in temp:
    d[i]=test_str.count(i)
# printing result
print("Extracted frequency dictionary : " + str(d))


Output

The original string is : abababa
Extracted frequency dictionary : {'a': 4, 'ab': 3, 'aba': 2, 'abab': 1, 'ababa': 1, 'ababab': 1, 'abababa': 1, 'b': 3, 'ba': 3, 'bab': 1, 'baba': 1, 'babab': 1, 'bababa': 1}

Method #2: Using loop + list comprehension

The combination of the above functionalities can be used to solve this problem. In this, we first extract all the substrings using list comprehension, post that loop is used to increase frequency.

Python3




# Python3 code to demonstrate working of
# All substrings Frequency in String
# Using loop + list comprehension
 
# initializing string
test_str = "abababa"
 
# printing original string
print("The original string is : " + str(test_str))
 
# list comprehension to extract substrings
temp = [test_str[idx: j] for idx in range(len(test_str))
        for j in range(idx + 1, len(test_str) + 1)]
 
# loop to extract final result of frequencies
res = {}
for idx in temp:
    if idx not in res.keys():
        res[idx] = 1
    else:
        res[idx] += 1
 
# printing result
print("Extracted frequency dictionary : " + str(res))


Output

The original string is : abababa
Extracted frequency dictionary : {'a': 4, 'ab': 3, 'aba': 3, 'abab': 2, 'ababa': 2, 'ababab': 1, 'abababa': 1, 'b': 3, 'ba': 3, 'bab': 2, 'baba': 2, 'babab': 1, 'bababa': 1}

Method #3: Using list comprehension

This is yet another way in which this task can be performed. In this, we perform both the tasks, of extracting substring and computing frequency in a single nested list comprehension.

Python3




# Python3 code to demonstrate working of
# All substrings Frequency in String
# Using list comprehension
 
# initializing string
test_str = "abababa"
 
# printing original string
print("The original string is : " + str(test_str))
 
# list comprehension to extract substrings and frequency
res = dict()
for ele in [test_str[idx: j] for idx in range(len(test_str)) for j in range(idx + 1, len(test_str) + 1)]:
    res[ele] = 1 if ele not in res.keys() else res[ele] + 1            
     
# printing result
print("Extracted frequency dictionary : " + str(res))


Output

The original string is : abababa
Extracted frequency dictionary : {'a': 4, 'ab': 3, 'aba': 3, 'abab': 2, 'ababa': 2, 'ababab': 1, 'abababa': 1, 'b': 3, 'ba': 3, 'bab': 2, 'baba': 2, 'babab': 1, 'bababa': 1}

Time Complexity: O(n2)
Auxiliary Space: O(n)

Method #4: Using regex + findall() method

Step by step Algorithm:

  1. Initialize a dictionary ‘d’ to store substring frequencies.
  2. Loop through range(1, len(test_str)+1).
  3. For each i in range, find all substrings of length i using regex findall function.
  4. For each substring ‘sub’, update its frequency in the dictionary ‘d’.
  5. Return the ‘d’ dictionary with substring frequencies.

Python3




import re
 
# initializing string
test_str = "abababa"
 
# printing original string
print("The original string is : " + str(test_str))
 
# using regex to count substring frequencies
d = {}
for i in range(1, len(test_str)+1):
    for sub in re.findall('(?=(.{'+str(i)+'}))', test_str):
        d[sub] = d.get(sub, 0) + 1
 
# printing result
print("Extracted frequency dictionary : " + str(d))


Output

The original string is : abababa
Extracted frequency dictionary : {'a': 4, 'b': 3, 'ab': 3, 'ba': 3, 'aba': 3, 'bab': 2, 'abab': 2, 'baba': 2, 'ababa': 2, 'babab': 1, 'ababab': 1, 'bababa': 1, 'abababa': 1}

Time complexity: O(n^2), where n is the length of the input string. The nested loops for finding substrings and counting their frequencies contribute to the O(n^2) time complexity.

Auxiliary Space: O(n), where n is the length of the input string. 

Method 5: Using a sliding window technique with a dictionary to keep track of the counts. 

Step-by-step approach:

  • Initialize an empty dictionary freq_dict to keep track of the substring frequencies.
  • Initialize a variable n to the length of the given string test_str.
  • Loop through the range of n:
  • Initialize a variable window_size to i + 1.
  • Loop through the range n – window_size + 1:
  • Initialize a variable substring to the substring from test_str starting at the current index and having length window_size.
  • If substring is already in freq_dict, increment its value by 1. Otherwise, add it to freq_dict with a value of 1.
  • Return the freq_dict.

Python3




test_str = "abababa"
print("The original string is : " + str(test_str))
 
# using sliding window with a dictionary to count substring frequencies
freq_dict = {}
n = len(test_str)
for i in range(n):
    window_size = i + 1
    for j in range(n - window_size + 1):
        substring = test_str[j:j+window_size]
        freq_dict[substring] = freq_dict.get(substring, 0) + 1
 
# printing result
print("Extracted frequency dictionary : " + str(freq_dict))


Output

The original string is : abababa
Extracted frequency dictionary : {'a': 4, 'b': 3, 'ab': 3, 'ba': 3, 'aba': 3, 'bab': 2, 'abab': 2, 'baba': 2, 'ababa': 2, 'babab': 1, 'ababab': 1, 'bababa': 1, 'abababa': 1}

Time complexity: O(n^3), since we have a nested loop over the range of n and over the range n – window_size + 1 for each window_size. 
Auxiliary space: O(n^3), since we are storing all possible substrings in the dictionary. 



Similar Reads

Maximum length prefix such that frequency of each character is atmost number of characters with minimum frequency
Given a string S, the task is to find the prefix of string S with the maximum possible length such that frequency of each character in the prefix is at most the number of characters in S with minimum frequency. Examples: Input: S = 'aabcdaab' Output: aabcd Explanation: Frequency of characters in the given string - {a: 4, b: 2, c: 1, d: 1} Minimum f
8 min read
Python | Get all substrings of given string
There are many problems in which we require to get all substrings of a string. This particular utility is very popular in competitive programming and having shorthands to solve this problem can always be handy. Let's discuss certain ways in which this problem can be solved. Method #1 : Using list comprehension + string slicing The combination of li
7 min read
Python - Find all combinations of overlapping substrings of a string
Given a string, the task is to write a Python program to find all combinations of overlapping substrings of a string and store it in a list. The list of lists will be ordered and grouped by length of substrings. Input : test_str = 'Geeks4G' Output : [['', '', '', '', '', '', '', ''], ['G', 'e', 'e', 'k', 's', '4', 'G'], ['Ge', 'ee', 'ek', 'ks', 's4
2 min read
Python | Count all prefixes in given string with greatest frequency
Given a string, print and count all prefixes in which first alphabet has greater frequency than second alphabet.Take two alphabets from the user and compare them. The prefixes in which the alphabet given first has greater frequency than the second alphabet, such prefixes are printed, else the result will be 0. Examples : Input : string1 = "geek", a
4 min read
Python | Find all possible substrings after deleting k characters
Given a string and an Integer k, write a Python program to find all possible substrings of the given string after deleting k characters. Examples: Input : geeks, k = 1 Output : {'gees', 'eeks', 'geks', 'geek'} Input : dog, k = 1 Output : {'do', 'dg', 'og'} Approach #1 : Naive Approach This is the recursive naive approach to find all possible substr
5 min read
Python - Find all the strings that are substrings to the given list of strings
Given two lists, the task is to write a Python program to extract all the strings which are possible substring to any of strings in another list. Example: Input : test_list1 = ["Geeksforgeeks", "best", "for", "geeks"], test_list2 = ["Geeks", "win", "or", "learn"] Output : ['Geeks', 'or'] Explanation : "Geeks" occurs in "Geeksforgeeks string as subs
5 min read
Python program to print the substrings that are prefix of the given string
Given a string, print all the possible substrings which are also the prefix of the given string. Examples: Input : ababc Output : a, ab, aba, abab, ababc, a, ab Input : abdabc Output : a, ab, abd, abda, abdab, abdabc, a, ab Approach: We use two variables: start and end to keep track of the current substring and follow the below conditions until sta
3 min read
Python | Get matching substrings in string
The testing of a single substring in a string has been discussed many times. But sometimes, we have a list of potential substrings and check which ones occur in a target string as a substring. Let's discuss certain ways in which this task can be performed. Method #1: Using list comprehension Using list comprehension is the naive and brute force met
6 min read
Python - Replace Substrings from String List
Sometimes while working with data, we can have a problem in which we need to perform replace substrings with the mapped string to form a short form of some terms. This kind of problem can have applications in many domains involving data. Let's discuss certain ways in which this task can be performed. Method #1 : Using loop + replace() + enumerate()
7 min read
Python - Sort String by Custom Integer Substrings
Given a list of strings, sort strings by the occurrence of substring from list. Input : test_list = ["Good at 4", "Wake at 7", "Work till 6", "Sleep at 11"], subord_list = ["11", "7", "4", "6"] Output : ['Sleep at 11', 'Wake at 7', 'Good at 4', 'Work till 6'] Explanation : Strings sorted by substring presence. Input : test_list = ["Good at 9", "Wak
5 min read
three90RightbarBannerImg