Python – Maximum occurring Substring from list
Last Updated :
08 May, 2023
Sometimes, while working with Python strings, we can have a problem in which we need to check for maximum occurring substring from strings list. This can have application in DNA sequencing in Biology and other application. Let’s discuss certain way in which this task can be performed.
Method 1 : Using regex() + groupby() + max() + lambda
The combination of above functionalities can be used to solve this particular problem. In this, we first extract the sequences using regex function. Then the counter grouping is performed using groupby(). The last step is extracting maximum which is done using max() along with lambda function.
Python3
import re
import itertools
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = [ 'gfg' , 'is' , 'best' ]
print ( "The original string is : " + test_str)
print ( "The original list is : " + str (test_list))
seqs = re.findall( str .join( '|' , test_list), test_str)
grps = [(key, len ( list (j))) for key, j in itertools.groupby(seqs)]
res = max (grps, key = lambda ele: ele[ 1 ])
print ( "Maximum frequency substring : " + str (res[ 0 ]))
|
Output :
The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg
Time complexity: O(n), where n is the length of the input string. The time complexity of regex(), groupby(), and max() is O(n).
Auxiliary space: O(k), where k is the length of the input list. This is the space needed to store the list of substrings. The space complexity of regex(), groupby(), and max() is O(1).
Method 2: Using count() and max() methods
count() returns the occurrence of a particular element in a sequence and the max() method returns the maximum of that.
Python3
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = [ 'gfg' , 'is' , 'best' ]
print ( "The original string is : " + test_str)
print ( "The original list is : " + str (test_list))
res = []
for i in test_list:
res.append(test_str.count(i))
x = max (res)
result = test_list[res.index(x)]
print ( "Maximum frequency substring : " + str (result))
|
Output
The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg
Time Complexity: O(n)
Auxiliary Space: O(n)
Method 3: Using re.findall() + Counter
This is an alternate approach that uses re.findall() and Counter module. In this, we extract the sequence using re.findall() and count the occurrence of each element using Counter() from collections module.
Python3
import collections
import re
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = [ 'gfg' , 'is' , 'best' ]
print ( "The original string is : " + test_str)
print ( "The original list is : " + str (test_list))
seqs = re.findall( str .join( '|' , test_list), test_str)
res = collections.Counter(seqs).most_common( 1 )[ 0 ][ 0 ]
print ( "Maximum frequency substring : " + str (res))
|
Output
The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg
Time Complexity: O(n)
Auxiliary Space: O(n)
Method 4 : Using operator.countOf() and max() methods
Python3
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = [ 'gfg' , 'is' , 'best' ]
print ( "The original string is : " + test_str)
print ( "The original list is : " + str (test_list))
res = []
for i in test_list:
import operator
res.append(operator.countOf(test_str, i))
x = max (res)
result = test_list[res.index(x)]
print ( "Maximum frequency substring : " + str (result))
|
Output
The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg
Time Complexity : O(n)
Auxiliary Space : O(n)
Method 5: Using a dictionary to count occurrences
In this approach, we can use a dictionary to count the occurrences of each substring in the list. We can iterate over the string and for each substring in the list, we can count the number of occurrences of that substring in the string and update the count in the dictionary. Finally, we can find the substring with the maximum count in the dictionary.
Approach:
- Initialize an empty dictionary to count the occurrences of substrings.
- Iterate over the string using a for loop.
- For each substring in the list, find the number of occurrences of that substring in the string using the count() method and update the count in the dictionary.
- Find the substring with the maximum count in the dictionary.
- Return the maximum frequency substring.
Example:
Python3
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = [ 'gfg' , 'is' , 'best' ]
print ( "The original string is : " + test_str)
print ( "The original list is : " + str (test_list))
count_dict = {}
for sub in test_list:
count_dict[sub] = test_str.count(sub)
res = max (count_dict, key = count_dict.get)
print ( "Maximum frequency substring : " + str (res))
|
Output
The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg
Time complexity: O(n*m), where n is the length of the string and m is the total number of substrings in the list.
Auxiliary space: O(m), where m is the total number of substrings in the list.
Method 6: Using itertools.product() and count()
- Import the product function from the itertools module.
- Use the product() function to generate all possible substrings of length len(sub) for each substring sub in test_list.
- Count the number of occurrences of each substring using the count() method.
- Initialize a variable max_count to 0 and a variable max_substring to an empty string.
- Loop through the substrings and their counts.
- If the current count is greater than max_count, update max_count and max_substring to the corresponding substring.
- Print the maximum occurring substring.
Example:
Python3
import itertools
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = [ 'gfg' , 'is' , 'best' ]
print ( "The original string is : " + test_str)
print ( "The original list is : " + str (test_list))
max_count = 0
max_substring = ""
for sub in test_list:
for substring in itertools.product( * [sub] * len (sub)):
count = test_str.count(''.join(substring))
if count > max_count:
max_count = count
max_substring = ''.join(substring)
print ( "Maximum frequency substring : " + str (max_substring))
|
Output
The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg
Time complexity: O(n*m^2), where n is the length of test_list and m is the maximum length of a substring in test_list.
Auxiliary space: O(m)
Please Login to comment...