Open In App

Python – Maximum occurring Substring from list

Last Updated : 08 May, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Sometimes, while working with Python strings, we can have a problem in which we need to check for maximum occurring substring from strings list. This can have application in DNA sequencing in Biology and other application. Let’s discuss certain way in which this task can be performed.

Method 1 : Using regex() + groupby() + max() + lambda 

The combination of above functionalities can be used to solve this particular problem. In this, we first extract the sequences using regex function. Then the counter grouping is performed using groupby(). The last step is extracting maximum which is done using max() along with lambda function.

Python3




# Python3 code to demonstrate working of
# Maximum occurring Substring from list
# Using regex() + groupby() + max() + lambda
 
import re
import itertools
 
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
 
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
 
# Maximum occurring Substring from list
# Using regex() + groupby() + max() + lambda
seqs = re.findall(str.join('|', test_list), test_str)
 
grps = [(key, len(list(j))) for key, j in itertools.groupby(seqs)]
 
res = max(grps, key=lambda ele: ele[1])
 
# printing result
print("Maximum frequency substring : " + str(res[0]))


Output : 

The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg

 

Time complexity: O(n), where n is the length of the input string. The time complexity of regex(), groupby(), and max() is O(n).
Auxiliary space: O(k), where k is the length of the input list. This is the space needed to store the list of substrings. The space complexity of regex(), groupby(), and max() is O(1).

Method 2: Using count() and max() methods

count() returns the occurrence of a particular element in a sequence and the max() method returns the maximum of that.

Python3




# Python3 code to demonstrate working of
# Maximum occurring Substring from list
 
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
 
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
res = []
for i in test_list:
    res.append(test_str.count(i))
x = max(res)
result = test_list[res.index(x)]
# printing result
print("Maximum frequency substring : " + str(result))


Output

The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg

Time Complexity: O(n)
Auxiliary Space: O(n)

Method 3: Using re.findall() + Counter

This is an alternate approach that uses re.findall() and Counter module. In this, we extract the sequence using re.findall() and count the occurrence of each element using Counter() from collections module.

Python3




# Python3 code to demonstrate working of
# Maximum occurring Substring from list
# Using re.findall() + Counter
 
# importing modules
import collections
import re
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
 
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
 
# Maximum occurring Substring from list
# Using re.findall() + Counter
seqs = re.findall(str.join('|', test_list), test_str)
res = collections.Counter(seqs).most_common(1)[0][0]
 
# printing result
print("Maximum frequency substring : " + str(res))
# This code is contributed by Edula Vinay Kumar Reddy


Output

The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg

Time Complexity: O(n)
Auxiliary Space: O(n)

Method 4 : Using operator.countOf() and max() methods

Python3




# Python3 code to demonstrate working of
# Maximum occurring Substring from list
 
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
 
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
res = []
for i in test_list:
    import operator
    res.append(operator.countOf(test_str, i))
x = max(res)
result = test_list[res.index(x)]
# printing result
print("Maximum frequency substring : " + str(result))


Output

The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg

Time Complexity : O(n)
Auxiliary Space : O(n)

Method 5: Using a dictionary to count occurrences

In this approach, we can use a dictionary to count the occurrences of each substring in the list. We can iterate over the string and for each substring in the list, we can count the number of occurrences of that substring in the string and update the count in the dictionary. Finally, we can find the substring with the maximum count in the dictionary.

Approach:

  1. Initialize an empty dictionary to count the occurrences of substrings.
  2. Iterate over the string using a for loop.
  3. For each substring in the list, find the number of occurrences of that substring in the string using the count() method and update the count in the dictionary.
  4. Find the substring with the maximum count in the dictionary.
  5. Return the maximum frequency substring.

Example:

Python3




# Python3 code to demonstrate working of
# Maximum occurring Substring from list
# Using dictionary
 
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
 
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
 
# Maximum occurring Substring from list
# Using dictionary
count_dict = {}
for sub in test_list:
    count_dict[sub] = test_str.count(sub)
res = max(count_dict, key=count_dict.get)
 
# printing result
print("Maximum frequency substring : " + str(res))


Output

The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg

Time complexity: O(n*m), where n is the length of the string and m is the total number of substrings in the list.
Auxiliary space: O(m), where m is the total number of substrings in the list.

Method 6: Using itertools.product() and count()

  1. Import the product function from the itertools module.
  2. Use the product() function to generate all possible substrings of length len(sub) for each substring sub in test_list.
  3. Count the number of occurrences of each substring using the count() method.
  4. Initialize a variable max_count to 0 and a variable max_substring to an empty string.
  5. Loop through the substrings and their counts.
  6. If the current count is greater than max_count, update max_count and max_substring to the corresponding substring.
  7. Print the maximum occurring substring.

Example:

Python3




# Python3 code to demonstrate working of
# Maximum occurring Substring from list
# Using itertools.product() and count()
 
import itertools
 
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
 
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
 
# Maximum occurring Substring from list
# Using itertools.product() and count()
max_count = 0
max_substring = ""
for sub in test_list:
    for substring in itertools.product(*[sub]*len(sub)):
        count = test_str.count(''.join(substring))
        if count > max_count:
            max_count = count
            max_substring = ''.join(substring)
 
# printing result
print("Maximum frequency substring : " + str(max_substring))


Output

The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg

Time complexity: O(n*m^2), where n is the length of test_list and m is the maximum length of a substring in test_list.
Auxiliary space: O(m)



Similar Reads

Python | Find the Number Occurring Odd Number of Times using Lambda expression and reduce function
Given an array of positive integers. All numbers occur even number of times except one number which occurs odd number of times. Find the number in O(n) time & constant space. Examples: Input : [1, 2, 3, 2, 3, 1, 3] Output : 3 We have existing solution for this problem please refer Find the Number Occurring Odd Number of Times link. we will solv
1 min read
Python program to find the most occurring character and its count
Given a string, write a python program to find the most occurrence character and its number of occurrences. Examples: Input : hello Output : ('l', 2) Input : geeksforgeeks Output : ('e', 4) We can solve this problem quickly in python using Counter() method. Simple Approach is to 1) Create a dictionary using Counter method having strings as keys and
3 min read
The most occurring number in a string using Regex in python
Given a string str, the task is to extract all the numbers from a string and find out the most occurring element of them using Regex Python. It is guaranteed that no two element have the same frequency Examples: Input :geek55of55geeks4abc3dr2 Output :55Input :abcd1def2high2bnasvd3vjhd44Output :2Approach:Extract all the numbers from a string str usi
2 min read
Python - Remove all duplicate occurring tuple records
Sometimes, while working with records, we can have a problem of removing those records which occur more than once. This kind of application can occur in web development domain. Let’s discuss certain ways in which this task can be performed. Method #1 : Using list comprehension + set() + count() Initial approach that can be applied is that we can it
6 min read
Python program to search for the minimum element occurring consecutively n times in a matrix
Given a matrix containing n rows. Each row has an equal number of m elements. The task is to find elements that come consecutively n times horizontally, vertically, diagonally in the matrix. If there are multiple such elements then print the smallest element. If there is no such element then print -1.Examples: Input : n = 4 mat[5][5] 2 1 3 4 5 3 2
4 min read
Python - Characters occurring in multiple Strings
Sometimes, while working with Python strings, we can have problem in which we need to extract the characters which have occurrences in more than one string in string list. This kind of problem usually occurs in web development and Data Science domains. Lets discuss certain ways in which this task can be performed. Method #1 : Using Counter() + set(
5 min read
Python Program to Find the Number Occurring Odd Number of Times
Write a Python program for a given array of positive integers. All numbers occur an even number of times except one number which occurs an odd number of times. Find the number in O(n) time & constant space. Examples : Input: arr = {1, 2, 3, 2, 3, 1, 3}Output : 3 Input: arr = {5, 7, 2, 7, 5, 2, 5}Output: 5 Recommended: Please solve it on “PRACTI
3 min read
Python | Filter list of strings based on the substring list
Given two lists of strings string and substr, write a Python program to filter out all the strings in string that contains string in substr. Examples: Input : string = ['city1', 'class5', 'room2', 'city2']substr = ['class', 'city']Output : ['city1', 'class5', 'city2'] Input : string = ['coordinates', 'xyCoord', '123abc']substr = ['abc', 'xy']Output
8 min read
Python - Filter the List of String whose index in second List contains the given Substring
Given two lists, extract all elements from the first list, whose corresponding index in the second list contains the required substring. Examples: Input : test_list1 = ["Gfg", "is", "not", "best", "and", "not", "CS"], test_list2 = ["Its ok", "all ok", "wrong", "looks ok", "ok", "wrong", "thats ok"], sub_str = "ok" Output : ['Gfg', 'is', 'best', 'an
10 min read
Find all array elements occurring more than ⌊N/3⌋ times
Given an array arr[] consisting of N integers, the task is to find all the array elements which occurs more than floor (n/3) times. Examples: Input: arr[] = {5, 3, 5}Output: 5Explanation:The frequency of 5 is 2, which is more than N/3(3/3 = 1). Input: arr[] = {7, 7, 7, 3, 4, 4, 4, 5}Output: 4 7Explanation:The frequency of 7 and 4 in the array is 3,
15+ min read
Practice Tags :
three90RightbarBannerImg