Python program to find Indices of Overlapping Substrings
Last Updated :
31 Jul, 2023
To count the number of overlapping sub-strings in Python we can use the Re module. To get the indices we will use the re.finditer() method. But it returns the count of non-overlapping indices only.
Examples:
Input: String: “geeksforgeeksforgeeks” ; Pattern: “geeksforgeeks”
Output: [0, 8]
Explanation: The pattern is overlapping the string from 0th index to 12th index and again overlapping it from 8th index to 20th index. Hence, the output is the starting positions of overlapping i.e index 0 and index 8.
Input: String: “barfoobarfoobarfoobarfoobarfoo” ; Pattern: “foobarfoo”
Output: [3, 9,15, 21]
Explanation: The pattern is overlapping the string from index 3, 9 , 15 and 21.
Python program to find Indices of Overlapping Substrings
This method returns the count of non-overlapping indices only from a string having multiple occurrences overlapping pattern. Below is a program depicting the use of finditer() method.
Python3
import re
def CntSubstr(pattern, string):
a = [m.start() for m in re.finditer(pattern, string)]
return a
string = 'geeksforgeeksforgeeks'
pattern = 'geeksforgeeks'
print (CntSubstr(pattern, string))
|
Output:
[0]
Therefore, to get the overlapping indices as well we need to do is escape out of the regular expressions in the pattern. The definition in the explicit function helps to select the characters in a partial way.
Approach:
- re.finditer() helps in finding the indices where the match object occurs. As it returns an iterable object, the start() method helps in return the indices or else it would show that a match object has been found at some location.
- The standard method in matching using re module is greedy which means the maximum number of characters are matched. Therefore, the ?={0} helps in minimum number of matches.
- To match it so that partial characters are matched, the re.escape() helps in escaping out the special characters which have been added before such as the ?={0}.
- The result is that by adding some modifications, the finditer() method returns a list of overlapping indices.
Below is the implementation of the above approach:
Python3
import re
def CntSubstr(pattern, string):
a = [m.start() for m in re.finditer(
'(?={0})' . format (re.escape(pattern)), string)]
return a
string1 = 'geeksforgeeksforgeeks'
pattern1 = 'geeksforgeeks'
string2 = 'barfoobarfoobarfoobarfoobarfoo'
pattern2 = 'foobarfoo'
print (CntSubstr(pattern1, string1))
print (CntSubstr(pattern2, string2))
|
Output:
[0, 8]
[3, 9, 15, 21]
The Time and Space Complexity for all the methods are the same:
Time Complexity: O(n)
Space Complexity: O(n)
Another approach is to use the sliding window method
Here’s a step-by-step algorithm for implementing the Python program to find indices of overlapping substrings using the sliding window method:
Define a function named overlapping_substring that takes in two parameters – string and pattern.
Initialize an empty list named result to store the indices of overlapping substrings.
Loop through the string using the range function with the length of string minus the length of pattern plus 1 as the upper limit. This is because the sliding window will only be valid until the last substring of length pattern.
Check if the current substring of string of length pattern starting at index i is equal to the pattern. If it is, append the index i to the result list.
Return the result list.
Define two example strings string1 and string2 along with their respective pattern substrings.
Call the overlapping_substring function with the string1 and pattern1 arguments and print the resulting list of indices.
Call the overlapping_substring function with the string2 and pattern2 arguments and print the resulting list of indices.
Python3
def overlapping_substring(string, pattern):
result = []
for i in range ( len (string) - len (pattern) + 1 ):
if string[i: i + len (pattern)] = = pattern:
result.append(i)
return result
string1 = 'geeksforgeeksforgeeks'
pattern1 = 'geeksforgeeks'
string2 = 'barfoobarfoobarfoobarfoobarfoo'
pattern2 = 'foobarfoo'
print (overlapping_substring(string1, pattern1))
print (overlapping_substring(string2, pattern2))
|
Output
[0, 8]
[3, 9, 15, 21]
The time complexity of this algorithm is O(n * m), where n is the length of string and m is the length of pattern. This is because we are looping through the string and comparing each substring of length pattern with the pattern. The auxiliary space complexity is also O(n), where n is the length of the string, because we are creating a new list to store the indices of overlapping substrings.
Overall, the sliding window method is an efficient way to find overlapping substrings in a string, especially when the length of the pattern is small compared to the length of the string.
Please Login to comment...