Open In App

Python program to find Indices of Overlapping Substrings

Last Updated : 31 Jul, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

To count the number of overlapping sub-strings in Python we can use the Re module. To get the indices we will use the re.finditer() method. But it returns the count of non-overlapping indices only.

Examples:

Input: String: “geeksforgeeksforgeeks” ; Pattern: “geeksforgeeks” 

Output: [0, 8] 

Explanation: The pattern is overlapping the string from 0th index to 12th index and again overlapping it from 8th index to 20th index. Hence, the output is the starting positions of overlapping i.e index 0 and index 8.   

Input: String: “barfoobarfoobarfoobarfoobarfoo” ;  Pattern: “foobarfoo” 

Output: [3, 9,15, 21] 

Explanation: The pattern is overlapping the string from index 3, 9 , 15 and 21.

Python program to find Indices of Overlapping Substrings

This method returns the count of non-overlapping indices only from a string having multiple occurrences overlapping pattern. Below is a program depicting the use of finditer() method.

Python3




# Import required module
import re
 
 
# Function to depict use of finditer() method
def CntSubstr(pattern, string):
 
    # Array storing the indices
    a = [m.start() for m in re.finditer(pattern, string)]
    return a
 
 
# Driver Code
string = 'geeksforgeeksforgeeks'
pattern = 'geeksforgeeks'
 
# Printing index values of non-overlapping pattern
print(CntSubstr(pattern, string))


Output:

[0]

Therefore, to get the overlapping indices as well we need to do is escape out of the regular expressions in the pattern. The definition in the explicit function helps to select the characters in a partial way.

Approach:

  1. re.finditer() helps in finding the indices where the match object occurs. As it returns an iterable object, the start() method helps in return the indices or else it would show that a match object has been found at some location.
  2. The standard method in matching using re module is greedy which means the maximum number of characters are matched. Therefore, the ?={0} helps in minimum number of matches.
  3. To match it so that partial characters are matched, the re.escape() helps in escaping out the special characters which have been added before such as the ?={0}.
  4. The result is that by adding some modifications, the finditer() method returns a list of overlapping indices.

Below is the implementation of the above approach:

Python3




# Import required module
import re
 
 
# Explicit function to Count
# Indices of Overlapping Substrings
def CntSubstr(pattern, string):
    a = [m.start() for m in re.finditer(
        '(?={0})'.format(re.escape(pattern)), string)]
    return a
 
 
# Driver Code
string1 = 'geeksforgeeksforgeeks'
pattern1 = 'geeksforgeeks'
 
string2 = 'barfoobarfoobarfoobarfoobarfoo'
pattern2 = 'foobarfoo'
 
 
# Calling the function
print(CntSubstr(pattern1, string1))
print(CntSubstr(pattern2, string2))


Output:

[0, 8]
[3, 9, 15, 21]

The Time and Space Complexity for all the methods are the same:

Time Complexity:  O(n)

Space Complexity: O(n)

Another approach is to use the sliding window method
Here’s a step-by-step algorithm for implementing the Python program to find indices of overlapping substrings using the sliding window method:

Define a function named overlapping_substring that takes in two parameters – string and pattern.
Initialize an empty list named result to store the indices of overlapping substrings.
Loop through the string using the range function with the length of string minus the length of pattern plus 1 as the upper limit. This is because the sliding window will only be valid until the last substring of length pattern.
Check if the current substring of string of length pattern starting at index i is equal to the pattern. If it is, append the index i to the result list.
Return the result list.
Define two example strings string1 and string2 along with their respective pattern substrings.
Call the overlapping_substring function with the string1 and pattern1 arguments and print the resulting list of indices.
Call the overlapping_substring function with the string2 and pattern2 arguments and print the resulting list of indices.

Python3




#Sliding window method
def overlapping_substring(string, pattern):
  result = []
  for i in range(len(string) - len(pattern) + 1):
    if string[i: i + len(pattern)] == pattern:
      result.append(i)
  return result
 
string1 = 'geeksforgeeksforgeeks'
pattern1 = 'geeksforgeeks'
 
string2 = 'barfoobarfoobarfoobarfoobarfoo'
pattern2 = 'foobarfoo'
 
print(overlapping_substring(string1, pattern1))
print(overlapping_substring(string2, pattern2))


Output

[0, 8]
[3, 9, 15, 21]

The time complexity of this algorithm is O(n * m), where n is the length of string and m is the length of pattern. This is because we are looping through the string and comparing each substring of length pattern with the pattern. The auxiliary space complexity is also O(n), where n is the length of the string, because we are creating a new list to store the indices of overlapping substrings.

Overall, the sliding window method is an efficient way to find overlapping substrings in a string, especially when the length of the pattern is small compared to the length of the string.



Previous Article
Next Article

Similar Reads

Python - Find all combinations of overlapping substrings of a string
Given a string, the task is to write a Python program to find all combinations of overlapping substrings of a string and store it in a list. The list of lists will be ordered and grouped by length of substrings. Input : test_str = 'Geeks4G' Output : [['', '', '', '', '', '', '', ''], ['G', 'e', 'e', 'k', 's', '4', 'G'], ['Ge', 'ee', 'ek', 'ks', 's4
2 min read
NumPy indices() Method | Create Array of Indices
The indices() method returns an array representing the indices of a grid. It computes an array where the subarrays contain index values 0, 1, … varying only along the corresponding axis. Example C/C++ Code import numpy as np gfg = np.indices((2, 3)) print (gfg) Output : [[[0 0 0] [1 1 1]] [[0 1 2] [0 1 2]]]Syntax numpy.indices(dimensions, dtype, sp
2 min read
Python program for sum of consecutive numbers with overlapping in lists
Given a List, perform summation of consecutive elements, by overlapping. Input : test_list = [4, 7, 3, 2] Output : [11, 10, 5, 6] Explanation : 4 + 7 = 11, 7 + 3 = 10, 3 + 2 = 5, and 2 + 4 = 6. Input : test_list = [4, 7, 3] Output : [11, 10, 7] Explanation : 4+7=11, 7+3=10, 3+4=7. Method 1 : Using list comprehension + zip() In this, we zip list, wi
3 min read
Python Program to Check Overlapping Prefix - Suffix in Two Lists
Given 2 Strings, our task is to check overlapping of one string's suffix with prefix of other string. Input : test_str1 = "Gfgisbest", test_str2 = "bestforall" Output : best Explanation : best overlaps as suffix of first string and prefix of next. Input : test_str1 = "Gfgisbest", test_str2 = "restforall" Output : '' Explanation : No overlapping. Me
4 min read
Python Program to split string into k sized overlapping strings
Given a string, the task is to write a Python program to extract overlapping consecutive string slices from the original string according to size K. Example: Input : test_str = 'Geeksforgeeks', K = 4 Output : ['Geek', 'eeks', 'eksf', 'ksfo', 'sfor', 'forg', 'orge', 'rgee', 'geek', 'eeks'] Explanation : Consecutive overlapping 4 sized strings are ou
4 min read
Python Program to Merge tuple list by overlapping mid tuple
Given two lists that contain tuples as elements, the task is to write a Python program to accommodate tuples from the second list between consecutive tuples from the first list, after considering ranges present between both the consecutive tuples from the first list. Input : test_list1 = [(4, 8), (19, 22), (28, 30), (31, 50)], test_list2 = [(10, 12
11 min read
Python | Find overlapping tuples from list
Sometimes, while working with tuple data, we can have a problem in which we may need to get the tuples which overlap a certain tuple. This kind of problem can occur in Mathematics domain while working with Geometry. Let's discuss certain ways in which this problem can be solved. Method #1 : Using loop In this method, we extract the pairs with overl
5 min read
Python program to find start and end indices of all Words in a String
Given a String, return all the start indices and end indices of each word. Examples: Input : test_str = ' Geekforgeeks is Best' Output : [(1, 12), (16, 17), (19, 22)] Explanation : "Best" Starts at 19th index, and ends at 22nd index. Input : test_str = ' Geekforgeeks is Best' Output : [(1, 12), (17, 18), (20, 23)] Explanation : "Best" Starts at 20t
4 min read
Python Program to find tuple indices from other tuple list
Given Tuples list and search list consisting of tuples to search, our task is to write a Python Program to extract indices of matching tuples. Input : test_list = [(4, 5), (7, 6), (1, 0), (3, 4)], search_tup = [(3, 4), (8, 9), (7, 6), (1, 2)]Output : [3, 1]Explanation : (3, 4) from search list is found on 3rd index on test_list, hence included in r
8 min read
Python | Count overlapping substring in a given string
Given a string and a sub-string, the task is to get the count of overlapping substring from the given string. Note that in Python, the count() function returns the number of substrings in a given string, but it does not give correct results when two occurrences of the substring overlap. Consider this example - C/C++ Code string = "Geeksfor
2 min read