Python | Split by repeating substring
Last Updated :
23 Apr, 2023
Sometimes, while working with Python strings, we can have a problem in which we need to perform splitting. This can be of a custom nature. In this, we can have a split in which we need to split by all the repetitions. This can have applications in many domains. Let us discuss certain ways in which this task can be performed.
Method #1: Using * operator + len() This is one of the way in which we can perform this task. In this, we compute the length of the repeated string and then divide the list to obtain root and construct new list using * operator.
Python3
test_str = "gfggfggfggfggfggfggfggfg"
print ( "The original string is : " + test_str)
K = 'gfg'
temp = len (test_str) / / len ( str (K))
res = [K] * temp
print ( "The split string is : " + str (res))
|
Output :
The original string is : gfggfggfggfggfggfggfggfg
The split string is : ['gfg', 'gfg', 'gfg', 'gfg', 'gfg', 'gfg', 'gfg', 'gfg']
Method #2 : Using re.findall() This is yet another way in which this problem can be solved. In this, we use findall() to get all the substrings and split is also performed internally.
Python3
import re
test_str = "gfggfggfggfggfggfggfggfg"
print ( "The original string is : " + test_str)
K = 'gfg'
res = re.findall(K, test_str)
print ( "The split string is : " + str (res))
|
Output :
The original string is : gfggfggfggfggfggfggfggfg
The split string is : ['gfg', 'gfg', 'gfg', 'gfg', 'gfg', 'gfg', 'gfg', 'gfg']
Method #3 : Using count() method and * operator
Python3
test_str = "gfggfggfggfggfggfggfggfg"
print ( "The original string is : " + test_str)
K = 'gfg'
re = test_str.count(K)
res = [K] * re
print ( "The split string is : " + str (res))
|
Output
The original string is : gfggfggfggfggfggfggfggfg
The split string is : ['gfg', 'gfg', 'gfg', 'gfg', 'gfg', 'gfg', 'gfg', 'gfg']
The Time and Space Complexity for all the methods are the same:
Time Complexity: O(n)
Auxiliary Space: O(n)
Method #4:Using loop and slicing
Python3
test_str = "gfggfggfggfggfggfggfggfg"
print ( "The original string is : " + test_str)
K = 'gfg'
res = []
start = 0
while start < len (test_str):
end = start + len (K)
if test_str[start:end] = = K:
res.append(K)
start = end
else :
start + = 1
print ( "The split string is : " + str (res))
|
Output
The original string is : gfggfggfggfggfggfggfggfg
The split string is : ['gfg', 'gfg', 'gfg', 'gfg', 'gfg', 'gfg', 'gfg', 'gfg']
Time complexity: O(n), The time complexity of this method is linear, as it involves looping through the input string once and performing constant time operations on each character.
Auxiliary Space: O(n), The space complexity of this method is linear, as it involves creating a list of strings that will be the split result. The length of this list will be proportional to the length of the input string.
Method 5 : use the regular expression module re
- Import the ‘re’ module which stands for “regular expressions”. This module provides a way to work with regular expressions in Python.
- Initialize a string ‘test_str’ with some repeated substrings.
- Initialize a target string ‘K’ with a substring we want to split by.
- Use the ‘re.findall()’ method to split the ‘test_str’ string by the target ‘K’ substring. This method returns a list of all non-overlapping matches of the regular expression in the string.
- Store the result of the ‘re.findall()’ method in a variable named ‘res’.
- Print the original string ‘test_str’ using the ‘print()’ function.
- Print the split string ‘res’ using the ‘print()’ function.
- Convert the ‘res’ list to a string using the ‘str()’ function to make it printable.
- Concatenate the string “The original string is : ” with ‘test_str’ using the ‘+’ operator and print the resulting string.
- Concatenate the string “The split string is : ” with the converted ‘res’ string using the ‘+’ operator and print the resulting string.
- The program execution ends here.
Python3
import re
test_str = "gfggfggfggfggfggfggfggfg"
K = 'gfg'
res = re.findall(K, test_str)
print ( "The original string is : " + test_str)
print ( "The split string is : " + str (res))
|
Output
The original string is : gfggfggfggfggfggfggfggfg
The split string is : ['gfg', 'gfg', 'gfg', 'gfg', 'gfg', 'gfg', 'gfg', 'gfg']
The time complexity of this approach is O(n), where n is the length of the input string.
The auxiliary space required is O(k), where k is the number of occurrences of the target substring in the input string.
Please Login to comment...