Python – Successive Characters Frequency
Last Updated :
10 May, 2023
Sometimes, while working with Python strings, we can have a problem in which we need to find the frequency of next character of a particular word in string. This is quite unique problem and has the potential for application in day-day programming and web development. Let’s discuss certain ways in which this task can be performed.
Input : test_str = 'geeks are for geeksforgeeks', que_word = "geek"
Output : {'s': 3}
Input : test_str = 'geek', que_word = "geek"
Output : {}
Method #1 : Using loop + count() + re.findall()
The combination of the above methods constitutes the brute force method to perform this task. In this, we perform the task of counting using count(), and the character is searched using findall() function.
Python3
import re
test_str = 'geeksforgeeks is best for geeks. A geek should take interest.'
print ( "The original string is : " + str (test_str))
que_word = "geek"
temp = []
for sub in re.findall(que_word + '.' , test_str):
temp.append(sub[ - 1 ])
res = {que_word : temp.count(que_word) for que_word in temp}
print ( "The Characters Frequency is : " + str (res))
|
Output :
The original string is : geeksforgeeks is best for geeks. A geek should take interest.
The Characters Frequency is : {'s': 3, ' ': 1}
Method #2 : Using Counter() + list comprehension + re.findall()
The combination of the above functions is used to perform the following task. In this, we use Counter() instead of count() to solve this problem. Works with newer versions of Python.
Python3
from collections import Counter
import re
test_str = 'geeksforgeeks is best for geeks. A geek should take interest.'
print ( "The original string is : " + str (test_str))
que_word = "geek"
res = dict (Counter(re.findall(f '{que_word}(.)' , test_str,
flags = re.IGNORECASE)))
print ( "The Characters Frequency is : " + str (res))
|
Output :
The original string is : geeksforgeeks is best for geeks. A geek should take interest.
The Characters Frequency is : {'s': 3, ' ': 1}
Time Complexity: O(n)
Auxiliary Space: O(n)
Method #3 : Using operator.countOf()
Python3
import re
import operator as op
test_str = 'geeksforgeeks is best for geeks. A geek should take interest.'
print ( "The original string is : " + str (test_str))
que_word = "geek"
temp = []
for sub in re.findall(que_word + '.' , test_str):
temp.append(sub[ - 1 ])
res = {que_word: op.countOf(temp, que_word) for que_word in temp}
print ( "The Characters Frequency is : " + str (res))
|
Output
The original string is : geeksforgeeks is best for geeks. A geek should take interest.
The Characters Frequency is : {'s': 3, ' ': 1}
Time Complexity: O(n)
Auxiliary Space: O(n)
Method 4: Using a loop and dictionary
- Initialize the input string and the queried word.
- Initialize an empty dictionary to store the frequency of successive characters.
- Loop through the input string, checking if each substring of length len(que_word) starting at each index of the string is equal to the queried word.
- If a substring is equal to the queried word, extract the character immediately following the substring.
- If the character is already a key in the dictionary, increment its value by 1. Otherwise, add the character as a key with a value of 1.
- Once the loop completes, print the dictionary with the character frequencies.
Example:
Python3
test_str = 'geeksforgeeks is best for geeks. A geek should take interest.'
que_word = 'geek'
freq_dict = {}
for i in range ( len (test_str) - 1 ):
if test_str[i:i + len (que_word)] = = que_word:
char = test_str[i + len (que_word)]
if char in freq_dict:
freq_dict[char] + = 1
else :
freq_dict[char] = 1
print ( 'The Characters Frequency is:' , freq_dict)
|
Output
The Characters Frequency is: {'s': 3, ' ': 1}
Time complexity: O(n), where n is the length of the input string.
Auxiliary space: O(n), as we are storing a dictionary with potentially n/2 keys (if every character in the string follows the queried word) and their corresponding frequencies.
Method #5: Using regex search() and defaultdict()
Step-by-step approach:
- Initialize the input string test_str to the value ‘geeksforgeeks is best for geeks. A geek should take interest.’.
- Initialize the query word que_word to the value ‘geek’.
- Initialize an empty dictionary freq_dict using defaultdict(int), which allows us to set the initial value of each key to 0.
- Loop through all the matches of the regular expression pattern que_word + ‘(.)’ in the input string test_str.
- For each match, retrieve the character following the query word, which is captured in the first group of the regular expression pattern. Increment the count of that character in the freq_dict dictionary.
- After processing all the matches, print the frequency dictionary as a regular dictionary using the dict() constructor.
- The output of the program is the characters frequency dictionary, where the keys are the characters following the query word in the input string and the values are their respective frequencies.
Example:
Python3
import re
from collections import defaultdict
test_str = 'geeksforgeeks is best for geeks. A geek should take interest.'
que_word = 'geek'
freq_dict = defaultdict( int )
for match in re.finditer(que_word + '(.)' , test_str):
freq_dict[match.group( 1 )] + = 1
print ( 'The Characters Frequency is:' , dict (freq_dict))
|
Output
The Characters Frequency is: {'s': 3, ' ': 1}
Time Complexity: O(n), where n is the length of the input string.
Auxiliary Space: O(k), where k is the number of distinct characters following the query word.
Method #6: Using itertools.groupby() and Counter()
Step-by-step approach:
- Import the itertools and Counter modules.
- Use the re.findall() function to find all occurrences of the query word followed by a character in the input string test_str.
- Use the itertools.groupby() function to group the characters following the query word.
- Use the Counter() function to count the frequency of each group.
- Print the result.
Python3
import re
import itertools
from collections import Counter
test_str = 'geeksforgeeks is best for geeks. A geek should take interest.'
que_word = 'geek'
matches = re.findall(que_word + '(.)' , test_str)
groups = itertools.groupby(matches)
freq_dict = Counter([char for _, char_group in groups for char in char_group])
print ( 'The Characters Frequency is:' , freq_dict)
|
Output
The Characters Frequency is: Counter({'s': 3, ' ': 1})
Time complexity: O(n), where n is the length of the input string test_str.
Auxiliary space: O(k), where k is the number of unique characters following the query word.
Please Login to comment...