Open In App

Python | Remove all duplicates words from a given sentence

Last Updated : 18 Mar, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Given a sentence containing n words/strings. Remove all duplicates words/strings which are similar to each others.

Examples:  

Input : Geeks for Geeks
Output : Geeks for

Input : Python is great and Java is also great
Output : is also Java Python and great

We can solve this problem quickly using python Counter() method. Approach is very simple.

1) Split input sentence separated by space into words. 
2) So to get all those strings together first we will join each string in given list of strings. 
3) Now create a dictionary using Counter method having strings as keys and their frequencies as values. 
4) Join each words are unique to form single string. 

Python




from collections import Counter
 
def remov_duplicates(input):
 
    # split input string separated by space
    input = input.split(" ")
 
    # now create dictionary using counter method
    # which will have strings as key and their
    # frequencies as value
    UniqW = Counter(input)
 
    # joins two adjacent elements in iterable way
    s = " ".join(UniqW.keys())
    print (s)
 
# Driver program
if __name__ == "__main__":
    input = 'Python is great and Java is also great'
    remov_duplicates(input)


Output

and great Java Python is also

Time Complexity: O(N)
Auxiliary Space: O(N)

Method 2:  

Python




# Program without using any external library
s = "Python is great and Java is also great"
l = s.split()
k = []
for i in l:
 
    # If condition is used to store unique string
    # in another list 'k'
    if (s.count(i)>=1 and (i not in k)):
        k.append(i)
print(' '.join(k))


Output

Python is great and Java also

Time Complexity: O(N*N)
Auxiliary Space: O(N)

Method 3: Another shorter implementation:

Python3




# Python3 program
 
string = 'Python is great and Java is also great'
 
print(' '.join(dict.fromkeys(string.split())))


Output

Python is great and Java also

Time Complexity: O(N)
Auxiliary Space: O(N)

Method 4: Using set() 

Python3




string = 'Python is great and Java is also great'
print(' '.join(set(string.split())))


Output

Java also great and Python is

Time Complexity: O(N)
Auxiliary Space: O(N)

Method 5:  using operator.countOf()

Python3




# Program using operator.countOf()
import operator as op
s = "Python is great and Java is also great"
l = s.split()
k = []
for i in l:
  # If condition is used to store unique string
  # in another list 'k'
  if (op.countOf(l,i)>=1 and (i not in k)):
    k.append(i)
print(' '.join(k))


Output

Python is great and Java also

Time Complexity: O(N)
Auxiliary Space: O(N)

Method 6:  

It uses a loop to traverse through each word of the sentence, and stores the unique words in a separate list using an if condition to check if the word is already present in the list.

Follow the steps below to implement the above idea:

  • Split the given sentence into words/strings and store it in a list.
  • Create an empty set to store the distinct words/strings.
  • Iterate over the list of words/strings, and for each word, check if it is already in the set.
  • If the word is not in the set, add it to the set.
  • If the word is already in the set, skip it.
  • Finally, join the words in the set using a space and return it as the output.

Below is the implementation of the above approach:

Python3




def remove_duplicates(sentence):
    words = sentence.split(" ")
    result = []
    for word in words:
        if word not in result:
            result.append(word)
    return " ".join(result)
 
sentence = "Python is great and Java is also great"
print(remove_duplicates(sentence))


Output

Python is great and Java also

Time complexity: O(n^2) because of the list result that stores unique words, which is searched for every word in the input sentence. 
Auxiliary space: O(n) because we are storing unique words in the result list.

Method 7:  Using Recursive method.

Algorithm:

  1. Split the input sentence into words.
  2. If there is only one word, return it.
  3. If the first word is present in the rest of the words, call the function recursively with the rest of the words.
  4. If the first word is not present in the rest of the words, concatenate it with the result of calling the function recursively with the rest of the words.
  5. Return the final result as a string.

Python3




def remove_duplicates(sentence):
    words = sentence.split(" ")
    if len(words) == 1:
        return words[0]
    if words[0] in words[1:]:
        return remove_duplicates(" ".join(words[1:]))
    else:
        return words[0] + " " + remove_duplicates(" ".join(words[1:]))
 
sentence = "Python is great and Java is also great"
print(remove_duplicates(sentence))


Output

Python and Java is also great

Time complexity:
The time complexity of this algorithm is O(n^2), where n is the number of words in the input sentence. This is because for each word in the input sentence, we are checking if it is present in the rest of the words using the in operator, which has a time complexity of O(n) in the worst case. Therefore, the total time complexity of the algorithm is O(n^2).

Space complexity:
The space complexity of this algorithm is O(n), where n is the number of words in the input sentence. This is because we are using recursion to call the function with smaller subsets of the input sentence, which results in a recursive call stack. The maximum depth of the call stack is equal to the number of words in the input sentence, so the space complexity is O(n). Additionally, we are creating a list to store the words in the output, which also takes O(n) space. Therefore, the total space complexity of the algorithm is O(n).

Method #8:Using reduce

  1. The remove_duplicates function takes an input string as input and splits it into a list of words using the split() method. This takes O(n) time where n is the length of the input string.
  2. The function initializes an empty list unique_words to store the unique words in the input string.
  3. The function uses the reduce() function from the functools module to iterate over the list of words and remove duplicates. The reduce() function takes O(n) time to execute where n is the number of words in the input string.
  4. The lambda function inside the reduce() function checks if a word is already in the accumulator list x and either returns x unchanged or appends the new word y to the list x.
  5. Finally, the function returns a string joined from the list of unique words using the join() method. This takes O(n) time where n is the length of the output string.

Python3




from functools import reduce
 
def remove_duplicates(input_str):
    words = input_str.split()
    unique_words = reduce(lambda x, y: x if y in x else x + [y], [[], ] + words)
    return ' '.join(unique_words)
 
input_str = 'Python is great and Java is also great'
print(remove_duplicates(input_str))
#This code is contributed by Vinay Pinjala.


Output

Python is great and Java also

The time complexity of the remove_duplicates() function is O(n^2) where n is the number of words in the input string.

This is because the reduce() function inside the remove_duplicates() function iterates over each word in the input string, and for each word, it checks whether that word already exists in the list of unique words, which takes O(n) time in the worst case.

Therefore, the time complexity of the function is O(n^2) because it has to perform this check for each word in the input string.

The auxiliary space of the remove_duplicates() function is O(n) because it needs to store all the unique words in the output list.

In the worst case, when there are no duplicates in the input string, the size of the output list is equal to the size of the input list, so the space complexity is O(n).



Similar Reads

Python | Check if given words appear together in a list of sentence
Given a list of sentences 'sentence' and a list of words 'words', write a Python program to find which sentence in the list of sentences consist of all words contained in 'words' and return them within a list. Examples: Input : sentence = ['I love tea', 'He hates tea', 'We love tea'] words = ['love', 'tea'] Output : ['I love tea', 'We love tea'] In
7 min read
Find most similar sentence in the file to the input sentence | NLP
In this article, we will find the most similar sentence in the file to the input sentence. Example: File content: "This is movie." "This is romantic movie" "This is a girl." Input: "This is a boy" Similar sentence to input: "This is a girl", "This is movie". Approach: Create a list to store all the unique words of the file.Convert all the sentences
2 min read
Remove All Duplicates from a Given String in Python
We are given a string and we need to remove all duplicates from it. What will be the output if the order of character matters? Examples: Input : geeksforgeeks Output : geksfor This problem has an existing solution please refer to Remove all duplicates from a given string. Method 1: [GFGTABS] Python from collections import OrderedDict # Function to
3 min read
Python | Sort words of sentence in ascending order
Given a sentence, sort it alphabetically in ascending order. Examples: Input : to learn programming refer geeksforgeeksOutput : geeksforgeeks learn programming refer to Input : geeks for geeksOutput : for geeks geeks Approach 1 : We will use the built-in library function to sort the words of the sentence in ascending order. Prerequisites: split() s
2 min read
Python program to count words in a sentence
Data preprocessing is an important task in text classification. With the emergence of Python in the field of data science, it is essential to have certain shorthands to have the upper hand among others. This article discusses ways to count words in a sentence, it starts with space-separated words but also includes ways to in presence of special cha
7 min read
Python | Split a sentence into list of words
Given a Sentence, write a Python program to convert the given sentence into a list of words. Examples: Input : 'Hello World' Output : ['Hello', 'world']Method 1: Split a sentence into a list using split() The simplest approach provided by Python to convert the given list of Sentences into words with separate indices is to use split() method. This m
5 min read
Python groupby method to remove all consecutive duplicates
Given a string S, remove all the consecutive duplicates. Examples: Input : aaaaabbbbbb Output : ab Input : geeksforgeeks Output : geksforgeks Input : aabccba Output : abcba We have existing solution for this problem please refer Remove all consecutive duplicates from the string link. We can solve this problem in python quickly using itertools.group
2 min read
Python | Remove all duplicates and permutations in nested list
Given a nested list, the task is to remove all duplicates and permutations in that nested list. Input: [[-11, 0, 11], [-11, 11, 0], [-11, 0, 11], [-11, 2, -11], [-11, 2, -11], [-11, -11, 2]] Output: {(-11, 0, 11), (-11, -11, 2)} Input: [[-1, 5, 3], [3, 5, 0], [-1, 5, 3], [1, 3, 5], [-1, 3, 5], [5, -1, 3]] Output: {(1, 3, 5), (0, 3, 5), (-1, 3, 5)}
4 min read
Python | Sort given list by frequency and remove duplicates
Problems associated with sorting and removal of duplicates is quite common in development domain and general coding as well. The sorting by frequency has been discussed, but sometimes, we even wish to remove the duplicates without using more LOC's and in a shorter way. Let's discuss certain ways in which this can be done. Method #1 : Using count()
5 min read
Python - Compute the frequency of words after removing stop words and stemming
In this article we are going to tokenize sentence, paragraph, and webpage contents using the NLTK toolkit in the python environment then we will remove stop words and apply stemming on the contents of sentences, paragraphs, and webpage. Finally, we will Compute the frequency of words after removing stop words and stemming. Modules Needed bs4: Beaut
8 min read