Open In App

Find the k most frequent words from data set in Python

Last Updated : 10 Dec, 2017
Improve
Improve
Like Article
Like
Save
Share
Report

Given the data set, we can find k number of most frequent words.

The solution of this problem already present as Find the k most frequent words from a file. But we can solve this problem very efficiently in Python with the help of some high performance modules.

In order to do this, we’ll use a high performance data type module, which is collections. This module got some specialized container datatypes and we will use counter class from this module.


Examples :

Input : "John is the son of John second. 
         Second son of John second is William second."
Output : [('second', 4), ('John', 3), ('son', 2), ('is', 2)]

Explanation :
1. The string will converted into list like this :
    ['John', 'is', 'the', 'son', 'of', 'John', 
     'second', 'Second', 'son', 'of', 'John', 
     'second', 'is', 'William', 'second']
2. Now 'most_common(4)' will return four most 
   frequent words and its count in tuple. 


Input : "geeks for geeks is for geeks. By geeks
         and for the geeks."
Output : [('geeks', 5), ('for', 3)]

Explanation :
most_common(2) will return two most frequent words and their count.

Approach :

  1. Import Counter class from collections module.
  2. Split the string into list using split(), it will return the lists of words.
  3. Now pass the list to the instance of Counter class
  4. The function 'most-common()' inside Counter will return the list of most frequent words from list and its count.

Below is Python implementation of above approach :




# Python program to find the k most frequent words
# from data set
from collections import Counter
  
data_set = "Welcome to the world of Geeks " \
"This portal has been created to provide well written well" \
"thought and well explained solutions for selected questions " \
"If you like Geeks for Geeks and would like to contribute " \
"here is your chance You can write article and mail your article " \
" to contribute at geeksforgeeks org See your article appearing on " \
"the Geeks for Geeks main page and help thousands of other Geeks. " \
  
# split() returns list of all the words in the string
split_it = data_set.split()
  
# Pass the split_it list to instance of Counter class.
Counter = Counter(split_it)
  
# most_common() produces k frequently encountered
# input values and their respective counts.
most_occur = Counter.most_common(4)
  
print(most_occur)


Output :

[('Geeks', 5), ('to', 4), ('and', 4), ('article', 3)]

Similar Reads

Python | Find most frequent element in a list
Given a list, find the most frequent element in it. If there are multiple elements that appear maximum number of times, print any one of them. Examples: Input : [2, 1, 2, 2, 1, 3] Output : 2 Input : ['Dog', 'Cat', 'Dog'] Output : Dog Approach #1 : Naive ApproachThis is a brute force approach in which we make use of for loop to count the frequency o
4 min read
Find the most frequent value in a NumPy array
In this article, let's discuss how to find the most frequent value in the NumPy array. Steps to find the most frequency value in a NumPy array: Create a NumPy array.Apply bincount() method of NumPy to get the count of occurrences of each element in the array.The n, apply argmax() method to get the value having a maximum number of occurrences(freque
1 min read
Find top k (or most frequent) numbers in a stream
Given an array of n numbers. Your task is to read numbers from the array and keep at-most K numbers at the top (According to their decreasing frequency) every time a new number is read. We basically need to print top k numbers sorted by frequency when input stream has included k distinct elements, else need to print all distinct elements sorted by
11 min read
Python program for most frequent word in Strings List
Given Strings List, write a Python program to get word with most number of occurrences. Example: Input : test_list = ["gfg is best for geeks", "geeks love gfg", "gfg is best"] Output : gfg Explanation : gfg occurs 3 times, most in strings in total. Input : test_list = ["geeks love gfg", "geeks are best"] Output : geeks Explanation : geeks occurs 2
6 min read
Kth most frequent Character in a given String
Given a string str and an integer K, the task is to find the K-th most frequent character in the string. If there are multiple characters that can account as K-th most frequent character then, print any one of them.Examples: Input: str = "GeeksforGeeks", K = 3 Output: f Explanation: K = 3, here 'e' appears 4 times & 'g', 'k', 's' appears 2 time
6 min read
How to display most frequent value in a Pandas series?
In this article, our basic task is to print the most frequent value in a series. We can find the number of occurrences of elements using the value_counts() method. From that the most frequent element can be accessed by using the mode() method. Example 1 : # importing the module import pandas as pd # creating the series series = pd.Series(['g', 'e',
1 min read
Python - Compute the frequency of words after removing stop words and stemming
In this article we are going to tokenize sentence, paragraph, and webpage contents using the NLTK toolkit in the python environment then we will remove stop words and apply stemming on the contents of sentences, paragraphs, and webpage. Finally, we will Compute the frequency of words after removing stop words and stemming. Modules Needed bs4: Beaut
8 min read
Python | Find top K frequent elements from a list of tuples
Given a list of tuples with word as first element and its frequency as second element, the task is to find top k frequent element. Below are some ways to above achieve the above task. Method #1: Using defaultdict C/C++ Code # Python code to find top 'k' frequent element # Importing import collections from operator import itemgetter from itertools i
5 min read
Python - Least Frequent Character in String
This article gives us the methods to find the frequency of minimum occurring character in a python string. This is quite important utility nowadays and knowledge of it is always useful. Let’s discuss certain ways in which this task can be performed. Method 1 : Naive method + min() In this method, we simply iterate through the string and form a key
5 min read
Python Program for Least frequent element in an array
Given an array, find the least frequent element in it. If there are multiple elements that appear least number of times, print any one of them.Examples : Input : arr[] = {1, 3, 2, 1, 2, 2, 3, 1} Output : 3 3 appears minimum number of times in given array. Input : arr[] = {10, 20, 30} Output : 10 or 20 or 30 Method 1:A simple solution is to run two
3 min read
Practice Tags :