Open In App

Scraping And Finding Ordered Words In A Dictionary using Python

Last Updated : 26 Nov, 2018
Improve
Improve
Like Article
Like
Save
Share
Report

What are ordered words?

An ordered word is a word in which the letters appear in alphabetic order. For example abbey & dirt. The rest of the words are unordered for example geeks

The task at hand

This task is taken from Rosetta Code and it is not as mundane as it sounds from the above description. To get a large number of words we will use an online dictionary available on http://www.puzzlers.org/pub/wordlists/unixdict.txt which has a collection of about 2,500 words and since we are gonna be using python we can do that by scraping the dictionary instead of downloading it as a text file and then doing some file handling operations on it.

Requirements:

pip install requests

Code

The approach will be to traverse the whole word and compare the ascii values of elements in pairs until we find a false result otherwise the word will be ordered.
So this task will be divided in 2 parts:
Scraping

  1. Using the python library requests we will fetch the data from the given URL
  2. Store the content fetched from the URL as a string
  3. Decoding the content which is usually encoded on the web using UTF-8
  4. Converting the long string of content into a list of words

Finding the ordered words

  1. Traversing the list of words
  2. Pairwise comparison of the ASCII value of every adjacent character in each word
  3. Storing a false result if a pair is unordered
  4. Otherwise printing the ordered word




# Python program to find ordered words
import requests
  
# Scrapes the words from the URL below and stores 
# them in a list
def getWords():
  
    # contains about 2500 words
    fetchData = requests.get(url)
  
    # extracts the content of the webpage
    wordList = fetchData.content
  
    # decodes the UTF-8 encoded text and splits the 
    # string to turn it into a list of words
    wordList = wordList.decode("utf-8").split()
  
    return wordList
  
  
# function to determine whether a word is ordered or not
def isOrdered():
  
    # fetching the wordList
    collection = getWords()
  
    # since the first few of the elements of the 
    # dictionary are numbers, getting rid of those
    # numbers by slicing off the first 17 elements
    collection = collection[16:]
    word = ''
  
    for word in collection:
        result = 'Word is ordered'
        i = 0
        l = len(word) - 1
  
        if (len(word) < 3): # skips the 1 and 2 lettered strings
            continue
  
        # traverses through all characters of the word in pairs
        while i < l:         
            if (ord(word[i]) > ord(word[i+1])):
                result = 'Word is not ordered'
                break
            else:
                i += 1
  
        # only printing the ordered words
        if (result == 'Word is ordered'):
            print(word,': ',result)
  
  
# execute isOrdered() function
if __name__ == '__main__':
    isOrdered()


Output:
aau: Word is ordered
abbe: Word is ordered
abbey: Word is ordered
abbot: Word is ordered
abbott: Word is ordered
abc: Word is ordered
abe: Word is ordered
abel: Word is ordered
abet: Word is ordered
abo: Word is ordered
abort: Word is ordered
accent: Word is ordered
accept: Word is ordered
...........................
...........................
...........................

References: Rosetta Code



Similar Reads

Regular Dictionary vs Ordered Dictionary in Python
Dictionary in Python is an unordered collection of data values, used to store data values like a map, unlike other Data Types that hold only a single value as an element, a Dictionary holds key: value pair. Key-value is provided in the dictionary to make it more optimized. A regular dictionary type does not track the insertion order of the (key, va
5 min read
Check if the given string of words can be formed from words present in the dictionary
Given a string array of M words and a dictionary of N words. The task is to check if the given string of words can be formed from words present in the dictionary. Examples: dict[] = { find, a, geeks, all, for, on, geeks, answers, inter } Input: str[] = { "find", "all", "answers", "on", "geeks", "for", "geeks" }; Output: YES all words of str[] are p
15+ min read
Program to print all distinct elements of a given integer array in Python | Ordered Dictionary
Given an integer array, print all distinct elements in array. The given array may contain duplicates and the output should print every element only once. The given array is not sorted. Examples: Input: arr[] = {12, 10, 9, 45, 2, 10, 10, 45} Output: 12, 10, 9, 45, 2 Input: arr[] = {1, 2, 3, 4, 5} Output: 1, 2, 3, 4, 5 Input: arr[] = {1, 1, 1, 1, 1}
2 min read
Methods of Ordered Dictionary in Python
An OrderedDict is a dict that remembers the order in that keys were first inserted. If a new entry overwrites an existing entry, the original insertion position is left unchanged. Deleting an entry and reinserting it will move it to the end. Ordered dictionary somehow can be used in the place where there is a use of hash Map and queue. It has chara
3 min read
Python | Finding 'n' Character Words in a Text File
This article aims to find words with a certain number of characters. In the code mentioned below, a Python program is given to find the words containing three characters in the text file. Example Input: Hello, how are you ? , n=3 Output: how are you Explanation: Output contains every character of the of length 3 from the input file.Input File: File
3 min read
Python - Compute the frequency of words after removing stop words and stemming
In this article we are going to tokenize sentence, paragraph, and webpage contents using the NLTK toolkit in the python environment then we will remove stop words and apply stemming on the contents of sentences, paragraphs, and webpage. Finally, we will Compute the frequency of words after removing stop words and stemming. Modules Needed bs4: Beaut
8 min read
Newspaper scraping using Python and News API
There are mainly two ways to extract data from a website: Use the API of the website (if it exists). For example, Facebook has the Facebook Graph API which allows retrieval of data posted on Facebook.Access the HTML of the webpage and extract useful information/data from it. This technique is called web scraping or web harvesting or web data extrac
4 min read
Scraping Weather prediction Data using Python and BS4
This article revolves around scraping weather prediction d data using python and bs4 library. Let's checkout components used in the script - BeautifulSoup- It is a powerful Python library for pulling out data from HTML/XML files. It creates a parse tree for parsed pages that can be used to extract data from HTML/XML files. Requests - It is a Python
3 min read
Scraping COVID-19 statistics using Python and Selenium
Selenium is an open source web testing tool that allows users to test web applications across different browsers and platforms. It includes a plethora of software that developers can use to automate web applications including IDE, RC, webdriver and Selenium grid, which all serve different purposes. Moreover, it serves the purpose of scraping dynami
4 min read
Web Scraping CryptoCurrency price and storing it in MongoDB using Python
Let us see how to fetch history price in USD or BTC, traded volume and market cap for a given date range using Santiment API and storing the data into MongoDB collection. Python is a mature language and getting much used in the Cryptocurrency domain. MongoDB is a NoSQL database getting paired with Python in many projects which helps to hold details
4 min read
Practice Tags :