Open In App

Web Scraping Tables with Selenium and Python

Last Updated : 11 Dec, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Selenium is the automation software testing tool that obtains the website, performs various actions, or obtains the data from the website. It was chiefly developed for easing the testing work by automating web applications. Nowadays, apart from being used for testing, it can also be used for making tedious work interesting. Do you know that with the help of Selenium, you can also extract data from the table on the website? The answer is Yes, we can easily scrap the table data from the website. What you need to do in order to scrape table data from the website is explained in this article.

Approach to be followed: 

Let us consider the simple HTML program containing tables only to understand the approach of scraping the table from the website.

HTML




<!DOCTYPE html>
<html>
   <head>
      <title>Selenium Table</title>
   </head>
   <body>
      <table border="1">
        <thead>
         <tr>
            <th>Name</th>
            <th>Class</th>
         </tr>
        </thead>
        <tbody>
         <tr>
            <td>Vinayak</td>
            <td>12</td>
         </tr>
         <tr>
            <td>Ishita</td>
            <td>10</td>
         </tr>
        </tbody>
      </table>
   </body>
</html>


Browser Output:

Follow the below-given steps:

Once you have created the HTML file, you can follow the below steps and extract data from the table from the website on your own.

  • First, declare the web driver

driver=webdriver.Chrome(executable_path=”Declare the path where web driver is installed”)

  • Now, open the website from which you want to obtain table data
driver.get("Specify the path of the website")
  • Next, you need to find rows in the table
rows=1+len(driver.find_elements_by_xpath("Specify the altered path"))

Here, the altered xpath means that if xpath of the row 1 is /html/body/table/tbody/tr[1] then, altered xpath will be /html/body/table/tbody/tr What needs to be done here is to remove the index value of table row. 

NOTE: Remember to add 1 to the row’s value for the table header as it was not included while calculating the table rows.

  • Further, find columns in the table
cols=len(driver.find_elements_by_xpath("Specify the altered path"))

Here, the altered xpath means that if xpath of the column showing output Vinayak is /html/body/table/tbody/tr[1]/td[1] then, altered xpath will be /html/body/table/tbody/tr/td What needs to be done here is to remove the index value of table row and table data.

  • Moreover, obtain data from each column of the table body
for r in range(2, rows+1):
for p in range(1, cols+1):
value = driver.find_element_by_xpath("Specify the altered path").text

Here, the altered xpath means that if xpath of the column showing output Vinayak is /html/body/table/tbody/tr[1]/td[1] then, altered xpath will be /html/body/table/tbody/tr[“+str(r)+”]/td[“+str(p)+”] What needs to be done here is to add the str(r) and str(p) for the index value of table row and table data respectively.

  • Finally, print data of the table
print(value, end='       ')  
print()

How to scrape table data from the website in Selenium?

As we have now seen the approach to be followed to extract the table data while using the automation tool Selenium. Now, let’s see the complete example for the scraping table data from the website. We will use this website to extract its table data in the given below program.

Python




# Python program to scrape table from website
  
# import libraries selenium and time
from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep
  
# Create webdriver object
driver = webdriver.Chrome(
    executable_path="C:\selenium\chromedriver_win32\chromedriver.exe")
  
# Get the website
driver.get(
  
# Make Python sleep for some time
sleep(2)
  
# Obtain the number of rows in body
rows = 1+len(driver.find_elements(By.XPATH,
    "/html/body/div[3]/div[2]/div/div[1]/div/div/div/article/div[3]/div/table/tbody/tr"))
  
# Obtain the number of columns in table
cols = len(driver.find_elements(By.XPATH,
    "/html/body/div[3]/div[2]/div/div[1]/div/div/div/article/div[3]/div/table/tbody/tr[1]/td"))
  
# Print rows and columns
print(rows)
print(cols)
  
# Printing the table headers
print("Locators           "+"             Description")
  
# Printing the data of the table
for r in range(2, rows+1):
    for p in range(1, cols+1):
        
        # obtaining the text from each column of the table
        value = driver.find_element(By.XPATH,
            "/html/body/div[3]/div[2]/div/div[1]/div/div/div/article/div[3]/div/table/tbody/tr["+str(r)+"]/td["+str(p)+"]").text
        print(value, end='       ')
    print()


Further, run the python code using:

python run.py

Output:

Browser Output:



Previous Article
Next Article

Similar Reads

Scraping COVID-19 statistics using Python and Selenium
Selenium is an open source web testing tool that allows users to test web applications across different browsers and platforms. It includes a plethora of software that developers can use to automate web applications including IDE, RC, webdriver and Selenium grid, which all serve different purposes. Moreover, it serves the purpose of scraping dynami
4 min read
Scraping Javascript Enabled Websites using Scrapy-Selenium
Scrapy-selenium is a middleware that is used in web scraping. scrapy do not support scraping modern sites that uses javascript frameworks and this is the reason that this middleware is used with scrapy to scrape those modern sites.Scrapy-selenium provide the functionalities of selenium that help in working with javascript websites. Other advantages
4 min read
Web Scraping CryptoCurrency price and storing it in MongoDB using Python
Let us see how to fetch history price in USD or BTC, traded volume and market cap for a given date range using Santiment API and storing the data into MongoDB collection. Python is a mature language and getting much used in the Cryptocurrency domain. MongoDB is a NoSQL database getting paired with Python in many projects which helps to hold details
4 min read
Implementing web scraping using lxml in Python
Web scraping basically refers to fetching only some important piece of information from one or more websites. Every website has recognizable structure/pattern of HTML elements. Steps to perform web scraping :1. Send a link and get the response from the sent link 2. Then convert response object to a byte string. 3. Pass the byte string to 'fromstrin
3 min read
Implementing Web Scraping in Python with Scrapy
Nowadays data is everything and if someone wants to get data from webpages then one way to use an API or implement Web Scraping techniques. In Python, Web scraping can be done easily by using scraping tools like BeautifulSoup. But what if the user is concerned about performance of scraper or need to scrape data efficiently. To overcome this problem
5 min read
Increase the speed of Web Scraping in Python using HTTPX module
In this article, we will talk about how to speed up web scraping using the requests module with the help of the HTTPX module and AsyncIO by fetching the requests concurrently. The user must be familiar with Python. Knowledge about the Requests module or web scraping would be a bonus. Required Module For this tutorial, we will use 4 modules - timere
4 min read
Web scraping from Wikipedia using Python - A Complete Guide
In this article, you will learn various concepts of web scraping and get comfortable with scraping various types of websites and their data. The goal is to scrape data from the Wikipedia Home page and parse it through various web scraping techniques. You will be getting familiar with various web scraping techniques, python modules for web scraping,
9 min read
Quote Guessing Game using Web Scraping in Python
Prerequisite: BeautifulSoup Installation In this article, we will scrape a quote and details of the author from this site http//quotes.toscrape.com using python framework called BeautifulSoup and develop a guessing game using different data structures and algorithm. The user will be given 4 chances to guess the author of a famous quote, In every ch
3 min read
How to Build Web scraping bot in Python
In this article, we are going to see how to build a web scraping bot in Python. Web Scraping is a process of extracting data from websites. A Bot is a piece of code that will automate our task. Therefore, A web scraping bot is a program that will automatically scrape a website for data, based on our requirements. Module neededbs4: Beautiful Soup(bs
8 min read
Clean Web Scraping Data Using clean-text in Python
If you like to play with API's or like to scrape data from various websites, you must've come around random annoying text, numbers, keywords that come around with data. Sometimes it can be really complicating and frustrating to clean scraped data to obtain the actual data that we want. In this article, we are going to explore a python library calle
2 min read
Practice Tags :
three90RightbarBannerImg