Open In App

Scrape LinkedIn Using Selenium And Beautiful Soup in Python

Last Updated : 01 Aug, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to scrape LinkedIn using Selenium and Beautiful Soup libraries in Python.

First of all, we need to install some libraries. Execute the following commands in the terminal.

pip install selenium 
pip install beautifulsoup4

In order to use selenium, we also need a web driver. You can download the web driver of either Internet Explorer, Firefox, or Chrome. In this article, we will be using the Chrome web driver.

Note: While following along with this article, if you get an error, there are most likely 2 possible reasons for that.

  1. The webpage took too long to load (probably because of a slow internet connection). In this case, use time.sleep() function to provide extra time for the webpage to load. Specify the number of seconds to sleep as per your need.
  2. The HTML of the webpage has changed from the one when this article was written. If so, you will have to manually select the required webpage elements, instead of copying the element names written below. How to find the element names is explained below. Additionally, don’t decrease the window height and width from the default height and width. It also changes the HTML of the webpage.

Logging in to LinkedIn

Here we will write code for login into Linkedin, First, we need to initiate the web driver using selenium and send a get request to the URL and Identify the HTML document and find the input tags and button tags that accept username/email, password, and sign-in button.

LinkedIn Login Page

Code:

Python3




from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time
 
# Creating a webdriver instance
driver = webdriver.Chrome("Enter-Location-Of-Your-Web-Driver")
# This instance will be used to log into LinkedIn
 
# Opening linkedIn's login page
 
# waiting for the page to load
time.sleep(5)
 
# entering username
username = driver.find_element(By.ID, "username")
 
# In case of an error, try changing the element
# tag used here.
 
# Enter Your Email Address
username.send_keys("User_email"
 
# entering password
pword = driver.find_element(By.ID, "password")
# In case of an error, try changing the element
# tag used here.
 
# Enter Your Password
pword.send_keys("User_pass")       
 
# Clicking on the log in button
# Format (syntax) of writing XPath -->
# //tagname[@attribute='value']
driver.find_element(By.XPATH, "//button[@type='submit']").click()
# In case of an error, try changing the
# XPath used here.


After executing the above command, you will be logged into your LinkedIn profile. Here is what it would look like.

Part 1 Code Execution

Extracting Data From a LinkedIn Profile

Here is the video of the execution of the complete code.

Part 2 Code Execution

2.A) Opening a Profile and Scrolling to the Bottom

Let us say that you want to extract data from Kunal Shah’s LinkedIn profile. First of all, we need to open his profile using the URL of his profile. Then we have to scroll to the bottom of the web page so that the complete data gets loaded.

Python3




from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time
 
# Creating an instance
driver = webdriver.Chrome("Enter-Location-Of-Your-Web-Driver")
 
# Logging into LinkedIn
time.sleep(5)
 
username = driver.find_element(By.ID, "username")
username.send_keys("")  # Enter Your Email Address
 
pword = driver.find_element(By.ID, "password")
pword.send_keys("")        # Enter Your Password
 
driver.find_element(By.XPATH, "//button[@type='submit']").click()
 
# Opening Kunal's Profile
# paste the URL of Kunal's profile here
 
driver.get(profile_url)        # this will open the link


Output:

Kunal Shah – LinkedIn Profile

Now, we need to scroll to the bottom. Here is the code to do that:

Python3




start = time.time()
 
# will be used in the while loop
initialScroll = 0
finalScroll = 1000
 
while True:
    driver.execute_script(f"window.scrollTo({initialScroll},
                                            {finalScroll})
                          ")
    # this command scrolls the window starting from
    # the pixel value stored in the initialScroll
    # variable to the pixel value stored at the
    # finalScroll variable
    initialScroll = finalScroll
    finalScroll += 1000
 
    # we will stop the script for 3 seconds so that
    # the data can load
    time.sleep(3)
    # You can change it as per your needs and internet speed
 
    end = time.time()
 
    # We will scroll for 20 seconds.
    # You can change it as per your needs and internet speed
    if round(end - start) > 20:
        break


The page is now scrolled to the bottom. As the page is completely loaded, we will scrape the data we want.

Extracting Data from the Profile

To extract data, firstly, store the source code of the web page in a variable. Then, use this source code to create a Beautiful Soup object.

Python3




src = driver.page_source
 
# Now using beautiful soup
soup = BeautifulSoup(src, 'lxml')


Extracting Profile Introduction:

To extract the profile introduction, i.e., the name, the company name, and the location, we need to find the source code of each element. First, we will find the source code of the div tag that contains the profile introduction.

Chrome – Inspect Elements

Now, we will use Beautiful Soup to import this div tag into python.

Python3




# Extracting the HTML of the complete introduction box
# that contains the name, company name, and the location
intro = soup.find('div', {'class': 'pv-text-details__left-panel'})
 
print(intro)


Output:

(Scribbled) Introduction HTML 

We now have the required HTML to extract the name, company name, and location. Let’s extract the information now:

Python3




# In case of an error, try changing the tags used here.
 
name_loc = intro.find("h1")
 
# Extracting the Name
name = name_loc.get_text().strip()
# strip() is used to remove any extra blank spaces
 
works_at_loc = intro.find("div", {'class': 'text-body-medium'})
 
# this gives us the HTML of the tag in which the Company Name is present
# Extracting the Company Name
works_at = works_at_loc.get_text().strip()
 
 
location_loc = intro.find_all("span", {'class': 'text-body-small'})
 
# Ectracting the Location
# The 2nd element in the location_loc variable has the location
location = location_loc[0].get_text().strip()
 
print("Name -->", name,
      "\nWorks At -->", works_at,
      "\nLocation -->", location)


Output:

Name --> Kunal Shah 
Works At --> Founder : CRED 
Location --> Bengaluru, Karnataka, India

Extracting Data from the Experience Section

Next, we will extract the Experience from the profile.

HTML of Experience Section

Python3




# Getting the HTML of the Experience section in the profile
experience = soup.find("section", {"id": "experience-section"}).find('ul')
 
print(experience)


Output:

Experience HTML Output

We have to go inside the HTML tags until we find our desired information. In the above image, we can see the HTML to extract the current job title and the name of the company. We now need to go inside each tag to extract the data

Scrape Job Title, company name and experience:

Python3




# In case of an error, try changing the tags used here.
 
li_tags = experience.find('div')
a_tags = li_tags.find("a")
job_title = a_tags.find("h3").get_text().strip()
 
print(job_title)
 
company_name = a_tags.find_all("p")[1].get_text().strip()
print(company_name)
 
joining_date = a_tags.find_all("h4")[0].find_all("span")[1].get_text().strip()
 
employment_duration = a_tags.find_all("h4")[1].find_all(
    "span")[1].get_text().strip()
 
print(joining_date + ", " + employment_duration)


Output:

'Founder'
'CRED'
Apr 2018 – Present, 3 yrs 6 mos

Extracting Job Search Data

We will use selenium to open the jobs page.

Python3




jobs = driver.find_element(By.XPATH, "//a[@data-link-to='jobs']/span")
# In case of an error, try changing the XPath.
 
jobs.click()


Now that the jobs page is open, we will create a BeautifulSoup object to scrape the data.

Python3




job_src = driver.page_source
 
soup = BeautifulSoup(job_src, 'lxml')  


Scrape Job Title:

First of all, we will scrape the Job Titles.

HTML of Job Title

On skimming through the HTML of this page, we will find that each Job Title has the class name “job-card-list__title”. We will use this class name to extract the job titles.

Python3




jobs_html = soup.find_all('a', {'class': 'job-card-list__title'})
# In case of an error, try changing the XPath.
 
job_titles = []
 
for title in jobs_html:
    job_titles.append(title.text.strip())
 
print(job_titles)


 Output:

Job Titles List

Scrape Company Name:

Next, we will extract the Company Name.

HTML of Company Name

We will use the class name to extract the names of the companies:

Python3




company_name_html = soup.find_all(
  'div', {'class': 'job-card-container__company-name'})
company_names = []
 
for name in company_name_html:
    company_names.append(name.text.strip())
 
print(company_names)


Output:

Company Names List

Scrape Job Location:

Finally, we will extract the Job Location.

HTML of Job Location

Once again, we will use the class name to extract the location.

Python3




import re   # for removing the extra blank spaces
 
location_html = soup.find_all(
    'ul', {'class': 'job-card-container__metadata-wrapper'})
 
location_list = []
 
for loc in location_html:
    res = re.sub('\n\n +', ' ', loc.text.strip())
 
    location_list.append(res)
 
print(location_list)


Output:

Job Locations List



Previous Article
Next Article

Similar Reads

Scraping Amazon Product Information using Beautiful Soup
Web scraping is a data extraction method used to exclusively gather data from websites. It is widely used for Data mining or collecting valuable insights from large websites. Web scraping comes in handy for personal use as well. Python contains an amazing library called BeautifulSoup to allow web scraping. We will be using it to scrape product info
6 min read
Scrape LinkedIn Profiles without login using Python
In this article, we'll explore how to scrape LinkedIn profiles without the need for a login, empowering you to gather valuable insights and information programmatically. By leveraging Python's web scraping capabilities, we can access public LinkedIn profiles seamlessly, opening up new possibilities for data-driven analysis and connection building."
3 min read
Scrape and Save Table Data in CSV file using Selenium in Python
Selenium WebDriver is an open-source API that allows you to interact with a browser in the same way a real user would and its scripts are written in various languages i.e. Python, Java, C#, etc. Here we will be working with python to scrape data from tables on the web and store it as a CSV file. As Google Chrome is the most popular browser, to make
3 min read
Scrape a popup using python and selenium
Web scraping is the process of extracting data from websites. It involves programmatically accessing web pages, parsing the HTML or XML content, and extracting the desired information. With the help of libraries like BeautifulSoup, Selenium, or Scrapy in Python, web scraping can be done efficiently. It enables automating data collection, scraping p
3 min read
How to scrape multiple pages using Selenium in Python?
As we know, selenium is a web-based automation tool that helps us to automate browsers. Selenium is an Open-Source testing tool which means we can easily download it from the internet and use it. With the help of Selenium, we can also scrap the data from the webpages. Here, In this article, we are going to discuss how to scrap multiple pages using
4 min read
Python - Find text using beautifulSoup then replace in original soup variable
Python provides a library called BeautifulSoup to easily allow web scraping. BeautifulSoup object is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. The BeautifulSoup object represents the parsed document as a w
3 min read
Scrape IMDB movie rating and details using Python and saving the details of top movies to .csv file
We can scrape the IMDb movie ratings and their details with the help of the BeautifulSoup library of Python. Modules Needed: Below is the list of modules required to scrape from IMDB. requests: Requests library is an integral part of Python for making HTTP requests to a specified URL. Whether it be REST APIs or Web Scraping, requests must be learne
4 min read
Generating Beautiful Code Snippets using Python
When creating technical documentation or tutorials, it can be helpful to include images of code snippets to illustrate specific examples or concepts. However, taking a screenshot of a code file can look unprofessional and hard to read. This article will explore how to convert programming code into beautiful image snippets in Python using the librar
3 min read
Scrape most reviewed news and tweet using Python
Many websites will be providing trendy news in any technology and the article can be rated by means of its review count. Suppose the news is for cryptocurrencies and news articles are scraped from cointelegraph.com, we can get each news item reviewer to count easily and placed in MongoDB collection. Modules Needed Tweepy: Tweepy is the Python clien
5 min read
Scrape Google Reviews and Ratings using Python
In this article, we are going to see how to scrape google reviews and ratings using Python. Modules needed:Beautiful Soup: The mechanism involved in scraping here is parsing the DOM, i.e. from HTML and XML files, the data is extracted# Installing with pip pip install beautifulsoup4 # Installing with conda conda install -c anaconda beautifulsoup4Scr
3 min read
three90RightbarBannerImg