Open In App

How to extract date from Excel file using Pandas?

Last Updated : 02 Jan, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Prerequisite: Regular Expressions in Python

In this article, Let’s see how to extract date from the Excel file. Suppose our Excel file looks like below given image then we have to extract the date from the string and store it into a new Dataframe column.

date_sample_data.xlsx

For viewing the Excel file Click Here.

Approach :

  • Import required module.
  • Import data from Excel file.
  • Make an extra column for a new date.
  • Set Index for searching.
  • Define the pattern of date format.
  • Search Date and assigning to the respective column in Dataframe.

Let’s see Step-By-Step-Implementation:

Step 1: Import the required module and read data from the Excel file.

Python3




# import required module
import pandas as pd;
import re;
 
# Read excel file and store in to DataFrame
data = pd.read_excel("date_sample_data.xlsx");
 
print("Original DataFrame")
data


Output:

Step 2: Make an extra column for a new date.

Python3




# Create column for Date
data['new_Date']= None
data


Output:

Step 3: Set Index for searching.

Python3




# set required index
index_set = data.columns.get_loc('Description')
index_date = data.columns.get_loc('new_Date')
 
print(index_set, index_date)


Output:

1 2

Step 4: Defining the Pattern of the date format.

We need to create a Regular expression for date pattern in DD/MM/YY format. Use the [0-9] expression to find any character between the brackets that is a digit. Use escape sequence “\” for escaping “/” a special symbol and {2}, {4} is used to denote no of times a character belongs to the given string. So the expression become ‘[0-9]{2}\/[0-9]{2}\/[0-9]{4}’.

Example:

02/04/2020
02 -----> [0 to 9] --> [0-9]
number of character inside the string {2} ( i.e DD)

04- ----> [0 to 9] --> [0-9]
number of character inside the string {2} ( i.e MM)

2020 -->[0 to 9] -->[0-9]
number of character inside the string {4} ( i.e YYYY)

Python3




# In DD/MM/YYYY
date_pattern = r'([0-9]{2}\/[0-9]{2}\/[0-9]{4})'


Step 5: Search Date and assigning to the respective column in Dataframe.

For searching the Date using regex in a string we are using re.search() function of re library.

Python3




for row in range(0, len(data)):
    Date = re.search(date_pattern,data.iat[row,index_set]).group()
    data.iat[row, index_date] = Date
     
# show the Dataframe
data


Output:

Complete Code:

Python3




# importing required module
import pandas as pd;
import re;
 
data = pd.read_excel("date_sample_data.xlsx");
 
print("Original data : \n",
      data)
 
# Create column for Date
data['new_Date'] = None
 
# set index
index_set = data.columns.get_loc('Description')
index_date = data.columns.get_loc('new_Date')
print(index_set, index_date)
 
# define pattern for date
# in DD/MM/YYYY
date_pattern = r'([0-9]{2}\/[0-9]{2}\/[0-9]{4})'
 
# searching pattern
# And storing in to DataFrame
for row in range(0, len(data)):
    Date = re.search(date_pattern,
                     data.iat[row,index_set]).group()
    data.iat[row, index_date] = Date
 
# show the Dataframe
data


Output:

Note: Before running this program, make sure you have already installed xlrd library in your Python environment.



Previous Article
Next Article

Similar Reads

How to extract Time data from an Excel file column using Pandas?
Prerequisite: Regular Expressions in Python In these articles, we will discuss how to extract Time data from an Excel file column using Pandas. Suppose our Excel file looks like below given image then we have to extract the Time from the Excel sheet column and store it into a new Dataframe column. For viewing the Excel file Click Here. Approach: Im
2 min read
How to extract Email column from Excel file and find out the type of mail using Pandas?
In this article, Let's see how to Extract Email column from an Excel file and find out the type of mail using Pandas. Suppose our Excel file looks like below given image, and then we have to store different type of emails in different columns of Dataframe. For viewing the Excel file Click Here Approach: Import required module.Import data from Excel
3 min read
Pandas Series dt.date | Extract Date From DateTime Objects
The dt.date attribute extracts the date part of the DateTime objects in a Pandas Series. It returns the NumPy array of Python datetime.date objects, mainly the date part of timestamps without information about the time and timezone. Example C/C++ Code import pandas as pd sr = pd.Series(['2012-10-21 09:30', '2019-7-18 12:30', '2008-02-2 10:30', '201
2 min read
Extract date from a specified column of a given Pandas DataFrame using Regex
In this article, we will discuss how to extract only valid date from a specified column of a given Data Frame. The extracted date from the specified column should be in the form of 'mm-dd-yyyy'. Approach: In this article, we have used a regular expression to extract valid date from the specified column of the data frame. Here we used \b(1[0-2]|0[1-
2 min read
Extract week number from date in Pandas-Python
Many times, when working with some data containing dates we may need to extract the week number from a particular date. In Python, it can be easily done with the help of pandas. Example 1: C/C++ Code # importing pandas as pd import pandas as pd # creating a dictionary containing a date dict = {'Date':["2015-06-17"]} # converting t
2 min read
How to sort date in excel using Pandas?
In these articles, We will discuss how to import an excel file in a single Dataframe and sort the Date in a given column on. Suppose our Excel file looks like these: To get the excel file used click here. Approach : Import Pandas moduleMake DataFrame from Excel filesort the date column with DataFrame.sort_value() functionDisplay the Final DataFrame
1 min read
How to print date starting from the given date for n number of days using Pandas?
In this article, we will print all the dates starting from the given date for n number days. It can be done using the pandas.date_range() function. This function is used to get a fixed frequency DatetimeIndex. Syntax: pandas.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs) Approac
2 min read
Pandas Series dt.minute | Extract Minute from DateTime Series in Pandas
Pandas Series.dt.minute attribute returns a NumPy array containing the minutes of the DateTime in the underlying data of the given series object. Example C/C++ Code import pandas as pd sr = pd.Series(['2012-10-21 09:30', '2019-7-18 12:30', '2008-02-2 10:30', '2010-4-22 09:25', '2019-11-8 02:22']) idx = ['Day 1', 'Day 2', 'Day 3', 'Day 4', 'Day 5']
2 min read
How to import an excel file into Python using Pandas?
It is not always possible to get the dataset in CSV format. So, Pandas provides us the functions to convert datasets in other formats to the Data frame. An excel file has a '.xlsx' format. Before we get started, we need to install a few libraries. pip install pandas pip install xlrd For importing an Excel file into Python using Pandas we have to us
2 min read
Find the sum and maximum value of the two column in excel file using Pandas
In these articles, we will discuss how to read data from excel and perform some mathematical operation and store it into a new column in DataFrame. Suppose our excel file looks like this. Then we have to compute the sum of two-column and find out the maximum value and store into a new DataFrame column. Approach : Import Pandas module.Read data from
2 min read
three90RightbarBannerImg