Open In App

Extract punctuation from the specified column of Dataframe using Regex

Last Updated : 29 Dec, 2020
Improve
Improve
Like Article
Like
Save
Share
Report

Prerequisite: Regular Expression in Python

In this article, we will see how to extract punctuation used in the specified column of the Dataframe using Regex.

Firstly, we are making regular expression that contains all the punctuation: [!”\$%&\'()*+,\-.\/:;=#@?\[\\\]^_`{|}~]* Then we are passing each row of specific column to re.findall() function for extracting the punctuation and then assigning that extracted punctuation to a new column in a Dataframe.

re.findall() function is used to extract all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.

Syntax: re.findall(regex, string) 

Return: All non-overlapping matches of pattern in string, as a list of strings.

Now, Let’s create a Dataframe:

Python3




# import required libraries
import pandas as pd
import re
  
# creating Dataframe with
# name and their comments
df = pd.DataFrame({
    'Name' : ['Akash', 'Ashish', 'Ayush',
              'Diksha' , 'Radhika'],
    
    'Comments': ['Hey! Akash how r u'
                 'Why are you asking this to me?' ,
                 'Today, what we are going to do.' ,
                 'No plans for today why?' ,
                 'Wedding plans, what are you saying?']},
    
    columns = ['Name', 'Comments']
    )
  
# show the Dataframe
df


Output:

Now, Extracting the punctuation from the column comment:

Python3




# define a function for extracting
# the punctuations
def check_find_punctuations(text):
    
    # regular expression containing
    # all punctuation
    result = re.findall(r'[!"\$%&\'()*+,\-.\/:;=#@?\[\\\]^_`{|}~]*'
                        text)
      
    # form a string
    string = "".join(result)
      
    # list of strings return
    return list(string)
    
# creating new column name
# as a punctuation_used and 
# applying user defined function
# on each rows of Comments column
df['punctuation_used'] = df['Comments'].apply(
                         lambda x : check_find_punctuations(x)
                         )
  
# show the Dataframe
df


Output:



Previous Article
Next Article

Similar Reads

Extract date from a specified column of a given Pandas DataFrame using Regex
In this article, we will discuss how to extract only valid date from a specified column of a given Data Frame. The extracted date from the specified column should be in the form of 'mm-dd-yyyy'. Approach: In this article, we have used a regular expression to extract valid date from the specified column of the data frame. Here we used \b(1[0-2]|0[1-
2 min read
Get a list of a specified column of a Pandas DataFrame
In this discussion, we'll delve into the procedures involved in Get a list of a specified column of a Pandas DataFrame. Our journey commences with the essential task of importing data from a CSV file into a Pandas DataFrame. This initial step establishes the groundwork necessary to create an environment conducive to subsequent data manipulations. G
5 min read
Python Extract Substring Using Regex
Python provides a powerful and flexible module called re for working with regular expressions. Regular expressions (regex) are a sequence of characters that define a search pattern, and they can be incredibly useful for extracting substrings from strings. In this article, we'll explore four simple and commonly used methods to extract substrings usi
2 min read
Python - Extract ith column values from jth column values
Sometimes, while working with Python Matrix, we can have a problem in which we need to extract ith column values from comparing values from jth column. This kind of problem can occur in domains such as school programming or web development. Let's discuss certain ways in which this task can be performed. Input : test_list = [[4, 5, 6], [2, 5, 7], [9
8 min read
Python Regex to extract maximum numeric value from a string
Given an alphanumeric string, extract maximum numeric value from that string. Alphabets will only be in lower case. Examples: Input : 100klh564abc365bgOutput : 564Maximum numeric value among 100, 564 and 365 is 564.Input : abchsd0sdhsOutput : 0Python Regex to extract maximum numeric value from a stringThis problem has existing solution please refer
2 min read
How to Remove repetitive characters from words of the given Pandas DataFrame using Regex?
Prerequisite: Regular Expression in Python In this article, we will see how to remove continuously repeating characters from the words of the given column of the given Pandas Dataframe using Regex. Here, we are actually looking for continuously occurring repetitively coming characters for that we have created a pattern that contains this regular ex
2 min read
Split a String into columns using regex in pandas DataFrame
Given some mixed data containing multiple values as a string, let's see how can we divide the strings using regex and make multiple columns in Pandas DataFrame. Method #1: In this method we will use re.search(pattern, string, flags=0). Here pattern refers to the pattern that we want to search. It takes in a string with the following values: \w matc
3 min read
Replace values in Pandas dataframe using regex
While working with large sets of data, it often contains text data and in many cases, those texts are not pretty at all. The text is often in very messier form and we need to clean those data before we can do anything meaningful with that text data. Mostly the text corpus is so large that we cannot manually list out all the texts that we want to re
4 min read
Create a DataFrame from a Numpy array and specify the index column and column headers
Let us see how to create a DataFrame from a Numpy array. We will also learn how to specify the index and the column headers of the DataFrame. Approach : Import the Pandas and Numpy modules. Create a Numpy array. Create list of index values and column values for the DataFrame. Create the DataFrame. Display the DataFrame. Example 1 : # importiong the
2 min read
Get column index from column name of a given Pandas DataFrame
In this article we will see how to get column index from column name of a Dataframe. We will use Dataframe.columns attribute and Index.get_loc method of pandas module together. Syntax: DataFrame.columns Return: column names index Syntax: Index.get_loc(key, method=None, tolerance=None) Return: loc : int if unique index, slice if monotonic index, els
2 min read