Open In App

Replacing strings with numbers in Python for Data Analysis

Last Updated : 05 Feb, 2018
Like Article

Sometimes we need to convert string values in a pandas dataframe to a unique integer so that the algorithms can perform better. So we assign unique numeric value to a string value in Pandas DataFrame.

Note: Before executing create an example.csv file containing some names and gender

Say we have a table containing names and gender column. In gender column, there are two categories male and female and suppose we want to assign 1 to male and 2 to female.


Input : 
    |  Name  |  Gender
 0    Ram        Male
 1    Seeta      Female
 2    Kartik     Male
 3    Niti       Female
 4    Naitik     Male 

Output :
    |  Name  |  Gender
 0    Ram        1
 1    Seeta      2
 2    Kartik     1
 3    Niti       2
 4    Naitik     1 

Method 1:

To create a dictionary containing two 
elements with following key-value pair:
Key       Value
male      1
female    2

Then iterate using for loop through Gender column of DataFrame and replace the values wherever the keys are found.

# import pandas library
import pandas as pd
# creating file handler for 
# our example.csv file in
# read mode
file_handler = open("example.csv", "r")
# creating a Pandas DataFrame
# using read_csv function 
# that reads from a csv file.
data = pd.read_csv(file_handler, sep = ",")
# closing the file handler
# creating a dict file 
gender = {'male': 1,'female': 2}
# traversing through dataframe
# Gender column and writing
# values where key matches
data.Gender = [gender[item] for item in data.Gender]

Output :

    |  Name  |  Gender
 0    Ram        1
 1    Seeta      2
 2    Kartik     1
 3    Niti       2
 4    Naitik     1 

Method 2:
Method 2 is also similar but requires no dictionary file and takes fewer lines of code. In this, we internally iterate through Gender column of DataFrame and change the values if the condition matches.

# import pandas library
import pandas as pd
# creating file handler for
# our example.csv file in
# read mode
file_handler = open("example.csv", "r")
# creating a Pandas DataFrame
# using read_csv function that
# reads from a csv file.
data = pd.read_csv(file_handler, sep = ",")
# closing the file handler
# traversing through Gender 
# column of dataFrame and 
# writing values where
# condition matches.
data.Gender[data.Gender == 'male'] = 1
data.Gender[data.Gender == 'female'] = 2

Output :

    |  Name  |  Gender
 0    Ram        1
 1    Seeta      2
 2    Kartik     1
 3    Niti       2
 4    Naitik     1 


  1. This technique can be applied in Data Science. Suppose if we are working on a dataset that contains gender as ‘male’ and ‘female’ then we can assign numbers like ‘0’ and ‘1’ respectively so that our algorithms can work on the data.
  2. This technique can also be applied to replace some particular values in a datasets with new values.


Similar Reads

Convert Strings to Numbers and Numbers to Strings in Python
In Python, strings or numbers can be converted to a number of strings using various inbuilt functions like str(), int(), float(), etc. Let's see how to use each of them. Example 1: Converting a Python String to an int: C/C++ Code # code # gfg contains string 10 gfg = "10" # using the int(), string is auto converted to int print(int(gfg)+2
2 min read
Pandas Functions in Python: A Toolkit for Data Analysis
Pandas is one of the most used libraries in Python for data science or data analysis. It can read data from CSV or Excel files, manipulate the data, and generate insights from it. Pandas can also be used to clean data, filter data, and visualize data. Whether you are a beginner or an experienced professional, Pandas functions can help you to save t
6 min read
Python | Math operations for Data analysis
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.There are some important math operations that can be performed on a pandas series to simplify data analysis using Python and save a lot o
2 min read
Python | Remove empty strings from list of strings
In many scenarios, we encounter the issue of getting an empty string in a huge amount of data and handling that sometimes becomes a tedious task. Let's discuss certain way-outs to remove empty strings from list of strings. Method #1: Using remove() This particular method is quite naive and not recommended use, but is indeed a method to perform this
7 min read
Python | Tokenizing strings in list of strings
Sometimes, while working with data, we need to perform the string tokenization of the strings that we might get as an input as list of strings. This has a usecase in many application of Machine Learning. Let's discuss certain ways in which this can be done. Method #1 : Using list comprehension + split() We can achieve this particular task using lis
3 min read
Python - Find all the strings that are substrings to the given list of strings
Given two lists, the task is to write a Python program to extract all the strings which are possible substring to any of strings in another list. Example: Input : test_list1 = ["Geeksforgeeks", "best", "for", "geeks"], test_list2 = ["Geeks", "win", "or", "learn"] Output : ['Geeks', 'or'] Explanation : "Geeks" occurs in "Geeksforgeeks string as subs
5 min read
Python | Replacing Nth occurrence of multiple characters in a String with the given character
Given a String S, a character array ch[], a number N and a replacing character, the task is to replace every Nth occurrence of each character of the character array ch[] in the string with the given replacing character. Input: S = "GeeksforGeeks", ch[] = {'G', 'e', 'k'}, N = 2, replacing_character = '#' Output: Ge#ksfor#ee#s Explanation: In the giv
4 min read
Replacing column value of a CSV file in Python
Let us see how we can replace the column value of a CSV file in Python. CSV file is nothing but a comma-delimited file. Method 1: Using Native Python way Using replace() method, we can replace easily a text into another text. In the below code, let us have an input CSV file as "csvfile.csv" and be opened in "read" mode. The join() method takes all
2 min read
Replacing missing values using Pandas in Python
Dataset is a collection of attributes and rows. Data set can have missing data that are represented by NA in Python and in this article, we are going to replace missing values in this article We consider this data set: Dataset In our data contains missing values in quantity, price, bought, forenoon and afternoon columns, So, We can replace missing
2 min read
Python - Replacing by Greatest Neighbour in list
Given a list, the task is to write a Python program to replace with the greatest neighbor among previous and next elements. Input : test_list = [5, 4, 2, 5, 8, 2, 1, 9], Output : [5, 5, 5, 8, 8, 8, 9, 9] Explanation : 4 is having 5 and 2 as neighbours, replaced by 5 as greater than 2. Input : test_list = [5, 4, 2, 5], Output : [5, 5, 5, 5] Explanat
2 min read
Article Tags :
Practice Tags :