Pandas dataframes are used to manipulate tabular data in Python. Sometimes, while manipulating the data, we need to replace certain values in the pandas dataframe. In this article, we will discuss different ways to replace a value in a pandas dataframe.
The replace() Method
To replace one or more values in a pandas dataframe, you can use the replace()
method. It has the following syntax.
DataFrame.replace(to_replace=None, value=_NoDefault.no_default, *, inplace=False, limit=None, regex=False, method=_NoDefault.no_default)
Here,
- The
to_repalce
parameter takes a string, regex, list, dictionary, series, integer, or a floating point number as its input argument.- If the input given to the
to_replace
parameter is a string, integer, floating point number, or a regex, the values matching to the input are replaced by the input given to thevalue
parameter.
- If the input given to the
- If we pass a list of strings, numeric values, or regexes to the
to_replace
parameter, it works in two ways.- If the input given to the
value
parameter is a single value, all the elements of the list passed to theto_replace
parameter are replaced by the same value. - If the input given to the
value
parameter is a list, lists given to both theto_replace
parameter and thevalue
parameter must have equal length. The values in the list given to theto_replace
parameter are replaced by the values at the corresponding position in the list given to thevalue
parameter.
- If the input given to the
- If the input given to the to_replace parameter is a python dictionary, it works in two ways.
- If the
value
parameter is set to None, the keys of the dictionary are replaced with the associated values. - If the
value
parameter is not None, the keys of the dictionary should be column names and the associated values are the values to be replaced with the input given to thevalue
parameter.
- If the
- By default, the
replace()
method returns a new dataframe. If you want to modify the original dataframe, you can set theinplace
parameter to True. - When we specify the
to_replace
parameter and thevalue
parameter is set to None, thereplace()
method works as the pandas fillna method. In this case, the values given to theto_replace
parameter are first replaced with NaN. Then, the nan values are replaced using the method specified in the method parameter. You can specify the values ‘pad
’, ‘ffill
’, and ‘bfill
’ for pad, forward fill, and backward fill respectively. - The
limit
parameter is used to fill nan values when thereplace()
method works as thefillna()
method. - The
regex
parameter is used to specify whether to interpretto_replace
and/or value as regular expressions. If this is True thento_replace
must be a string. Alternatively, this could be a regular expression or a list, dict, or array of regular expressions in which caseto_replace
must be None.
After execution, the replace()
method returns a new dataframe if the inplace
parameter is set to False. Otherwise, it returns None. If invoked on a pandas series, the replace()
method returns a series.
Replace Value in a Series in Python
To replace a value in a series, we will pass the value to be replaced and the new value to the replace()
method as shown in the following example.
import pandas as pd
import numpy as np
numbers=[3,23,100,14,16,100,45,65]
series=pd.Series(numbers)
print("The series is:")
print(series)
newSeries=series.replace(100,"Max")
print("The updated series is:")
print(newSeries)
Output:
The series is:
0 3
1 23
2 100
3 14
4 16
5 100
6 45
7 65
dtype: int64
The updated series is:
0 3
1 23
2 Max
3 14
4 16
5 Max
6 45
7 65
dtype: object
In this example, we first created a series using a python list. Then, we invoked the replace()
method on the series with 100 as its first input argument and the python literal “Max
” as the second input argument. After execution, the replace()
method replaces each instance of 100 with "Max"
and returns a new series.
Pandas Replace Single Value in the Entire Dataframe
To replace a value in a pandas dataframe, We will invoke the replace()
method on the dataframe. Here, we will pass the value that needs to be replaced as the first input argument and the new value as the second input argument to the replace()
method as shown below.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":87, "Chemistry": 82},
{"Roll":2,"Maths":75, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":87, "Physics":84, "Chemistry": 76},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":87, "Chemistry": 84},
{"Roll":6,"Maths":79, "Physics":75, "Chemistry": 72}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
newDf=df.replace(100,"Max")
print("The updated dataframe is:")
print(newDf)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 87 82
1 2 75 100 90
2 3 87 84 76
3 4 100 100 90
4 5 90 87 84
5 6 79 75 72
The updated dataframe is:
Roll Maths Physics Chemistry
0 1 Max 87 82
1 2 75 Max 90
2 3 87 84 76
3 4 Max Max 90
4 5 90 87 84
5 6 79 75 72
In the above example, we first converted a list of dictionaries to dataframe. Then, we invoked the replace()
method on the dataframe with 100 as its first input argument and "Max"
as the second input argument. After execution, the replace()
method replaces each instance of 100
with "Max"
in the original dataframe and returns a new dataframe.
Replace Value in a Single Column in a Dataframe
Instead of replacing value in the entire dataframe, you can also replace a value in a single column of a pandas dataframe.
To replace a value in a specific column, we will invoke the replace()
method on the column instead of the entire dataframe.You can observe this in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":87, "Chemistry": 82},
{"Roll":2,"Maths":75, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":87, "Physics":84, "Chemistry": 76},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":87, "Chemistry": 84},
{"Roll":6,"Maths":79, "Physics":75, "Chemistry": 72}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Maths"]=df["Maths"].replace(100,"Max")
print("The updated dataframe is:")
print(df)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 87 82
1 2 75 100 90
2 3 87 84 76
3 4 100 100 90
4 5 90 87 84
5 6 79 75 72
The updated dataframe is:
Roll Maths Physics Chemistry
0 1 Max 87 82
1 2 75 100 90
2 3 87 84 76
3 4 Max 100 90
4 5 90 87 84
5 6 79 75 72
In the above example, we have invoked the replace()
method on a column of the dataframe. After execution, the replace()
method returns a new series object. We then assign the same object to the existing column in the dataframe.
Pandas Replace Different Value in Each Column
If you want to replace different values in different columns with a single final value, you can pass a dictionary to the replace()
method as the first input argument.
Here, the dictionary should contain the column names as its keys and the values that need to be replaced in the columns as the corresponding values of the keys in the dictionary. You can specify the replacement value as the second input argument to the replace()
method. After execution, you will get the desired output as shown below.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":87, "Chemistry": 82},
{"Roll":2,"Maths":75, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":87, "Physics":84, "Chemistry": 76},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":87, "Chemistry": 84},
{"Roll":6,"Maths":79, "Physics":75, "Chemistry": 72}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
newDf=df.replace({"Maths":100,"Physics":100, "Chemistry":90},"Max")
print("The updated dataframe is:")
print(newDf)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 87 82
1 2 75 100 90
2 3 87 84 76
3 4 100 100 90
4 5 90 87 84
5 6 79 75 72
The updated dataframe is:
Roll Maths Physics Chemistry
0 1 Max 87 82
1 2 75 Max Max
2 3 87 84 76
3 4 Max Max Max
4 5 90 87 84
5 6 79 75 72
In the original dataframe, the column "Chemistry"
has 90 has its highest value. So, when we replace 100 with "Max"
, we cannot specify the rows that have maximum marks in Chemistry
.
To specify the value to replace in each column, we have passed a python dictionary containing the column names as the keys and the maximum value in each column as the associated value to the replace()
method as its first input argument and the term "Max"
as the second input argument. Hence, after execution of the replace()
method replaces the value 100 in the columns "Maths"
, and "Physics"
. In the column "Chemistry"
, it replaces the value 90 with "Max"
as specified in the dictionary.
Replace Value Inplace in a Pandas Dataframe
In the above examples, the replace()
method returns a new dataframe or series after execution. If you want to modify the existing series or dataframe after using the replace()
method, you can set the inplace
parameter to True. After this, the original series or dataframe will be modified. You can observe this in the following example.
import pandas as pd
import numpy as np
numbers=[3,23,100,14,16,100,45,65]
series=pd.Series(numbers)
print("The series is:")
print(series)
series.replace(100,"Max",inplace=True)
print("The updated series is:")
print(series)
Output:
The series is:
0 3
1 23
2 100
3 14
4 16
5 100
6 45
7 65
dtype: int64
The updated series is:
0 3
1 23
2 Max
3 14
4 16
5 Max
6 45
7 65
dtype: object
In this example, we have set the inplace
parameter to True in the replace()
method. Hence, the replace()
method modifies the original series instead of returning a new series.
In a similar manner, you can replace a value in a pandas dataframe inplace as shown in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":87, "Chemistry": 82},
{"Roll":2,"Maths":75, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":87, "Physics":84, "Chemistry": 76},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":87, "Chemistry": 84},
{"Roll":6,"Maths":79, "Physics":75, "Chemistry": 72}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df.replace({"Maths":100,"Physics":100, "Chemistry":90},"Max",inplace=True)
print("The updated dataframe is:")
print(df)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 87 82
1 2 75 100 90
2 3 87 84 76
3 4 100 100 90
4 5 90 87 84
5 6 79 75 72
The updated dataframe is:
Roll Maths Physics Chemistry
0 1 Max 87 82
1 2 75 Max Max
2 3 87 84 76
3 4 Max Max Max
4 5 90 87 84
5 6 79 75 72
Conclusion
In this article, we have discussed different ways to replace a value in a pandas dataframe and series. We also discussed how to replace different values in different columns by a single value.
To learn more about python programming, you can read this article on how to sort a pandas dataframe. You might also like this article on how to drop columns from a pandas dataframe.
I hope you enjoyed reading this article. Stay tuned for more informative articles.
Happy Learning!
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.