While working with pandas dataframe, we often filter data using different conditions. In this article, we will discuss how we can use the pandas where method to filter and replace data from a series or dataframe.
The Pandas where() Method
We use the pandas where()
method to replace a value based on a condition. The where()
method has the following syntax.
DataFrame.where(cond, other=_NoDefault.no_default, *, inplace=False, axis=None, level=None)
Here,
- The
cond
parameter takes a condition or multiple conditional statements as input arguments. The conditional statements must evaluate to a series of True and False values. If thecond
parameter is True for a row, the data is preserved in that row. All the values are set to None for the rows where thecond
parameter evaluates to False. - The
other
parameter takes a function, series, dataframe, or scaler value as its input argument. All the entries where thecond
parameter evaluates to False are replaced with the corresponding value from theother
parameter. If we pass a function to theother
parameter, it is computed on the DataFrame and should return scalar or Series/DataFrame. The function must not change the input DataFrame. If we don’t specify theother
parameter, all the values are set to None for the rows where thecond
parameter evaluates to False. - By default, the
where()
method returns a new dataframe after execution. If you want to modify the existing dataframe using thewhere()
method, you can set theinplace
parameter to True. After this, the original dataframe will be modified to store the output. - The
axis
parameter is used to set the alignment axis if needed. For Series, theaxis
parameter is unused. For dataframes, it has a default value of 0. - The
level
parameter is used to set the alignment level if required.
Now, let us discuss how we can use the where() method with a series or a dataframe.
Pandas Where() Method With Series in Python
When we invoke the where()
method on a pandas series, it takes a condition as its input argument. After execution, it returns a new series. In the output series, the values that fulfill the condition in the input argument and unchanged while the rest of the values are set to None. You can observe this in the following example.
import pandas as pd
series=pd.Series([1,23,12,423,4,53,231,234,1])
print("The input series is:")
print(series)
output=series.where(series>50)
print("The output series is:")
print(output)
Output:
The input series is:
0 1
1 23
2 12
3 423
4 4
5 53
6 231
7 234
8 1
dtype: int64
The output series is:
0 NaN
1 NaN
2 NaN
3 423.0
4 NaN
5 53.0
6 231.0
7 234.0
8 NaN
dtype: float64
In the above example, we passed the condition series>50
to the where()
method. In the output series, you can observe that the where()
method preserves the numbers greater than 50. On the other hand, values less than 50 are set to None.
Replace a Value Based on a Condition Using The where() Method
Instead of None, we can also set a replacement value for the values in the series that don’t fulfill the condition given in the input to the where()
method. For this, we will pass the replacement value as the second input argument to the where()
method. After execution, it returns a series in which the values that fulfill the condition remain unchanged while the other values are replaced using the replacement value. You can observe this in the following example.
import pandas as pd
series=pd.Series([1,23,12,423,4,53,231,234,1])
print("The input series is:")
print(series)
output=series.where(series>50,-1)
print("The output series is:")
print(output)
Output:
The input series is:
0 1
1 23
2 12
3 423
4 4
5 53
6 231
7 234
8 1
dtype: int64
The output series is:
0 -1
1 -1
2 -1
3 423
4 -1
5 53
6 231
7 234
8 -1
dtype: int64
In the above example, we have set the other parameter to -1. Hence, the numbers less than 50 are set to -1 in the output dataframe.
Replace a Value Using a Function Based on a Condition Using The where() Method
Instead of a value, we can also pass a function for replacing the values in the series using the where()
method. For instance, consider the following example.
def myFun(x):
return x**2
import pandas as pd
series=pd.Series([1,23,12,423,4,53,231,234,1])
print("The input series is:")
print(series)
output=series.where(series>50,other=myFun)
print("The output series is:")
print(output)
Output:
The input series is:
0 1
1 23
2 12
3 423
4 4
5 53
6 231
7 234
8 1
dtype: int64
The output series is:
0 1
1 529
2 144
3 423
4 16
5 53
6 231
7 234
8 1
dtype: int64
In the above code, we have defined a function myFun()
that takes a number and returns its square. Then, we passed the function to the other parameter in the where()
method. After this, the values less than 50 are first passed to the function myFun()
. The where()
method then gives the output of myFun()
function in the output series in all the positions where the cond
parameter is False.
Pandas Where Method With DataFrame
Instead of a series, we can also use the where()
method on a dataframe. When we invoke the where()
method on a dataframe, it takes a condition as its input argument. After execution, it returns a dataframe created from the input dataframe.
Here, the rows that fulfill the condition given as input to the where()
method remain unchanged. All the other rows are filled with a None value. You can observe this in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df1=df.where(df["Maths"]>80)
print("The output dataframe is:")
print(df1)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The output dataframe is:
Roll Maths Physics Chemistry
0 1.0 100.0 80.0 90.0
1 NaN NaN NaN NaN
2 3.0 90.0 80.0 70.0
3 4.0 100.0 100.0 90.0
4 5.0 90.0 90.0 80.0
5 NaN NaN NaN NaN
Instead of the None value, we can also give a replacement value to the where()
method as shown below.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df1=df.where(df["Maths"]>80,"LOW")
print("The output dataframe is:")
print(df1)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The output dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 LOW LOW LOW LOW
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 LOW LOW LOW LOW
In the above examples, you can observe that the where()
method works in a similar manner it works with a series. The only difference is that the results are applied to the entire row instead of a single value.
Pandas where() Method With Multiple Conditions
We can also use multiple conditions in a single where method. For this, we will operate all the conditions with AND/OR
logical operator. After the execution of each condition, the logical operations are performed and we get a mask containing True and False values. The mask is then used to create the output dataframe. You can observe this in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df1=df.where((df["Maths"]>80) & (df["Chemistry"]>80))
print("The output dataframe is:")
print(df1)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The output dataframe is:
Roll Maths Physics Chemistry
0 1.0 100.0 80.0 90.0
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 4.0 100.0 100.0 90.0
4 NaN NaN NaN NaN
5 NaN NaN NaN NaN
Conclusion
In this article, we discussed different ways to use the pandas where method with a series or dataframe in Python. To learn more about Python programming, you can read this article on how to read excel into pandas dataframe. You might also like this article on how to map functions to a pandas series in Python.
I hope you enjoyed reading this article. Stay tuned for more informative articles.
Happy Learning!
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.