Handling NaN values while analyzing data is an important task. The pandas module in python provides us with the fillna()
method to fill NaN values. In this article, we will discuss how to use the pandas fillna method to fill NaN values in Python.
- The filna() Method
- Use Pandas Fillna to Fill Nan Values in the Entire Dataframe
- Fill Different Values in Each Column in Pandas
- Only Fill the First N Null Values in Each Column
- Only Fill the First N Null Values in Each Row
- Pandas Fillna With the Last Valid Observation
- Pandas Fillna With the Next Valid Observation
- Pandas Fillna Inplace
- Conclusion
The filna() Method
You can fill NaN values in a pandas dataframe using the fillna()
method. It has the following syntax.
DataFrame.fillna(value=None, *, method=None, axis=None, inplace=False, limit=None, downcast=None)
Here,
- The
value
parameter takes the value that replaces the NaN values. You can also pass a python dictionary or a series to the value parameter. Here, the dictionary should contain the column names of the dataframe as its keys and the value that needs to be filled in the columns as the associated values. Similarly, the pandas series should contain the column names of the dataframe as the index and the replacement values as the associated value for each index. - The
method
parameter is used to fill NaN values in the dataframe if no input is given to thevalue
parameter. If thevalue
parameter is not None, themethod
parameter is set to None. Otherwise, we can assign the literal“ffill”
,“bfill”
,“backfill”
, or“pad”
to specify what values we want to fill in place of the NaN values. - The
axis
parameter is used to specify the axis along which to fill missing values. If you want to fill only specific rows or columns using the pandas fillna method, you can use theaxis
parameter. To fill NaN values in rows, theaxis
parameter is set to 1 or“columns”
. To fill values by to columns, theaxis
parameter is set to“index”
or 0. - By default, the pandas fillna method doesn’t modify the original dataframe. It returns a new dataframe after execution, to modify the original dataframe on which the
fillna()
method is invoked, you can set theinplace
parameter to True. - If the
method
parameter is specified, thelimit
parameter specifies the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more thanlimit
number of consecutive NaNs, it will only be partially filled. If themethod
parameter is not specified, thelimit
parameter takes the maximum number of entries along the entire axis where NaNs will be filled. It must be greater than 0 if not None. - The
downcast
parameter takes a dictionary as a map to decide what data types should be downcasted and the destination data type if there is a need to change the data types of the values.
Use Pandas Fillna to Fill Nan Values in the Entire Dataframe
To fill NaN values in a pandas dataframe using the fillna method, you pass the replacement value of the NaN value to the fillna()
method as shown in the following example.
import pandas as pd
import numpy as np
x=pd.read_csv("grade2.csv")
print("The original dataframe is:")
print(x)
x=x.fillna(0)
print("The modified dataframe is:")
print(x)
Output:
The original dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 NaN NaN NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 NaN NaN NaN NaN NaN
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
9 NaN NaN NaN NaN NaN
10 3.0 15.0 Lokesh 88.0 A
The modified dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 0 0.0 0
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 0 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 0.0 0.0 0 0.0 0
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 0
9 0.0 0.0 0 0.0 0
10 3.0 15.0 Lokesh 88.0 A
In the above example, we have passed the value 0 to the fillna()
method. Hence, all the NaN values in the input data frame are replaced by 0.
This approach isn’t very practical as different columns have different data types. So, we can choose to fill different values in different columns to replace the null values.
Fill Different Values in Each Column in Pandas
Instead of filling all the NaN values with the same value, you can also replace the NaN value in each column with a specific value. For this, we need to pass a dictionary containing column names as its keys and the values to be filled in the columns as the associated values to the fillna()
method. You can observe this in the following example.
import pandas as pd
import numpy as np
x=pd.read_csv("grade2.csv")
print("The original dataframe is:")
print(x)
x=x.fillna({"Class":1,"Roll":100,"Name":"PFB","Marks":0,"Grade":"F"})
print("The modified dataframe is:")
print(x)
Output:
The original dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 NaN NaN NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 NaN NaN NaN NaN NaN
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
9 NaN NaN NaN NaN NaN
10 3.0 15.0 Lokesh 88.0 A
The modified dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 PFB 0.0 F
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 PFB 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 1.0 100.0 PFB 0.0 F
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 F
9 1.0 100.0 PFB 0.0 F
10 3.0 15.0 Lokesh 88.0 A
In the above example, we have passed the dictionary {"Class" :1, "Roll": 100, "Name": "PFB", "Marks" : 0, "Grade": "F" }
to the fillna()
method as input. Due to this, the NaN values in the "Class"
column are replaced by 1, the NaN values in the "Roll"
column are replaced by 100, the NaN values in the "Name"
column are replaced by "PFB"
, and so on. Thus, When we pass the column names of the dataframe as key and a python literal as associated value to the key, the NaN values are replaced in each column of the dataframe according to the input dictionary.
Instead of giving all the column names as keys in the input dictionary, you can also choose to ignore some. In this case, the NaN values in the columns that are not present in the input dictionary are not considered for replacement. You can observe this in the following example.
import pandas as pd
import numpy as np
x=pd.read_csv("grade2.csv")
print("The original dataframe is:")
print(x)
x=x.fillna({"Class":1,"Roll":100,"Name":"PFB","Marks":0})
print("The modified dataframe is:")
print(x)
Output:
The original dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 NaN NaN NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 NaN NaN NaN NaN NaN
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
9 NaN NaN NaN NaN NaN
10 3.0 15.0 Lokesh 88.0 A
The modified dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 PFB 0.0 NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 PFB 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 1.0 100.0 PFB 0.0 NaN
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
9 1.0 100.0 PFB 0.0 NaN
10 3.0 15.0 Lokesh 88.0 A
In this example, we haven’t passed the "Grade"
column in the input dictionary to the fillna()
method. Hence, the NaN values in the "Grade"
column are not replaced by any other value.
Only Fill the First N Null Values in Each Column
Instead of filling all NaN values in each column, you can also limit the number of NaN values to be filled in each column. For this, you can pass the maximum number of values to be filled as input argument to the limit
parameter in the fillna()
method as shown below.
import pandas as pd
import numpy as np
x=pd.read_csv("grade2.csv")
print("The original dataframe is:")
print(x)
x=x.fillna(0, limit=3)
print("The modified dataframe is:")
print(x)
Output:
The original dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 NaN NaN NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 NaN NaN NaN NaN NaN
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
9 NaN NaN NaN NaN NaN
10 3.0 15.0 Lokesh 88.0 A
The modified dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 0 0.0 0
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 0 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 0.0 0.0 0 0.0 0
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 0
9 0.0 0.0 NaN 0.0 NaN
10 3.0 15.0 Lokesh 88.0 A
In the above example, we have set the limit
parameter to 3. Due to this, only the first three NaN values from each column are replaced by 0.
Only Fill the First N Null Values in Each Row
To fill only the first N null value in each row of the dataframe, you can pass the maximum number of values to be filled as an input argument to the limit
parameter in the fillna() method. Additionally, you need to specify that you want to fill the rows by setting the axis
parameter to 1. You can observe this in the following example.
import pandas as pd
import numpy as np
x=pd.read_csv("grade2.csv")
print("The original dataframe is:")
print(x)
x=x.fillna(0, limit=2,axis=1)
print("The modified dataframe is:")
print(x)
Output:
The original dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 NaN NaN NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 NaN NaN NaN NaN NaN
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
9 NaN NaN NaN NaN NaN
10 3.0 15.0 Lokesh 88.0 A
The modified dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 0.0 0.0 NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 0 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 0.0 0.0 NaN NaN NaN
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 0
9 0.0 0.0 NaN NaN NaN
10 3.0 15.0 Lokesh 88.0 A
In the above example, we have set the limit
parameter to 2 and the axis
parameter to 1. Hence, only two NaN values from each row are replaced by 0 when the fillna()
method is executed.
Pandas Fillna With the Last Valid Observation
Instead of specifying a new value, you can also fill NaN values using the existing values. For instance, you can fill the Null values using the last valid observation by setting the method parameter to “ffill”
as shown below.
import pandas as pd
import numpy as np
x=pd.read_csv("grade2.csv")
print("The original dataframe is:")
print(x)
x=x.fillna(method="ffill")
print("The modified dataframe is:")
print(x)
Output:
The original dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 NaN NaN NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 NaN NaN NaN NaN NaN
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
9 NaN NaN NaN NaN NaN
10 3.0 15.0 Lokesh 88.0 A
The modified dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 Clara 78.0 B
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 Amy 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 3.0 27.0 Aditya 55.0 C
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 B
9 3.0 11.0 Bobby 50.0 B
10 3.0 15.0 Lokesh 88.0 A
In this example, we have set the method parameter to "ffill"
. Hence, whenever a NaN value is encountered, the fillna()
method fills the particular cell with the non-null value in the preceding cell in the same column.
Pandas Fillna With the Next Valid Observation
You can fill the Null values using the next valid observation by setting the method
parameter to “bfill”
as shown below.
import pandas as pd
import numpy as np
x=pd.read_csv("grade2.csv")
print("The original dataframe is:")
print(x)
x=x.fillna(method="bfill")
print("The modified dataframe is:")
print(x)
Output:
The original dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 NaN NaN NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 NaN NaN NaN NaN NaN
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
9 NaN NaN NaN NaN NaN
10 3.0 15.0 Lokesh 88.0 A
The modified dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 Amy 88.0 A
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 Aditya 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 3.0 23.0 Radheshyam 78.0 B
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 A
9 3.0 15.0 Lokesh 88.0 A
10 3.0 15.0 Lokesh 88.0 A
In this example, we have set the method
parameter to "bfill"
. Hence, whenever a NaN value is encountered, the fillna()
method fills the particular cell with the non-null value in the next cell in the same column.
Pandas Fillna Inplace
By default, the fillna()
method returns a new dataframe after execution. To modify the existing dataframe instead of creating a new one, you can set the inplace
parameter to True in the fillna()
method as shown below.
import pandas as pd
import numpy as np
x=pd.read_csv("grade2.csv")
print("The original dataframe is:")
print(x)
x.fillna(method="bfill",inplace=True)
print("The modified dataframe is:")
print(x)
Output:
The original dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 NaN NaN NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 NaN NaN NaN NaN NaN
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
9 NaN NaN NaN NaN NaN
10 3.0 15.0 Lokesh 88.0 A
The modified dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 Amy 88.0 A
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 Aditya 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 3.0 23.0 Radheshyam 78.0 B
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 A
9 3.0 15.0 Lokesh 88.0 A
10 3.0 15.0 Lokesh 88.0 A
In this example, we have set the inplace parameter to True in the fillna()
method. Hence, the input dataframe is modified.
Conclusion
In this article, we have discussed how to use the pandas fillna method to fill nan values in Python.
To learn more about python programming, you can read this article on how to sort a pandas dataframe. You might also like this article on how to drop columns from a pandas dataframe.
I hope you enjoyed reading this article. Stay tuned for more informative articles.
Happy Learning!
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.