Handling nan values is a tedious task while data cleaning. In this article, we will discuss different ways to drop rows with nan values from a pandas dataframe using the dropna()
method.
- The dropna() Method
- Drop Rows Having NaN Values in Any Column in a Dataframe
- Drop Rows Having NaN Values in All the Columns in a Dataframe
- Drop Rows Having Non-null Values in at Least N Columns
- Drop Rows Having at Least N Null Values in Pandas Dataframe
- Drop Rows Having NaN Values in Specific Columns in Pandas
- Drop Rows With NaN Values Inplace From a Pandas Dataframe
- Conclusion
The dropna() Method
The dropna()
method can be used to drop rows having nan values in a pandas dataframe. It has the following syntax.
DataFrame.dropna(*, axis=0, how=_NoDefault.no_default, thresh=_NoDefault.no_default, subset=None, inplace=False)
Here,
- The
axis
parameter is used to decide if we want to drop rows or columns that have nan values. By default, theaxis
parameter is set to 0. Due to this, rows with nan values are dropped when thedropna()
method is executed on the dataframe. - The
“how”
parameter is used to determine if the row that needs to be dropped should have all the values as NaN or if it can be deleted for having at least one NaN value. By default, the“how”
parameter is set to“any”
. Due to this even if a single nan value is present, the row will be deleted from the dataframe. - The
thresh
parameter is used when we want to drop rows if they have at least a specific number of non-NaN values present. For instance, if you want to delete a row if it has less than n non-null values, you can pass the number n to thethresh
parameter. - The
subset
parameter is used when we want to check for NaN values in only specific columns in each row. By default, thesubset
parameter is set to None. Hence, thedropna()
method searches for NaN values in all the columns. If you want it to search for nan values in only a specific column in each row, you can pass the column name to thesubset
parameter. To check for nan value in two or more columns, you can pass the list of column names to thesubset
parameter. - The
inplace
parameter is used to decide if we get a new dataframe after the drop operation or if we want to modify the original dataframe. Wheninplace
is set to False, which is its default value, the original dataframe isn’t changed and the dropna() method returns the modified dataframe after execution. To modify the original dataframe, you can setinplace
to True.
Drop Rows Having NaN Values in Any Column in a Dataframe
To drop rows from a pandas dataframe that have nan values in any of the columns, you can directly invoke the dropna()
method on the input dataframe. After execution, it returns a modified dataframe with nan values removed from it. You can observe this in the following example.
import pandas as pd
df=pd.read_csv("grade2.csv")
print("The dataframe is:")
print(df)
df=df.dropna()
print("After dropping NaN values:")
print(df)
Output:
The dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 NaN NaN NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 NaN NaN NaN NaN NaN
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
9 NaN NaN NaN NaN NaN
10 3.0 15.0 Lokesh 88.0 A
After dropping NaN values:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
3 3.0 34.0 Amy 88.0 A
5 3.0 27.0 Aditya 55.0 C
7 3.0 23.0 Radheshyam 78.0 B
10 3.0 15.0 Lokesh 88.0 A
In the above example, the input dataframe contains many rows with NaN values. Once we invoke the dropna()
method on the input dataframe, it returns a dataframe that has no null values in it.
Drop Rows Having NaN Values in All the Columns in a Dataframe
By default, the dropna()
method drops rows from a dataframe if it has NaN value in at least one column. If you want to drop a dataframe only if it has NaN values in all the columns, you can set the “how”
parameter in the dropna()
method to “all”
. After this, the rows are dropped from the dataframe only when all the columns in any row contain NaN values.
import pandas as pd
df=pd.read_csv("grade2.csv")
print("The dataframe is:")
print(df)
df=df.dropna(how="all")
print("After dropping NaN values:")
print(df)
Output:
The dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 NaN NaN NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 NaN NaN NaN NaN NaN
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
9 NaN NaN NaN NaN NaN
10 3.0 15.0 Lokesh 88.0 A
After dropping NaN values:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 NaN NaN NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
10 3.0 15.0 Lokesh 88.0 A
In this example, we have set the how parameter to "all"
in the dropna()
method. Due to this, only those rows are deleted from the input dataframe where all the values are Null. Thus, only two rows having NaN values in all the columns are dropped from the input dataframe instead of the five rows as observed in the previous example.
Drop Rows Having Non-null Values in at Least N Columns
Instead of one or all, you might also want to have control over the number of nan values in each row. For this, you can specify the minimum number of non-null values in each row in the output dataframe using the thresh
parameter in the dropna()
method. After this, the output dataframe returned by the dropna()
method will contain at least N on null values in each row. Here, N is the number passed as an input argument to the thresh parameter. You can observe this in the following example.
import pandas as pd
df=pd.read_csv("grade2.csv")
print("The dataframe is:")
print(df)
df=df.dropna(thresh=4)
print("After dropping NaN values:")
print(df)
Output:
The dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 NaN NaN NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 NaN NaN NaN NaN NaN
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
9 NaN NaN NaN NaN NaN
10 3.0 15.0 Lokesh 88.0 A
After dropping NaN values:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
10 3.0 15.0 Lokesh 88.0 A
In this example, we have specified the parameter thresh=4
in the dropna()
method. Due to this, only those rows are dropped from the input dataframe that have less than 4 Non-null values. Even if a row has a null value and has more than 4 non-null values, it isn’t dropped from the dataframe.
Drop Rows Having at Least N Null Values in Pandas Dataframe
Instead of keeping at least N non-null values in each row, you might want to drop all the rows from the input dataframe that have more than N null values. For this, we will first find the number of columns in the input dataframe using the columns attribute and the len()
function. Next, we will subtract N from the total number of columns in the dataframe. The resultant number will be the least number of non-null values that we want in the output dataframe. Hence, we will pass the number to the thresh
parameter in the dropna()
method.
After execution of the dropna()
method, we will get the output dataframe after dropping all the rows having at least n null values in each row. You can observe this in the following example.
import pandas as pd
df=pd.read_csv("grade2.csv")
print("The dataframe is:")
print(df)
N=3
number_of_columns=len(df.columns)
df=df.dropna(thresh=number_of_columns-N+1)
print("After dropping NaN values:")
print(df)
Output:
The dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 NaN NaN NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 NaN NaN NaN NaN NaN
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
9 NaN NaN NaN NaN NaN
10 3.0 15.0 Lokesh 88.0 A
After dropping NaN values:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
10 3.0 15.0 Lokesh 88.0 A
This example is just a variation of the previous example. If you want to drop rows having more than N null values, you need to preserve rows having the number of columns-N+1 or more non-null values. That’s what we have done in this example.
Drop Rows Having NaN Values in Specific Columns in Pandas
By default, the dropna()
method searches for NaN values in all the columns in each row. If you want to drop rows from a dataframe only if it has null values in specific columns, you can use the subset
parameter in the dropna()
method.
The subset
parameter in the dropna()
method takes a list of column names as its input argument. After this, the dropna()
method drops rows with null values only in the specified columns. You can observe this in the following example.
import pandas as pd
df=pd.read_csv("grade2.csv")
print("The dataframe is:")
print(df)
df=df.dropna(subset=["Class","Roll","Marks"])
print("After dropping NaN values:")
print(df)
Output:
The dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 NaN NaN NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 NaN NaN NaN NaN NaN
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
9 NaN NaN NaN NaN NaN
10 3.0 15.0 Lokesh 88.0 A
After dropping NaN values:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
10 3.0 15.0 Lokesh 88.0 A
In this example, we have passed the list ["Class", "Roll", "Marks"]
to the subset
parameter in the dropna()
method. Due to this the dropna()
method searches for NaN values in only these columns of the dataframe. Any row having NaN values in these columns is dropped from the dataframe after execution of the dropna()
method. If a row has non-null values in these columns, it won’t be dropped from the dataframe if it has NaN values in other columns.
Suggested Reading: If you are into machine learning, you can read this MLFlow tutorial with code examples. You might also like this article on 15 Free Data Visualization Tools for 2023.
Drop Rows With NaN Values Inplace From a Pandas Dataframe
In all the examples in the previous sections, the dropna()
method doesn’t modify the input dataframe. Every time, it returns a new dataframe. To modify the input dataframe by dropping nan values, you can use the inplace
parameter in the dropna()
method. When the inplace
parameter is set to True, the dropna()
method modifies the original dataframe instead of creating a new one. You can observe this in the following example.
import pandas as pd
df=pd.read_csv("grade2.csv")
print("The dataframe is:")
print(df)
df.dropna(inplace=True)
print("After dropping NaN values:")
print(df)
Output:
The dataframe is:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
2 3.0 33.0 NaN NaN NaN
3 3.0 34.0 Amy 88.0 A
4 3.0 15.0 NaN 78.0 B
5 3.0 27.0 Aditya 55.0 C
6 NaN NaN NaN NaN NaN
7 3.0 23.0 Radheshyam 78.0 B
8 3.0 11.0 Bobby 50.0 NaN
9 NaN NaN NaN NaN NaN
10 3.0 15.0 Lokesh 88.0 A
After dropping NaN values:
Class Roll Name Marks Grade
0 2.0 27.0 Harsh 55.0 C
1 2.0 23.0 Clara 78.0 B
3 3.0 34.0 Amy 88.0 A
5 3.0 27.0 Aditya 55.0 C
7 3.0 23.0 Radheshyam 78.0 B
10 3.0 15.0 Lokesh 88.0 A
In this example, we have set the inplace
parameter to True in the dropna()
method. Hence, the dropna()
method modifies the original dataframe instead of creating a new one.
Conclusion
In this article, we have discussed different ways to drop rows with NaN values from a pandas dataframe using the dropna()
method.
To know more about the pandas module, you can read this article on how to sort a pandas dataframe. You might also like this article on how to drop columns from a pandas dataframe.
I hope you enjoyed reading this article. Stay tuned for more informative articles.
Happy Learning!
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.