While working with dataframes in python, we often need to delete one or more columns from the dataframe while data preprocessing. In this article, we will discuss different ways to drop columns from a pandas dataframe in python.
- The drop() Method
- Pandas Drop Columns by Name in Python
- Drop Columns by Index From a Pandas Dataframe
- Pandas Drop Columns in Place in Python
- Pandas Drop Columns if They Exist in Python
- Drop Multiple Columns From a Pandas Dataframe
- The dropna() Method
- Drop Columns With NaN Values in a Pandas Dataframe
- Drop Columns With at Least N NaN Values in a Dataframe
- Drop Columns From a Dataframe Using the pop() Method
- Conclusion
The drop() Method
The drop() method can be used to drop columns or rows from a pandas dataframe. It has the following syntax.
DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
Here,
- The
index
parameter is used when we have to drop a row from the dataframe. Theindex
parameter takes an index or a list of indices that have to be deleted as its input argument. - The
columns
parameter is used when we need to drop a column from the dataframe. The columns parameter takes a column name or a list of column names that need to be dropped as its input argument. - The
labels
parameter represents the index or column label that we need to remove from the dataframe. To drop rows from a dataframe, we use theindex
label. To drop a column from a dataframe, we use thecolumns
label. To drop two or more columns or rows, we can also pass a list of column names of indices to the columns and index labels respectively. - When we don’t use the index and columns parameter, we pass the column name or the index of the row that needs to be deleted to the
labels
parameter as its input argument. In such cases, we use theaxis
parameter to decide if we want to drop a row or a column. If we want to drop a column from the dataframe, we set theaxis
parameter to 1. When we want to drop a row from the dataframe, we set theaxis
parameter to 0 which is its default value. - The
level
parameter is used to drop rows or columns from a dataframe when we have multilevel indices. Thelevel
parameter takes the index level or the index name of the column or row that we want to drop from the dataframe. To drop two or more levels, you can pass the list of index levels or index names to thelevel
parameter. - The
inplace
parameter is used to decide if we get a new dataframe after the drop operation or if we want to modify the original dataframe. Wheninplace
is set toFalse
, which is its default value, the original dataframe isn’t changed and thedrop()
method returns the modified dataframe after execution. To modify the original dataframe, you can setinplace
toTrue
. - The
errors
parameter is used to decide if we want to raise exceptions and errors while executing thedrop()
method. By default, the errors parameter is set to“raise”
. Due to this, thedrop()
method raises an exception if anything goes bad while execution. If you don’t want the errors to be raised, you can set the errors parameter to“ignore”
. After this, thedrop()
method will suppress all the exceptions.
After execution, the drop()
method returns the modified data frame if the inplace
parameter is set to False
. Otherwise, it returns None
.
Pandas Drop Columns by Name in Python
To drop a column from a pandas dataframe by name, you can pass the column name to the labels
parameter and set the axis
parameter to 1 in the drop()
method.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(labels="Marks",axis=1)
print("The output dataframe is:")
print(output_df)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Grade
0 1 11 Aditya A
1 1 12 Chris A
2 1 14 Sam B
3 1 15 Harry NaN
4 2 22 Tom B
5 2 15 Golu B
6 2 27 Harsh C
7 2 23 Clara B
8 3 34 Amy A
9 3 15 Prashant B
10 3 27 Aditya C
11 3 23 Radheshyam NaN
In the above example, we first read the pandas dataframe from the CSV file using the read_csv
() method. Then, we invoked the drop()
method on the input dataframe with “Marks" as an
input argument to the labels
parameter and set axis=1
. You can observe that the dataframe returned by the drop()
method doesn’t have the "Marks"
column.
Instead of using the labels
and the axis
parameter, you can pass the column name to the columns
parameter in the drop()
method as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(columns="Marks")
print("The output dataframe is:")
print(output_df)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Grade
0 1 11 Aditya A
1 1 12 Chris A
2 1 14 Sam B
3 1 15 Harry NaN
4 2 22 Tom B
5 2 15 Golu B
6 2 27 Harsh C
7 2 23 Clara B
8 3 34 Amy A
9 3 15 Prashant B
10 3 27 Aditya C
11 3 23 Radheshyam NaN
In the above example, instead of using two parameters i.e. labels
and axis
, we have used the columns
parameter to drop the "Marks"
column from the input dataframe.
Drop Columns by Index From a Pandas Dataframe
Instead of using the column name, you can also drop columns by index from a dataframe.
To drop columns from a pandas dataframe by index, we first need to obtain the columns object using the columns attribute of the dataframe. Then, we can use the indexing operator with the index of the columns to drop the columns by index as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(labels=grades.columns[3],axis=1)
print("The output dataframe is:")
print(output_df)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Grade
0 1 11 Aditya A
1 1 12 Chris A
2 1 14 Sam B
3 1 15 Harry NaN
4 2 22 Tom B
5 2 15 Golu B
6 2 27 Harsh C
7 2 23 Clara B
8 3 34 Amy A
9 3 15 Prashant B
10 3 27 Aditya C
11 3 23 Radheshyam NaN
In the above example, we have first obtained the column names of the grades
dataframe using grades.columns
attribute. Then, we can obtain the element at index 3 in the grades.columns
attribute. Finally, we have passed the value to the labels
parameter. You can observe in the output that the dataframe returned by the drop()
method doesn’t have the 4th column of the input dataframe. Hence, we have successfully dropped a column of the dataframe using the index.
We can also use the columns
parameter instead of the labels
parameter as shown in the following example.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(columns=grades.columns[3])
print("The output dataframe is:")
print(output_df)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Grade
0 1 11 Aditya A
1 1 12 Chris A
2 1 14 Sam B
3 1 15 Harry NaN
4 2 22 Tom B
5 2 15 Golu B
6 2 27 Harsh C
7 2 23 Clara B
8 3 34 Amy A
9 3 15 Prashant B
10 3 27 Aditya C
11 3 23 Radheshyam NaN
Drop the First Column From a Dataframe in Python
To drop the first column of the dataframe, you pass the column name at index 0 to the labels
parameter as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(labels=grades.columns[0],axis=1)
print("The output dataframe is:")
print(output_df)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Roll Name Marks Grade
0 11 Aditya 85.0 A
1 12 Chris NaN A
2 14 Sam 75.0 B
3 15 Harry NaN NaN
4 22 Tom 73.0 B
5 15 Golu 79.0 B
6 27 Harsh 55.0 C
7 23 Clara NaN B
8 34 Amy 88.0 A
9 15 Prashant NaN B
10 27 Aditya 55.0 C
11 23 Radheshyam NaN NaN
In this example, we have to drop the column at the first position. Hence, we have passed the element at index 0 of the grades.columns
attribute of the input dataframe to the labels
parameter. As we have to drop a column of the dataframe, we also need to set axis=1
.
Instead of using the labels
and the axis
parameter, you can use the columns
parameter as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(columns=grades.columns[0])
print("The output dataframe is:")
print(output_df)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Roll Name Marks Grade
0 11 Aditya 85.0 A
1 12 Chris NaN A
2 14 Sam 75.0 B
3 15 Harry NaN NaN
4 22 Tom 73.0 B
5 15 Golu 79.0 B
6 27 Harsh 55.0 C
7 23 Clara NaN B
8 34 Amy 88.0 A
9 15 Prashant NaN B
10 27 Aditya 55.0 C
11 23 Radheshyam NaN NaN
In this example, we have passed the element at index 0 of the grades.columns
attribute to the columns
parameter in the drop()
method. The grades.columns
attribute essentially contains a list of column names. Hence, the process of dropping a column using the index of the column is similar to that using the name of the column. It just has some extra calculations. Hence, it is always better to drop columns by column names directly if we know the name of the columns in the dataframe.
Drop the Last Column From a Pandas Dataframe
To drop the last column of the dataframe, we will first find the index of the last column. For this, we will find the length of the columns attribute of the dataframe. After this, we will find the index of the last column by subtracting 1 from the length. Then, we will get the name of the last column using the index of the last column and the columns
attribute. Finally, we will pass the obtained column name to the labels
parameter. After execution, the drop()
method will drop the last column of the dataframe. You can observe this in the following example.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
lenDf=len(grades.columns)
output_df=grades.drop(labels=grades.columns[lenDf-1],axis=1)
print("The output dataframe is:")
print(output_df)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Marks
0 1 11 Aditya 85.0
1 1 12 Chris NaN
2 1 14 Sam 75.0
3 1 15 Harry NaN
4 2 22 Tom 73.0
5 2 15 Golu 79.0
6 2 27 Harsh 55.0
7 2 23 Clara NaN
8 3 34 Amy 88.0
9 3 15 Prashant NaN
10 3 27 Aditya 55.0
11 3 23 Radheshyam NaN
Instead of using the labels
parameter, you can use the columns parameter to drop columns from a pandas dataframe as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
lenDf=len(grades.columns)
output_df=grades.drop(columns=grades.columns[lenDf-1])
print("The output dataframe is:")
print(output_df)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Marks
0 1 11 Aditya 85.0
1 1 12 Chris NaN
2 1 14 Sam 75.0
3 1 15 Harry NaN
4 2 22 Tom 73.0
5 2 15 Golu 79.0
6 2 27 Harsh 55.0
7 2 23 Clara NaN
8 3 34 Amy 88.0
9 3 15 Prashant NaN
10 3 27 Aditya 55.0
11 3 23 Radheshyam NaN
Pandas Drop Columns in Place in Python
The drop()
method doesn’t modify the original dataframe by default. After dropping the specified columns, it returns the modified dataframe.
To drop columns from the original dataframe inplace, we will set the inplace
parameter to True
. After this, the drop()
method modifies the original dataframe instead of returning a new dataframe. You can observe this in the following example.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns="Marks",inplace=True)
print("The output dataframe is:")
print(grades)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Grade
0 1 11 Aditya A
1 1 12 Chris A
2 1 14 Sam B
3 1 15 Harry NaN
4 2 22 Tom B
5 2 15 Golu B
6 2 27 Harsh C
7 2 23 Clara B
8 3 34 Amy A
9 3 15 Prashant B
10 3 27 Aditya C
11 3 23 Radheshyam NaN
In this example, we have set the inplace
parameter to True
. As a result, the drop()
method modified the original dataframe.
Pandas Drop Columns if They Exist in Python
If a column name does not exist in the dataframe the drop()
method raises a KeyError exception.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns="Height",inplace=True)
print("The output dataframe is:")
print(grades)
Output
KeyError: "['Height'] not found in axis"
In this example, the "Height" parameter that is given as an
input argument to the drop()
method is not present as a column in the dataframe. Due to this, the drop()
method raises the KeyError exception.
To drop columns if they exist without running into exceptions, we will use the errors
parameter in the drop()
method. You can set the errors parameter to “ignore”
. After this, the drop()
method will suppress all the exceptions. It executes normally if the input dataframe contains the specified column.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns="Marks",inplace=True,errors="ignore")
print("The output dataframe is:")
print(grades)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Grade
0 1 11 Aditya A
1 1 12 Chris A
2 1 14 Sam B
3 1 15 Harry NaN
4 2 22 Tom B
5 2 15 Golu B
6 2 27 Harsh C
7 2 23 Clara B
8 3 34 Amy A
9 3 15 Prashant B
10 3 27 Aditya C
11 3 23 Radheshyam NaN
In the above example, the "Marks"
column is present in the dataframe. Hence, it is dropped while executing the drop()
method.
If the column specified in the columns parameter does not exist in the dataframe, the drop()
method does nothing. You can observe this in the following example.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns="Height",inplace=True,errors="ignore")
print("The output dataframe is:")
print(grades)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
In the above example, the "Height"
parameter that is given as an input argument to the drop()
method is not present as a column in the dataframe. So, the drop()
method has no effect on the original dataframe.
Drop Multiple Columns From a Pandas Dataframe
To drop multiple columns from a pandas dataframe, you can pass a list of column names to the labels parameter as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(labels=["Marks", "Grade"],axis=1,inplace=True)
print("The output dataframe is:")
print(grades)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 14 Sam
3 1 15 Harry
4 2 22 Tom
5 2 15 Golu
6 2 27 Harsh
7 2 23 Clara
8 3 34 Amy
9 3 15 Prashant
10 3 27 Aditya
11 3 23 Radheshyam
In the above example, we have passed the list ["Marks", "Grade"]
to the labels
parameter. Hence, the drop()
method drops both the Marks
and the Grade
column from the input dataframe.
Instead of the labels
parameter, you can use the columns
parameter to drop multiple columns from a pandas dataframe as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns=["Marks", "Grade"],inplace=True)
print("The output dataframe is:")
print(grades)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 14 Sam
3 1 15 Harry
4 2 22 Tom
5 2 15 Golu
6 2 27 Harsh
7 2 23 Clara
8 3 34 Amy
9 3 15 Prashant
10 3 27 Aditya
11 3 23 Radheshyam
You can also use the index of the columns to drop multiple columns from a pandas dataframe as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns=grades.columns[3:5],inplace=True)
print("The output dataframe is:")
print(grades)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 14 Sam
3 1 15 Harry
4 2 22 Tom
5 2 15 Golu
6 2 27 Harsh
7 2 23 Clara
8 3 34 Amy
9 3 15 Prashant
10 3 27 Aditya
11 3 23 Radheshyam
In this example, we have used list slicing to obtain the column names at the third and fourth positions of the dataframe. Using the index of the columns to drop a dataframe is more costly compared to using the column names. However, it can be useful in cases when we want to drop first n columns or last n columns or columns from specific indices if we don’t have the column names.
Drop First N Columns From a Pandas Dataframe
To drop the first n columns of a dataframe, we will obtain the columns object using the columns
attribute of the dataframe. Then, we can use the indexing operator with the index of the columns to drop the first n columns from the dataframe as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
n=2
grades.drop(columns=grades.columns[:n],inplace=True)
print("The output dataframe is:")
print(grades)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Name Marks Grade
0 Aditya 85.0 A
1 Chris NaN A
2 Sam 75.0 B
3 Harry NaN NaN
4 Tom 73.0 B
5 Golu 79.0 B
6 Harsh 55.0 C
7 Clara NaN B
8 Amy 88.0 A
9 Prashant NaN B
10 Aditya 55.0 C
11 Radheshyam NaN NaN
In the above example, we have taken a slice of the columns
attribute to obtain the column names of the columns at index 0 and 1. Then, we passed the slice to the columns parameter in the drop()
method to delete the columns.
Drop the Last N Columns From a Pandas Dataframe
To drop the last n columns of a dataframe, we will obtain the columns object using the columns
attribute of the dataframe. Then, we can use the indexing operator with the index of the columns to drop the last n columns from the dataframe as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
lenDf=len(grades.columns)
n=2
grades.drop(columns=grades.columns[lenDf-n:lenDf],inplace=True)
print("The output dataframe is:")
print(grades)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 14 Sam
3 1 15 Harry
4 2 22 Tom
5 2 15 Golu
6 2 27 Harsh
7 2 23 Clara
8 3 34 Amy
9 3 15 Prashant
10 3 27 Aditya
11 3 23 Radheshyam
In the above example, we have calculated the total number of columns in the dataframe and stored it in the lenDf
variable. Then, we used list slicing to obtain the last n columns from the grades.columns
list. After obtaining the name of the last n columns, we passed it to the columns parameter to drop the last n columns from the pandas dataframe.
Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on clustering mixed data types in Python.
The dropna() Method
The dropna()
method can be used to drop columns having nan values. It has the following syntax.
DataFrame.dropna(*, axis=0, how=_NoDefault.no_default, thresh=_NoDefault.no_default, subset=None, inplace=False)
Here,
- The
axis
parameter is used to decide if we want to drop rows or columns that have nan values. By default,axis
is set to 0. Due to this, rows with nan values are dropped when thedropna()
method is executed on the dataframe. To drop a column having nan values, you can set theaxis
parameter to 1. - The
how
parameter is used to determine if the column that needs to be dropped should have all the valuesNaN
or if it can be deleted for having at least oneNaN
value. By default, thehow
parameter is set to“any”
. Due to this even if a single nan is present, the column will be deleted from the dataframe. - The
thresh
parameter is used when we want to drop columns if they have at least a specific number of non-NaN values present. For instance, if you want to delete a column if it has less than n non-NaN values, you can pass the number n to thethresh
parameter. - The
subset
parameter is used when we want to check forNaN
values in only specific indices for each column. By default, the subset parameter is set toNone
. Hence, thedropna()
method searches forNaN
values in all the indices. If you want it to search for nan values in only a specific row, you can pass the row index to thesubset
parameter. To check for nan value in two or more rows, you can pass the list of indices to thesubset
parameter. - The
inplace
parameter is used to decide if we get a new dataframe after the drop operation or if we want to modify the original dataframe. Wheninplace
is set toFalse
, which is its default value, the original dataframe isn’t changed and thedropna()
method returns the modified dataframe after execution. To modify the original dataframe, you can setinplace
toTrue
.
After execution, the dropna()
method returns the modified data frame if inplace
is set to False
. Otherwise, it returns None
.
Drop Columns With NaN Values in a Pandas Dataframe
To drop a column with nan values, you can invoke the dropna()
method on the input dataframe. Additionally, you need to set the axis
parameter to 1.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.dropna(axis=1,inplace=True)
print("The output dataframe is:")
print(grades)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 14 Sam
3 1 15 Harry
4 2 22 Tom
5 2 15 Golu
6 2 27 Harsh
7 2 23 Clara
8 3 34 Amy
9 3 15 Prashant
10 3 27 Aditya
11 3 23 Radheshyam
In the above example, the Marks
and Grades
column have NaN
values in the input dataframe. Hence, they have been dropped from the dataframe after execution of the dropna()
method.
Drop Columns With at Least N NaN Values in a Dataframe
To drop a column only if it has at least n number of nan values, you can use the thresh
parameter along with the axis
parameter in the dropna()
method.
The thresh
parameter takes the minimum number of non-NaN elements as its input argument. If any column in the dataframe has less number of non-NaN values compared to that specified in the thresh parameter, the column is dropped from the dataframe after execution of the dropna()
method.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
lenDf=len(grades.index)
n=5
count=lenDf-n+1
grades.dropna(axis=1,inplace=True,thresh=count)
print("The output dataframe is:")
print(grades)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Grade
0 1 11 Aditya A
1 1 12 Chris A
2 1 14 Sam B
3 1 15 Harry NaN
4 2 22 Tom B
5 2 15 Golu B
6 2 27 Harsh C
7 2 23 Clara B
8 3 34 Amy A
9 3 15 Prashant B
10 3 27 Aditya C
11 3 23 Radheshyam NaN
In this example, we need to drop a column from a dataframe if it has at least n NaN values. So, it should have at least (number of rows in the dataframe -n +1) non-NaN values if a column has to be included in the output dataframe. If any column that has less than (number of rows in the dataframe -n +1) non-NaN values, it is dropped from the data frame.
Drop Columns From a Dataframe Using the pop() Method
The pop()
method can be used to drop a single column from a dataframe at once. It has the following syntax.
DataFrame.pop(item)
The pop()
method, when invoked on a dataframe, takes a column name as its input and drops the column from the original dataframe. It also returns the dropped column as output.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.pop("Marks")
print("The output dataframe is:")
print(grades)
Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Grade
0 1 11 Aditya A
1 1 12 Chris A
2 1 14 Sam B
3 1 15 Harry NaN
4 2 22 Tom B
5 2 15 Golu B
6 2 27 Harsh C
7 2 23 Clara B
8 3 34 Amy A
9 3 15 Prashant B
10 3 27 Aditya C
11 3 23 Radheshyam NaN
If the column is not present in the dataframe, the pop() method raises a KeyError exception. You can observe this in the following example.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.pop("Height")
print("The output dataframe is:")
print(grades)
Output:
KeyError: 'Height'
Here, the ‘Height’ parameter is not present in the dataframe. Hence, the pop() method raises the KeyError exception.
Conclusion
In this article, we have discussed different ways to drop columns from a pandas dataframe in Python. For this, we have used the drop()
method, the dropna()
method, and the pop()
method.
To learn more about python programming, you can read this article on dictionary comprehension in python. You might like this article on list comprehension in python too.
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.