While working with dataframes in python, we often need to delete one or more columns from the dataframe while data preprocessing. In this article, we will discuss different ways to drop columns from a pandas dataframe in python.
- The drop() Method
- Pandas Drop Columns by Name in Python
- Drop Columns by Index From a Pandas Dataframe
- Pandas Drop Columns in Place in Python
- Pandas Drop Columns if They Exist in Python
- Drop Multiple Columns From a Pandas Dataframe
- The dropna() Method
- Drop Columns With NaN Values in a Pandas Dataframe
- Drop Columns With at Least N NaN Values in a Dataframe
- Drop Columns From a Dataframe Using the pop() Method
- Conclusion
The drop() Method
The drop() method can be used to drop columns or rows from a pandas dataframe. It has the following syntax.
DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')Here,
- The
indexparameter is used when we have to drop a row from the dataframe. Theindexparameter takes an index or a list of indices that have to be deleted as its input argument. - The
columnsparameter is used when we need to drop a column from the dataframe. The columns parameter takes a column name or a list of column names that need to be dropped as its input argument. - The
labelsparameter represents the index or column label that we need to remove from the dataframe. To drop rows from a dataframe, we use theindexlabel. To drop a column from a dataframe, we use thecolumnslabel. To drop two or more columns or rows, we can also pass a list of column names of indices to the columns and index labels respectively. - When we don’t use the index and columns parameter, we pass the column name or the index of the row that needs to be deleted to the
labelsparameter as its input argument. In such cases, we use theaxisparameter to decide if we want to drop a row or a column. If we want to drop a column from the dataframe, we set theaxisparameter to 1. When we want to drop a row from the dataframe, we set theaxisparameter to 0 which is its default value. - The
levelparameter is used to drop rows or columns from a dataframe when we have multilevel indices. Thelevelparameter takes the index level or the index name of the column or row that we want to drop from the dataframe. To drop two or more levels, you can pass the list of index levels or index names to thelevelparameter. - The
inplaceparameter is used to decide if we get a new dataframe after the drop operation or if we want to modify the original dataframe. Wheninplaceis set toFalse, which is its default value, the original dataframe isn’t changed and thedrop()method returns the modified dataframe after execution. To modify the original dataframe, you can setinplacetoTrue. - The
errorsparameter is used to decide if we want to raise exceptions and errors while executing thedrop()method. By default, the errors parameter is set to“raise”. Due to this, thedrop()method raises an exception if anything goes bad while execution. If you don’t want the errors to be raised, you can set the errors parameter to“ignore”. After this, thedrop()method will suppress all the exceptions.
After execution, the drop() method returns the modified data frame if the inplace parameter is set to False. Otherwise, it returns None.
Pandas Drop Columns by Name in Python
To drop a column from a pandas dataframe by name, you can pass the column name to the labels parameter and set the axis parameter to 1 in the drop() method.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(labels="Marks",axis=1)
print("The output dataframe is:")
print(output_df)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Grade
0 1 11 Aditya A
1 1 12 Chris A
2 1 14 Sam B
3 1 15 Harry NaN
4 2 22 Tom B
5 2 15 Golu B
6 2 27 Harsh C
7 2 23 Clara B
8 3 34 Amy A
9 3 15 Prashant B
10 3 27 Aditya C
11 3 23 Radheshyam NaNIn the above example, we first read the pandas dataframe from the CSV file using the read_csv() method. Then, we invoked the drop() method on the input dataframe with “Marks" as an input argument to the labels parameter and set axis=1. You can observe that the dataframe returned by the drop() method doesn’t have the "Marks" column.
Instead of using the labels and the axis parameter, you can pass the column name to the columns parameter in the drop() method as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(columns="Marks")
print("The output dataframe is:")
print(output_df)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Grade
0 1 11 Aditya A
1 1 12 Chris A
2 1 14 Sam B
3 1 15 Harry NaN
4 2 22 Tom B
5 2 15 Golu B
6 2 27 Harsh C
7 2 23 Clara B
8 3 34 Amy A
9 3 15 Prashant B
10 3 27 Aditya C
11 3 23 Radheshyam NaNIn the above example, instead of using two parameters i.e. labels and axis, we have used the columns parameter to drop the "Marks" column from the input dataframe.
Drop Columns by Index From a Pandas Dataframe
Instead of using the column name, you can also drop columns by index from a dataframe.
To drop columns from a pandas dataframe by index, we first need to obtain the columns object using the columns attribute of the dataframe. Then, we can use the indexing operator with the index of the columns to drop the columns by index as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(labels=grades.columns[3],axis=1)
print("The output dataframe is:")
print(output_df)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Grade
0 1 11 Aditya A
1 1 12 Chris A
2 1 14 Sam B
3 1 15 Harry NaN
4 2 22 Tom B
5 2 15 Golu B
6 2 27 Harsh C
7 2 23 Clara B
8 3 34 Amy A
9 3 15 Prashant B
10 3 27 Aditya C
11 3 23 Radheshyam NaNIn the above example, we have first obtained the column names of the grades dataframe using grades.columns attribute. Then, we can obtain the element at index 3 in the grades.columns attribute. Finally, we have passed the value to the labels parameter. You can observe in the output that the dataframe returned by the drop() method doesn’t have the 4th column of the input dataframe. Hence, we have successfully dropped a column of the dataframe using the index.
We can also use the columns parameter instead of the labels parameter as shown in the following example.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(columns=grades.columns[3])
print("The output dataframe is:")
print(output_df)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Grade
0 1 11 Aditya A
1 1 12 Chris A
2 1 14 Sam B
3 1 15 Harry NaN
4 2 22 Tom B
5 2 15 Golu B
6 2 27 Harsh C
7 2 23 Clara B
8 3 34 Amy A
9 3 15 Prashant B
10 3 27 Aditya C
11 3 23 Radheshyam NaNDrop the First Column From a Dataframe in Python
To drop the first column of the dataframe, you pass the column name at index 0 to the labels parameter as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(labels=grades.columns[0],axis=1)
print("The output dataframe is:")
print(output_df)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Roll Name Marks Grade
0 11 Aditya 85.0 A
1 12 Chris NaN A
2 14 Sam 75.0 B
3 15 Harry NaN NaN
4 22 Tom 73.0 B
5 15 Golu 79.0 B
6 27 Harsh 55.0 C
7 23 Clara NaN B
8 34 Amy 88.0 A
9 15 Prashant NaN B
10 27 Aditya 55.0 C
11 23 Radheshyam NaN NaN
In this example, we have to drop the column at the first position. Hence, we have passed the element at index 0 of the grades.columns attribute of the input dataframe to the labels parameter. As we have to drop a column of the dataframe, we also need to set axis=1.
Instead of using the labels and the axis parameter, you can use the columns parameter as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(columns=grades.columns[0])
print("The output dataframe is:")
print(output_df)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Roll Name Marks Grade
0 11 Aditya 85.0 A
1 12 Chris NaN A
2 14 Sam 75.0 B
3 15 Harry NaN NaN
4 22 Tom 73.0 B
5 15 Golu 79.0 B
6 27 Harsh 55.0 C
7 23 Clara NaN B
8 34 Amy 88.0 A
9 15 Prashant NaN B
10 27 Aditya 55.0 C
11 23 Radheshyam NaN NaNIn this example, we have passed the element at index 0 of the grades.columns attribute to the columns parameter in the drop() method. The grades.columns attribute essentially contains a list of column names. Hence, the process of dropping a column using the index of the column is similar to that using the name of the column. It just has some extra calculations. Hence, it is always better to drop columns by column names directly if we know the name of the columns in the dataframe.
Drop the Last Column From a Pandas Dataframe
To drop the last column of the dataframe, we will first find the index of the last column. For this, we will find the length of the columns attribute of the dataframe. After this, we will find the index of the last column by subtracting 1 from the length. Then, we will get the name of the last column using the index of the last column and the columns attribute. Finally, we will pass the obtained column name to the labels parameter. After execution, the drop() method will drop the last column of the dataframe. You can observe this in the following example.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
lenDf=len(grades.columns)
output_df=grades.drop(labels=grades.columns[lenDf-1],axis=1)
print("The output dataframe is:")
print(output_df)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Marks
0 1 11 Aditya 85.0
1 1 12 Chris NaN
2 1 14 Sam 75.0
3 1 15 Harry NaN
4 2 22 Tom 73.0
5 2 15 Golu 79.0
6 2 27 Harsh 55.0
7 2 23 Clara NaN
8 3 34 Amy 88.0
9 3 15 Prashant NaN
10 3 27 Aditya 55.0
11 3 23 Radheshyam NaNInstead of using the labels parameter, you can use the columns parameter to drop columns from a pandas dataframe as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
lenDf=len(grades.columns)
output_df=grades.drop(columns=grades.columns[lenDf-1])
print("The output dataframe is:")
print(output_df)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Marks
0 1 11 Aditya 85.0
1 1 12 Chris NaN
2 1 14 Sam 75.0
3 1 15 Harry NaN
4 2 22 Tom 73.0
5 2 15 Golu 79.0
6 2 27 Harsh 55.0
7 2 23 Clara NaN
8 3 34 Amy 88.0
9 3 15 Prashant NaN
10 3 27 Aditya 55.0
11 3 23 Radheshyam NaNPandas Drop Columns in Place in Python
The drop() method doesn’t modify the original dataframe by default. After dropping the specified columns, it returns the modified dataframe.
To drop columns from the original dataframe inplace, we will set the inplace parameter to True. After this, the drop() method modifies the original dataframe instead of returning a new dataframe. You can observe this in the following example.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns="Marks",inplace=True)
print("The output dataframe is:")
print(grades)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Grade
0 1 11 Aditya A
1 1 12 Chris A
2 1 14 Sam B
3 1 15 Harry NaN
4 2 22 Tom B
5 2 15 Golu B
6 2 27 Harsh C
7 2 23 Clara B
8 3 34 Amy A
9 3 15 Prashant B
10 3 27 Aditya C
11 3 23 Radheshyam NaNIn this example, we have set the inplace parameter to True. As a result, the drop() method modified the original dataframe.
Pandas Drop Columns if They Exist in Python
If a column name does not exist in the dataframe the drop() method raises a KeyError exception.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns="Height",inplace=True)
print("The output dataframe is:")
print(grades)Output
KeyError: "['Height'] not found in axis"In this example, the "Height" parameter that is given as an input argument to the drop() method is not present as a column in the dataframe. Due to this, the drop() method raises the KeyError exception.
To drop columns if they exist without running into exceptions, we will use the errors parameter in the drop() method. You can set the errors parameter to “ignore”. After this, the drop() method will suppress all the exceptions. It executes normally if the input dataframe contains the specified column.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns="Marks",inplace=True,errors="ignore")
print("The output dataframe is:")
print(grades)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Grade
0 1 11 Aditya A
1 1 12 Chris A
2 1 14 Sam B
3 1 15 Harry NaN
4 2 22 Tom B
5 2 15 Golu B
6 2 27 Harsh C
7 2 23 Clara B
8 3 34 Amy A
9 3 15 Prashant B
10 3 27 Aditya C
11 3 23 Radheshyam NaNIn the above example, the "Marks" column is present in the dataframe. Hence, it is dropped while executing the drop() method.
If the column specified in the columns parameter does not exist in the dataframe, the drop() method does nothing. You can observe this in the following example.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns="Height",inplace=True,errors="ignore")
print("The output dataframe is:")
print(grades)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaNIn the above example, the "Height" parameter that is given as an input argument to the drop() method is not present as a column in the dataframe. So, the drop() method has no effect on the original dataframe.
Drop Multiple Columns From a Pandas Dataframe
To drop multiple columns from a pandas dataframe, you can pass a list of column names to the labels parameter as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(labels=["Marks", "Grade"],axis=1,inplace=True)
print("The output dataframe is:")
print(grades)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 14 Sam
3 1 15 Harry
4 2 22 Tom
5 2 15 Golu
6 2 27 Harsh
7 2 23 Clara
8 3 34 Amy
9 3 15 Prashant
10 3 27 Aditya
11 3 23 RadheshyamIn the above example, we have passed the list ["Marks", "Grade"] to the labels parameter. Hence, the drop() method drops both the Marks and the Grade column from the input dataframe.
Instead of the labels parameter, you can use the columns parameter to drop multiple columns from a pandas dataframe as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns=["Marks", "Grade"],inplace=True)
print("The output dataframe is:")
print(grades)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 14 Sam
3 1 15 Harry
4 2 22 Tom
5 2 15 Golu
6 2 27 Harsh
7 2 23 Clara
8 3 34 Amy
9 3 15 Prashant
10 3 27 Aditya
11 3 23 RadheshyamYou can also use the index of the columns to drop multiple columns from a pandas dataframe as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns=grades.columns[3:5],inplace=True)
print("The output dataframe is:")
print(grades)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 14 Sam
3 1 15 Harry
4 2 22 Tom
5 2 15 Golu
6 2 27 Harsh
7 2 23 Clara
8 3 34 Amy
9 3 15 Prashant
10 3 27 Aditya
11 3 23 RadheshyamIn this example, we have used list slicing to obtain the column names at the third and fourth positions of the dataframe. Using the index of the columns to drop a dataframe is more costly compared to using the column names. However, it can be useful in cases when we want to drop first n columns or last n columns or columns from specific indices if we don’t have the column names.
Drop First N Columns From a Pandas Dataframe
To drop the first n columns of a dataframe, we will obtain the columns object using the columns attribute of the dataframe. Then, we can use the indexing operator with the index of the columns to drop the first n columns from the dataframe as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
n=2
grades.drop(columns=grades.columns[:n],inplace=True)
print("The output dataframe is:")
print(grades)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Name Marks Grade
0 Aditya 85.0 A
1 Chris NaN A
2 Sam 75.0 B
3 Harry NaN NaN
4 Tom 73.0 B
5 Golu 79.0 B
6 Harsh 55.0 C
7 Clara NaN B
8 Amy 88.0 A
9 Prashant NaN B
10 Aditya 55.0 C
11 Radheshyam NaN NaNIn the above example, we have taken a slice of the columns attribute to obtain the column names of the columns at index 0 and 1. Then, we passed the slice to the columns parameter in the drop() method to delete the columns.
Drop the Last N Columns From a Pandas Dataframe
To drop the last n columns of a dataframe, we will obtain the columns object using the columns attribute of the dataframe. Then, we can use the indexing operator with the index of the columns to drop the last n columns from the dataframe as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
lenDf=len(grades.columns)
n=2
grades.drop(columns=grades.columns[lenDf-n:lenDf],inplace=True)
print("The output dataframe is:")
print(grades)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 14 Sam
3 1 15 Harry
4 2 22 Tom
5 2 15 Golu
6 2 27 Harsh
7 2 23 Clara
8 3 34 Amy
9 3 15 Prashant
10 3 27 Aditya
11 3 23 RadheshyamIn the above example, we have calculated the total number of columns in the dataframe and stored it in the lenDf variable. Then, we used list slicing to obtain the last n columns from the grades.columns list. After obtaining the name of the last n columns, we passed it to the columns parameter to drop the last n columns from the pandas dataframe.
Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on clustering mixed data types in Python.
The dropna() Method
The dropna() method can be used to drop columns having nan values. It has the following syntax.
DataFrame.dropna(*, axis=0, how=_NoDefault.no_default, thresh=_NoDefault.no_default, subset=None, inplace=False)Here,
- The
axisparameter is used to decide if we want to drop rows or columns that have nan values. By default,axisis set to 0. Due to this, rows with nan values are dropped when thedropna()method is executed on the dataframe. To drop a column having nan values, you can set theaxisparameter to 1. - The
howparameter is used to determine if the column that needs to be dropped should have all the valuesNaNor if it can be deleted for having at least oneNaNvalue. By default, thehowparameter is set to“any”. Due to this even if a single nan is present, the column will be deleted from the dataframe. - The
threshparameter is used when we want to drop columns if they have at least a specific number of non-NaN values present. For instance, if you want to delete a column if it has less than n non-NaN values, you can pass the number n to thethreshparameter. - The
subsetparameter is used when we want to check forNaNvalues in only specific indices for each column. By default, the subset parameter is set toNone. Hence, thedropna()method searches forNaNvalues in all the indices. If you want it to search for nan values in only a specific row, you can pass the row index to thesubsetparameter. To check for nan value in two or more rows, you can pass the list of indices to thesubsetparameter. - The
inplaceparameter is used to decide if we get a new dataframe after the drop operation or if we want to modify the original dataframe. Wheninplaceis set toFalse, which is its default value, the original dataframe isn’t changed and thedropna()method returns the modified dataframe after execution. To modify the original dataframe, you can setinplacetoTrue.
After execution, the dropna() method returns the modified data frame if inplace is set to False. Otherwise, it returns None.
Drop Columns With NaN Values in a Pandas Dataframe
To drop a column with nan values, you can invoke the dropna() method on the input dataframe. Additionally, you need to set the axis parameter to 1.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.dropna(axis=1,inplace=True)
print("The output dataframe is:")
print(grades)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 14 Sam
3 1 15 Harry
4 2 22 Tom
5 2 15 Golu
6 2 27 Harsh
7 2 23 Clara
8 3 34 Amy
9 3 15 Prashant
10 3 27 Aditya
11 3 23 RadheshyamIn the above example, the Marks and Grades column have NaN values in the input dataframe. Hence, they have been dropped from the dataframe after execution of the dropna() method.
Drop Columns With at Least N NaN Values in a Dataframe
To drop a column only if it has at least n number of nan values, you can use the thresh parameter along with the axis parameter in the dropna() method.
The thresh parameter takes the minimum number of non-NaN elements as its input argument. If any column in the dataframe has less number of non-NaN values compared to that specified in the thresh parameter, the column is dropped from the dataframe after execution of the dropna() method.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
lenDf=len(grades.index)
n=5
count=lenDf-n+1
grades.dropna(axis=1,inplace=True,thresh=count)
print("The output dataframe is:")
print(grades)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Grade
0 1 11 Aditya A
1 1 12 Chris A
2 1 14 Sam B
3 1 15 Harry NaN
4 2 22 Tom B
5 2 15 Golu B
6 2 27 Harsh C
7 2 23 Clara B
8 3 34 Amy A
9 3 15 Prashant B
10 3 27 Aditya C
11 3 23 Radheshyam NaNIn this example, we need to drop a column from a dataframe if it has at least n NaN values. So, it should have at least (number of rows in the dataframe -n +1) non-NaN values if a column has to be included in the output dataframe. If any column that has less than (number of rows in the dataframe -n +1) non-NaN values, it is dropped from the data frame.
Drop Columns From a Dataframe Using the pop() Method
The pop() method can be used to drop a single column from a dataframe at once. It has the following syntax.
DataFrame.pop(item)The pop() method, when invoked on a dataframe, takes a column name as its input and drops the column from the original dataframe. It also returns the dropped column as output.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.pop("Marks")
print("The output dataframe is:")
print(grades)Output:
The input dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris NaN A
2 1 14 Sam 75.0 B
3 1 15 Harry NaN NaN
4 2 22 Tom 73.0 B
5 2 15 Golu 79.0 B
6 2 27 Harsh 55.0 C
7 2 23 Clara NaN B
8 3 34 Amy 88.0 A
9 3 15 Prashant NaN B
10 3 27 Aditya 55.0 C
11 3 23 Radheshyam NaN NaN
The output dataframe is:
Class Roll Name Grade
0 1 11 Aditya A
1 1 12 Chris A
2 1 14 Sam B
3 1 15 Harry NaN
4 2 22 Tom B
5 2 15 Golu B
6 2 27 Harsh C
7 2 23 Clara B
8 3 34 Amy A
9 3 15 Prashant B
10 3 27 Aditya C
11 3 23 Radheshyam NaNIf the column is not present in the dataframe, the pop() method raises a KeyError exception. You can observe this in the following example.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.pop("Height")
print("The output dataframe is:")
print(grades)Output:
KeyError: 'Height'Here, the ‘Height’ parameter is not present in the dataframe. Hence, the pop() method raises the KeyError exception.
Conclusion
In this article, we have discussed different ways to drop columns from a pandas dataframe in Python. For this, we have used the drop() method, the dropna() method, and the pop() method.
To learn more about python programming, you can read this article on dictionary comprehension in python. You might like this article on list comprehension in python too.
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.

