Drop Columns From Pandas Dataframe - PythonForBeginners.com

While working with dataframes in python, we often need to delete one or more columns from the dataframe while data preprocessing. In this article, we will discuss different ways to drop columns from a pandas dataframe in python.

Table of Contents

The drop() Method
Pandas Drop Columns by Name in Python
Drop Columns by Index From a Pandas Dataframe
1. Drop the First Column From a Dataframe in Python
2. Drop the Last Column From a Pandas Dataframe
Pandas Drop Columns in Place in Python
Pandas Drop Columns if They Exist in Python
Drop Multiple Columns From a Pandas Dataframe
1. Drop First N Columns From a Pandas Dataframe
2. Drop the Last N Columns From a Pandas Dataframe
The dropna() Method
Drop Columns With NaN Values in a Pandas Dataframe
Drop Columns With at Least N NaN Values in a Dataframe
Drop Columns From a Dataframe Using the pop() Method
Conclusion

The drop() Method

The drop() method can be used to drop columns or rows from a pandas dataframe. It has the following syntax.

DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

Here,

The index parameter is used when we have to drop a row from the dataframe. The index parameter takes an index or a list of indices that have to be deleted as its input argument.
The columns parameter is used when we need to drop a column from the dataframe. The columns parameter takes a column name or a list of column names that need to be dropped as its input argument.
The labels parameter represents the index or column label that we need to remove from the dataframe. To drop rows from a dataframe, we use the index label. To drop a column from a dataframe, we use the columns label. To drop two or more columns or rows, we can also pass a list of column names of indices to the columns and index labels respectively.
When we don’t use the index and columns parameter, we pass the column name or the index of the row that needs to be deleted to the labels parameter as its input argument. In such cases, we use the axis parameter to decide if we want to drop a row or a column. If we want to drop a column from the dataframe, we set the axis parameter to 1. When we want to drop a row from the dataframe, we set the axis parameter to 0 which is its default value.
The level parameter is used to drop rows or columns from a dataframe when we have multilevel indices. The level parameter takes the index level or the index name of the column or row that we want to drop from the dataframe. To drop two or more levels, you can pass the list of index levels or index names to the level parameter.
The inplace parameter is used to decide if we get a new dataframe after the drop operation or if we want to modify the original dataframe. When inplace is set to False, which is its default value, the original dataframe isn’t changed and the drop() method returns the modified dataframe after execution. To modify the original dataframe, you can set inplace to True.
The errors parameter is used to decide if we want to raise exceptions and errors while executing the drop() method. By default, the errors parameter is set to “raise”. Due to this, the drop() method raises an exception if anything goes bad while execution. If you don’t want the errors to be raised, you can set the errors parameter to “ignore”. After this, the drop() method will suppress all the exceptions.

After execution, the drop() method returns the modified data frame if the inplace parameter is set to False. Otherwise, it returns None.

Pandas Drop Columns by Name in Python

To drop a column from a pandas dataframe by name, you can pass the column name to the labels parameter and set the axis parameter to 1 in the drop() method.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(labels="Marks",axis=1)
print("The output dataframe is:")
print(output_df)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Class  Roll        Name Grade
0       1    11      Aditya     A
1       1    12       Chris     A
2       1    14         Sam     B
3       1    15       Harry   NaN
4       2    22         Tom     B
5       2    15        Golu     B
6       2    27       Harsh     C
7       2    23       Clara     B
8       3    34         Amy     A
9       3    15    Prashant     B
10      3    27      Aditya     C
11      3    23  Radheshyam   NaN

In the above example, we first read the pandas dataframe from the CSV file using the read_csv() method. Then, we invoked the drop() method on the input dataframe with “Marks" as an input argument to the labels parameter and set axis=1. You can observe that the dataframe returned by the drop() method doesn’t have the "Marks" column.

Instead of using the labels and the axis parameter, you can pass the column name to the columns parameter in the drop() method as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(columns="Marks")
print("The output dataframe is:")
print(output_df)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Class  Roll        Name Grade
0       1    11      Aditya     A
1       1    12       Chris     A
2       1    14         Sam     B
3       1    15       Harry   NaN
4       2    22         Tom     B
5       2    15        Golu     B
6       2    27       Harsh     C
7       2    23       Clara     B
8       3    34         Amy     A
9       3    15    Prashant     B
10      3    27      Aditya     C
11      3    23  Radheshyam   NaN

In the above example, instead of using two parameters i.e. labels and axis, we have used the columns parameter to drop the "Marks" column from the input dataframe.

Drop Columns by Index From a Pandas Dataframe

Instead of using the column name, you can also drop columns by index from a dataframe.

To drop columns from a pandas dataframe by index, we first need to obtain the columns object using the columns attribute of the dataframe. Then, we can use the indexing operator with the index of the columns to drop the columns by index as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(labels=grades.columns[3],axis=1)
print("The output dataframe is:")
print(output_df)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Class  Roll        Name Grade
0       1    11      Aditya     A
1       1    12       Chris     A
2       1    14         Sam     B
3       1    15       Harry   NaN
4       2    22         Tom     B
5       2    15        Golu     B
6       2    27       Harsh     C
7       2    23       Clara     B
8       3    34         Amy     A
9       3    15    Prashant     B
10      3    27      Aditya     C
11      3    23  Radheshyam   NaN

In the above example, we have first obtained the column names of the grades dataframe using grades.columns attribute. Then, we can obtain the element at index 3 in the grades.columns attribute. Finally, we have passed the value to the labels parameter. You can observe in the output that the dataframe returned by the drop() method doesn’t have the 4th column of the input dataframe. Hence, we have successfully dropped a column of the dataframe using the index.

We can also use the columns parameter instead of the labels parameter as shown in the following example.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(columns=grades.columns[3])
print("The output dataframe is:")
print(output_df)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Class  Roll        Name Grade
0       1    11      Aditya     A
1       1    12       Chris     A
2       1    14         Sam     B
3       1    15       Harry   NaN
4       2    22         Tom     B
5       2    15        Golu     B
6       2    27       Harsh     C
7       2    23       Clara     B
8       3    34         Amy     A
9       3    15    Prashant     B
10      3    27      Aditya     C
11      3    23  Radheshyam   NaN

Drop the First Column From a Dataframe in Python

To drop the first column of the dataframe, you pass the column name at index 0 to the labels parameter as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(labels=grades.columns[0],axis=1)
print("The output dataframe is:")
print(output_df)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Roll        Name  Marks Grade
0     11      Aditya   85.0     A
1     12       Chris    NaN     A
2     14         Sam   75.0     B
3     15       Harry    NaN   NaN
4     22         Tom   73.0     B
5     15        Golu   79.0     B
6     27       Harsh   55.0     C
7     23       Clara    NaN     B
8     34         Amy   88.0     A
9     15    Prashant    NaN     B
10    27      Aditya   55.0     C
11    23  Radheshyam    NaN   NaN

In this example, we have to drop the column at the first position. Hence, we have passed the element at index 0 of the grades.columns attribute of the input dataframe to the labels parameter. As we have to drop a column of the dataframe, we also need to set axis=1.

Instead of using the labels and the axis parameter, you can use the columns parameter as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
output_df=grades.drop(columns=grades.columns[0])
print("The output dataframe is:")
print(output_df)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Roll        Name  Marks Grade
0     11      Aditya   85.0     A
1     12       Chris    NaN     A
2     14         Sam   75.0     B
3     15       Harry    NaN   NaN
4     22         Tom   73.0     B
5     15        Golu   79.0     B
6     27       Harsh   55.0     C
7     23       Clara    NaN     B
8     34         Amy   88.0     A
9     15    Prashant    NaN     B
10    27      Aditya   55.0     C
11    23  Radheshyam    NaN   NaN

In this example, we have passed the element at index 0 of the grades.columns attribute to the columns parameter in the drop() method. The grades.columns attribute essentially contains a list of column names. Hence, the process of dropping a column using the index of the column is similar to that using the name of the column. It just has some extra calculations. Hence, it is always better to drop columns by column names directly if we know the name of the columns in the dataframe.

Drop the Last Column From a Pandas Dataframe

To drop the last column of the dataframe, we will first find the index of the last column. For this, we will find the length of the columns attribute of the dataframe. After this, we will find the index of the last column by subtracting 1 from the length. Then, we will get the name of the last column using the index of the last column and the columns attribute. Finally, we will pass the obtained column name to the labels parameter. After execution, the drop() method will drop the last column of the dataframe. You can observe this in the following example.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
lenDf=len(grades.columns)
output_df=grades.drop(labels=grades.columns[lenDf-1],axis=1)
print("The output dataframe is:")
print(output_df)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Class  Roll        Name  Marks
0       1    11      Aditya   85.0
1       1    12       Chris    NaN
2       1    14         Sam   75.0
3       1    15       Harry    NaN
4       2    22         Tom   73.0
5       2    15        Golu   79.0
6       2    27       Harsh   55.0
7       2    23       Clara    NaN
8       3    34         Amy   88.0
9       3    15    Prashant    NaN
10      3    27      Aditya   55.0
11      3    23  Radheshyam    NaN

Instead of using the labels parameter, you can use the columns parameter to drop columns from a pandas dataframe as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
lenDf=len(grades.columns)
output_df=grades.drop(columns=grades.columns[lenDf-1])
print("The output dataframe is:")
print(output_df)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Class  Roll        Name  Marks
0       1    11      Aditya   85.0
1       1    12       Chris    NaN
2       1    14         Sam   75.0
3       1    15       Harry    NaN
4       2    22         Tom   73.0
5       2    15        Golu   79.0
6       2    27       Harsh   55.0
7       2    23       Clara    NaN
8       3    34         Amy   88.0
9       3    15    Prashant    NaN
10      3    27      Aditya   55.0
11      3    23  Radheshyam    NaN

Pandas Drop Columns in Place in Python

The drop() method doesn’t modify the original dataframe by default. After dropping the specified columns, it returns the modified dataframe.

To drop columns from the original dataframe inplace, we will set the inplace parameter to True. After this, the drop() method modifies the original dataframe instead of returning a new dataframe. You can observe this in the following example.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns="Marks",inplace=True)
print("The output dataframe is:")
print(grades)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Class  Roll        Name Grade
0       1    11      Aditya     A
1       1    12       Chris     A
2       1    14         Sam     B
3       1    15       Harry   NaN
4       2    22         Tom     B
5       2    15        Golu     B
6       2    27       Harsh     C
7       2    23       Clara     B
8       3    34         Amy     A
9       3    15    Prashant     B
10      3    27      Aditya     C
11      3    23  Radheshyam   NaN

In this example, we have set the inplace parameter to True. As a result, the drop() method modified the original dataframe.

Pandas Drop Columns if They Exist in Python

If a column name does not exist in the dataframe the drop() method raises a KeyError exception.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns="Height",inplace=True)
print("The output dataframe is:")
print(grades)

Output

KeyError: "['Height'] not found in axis"

In this example, the "Height" parameter that is given as an input argument to the drop() method is not present as a column in the dataframe. Due to this, the drop() method raises the KeyError exception.

To drop columns if they exist without running into exceptions, we will use the errors parameter in the drop() method. You can set the errors parameter to “ignore”. After this, the drop() method will suppress all the exceptions. It executes normally if the input dataframe contains the specified column.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns="Marks",inplace=True,errors="ignore")
print("The output dataframe is:")
print(grades)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Class  Roll        Name Grade
0       1    11      Aditya     A
1       1    12       Chris     A
2       1    14         Sam     B
3       1    15       Harry   NaN
4       2    22         Tom     B
5       2    15        Golu     B
6       2    27       Harsh     C
7       2    23       Clara     B
8       3    34         Amy     A
9       3    15    Prashant     B
10      3    27      Aditya     C
11      3    23  Radheshyam   NaN

In the above example, the "Marks" column is present in the dataframe. Hence, it is dropped while executing the drop() method.

If the column specified in the columns parameter does not exist in the dataframe, the drop() method does nothing. You can observe this in the following example.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns="Height",inplace=True,errors="ignore")
print("The output dataframe is:")
print(grades)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN

In the above example, the "Height" parameter that is given as an input argument to the drop() method is not present as a column in the dataframe. So, the drop() method has no effect on the original dataframe.

Drop Multiple Columns From a Pandas Dataframe

To drop multiple columns from a pandas dataframe, you can pass a list of column names to the labels parameter as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(labels=["Marks", "Grade"],axis=1,inplace=True)
print("The output dataframe is:")
print(grades)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Class  Roll        Name
0       1    11      Aditya
1       1    12       Chris
2       1    14         Sam
3       1    15       Harry
4       2    22         Tom
5       2    15        Golu
6       2    27       Harsh
7       2    23       Clara
8       3    34         Amy
9       3    15    Prashant
10      3    27      Aditya
11      3    23  Radheshyam

In the above example, we have passed the list ["Marks", "Grade"] to the labels parameter. Hence, the drop() method drops both the Marks and the Grade column from the input dataframe.

Instead of the labels parameter, you can use the columns parameter to drop multiple columns from a pandas dataframe as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns=["Marks", "Grade"],inplace=True)
print("The output dataframe is:")
print(grades)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Class  Roll        Name
0       1    11      Aditya
1       1    12       Chris
2       1    14         Sam
3       1    15       Harry
4       2    22         Tom
5       2    15        Golu
6       2    27       Harsh
7       2    23       Clara
8       3    34         Amy
9       3    15    Prashant
10      3    27      Aditya
11      3    23  Radheshyam

You can also use the index of the columns to drop multiple columns from a pandas dataframe as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.drop(columns=grades.columns[3:5],inplace=True)
print("The output dataframe is:")
print(grades)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Class  Roll        Name
0       1    11      Aditya
1       1    12       Chris
2       1    14         Sam
3       1    15       Harry
4       2    22         Tom
5       2    15        Golu
6       2    27       Harsh
7       2    23       Clara
8       3    34         Amy
9       3    15    Prashant
10      3    27      Aditya
11      3    23  Radheshyam

In this example, we have used list slicing to obtain the column names at the third and fourth positions of the dataframe. Using the index of the columns to drop a dataframe is more costly compared to using the column names. However, it can be useful in cases when we want to drop first n columns or last n columns or columns from specific indices if we don’t have the column names.

Drop First N Columns From a Pandas Dataframe

To drop the first n columns of a dataframe, we will obtain the columns object using the columns attribute of the dataframe. Then, we can use the indexing operator with the index of the columns to drop the first n columns from the dataframe as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
n=2
grades.drop(columns=grades.columns[:n],inplace=True)
print("The output dataframe is:")
print(grades)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
          Name  Marks Grade
0       Aditya   85.0     A
1        Chris    NaN     A
2          Sam   75.0     B
3        Harry    NaN   NaN
4          Tom   73.0     B
5         Golu   79.0     B
6        Harsh   55.0     C
7        Clara    NaN     B
8          Amy   88.0     A
9     Prashant    NaN     B
10      Aditya   55.0     C
11  Radheshyam    NaN   NaN

In the above example, we have taken a slice of the columns attribute to obtain the column names of the columns at index 0 and 1. Then, we passed the slice to the columns parameter in the drop() method to delete the columns.

Drop the Last N Columns From a Pandas Dataframe

To drop the last n columns of a dataframe, we will obtain the columns object using the columns attribute of the dataframe. Then, we can use the indexing operator with the index of the columns to drop the last n columns from the dataframe as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
lenDf=len(grades.columns)
n=2
grades.drop(columns=grades.columns[lenDf-n:lenDf],inplace=True)
print("The output dataframe is:")
print(grades)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Class  Roll        Name
0       1    11      Aditya
1       1    12       Chris
2       1    14         Sam
3       1    15       Harry
4       2    22         Tom
5       2    15        Golu
6       2    27       Harsh
7       2    23       Clara
8       3    34         Amy
9       3    15    Prashant
10      3    27      Aditya
11      3    23  Radheshyam

In the above example, we have calculated the total number of columns in the dataframe and stored it in the lenDf variable. Then, we used list slicing to obtain the last n columns from the grades.columns list. After obtaining the name of the last n columns, we passed it to the columns parameter to drop the last n columns from the pandas dataframe.

Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on clustering mixed data types in Python.

The dropna() Method

The dropna() method can be used to drop columns having nan values. It has the following syntax.

DataFrame.dropna(*, axis=0, how=_NoDefault.no_default, thresh=_NoDefault.no_default, subset=None, inplace=False)

Here,

The axis parameter is used to decide if we want to drop rows or columns that have nan values. By default, axis is set to 0. Due to this, rows with nan values are dropped when the dropna() method is executed on the dataframe. To drop a column having nan values, you can set the axis parameter to 1.
The how parameter is used to determine if the column that needs to be dropped should have all the values NaN or if it can be deleted for having at least one NaN value. By default, the how parameter is set to “any”. Due to this even if a single nan is present, the column will be deleted from the dataframe.
The thresh parameter is used when we want to drop columns if they have at least a specific number of non-NaN values present. For instance, if you want to delete a column if it has less than n non-NaN values, you can pass the number n to the thresh parameter.
The subset parameter is used when we want to check for NaN values in only specific indices for each column. By default, the subset parameter is set to None. Hence, the dropna() method searches for NaN values in all the indices. If you want it to search for nan values in only a specific row, you can pass the row index to the subset parameter. To check for nan value in two or more rows, you can pass the list of indices to the subset parameter.
The inplace parameter is used to decide if we get a new dataframe after the drop operation or if we want to modify the original dataframe. When inplace is set to False, which is its default value, the original dataframe isn’t changed and the dropna() method returns the modified dataframe after execution. To modify the original dataframe, you can set inplace to True.

After execution, the dropna() method returns the modified data frame if inplace is set to False. Otherwise, it returns None.

Drop Columns With NaN Values in a Pandas Dataframe

To drop a column with nan values, you can invoke the dropna() method on the input dataframe. Additionally, you need to set the axis parameter to 1.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.dropna(axis=1,inplace=True)
print("The output dataframe is:")
print(grades)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Class  Roll        Name
0       1    11      Aditya
1       1    12       Chris
2       1    14         Sam
3       1    15       Harry
4       2    22         Tom
5       2    15        Golu
6       2    27       Harsh
7       2    23       Clara
8       3    34         Amy
9       3    15    Prashant
10      3    27      Aditya
11      3    23  Radheshyam

In the above example, the Marks and Grades column have NaN values in the input dataframe. Hence, they have been dropped from the dataframe after execution of the dropna() method.

Drop Columns With at Least N NaN Values in a Dataframe

To drop a column only if it has at least n number of nan values, you can use the thresh parameter along with the axis parameter in the dropna() method.

The thresh parameter takes the minimum number of non-NaN elements as its input argument. If any column in the dataframe has less number of non-NaN values compared to that specified in the thresh parameter, the column is dropped from the dataframe after execution of the dropna() method.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
lenDf=len(grades.index)
n=5
count=lenDf-n+1
grades.dropna(axis=1,inplace=True,thresh=count)
print("The output dataframe is:")
print(grades)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Class  Roll        Name Grade
0       1    11      Aditya     A
1       1    12       Chris     A
2       1    14         Sam     B
3       1    15       Harry   NaN
4       2    22         Tom     B
5       2    15        Golu     B
6       2    27       Harsh     C
7       2    23       Clara     B
8       3    34         Amy     A
9       3    15    Prashant     B
10      3    27      Aditya     C
11      3    23  Radheshyam   NaN

In this example, we need to drop a column from a dataframe if it has at least n NaN values. So, it should have at least (number of rows in the dataframe -n +1) non-NaN values if a column has to be included in the output dataframe. If any column that has less than (number of rows in the dataframe -n +1) non-NaN values, it is dropped from the data frame.

Drop Columns From a Dataframe Using the pop() Method

The pop() method can be used to drop a single column from a dataframe at once. It has the following syntax.

DataFrame.pop(item)

The pop() method, when invoked on a dataframe, takes a column name as its input and drops the column from the original dataframe. It also returns the dropped column as output.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.pop("Marks")
print("The output dataframe is:")
print(grades)

Output:

The input dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris    NaN     A
2       1    14         Sam   75.0     B
3       1    15       Harry    NaN   NaN
4       2    22         Tom   73.0     B
5       2    15        Golu   79.0     B
6       2    27       Harsh   55.0     C
7       2    23       Clara    NaN     B
8       3    34         Amy   88.0     A
9       3    15    Prashant    NaN     B
10      3    27      Aditya   55.0     C
11      3    23  Radheshyam    NaN   NaN
The output dataframe is:
    Class  Roll        Name Grade
0       1    11      Aditya     A
1       1    12       Chris     A
2       1    14         Sam     B
3       1    15       Harry   NaN
4       2    22         Tom     B
5       2    15        Golu     B
6       2    27       Harsh     C
7       2    23       Clara     B
8       3    34         Amy     A
9       3    15    Prashant     B
10      3    27      Aditya     C
11      3    23  Radheshyam   NaN

If the column is not present in the dataframe, the pop() method raises a KeyError exception. You can observe this in the following example.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is:")
print(grades)
grades.pop("Height")
print("The output dataframe is:")
print(grades)

Output:

KeyError: 'Height'

Here, the ‘Height’ parameter is not present in the dataframe. Hence, the pop() method raises the KeyError exception.

Conclusion

In this article, we have discussed different ways to drop columns from a pandas dataframe in Python. For this, we have used the drop() method, the dropna() method, and the pop() method.

To learn more about python programming, you can read this article on dictionary comprehension in python. You might like this article on list comprehension in python too.

Recommended Python Training

Course: Python 3 For Beginners

Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.

Enroll Now

The drop() Method

Pandas Drop Columns by Name in Python

Drop Columns by Index From a Pandas Dataframe

Drop the First Column From a Dataframe in Python

Drop the Last Column From a Pandas Dataframe

Pandas Drop Columns in Place in Python

Pandas Drop Columns if They Exist in Python

Drop Multiple Columns From a Pandas Dataframe

Drop First N Columns From a Pandas Dataframe

Drop the Last N Columns From a Pandas Dataframe

The dropna() Method

Drop Columns With NaN Values in a Pandas Dataframe

Drop Columns With at Least N NaN Values in a Dataframe

Drop Columns From a Dataframe Using the pop() Method

Conclusion

Related

Recommended Python Training

More Python Topics