While working with dataframes in python, we sometimes need to select specific data. For this, we need to select one or more columns that may or may not be contiguous. I have already discussed how to select multiple columns in the pandas dataframe. This article will discuss different ways to select specific columns in a pandas dataframe.
Select Specific Columns in Pandas Dataframe Using Column Names
To select specific columns from the pandas dataframe using the column names, you can pass a list of column names to the indexing operator as shown below.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The columns are:")
columns=df[["Maths", "Physics"]]
print(columns)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The columns are:
Maths Physics
0 100 80
1 80 100
2 90 80
3 100 100
4 90 90
5 80 70
In this example, we first converted a list of dictionaries to a dataframe using the DataFrame()
function. Then, we selected the "Maths"
and "Physics"
columns from the dataframe using the list ["Maths", "Physics"]
.
Select Specific Columns in Pandas Dataframe Using the Column Positions
If you don’t know the column names and only have the position of the columns, you can use the column attribute of the pandas dataframe to select specific columns. For this, we will use the following steps.
- First, we will get a list of column names from the dataframe using the columns attribute.
- Then, we will extract the name of specific columns that we want to select. For this, we will use the list containing column names and list comprehension.
- After obtaining the list of specific column names, we can use it to select specific columns in the dataframe using the indexing operator.
You can observe this in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
column_names=df.columns
reduired_indices=[0,2,3]
reuired_columns=[column_names[index] for index in reduired_indices]
print("The column names are:")
print(reuired_columns)
print("The columns are:")
columns=df[reuired_columns]
print(columns)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The column names are:
['Roll', 'Physics', 'Chemistry']
The columns are:
Roll Physics Chemistry
0 1 80 90
1 2 100 90
2 3 80 70
3 4 100 90
4 5 90 80
5 6 70 70
In this example, we had to select the columns at positions 0, 2, and 3. For this, we created a variable reduired_indices
with the list [0, 2, 3]
as its value. Then, we used list comprehension and the python indexing operator to get the column names at the specified indices from the list of column names. We stored the specified column names in the reuired_columns
variable. Then, we used the indexing operator to select the specific columns from the dataframe.
Select Specific Columns in a Dataframe Using the iloc Attribute
The iloc attribute in a pandas dataframe is used to select rows or columns at any given position. The iloc attribute of a dataframe returns an _ilocIndexer
object. We can use this _ilocIndexer
object to select columns from the dataframe. To select columns as specific positions using the iloc object, we will use the following syntax.
df.iloc[start_row:end_row, list_of_column_positions]
Here,
df
is the input dataframe.- The
start_row
variable contains the start position of the rows that we want to include in the output. - The
end_row
variable contains the position of the last row that we want to include in the output. - The
list_of_column_positions
variable contains the position of specific columns that we want to select from the dataframe.
As we want to select all the rows and specified columns, we will keep start_row
and end_row
empty. We will just pass the list containing the position of specific columns to the list_of_column_positions
variable for selecting the columns from the dataframe as shown in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The columns are:")
list_of_column_positions=[0,2,3]
columns=df.iloc[:,list_of_column_positions]
print(columns)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The columns are:
Roll Physics Chemistry
0 1 80 90
1 2 100 90
2 3 80 70
3 4 100 90
4 5 90 80
5 6 70 70
In this example, we used the iloc attribute to select columns at positions 0, 2, and 3 in the dataframe.
Specific Columns in a Dataframe Using the loc Attribute
The loc attribute in a pandas dataframe is used to select rows or columns at any given index or column name respectively. The loc attribute of a dataframe returns a _LocIndexer
object. We can use this _LocIndexer
object to select columns from the dataframe using the column names. To select specific columns using the loc object, we will use the following syntax.
df.iloc[start_row_index:end_row_index, list_of_column_names]
Here,
df
is the input dataframe.- The
start_row_index
variable contains the start index of the rows that we want to include in the output. - The
end_row_index
variable contains the index of the last row that we want to include in the output. - The
list_of_column_names
variable contains the name of specific columns that we want to select from the dataframe.
As we want to select all the rows and specified columns, we will keep start_row_index
and end_row_index
empty. We will just pass the list of specific column names to list_of_column_names
for selecting the columns from the dataframe as shown below.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The columns are:")
columns=df.loc[:,["Maths", "Physics"]]
print(columns)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The columns are:
Maths Physics
0 100 80
1 80 100
2 90 80
3 100 100
4 90 90
5 80 70
In this example, we have selected specific columns from the dataframe using a list of column names and the loc attribute.
Conclusion
In this article, we have discussed different ways to select specific columns in a pandas dataframe.
To learn more about python programming, you can read this article on how to sort a pandas dataframe. You might also like this article on how to drop columns from a pandas dataframe.
I hope you enjoyed reading this article. Stay tuned for more informative articles.
Happy Learning!
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.