In the last article on dataframes in python, we discussed how to iterate rows in pandas dataframe. This article will discuss different ways to select multiple columns in a pandas dataframe.
This article only discusses how to select contiguous columns from the dataframe. If you want to select columns at only specific no-contiguous positions, you can read this article on how to select specific columns in a pandas dataframe.
Select Multiple Columns in the Pandas Dataframe Using Column Names
Pandas dataframes support selecting rows and columns using indexing operator just like a list in python. To select a single column in a pandas dataframe, we can use the column name and the indexing operating as shown in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The columns are:")
columns=df["Maths"]
print(columns)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The columns are:
0 100
1 80
2 90
3 100
4 90
5 80
Name: Maths, dtype: int64
In the above example, we first converted a list of dictionaries to dataframe using the DataFrame()
function. Then, we selected the "Maths"
column using the column name and python indexing operator.
To select multiple columns using the column names in pandas dataframe, you can pass a list of column names to the indexing operator as shown below.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The columns are:")
columns=df[["Maths", "Physics"]]
print(columns)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The columns are:
Maths Physics
0 100 80
1 80 100
2 90 80
3 100 100
4 90 90
5 80 70
In this example, we have selected two columns from the dataframe using a list of column names and the indexing operator.
Using the columns Attribute
The columns attribute stores the column names in the pandas dataframe. If you don’t know the column names and want to select dataframe columns using their position, you can use the columns attribute and the indexing operator. For this, we will use the following steps.
- First, we will obtain the list of column names using the columns attribute in the data frame.
- Then, we get the name of the required columns using the positions of the columns and the indexing operator.
- Once we get the list of required columns, we will pass it to the indexing operator of the pandas dataframe. Thus, we will get the required columns as shown in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
column_names=df.columns
reuired_columns=column_names[1:3]
print("The columns are:")
columns=df[reuired_columns]
print(columns)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The columns are:
Maths Physics
0 100 80
1 80 100
2 90 80
3 100 100
4 90 90
5 80 70
In this example, we first obtained the list of column names into the column_names
variables using the columns attribute of the dataframe. Then, we used list slicing to to select the required columns at given positions in the reuired_columns
variable. Finally, we use the column names and indexing operator to select the required columns.
Multiple Columns From Pandas Dataframe Using the iloc Attribute
The iloc attribute in a pandas dataframe contains an _ilocIndexer
object. Using this _ilocIndexer
object, we can select elements from a dataframe using their position.
To select multiple columns using the _ilocIndexer
object, we will use the following syntax.
df.iloc[row_pos1:row_pos2,column_pos1:column_pos2]
Here,
df
is the input dataframe.- The
row_pos1
variable represents the position of the starting row from which we want to select the elements from the dataframe. - The
row_pos2
variable represents the position of the last row that we want to select. Keep in mind that the row at positionrow_pos2
is excluded from the output. - The
column_pos1
variable represents the position of the starting column from which we want to select the elements from the dataframe. - The
column_pos2
variable represents the position of the last column that we want to select. Again the column at positioncolumn_pos2
is excluded from the output.
In the above syntax, we will keep row_pos1
and row_pos2
empty as we want to select all the rows of the dataframe. To select multiple columns, we will specify the position of the columns in the variable column_pos1
and column_pos2
as shown in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The columns are:")
columns=df.iloc[:,1:3]
print(columns)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The columns are:
Maths Physics
0 100 80
1 80 100
2 90 80
3 100 100
4 90 90
5 80 70
In this example, we have selected the columns at position 1 and 2 using the iloc attribute of the pandas dataframe.
Using the loc Attribute
If you want to select multiple columns in the pandas dataframe using the column names, you can use the loc attribute. The syntax for using the loc attribute is as follows.
df.loc[row_index1:row_index2,column_name1:column_name2]
Here,
df
is the input dataframe.- The
row_index1
variable represents the index of the starting row from which we want to select the elements from the dataframe. - The
row_index2
variable represents the index of the last row that we want to select. Here, both indices are included in the output. - The
column_name1
variable represents the column name of the starting column from which we want to select the elements in the dataframe. - The
column_name2
variable represents the name of the last column that we want to select.
In the above syntax, we will keep row_index1
and row_index2
empty as we want to select all the rows of the dataframe. To select multiple columns, we will specify the name of the columns in the variable column_name1
and column_name2
as shown in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The columns are:")
columns=df.loc[:,"Maths":"Chemistry"]
print(columns)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The columns are:
Maths Physics Chemistry
0 100 80 90
1 80 100 90
2 90 80 70
3 100 100 90
4 90 90 80
5 80 70 70
In this example, we have selected multiple columns from the dataframe using the column names and the loc attribute. Here, you can observe that the program selects all the columns from the column"Maths"
to the column "Chemistry"
. Hence, if we want to select contiguous columns using the column names, we can use this approach.
Conclusion
In this article, we have discussed different ways to select multiple columns in a pandas data frame.
To learn more about python programming, you can read this article on how to sort a pandas dataframe. You might also like this article on how to drop columns from a pandas dataframe.
I hope you enjoyed reading this article. Stay tuned for more informative articles.
Happy Learning!
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.