Pandas dataframes are one of the most used data structures for data analysis and machine learning tasks in Python. In this article, we will discuss how to create and delete an index from a pandas dataframe. We will also discuss multilevel indexing in a pandas dataframe and how we can access elements from a dataframe using dataframe indices.
- What Is a Pandas Dataframe Index?
- Create an Index While Creating a Pandas Dataframe
- Create Dataframe Index While Loading a CSV File
- Create an Index After Creating a Pandas Dataframe
- Convert Column of a DataFrame into Index
- Change Index of a Pandas Dataframe
- Create Multilevel Index in a Pandas Dataframe
- Remove Index From a Pandas Dataframe
- Conclusion
What Is a Pandas Dataframe Index?
Just like a dataframe has column names, you can consider an index as a row label. When we create a dataframe, the rows of the dataframe are assigned indices starting from 0 till the number of rows minus one as shown below.
import pandas as pd
list1=[1,2,3]
list2=[3,55,34]
list3=[12,32,45]
myList=[list1,list2,list3]
myDf=pd.DataFrame(myList,columns=["A", "B", "C"])
print("The dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
Output:
The dataframe is:
A B C
0 1 2 3
1 3 55 34
2 12 32 45
The index is:
[0, 1, 2]
Create an Index While Creating a Pandas Dataframe
You can also create custom indices while creating a dataframe. For this, you can use the index
parameter of the DataFrame()
function. The index
parameter takes a list of values and assigns the values as indices of the rows in the dataframe. You can observe this in the following example.
import pandas as pd
list1=[1,2,3]
list2=[3,55,34]
list3=[12,32,45]
myList=[list1,list2,list3]
myDf=pd.DataFrame(myList,columns=["A", "B", "C"],index=[101,102,103])
print("The dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
Output:
The dataframe is:
A B C
101 1 2 3
102 3 55 34
103 12 32 45
The index is:
[101, 102, 103]
In the above example, we have created the index of the dataframe using the list [101, 102, 103]
and the index
parameter of the DataFrame()
function.
Here, you need to make sure that the number of elements in the list passed to the index
parameter should be equal to the number of rows in the dataframe. Otherwise, the program will run into a ValueError exception as shown below.
import pandas as pd
list1=[1,2,3]
list2=[3,55,34]
list3=[12,32,45]
myList=[list1,list2,list3]
myDf=pd.DataFrame(myList,columns=["A", "B", "C"],index=[101,102,103,104])
print("The dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
Output:
ValueError: Length of values (3) does not match length of index (4)
In the above example, you can observe that we have passed 4 elements in the list passed to the index parameter. However, the dataframe has only three rows. Hence, the program runs into Python ValueError exception.
Create Dataframe Index While Loading a CSV File
If you are creating a dataframe a csv file and you want to make a column of the csv file as the dataframe index, you can use the index_col
parameter in the read_csv()
function.
The index_col
parameter takes the name of the column as its input argument. After execution of the read_csv()
function, the specified column is assigned as the index of the dataframe. You can observe this in the following example.
myDf=pd.read_csv("samplefile.csv",index_col="Class")
print("The dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
Output:
The dataframe is:
Roll Name
Class
1 11 Aditya
1 12 Chris
1 13 Sam
2 1 Joel
2 22 Tom
2 44 Samantha
3 33 Tina
3 34 Amy
The index is:
[1, 1, 1, 2, 2, 2, 3, 3]
You can also pass the position of a column name in the column list instead of its name as an input argument to the index_col
parameter. For instance, if you want to make the first column of the pandas dataframe as its index, you can pass 0 to the index_col
parameter in the DataFrame()
function as shown below.
myDf=pd.read_csv("samplefile.csv",index_col=0)
print("The dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
Output:
The dataframe is:
Roll Name
Class
1 11 Aditya
1 12 Chris
1 13 Sam
2 1 Joel
2 22 Tom
2 44 Samantha
3 33 Tina
3 34 Amy
The index is:
[1, 1, 1, 2, 2, 2, 3, 3]
Here, the Class
column is the first column in the csv file. Hence, it is converted into index of the dataframe.
The index_col
parameter also takes multiple values as their input. We have discussed this in the section on multilevel indexing in dataframes.
Create an Index After Creating a Pandas Dataframe
When a dataframe is created, the rows of the dataframe are assigned indices starting from 0 till the number of rows minus one. However, we can create a custom index for a dataframe using the index attribute.
To create a custom index in a pandas dataframe, we will assign a list of index labels to the index attribute of the dataframe. After execution of the assignment statement, a new index is created for the dataframe as shown below.
myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
myDf.index=[101,102,103,104,105,106,107,108]
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
Output:
The dataframe is:
Class Roll Name
101 1 11 Aditya
102 1 12 Chris
103 1 13 Sam
104 2 1 Joel
105 2 22 Tom
106 2 44 Samantha
107 3 33 Tina
108 3 34 Amy
The index is:
[101, 102, 103, 104, 105, 106, 107, 108]
Here, you can see that we have assigned a list containing numbers from 101 to108 to the index attribute of the dataframe. Hence, the elements of the list are converted into indices of the rows in the dataframe.
Remember that the total number of index labels in the list should be equal to the number of rows in the dataframe. Otherwise, the program will run into a ValueError exception.
Convert Column of a DataFrame into Index
We can also use a column as the index of the dataframe. For this, we can use the set_index()
method. The set_index()
method, when invoked on a dataframe, takes the column name as its input argument. After execution, it returns a new dataframe with the specified column as its index as shown in the following example.
myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
myDf=myDf.set_index("Class")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
Output:
The dataframe is:
Roll Name
Class
1 11 Aditya
1 12 Chris
1 13 Sam
2 1 Joel
2 22 Tom
2 44 Samantha
3 33 Tina
3 34 Amy
The index is:
[1, 1, 1, 2, 2, 2, 3, 3]
In the above example, we have use the set_index()
method to create index from an existing column of the dataframe instead of a new sequence.
Change Index of a Pandas Dataframe
You can change the index column of a dataframe using the set_index()
method. For this, you just need to pass the column name of the new index column as input to the set_index()
method as shown below.
myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
myDf=myDf.set_index("Class")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
print("The modified dataframe is:")
newDf=myDf.set_index("Roll")
print(newDf)
print("The index is:")
index=list(newDf.index)
print(index)
Output:
The dataframe is:
Roll Name
Class
1 11 Aditya
1 12 Chris
1 13 Sam
2 1 Joel
2 22 Tom
2 44 Samantha
3 33 Tina
3 34 Amy
The index is:
[1, 1, 1, 2, 2, 2, 3, 3]
The modified dataframe is:
Name
Roll
11 Aditya
12 Chris
13 Sam
1 Joel
22 Tom
44 Samantha
33 Tina
34 Amy
The index is:
[11, 12, 13, 1, 22, 44, 33, 34]
If you want to assign a sequence as the new index to the dataframe, you can assign the sequence to the index attribute of the pandas dataframe as shown below.
myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
myDf=myDf.set_index("Class")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
print("The modified dataframe is:")
myDf.index=[101, 102, 103, 104, 105, 106, 107, 108]
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
Output:
The dataframe is:
Roll Name
Class
1 11 Aditya
1 12 Chris
1 13 Sam
2 1 Joel
2 22 Tom
2 44 Samantha
3 33 Tina
3 34 Amy
The index is:
[1, 1, 1, 2, 2, 2, 3, 3]
The modified dataframe is:
Roll Name
101 11 Aditya
102 12 Chris
103 13 Sam
104 1 Joel
105 22 Tom
106 44 Samantha
107 33 Tina
108 34 Amy
The index is:
[101, 102, 103, 104, 105, 106, 107, 108]
When we change the index column of a dataframe, the existing index column is deleted from the dataframe. Therefore, you should first store the index column into a new column of the dataframe before changing the index column. Otherwise, you will lose data stored in the index column from your dataframe.
myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
myDf=myDf.set_index("Class")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
print("The modified dataframe is:")
myDf["Class"]=myDf.index
myDf.index=[101, 102, 103, 104, 105, 106, 107, 108]
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
Output:
The dataframe is:
Roll Name
Class
1 11 Aditya
1 12 Chris
1 13 Sam
2 1 Joel
2 22 Tom
2 44 Samantha
3 33 Tina
3 34 Amy
The index is:
[1, 1, 1, 2, 2, 2, 3, 3]
The modified dataframe is:
Roll Name Class
101 11 Aditya 1
102 12 Chris 1
103 13 Sam 1
104 1 Joel 2
105 22 Tom 2
106 44 Samantha 2
107 33 Tina 3
108 34 Amy 3
The index is:
[101, 102, 103, 104, 105, 106, 107, 108]
Here, you can observe that we have first stored the index into the Class
column before changing the index of the dataframe. In the previous example, we hadn’t done that. Due to this, the data in the Class
column was lost.
Create Multilevel Index in a Pandas Dataframe
You can also create a multilevel index in a dataframe. Multilevel indices help you access hierarchical data such as census data that have different levels of abstraction. We can create multilevel indices while creating the dataframe as well as after creating the dataframe. This is discussed as follows.
Create a Multilevel Index While Creating a Dataframe
To create a multilevel index using different columns of a dataframe, you can use the index_col
parameter in the read_csv()
function. The index_col
parameter takes a list of columns that have to be used as indices. The order of the column names in the list given to the index_col
parameter from left to right is from highest to lowest level of index. After execution of the read_csv()
function, you will get a dataframe with multilevel index as shown in the following example.
myDf=pd.read_csv("samplefile.csv",index_col=["Class","Roll"])
print("The dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
Output:
The dataframe is:
Name
Class Roll
1 11 Aditya
12 Chris
13 Sam
2 1 Joel
22 Tom
44 Samantha
3 33 Tina
34 Amy
The index is:
[(1, 11), (1, 12), (1, 13), (2, 1), (2, 22), (2, 44), (3, 33), (3, 34)]
In the above example, the Class
column contains the first level of index and the Roll
column contains the second level of index. To access elements from the dataframe, you need to know index at both the level for any row.
Instead of using the column names, you can also pass the position of a column name in the column list instead of its name as an input argument to the index_col
parameter. For instance, you can assign the first and third column of the dataframe as its index as shown below.
myDf=pd.read_csv("samplefile.csv",index_col=[0,1])
print("The dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
Output:
The dataframe is:
Name
Class Roll
1 11 Aditya
12 Chris
13 Sam
2 1 Joel
22 Tom
44 Samantha
3 33 Tina
34 Amy
The index is:
[(1, 11), (1, 12), (1, 13), (2, 1), (2, 22), (2, 44), (3, 33), (3, 34)]
Create a Multilevel Index After Creating a Dataframe
You can also create a multilevel index after creating a dataframe using the set_index()
method. For this, you just need to pass a list of column names to the set_index()
method. Again, the order of the column names in the list given to the index_col
parameter from left to right is from highest to lowest level of index as shown below.
myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
myDf=myDf.set_index(["Class","Roll"])
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
Output:
The dataframe is:
Name
Class Roll
1 11 Aditya
12 Chris
13 Sam
2 1 Joel
22 Tom
44 Samantha
3 33 Tina
34 Amy
The index is:
[(1, 11), (1, 12), (1, 13), (2, 1), (2, 22), (2, 44), (3, 33), (3, 34)]
You need to keep in mind that the set_index()
method removes the existing index column. If you want to save the data stored in the index column, you should copy the data into another column before creating new index.
Remove Index From a Pandas Dataframe
To remove index from a pandas dataframe, you can use the reset_index()
method. The reset_index()
method, when invoked on a dataframe, returns a new dataframe without any index column. If the existing index is a specific column, the column is again converted to a normal column as shown below.
myDf=pd.read_csv("samplefile.csv",index_col=[0,1])
print("The dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
myDf=myDf.reset_index()
print("The modified dataframe is:")
print(myDf)
print("The index is:")
index=list(myDf.index)
print(index)
Output:
The dataframe is:
Name
Class Roll
1 11 Aditya
12 Chris
13 Sam
2 1 Joel
22 Tom
44 Samantha
3 33 Tina
34 Amy
The index is:
[(1, 11), (1, 12), (1, 13), (2, 1), (2, 22), (2, 44), (3, 33), (3, 34)]
The modified dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 13 Sam
3 2 1 Joel
4 2 22 Tom
5 2 44 Samantha
6 3 33 Tina
7 3 34 Amy
The index is:
[0, 1, 2, 3, 4, 5, 6, 7]
Conclusion
In this article, we have discussed how to create pandas dataframe index. Additionally, we have also createdย multilevel indices and learnt how to remove index from a pandas dataframe. To learn more about python programming, you can read this article on list comprehension in Python. If you are into machine learning, you can read this article on regular expressions in machine learning.
Stay tuned for more informative articles.
Happy Learning!
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.