Pandas dataframes are used to handle tabular data in Python. Many times, we need to sort the dataframe based on a column. In this article, we will discuss different ways to sort a pandas dataframe in Python.
- The sort_values() Method
- Sort Rows of a Dataframe by a Column in Python
- Sort Rows of a Dataframe by Multiple Columns
- Sort Values in Descending Order in a Pandas DataFrame
- Sort Dataframe With NaN Values in Python
- Sort Columns of a Dataframe By a Row in Python
- The sort_index() Method
- Sort Pandas Dataframe by Index
- Sort Pandas Dataframe by Multiple Indices in Python
- Sort Pandas Dataframe by Index in Descending Order
- Conclusion
The sort_values() Method
The sort_values() function is used to sort a pandas dataframe horizontally or vertically. It has the following syntax.
DataFrame.sort_values(by, *, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)
Here,
- The
by
parameter takes a string or a list of strings as its input argument. The input to theby
parameter depends on whether we want to sort the rows or columns of a dataframe. To sort the rows of a dataframe based on a column, we can pass a column name or list of column names to theby
parameter. To sort the columns of a dataframe based on a row, we can pass the row index or a list of row indices to theby
parameter. - The
axis
parameter is used to decide if we want to sort the rows or columns of the dataframe. To sort the rows of a dataframe based on a column or list of columns, we can pass the value 0 to theaxis
parameter which is its default value. To sort the columns of a dataframe based on a row or multiple rows, we can pass the value 1 to theaxis
parameter. - The
ascending
parameter is used to decide if the dataframe is sorted in ascending or descending order. By default, it isTrue
denoting that sorting occurs in ascending order. You can set it toFalse
to sort the dataframe in descending order. If sorting is done by multiple columns, you can pass a list ofTrue
andFalse
values to theascending
parameter to decide on which column the dataframe is sorted in ascending order or descending order. - The
inplace
parameter is used to decide whether we modify the original dataframe or create a new dataframe after sorting. By default,inplace
is set toFalse
. Hence, it doesn’t modify the original dataframe and thesort_values()
method returns the new sorted dataframe. If you want to modify the original dataframe while sorting, you can setinplace
toTrue
. - The
kind
parameter is used to decide the sorting algorithm. By default, thesort_values()
method uses the quicksort algorithm. After data analysis, if you think that the input data has a definite pattern and a certain sorting algorithm can reduce the time, you can use‘mergesort’
,‘heapsort’
, or‘stable’
sorting algorithms. - The
na_position
parameter is used to decide the position of rows havingNaN
values. By default, it has the value'last'
denoting that the rows withNaN
values are stored at last in the sorted dataframe. You can set it to“first”
if you want to have rows withNaN
values at the top of the sorted dataframe. - The
ignore_index
parameter is used to decide if the indices of the rows in the input dataframe are preserved in the sorted dataframe. By default, it isTrue
denoting that the indices are preserved. If you want to ignore the indices of the initial dataframe, you can setignore_index
toTrue
. - The
key
parameter is used to perform operations on the columns of the dataframe before sorting. It takes a vectorized function as its input argument. The function provided to thekey
parameter must take a pandas series as its input argument and return a pandas series. Before sorting, the function is applied to each column in the input dataframe independently.
After execution, the sort_values()
method returns the sorted dataframe if the inplace parameter is set to False
. If inplace
is set to True
, the sort_values()
method returns None
.
Sort Rows of a Dataframe by a Column in Python
To sort a dataframe by a column, we will invoke the sort_values()
method on the dataframe. We will pass the column name by which the dataframe has to be sorted as the input argument to the “by”
parameter. After execution, the sort_values()
method will return the sorted dataframe. Following is the CSV file that we have used for creating dataframes in this article.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
sorted_df=grades.sort_values(by="Marks")
print("The sorted dataframe is")
print(sorted_df)
Output:
The input dataframe is
Class Roll Name Marks Grade
0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
8 2 27 Harsh 55 C
9 2 23 Clara 78 B
10 3 33 Tina 82 A
11 3 34 Amy 88 A
12 3 15 Prashant 78 B
13 3 27 Aditya 55 C
14 3 23 Radheshyam 78 B
15 3 11 Bobby 50 D
The sorted dataframe is
Class Roll Name Marks Grade
15 3 11 Bobby 50 D
4 1 15 Harry 55 C
8 2 27 Harsh 55 C
13 3 27 Aditya 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
9 2 23 Clara 78 B
12 3 15 Prashant 78 B
14 3 23 Radheshyam 78 B
7 2 15 Golu 79 B
10 3 33 Tina 82 A
0 1 11 Aditya 85 A
11 3 34 Amy 88 A
1 1 12 Chris 95 A
In the above example, we first read the CSV file into a dataframe using the read_csv()
function. The read_csv()
function takes the file name of the CSV file and returns a dataframe. After obtaining the dataframe, we sorted it by "Marks"
using the sort_values()
method.
Here, the sort_values()
returns a new sorted dataframe. If you want to sort the original dataframe, you can use the inplace=True
parameter as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
grades.sort_values(by="Marks",inplace=True)
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Class Roll Name Marks Grade
0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
8 2 27 Harsh 55 C
9 2 23 Clara 78 B
10 3 33 Tina 82 A
11 3 34 Amy 88 A
12 3 15 Prashant 78 B
13 3 27 Aditya 55 C
14 3 23 Radheshyam 78 B
15 3 11 Bobby 50 D
The sorted dataframe is
Class Roll Name Marks Grade
15 3 11 Bobby 50 D
4 1 15 Harry 55 C
8 2 27 Harsh 55 C
13 3 27 Aditya 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
9 2 23 Clara 78 B
12 3 15 Prashant 78 B
14 3 23 Radheshyam 78 B
7 2 15 Golu 79 B
10 3 33 Tina 82 A
0 1 11 Aditya 85 A
11 3 34 Amy 88 A
1 1 12 Chris 95 A
You can observe that the original dataframe has been sorted after setting inplace
to True
,
In the above examples, the indices are also shuffled along with the rows. This is not desired sometimes. To change the index of the rows by refreshing the index, you can set the ignore_index
parameter to True
.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
grades.sort_values(by="Marks",inplace=True,ignore_index=True)
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Class Roll Name Marks Grade
0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
8 2 27 Harsh 55 C
9 2 23 Clara 78 B
10 3 33 Tina 82 A
11 3 34 Amy 88 A
12 3 15 Prashant 78 B
13 3 27 Aditya 55 C
14 3 23 Radheshyam 78 B
15 3 11 Bobby 50 D
The sorted dataframe is
Class Roll Name Marks Grade
0 3 11 Bobby 50 D
1 1 15 Harry 55 C
2 2 27 Harsh 55 C
3 3 27 Aditya 55 C
4 2 1 Joel 68 B
5 2 22 Tom 73 B
6 1 14 Sam 75 B
7 1 16 Aditya 78 B
8 2 23 Clara 78 B
9 3 15 Prashant 78 B
10 3 23 Radheshyam 78 B
11 2 15 Golu 79 B
12 3 33 Tina 82 A
13 1 11 Aditya 85 A
14 3 34 Amy 88 A
15 1 12 Chris 95 A
In the above example, you can observe that the index of the rows at each position is the same as the original dataframe and hasn’t been shuffled with the input rows. This is due to the reason that we have specified ignore_index
to True
.
Sort Rows of a Dataframe by Multiple Columns
Instead of sorting the dataframe by just one column, we can also sort the rows of a dataframe by multiple columns.
To sort the rows of a pandas dataframe by multiple columns, you can pass the list of column names as the input argument to the “by”
parameter. When we pass a list of column names, the rows are sorted according to the first element. After that, they are sorted according to the second element of the list, and so on. You can observe this in the following example.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
grades.sort_values(by=["Class","Marks"],inplace=True,ignore_index=True)
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Class Roll Name Marks Grade
0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
8 2 27 Harsh 55 C
9 2 23 Clara 78 B
10 3 33 Tina 82 A
11 3 34 Amy 88 A
12 3 15 Prashant 78 B
13 3 27 Aditya 55 C
14 3 23 Radheshyam 78 B
15 3 11 Bobby 50 D
The sorted dataframe is
Class Roll Name Marks Grade
0 1 15 Harry 55 C
1 1 14 Sam 75 B
2 1 16 Aditya 78 B
3 1 11 Aditya 85 A
4 1 12 Chris 95 A
5 2 27 Harsh 55 C
6 2 1 Joel 68 B
7 2 22 Tom 73 B
8 2 23 Clara 78 B
9 2 15 Golu 79 B
10 3 11 Bobby 50 D
11 3 27 Aditya 55 C
12 3 15 Prashant 78 B
13 3 23 Radheshyam 78 B
14 3 33 Tina 82 A
15 3 34 Amy 88 A
In the above example, we have sorted the dataframe by two columns i.e. Class
and Marks
. For this, we have passed the list ["Class", "Marks"]
to the by
parameter in the sort_values()
method.
Here, the dataframe is sorted by the order of the column names in the by
parameter. First, the dataframe is sorted by the "Class"
column. When two or more rows have the same values in the "Class"
column, the rows are then sorted by the "Marks"
column.
Sort Values in Descending Order in a Pandas DataFrame
By default, the sort_values()
method sorts the dataframe in ascending order. To sort the values in descending order, you can use the “ascending”
parameter and set it to False
. Then, the sort_values()
method will sort the dataframe in descending order. You can observe this in the following example.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
grades.sort_values(by="Marks",inplace=True,ignore_index=True,ascending=False)
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Class Roll Name Marks Grade
0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
8 2 27 Harsh 55 C
9 2 23 Clara 78 B
10 3 33 Tina 82 A
11 3 34 Amy 88 A
12 3 15 Prashant 78 B
13 3 27 Aditya 55 C
14 3 23 Radheshyam 78 B
15 3 11 Bobby 50 D
The sorted dataframe is
Class Roll Name Marks Grade
0 1 12 Chris 95 A
1 3 34 Amy 88 A
2 1 11 Aditya 85 A
3 3 33 Tina 82 A
4 2 15 Golu 79 B
5 1 16 Aditya 78 B
6 2 23 Clara 78 B
7 3 15 Prashant 78 B
8 3 23 Radheshyam 78 B
9 1 14 Sam 75 B
10 2 22 Tom 73 B
11 2 1 Joel 68 B
12 1 15 Harry 55 C
13 2 27 Harsh 55 C
14 3 27 Aditya 55 C
15 3 11 Bobby 50 D
In this example, we have set the ascending
parameter to False
. Due to this, the rows of the dataframe are sorted by Marks
in descending order.
We can also sort the dataframe in descending order if we are sorting it by multiple columns as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
grades.sort_values(by=["Class","Marks"],inplace=True,ignore_index=True,ascending=False)
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Class Roll Name Marks Grade
0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
8 2 27 Harsh 55 C
9 2 23 Clara 78 B
10 3 33 Tina 82 A
11 3 34 Amy 88 A
12 3 15 Prashant 78 B
13 3 27 Aditya 55 C
14 3 23 Radheshyam 78 B
15 3 11 Bobby 50 D
The sorted dataframe is
Class Roll Name Marks Grade
0 3 34 Amy 88 A
1 3 33 Tina 82 A
2 3 15 Prashant 78 B
3 3 23 Radheshyam 78 B
4 3 27 Aditya 55 C
5 3 11 Bobby 50 D
6 2 15 Golu 79 B
7 2 23 Clara 78 B
8 2 22 Tom 73 B
9 2 1 Joel 68 B
10 2 27 Harsh 55 C
11 1 12 Chris 95 A
12 1 11 Aditya 85 A
13 1 16 Aditya 78 B
14 1 14 Sam 75 B
15 1 15 Harry 55 C
In the above example, the dataframe is first sorted by the Class
column in descending order. If the rows have the same values for the Class
column, such rows are sorted by Marks
in descending order.
While sorting a dataframe by multiple columns, you can pass a list of True
and False
values to the ascending
parameter. This helps us sort the dataframe by one column in ascending order and by another column in descending order. For instance, consider the following example.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
grades.sort_values(by=["Class","Marks"],inplace=True,ignore_index=True,ascending=[True,False])
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Class Roll Name Marks Grade
0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
8 2 27 Harsh 55 C
9 2 23 Clara 78 B
10 3 33 Tina 82 A
11 3 34 Amy 88 A
12 3 15 Prashant 78 B
13 3 27 Aditya 55 C
14 3 23 Radheshyam 78 B
15 3 11 Bobby 50 D
The sorted dataframe is
Class Roll Name Marks Grade
0 1 12 Chris 95 A
1 1 11 Aditya 85 A
2 1 16 Aditya 78 B
3 1 14 Sam 75 B
4 1 15 Harry 55 C
5 2 15 Golu 79 B
6 2 23 Clara 78 B
7 2 22 Tom 73 B
8 2 1 Joel 68 B
9 2 27 Harsh 55 C
10 3 34 Amy 88 A
11 3 33 Tina 82 A
12 3 15 Prashant 78 B
13 3 23 Radheshyam 78 B
14 3 27 Aditya 55 C
15 3 11 Bobby 50 D
In this example, we have sorted the dataframe by Class
and Marks
Column. In the ascending
parameter, we have given the list [True, False].
Due to this, the dataframe is first sorted by the Class
column in ascending order. If the rows have the same values for the Class
column, such rows are sorted by Marks
in descending order.
Sort Dataframe With NaN Values in Python
In python pandas, the NaN
values are treated as floating point numbers. When we sort the rows of a dataframe containing NaN
values using the sort_values()
method, the rows with NaN
values are placed at the bottom of the dataframe as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
grades.sort_values(by="Marks",inplace=True,ignore_index=True)
print("The sorted dataframe is")
print(grades)
Output:
he input dataframe is
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris 95.0 A
2 1 14 Sam 75.0 B
3 1 16 Aditya 78.0 B
4 1 15 Harry NaN C
5 2 1 Joel 68.0 B
6 2 22 Tom 73.0 B
7 2 15 Golu 79.0 B
8 2 27 Harsh 55.0 C
9 2 23 Clara NaN B
10 3 33 Tina 82.0 A
11 3 34 Amy 88.0 A
12 3 15 Prashant NaN B
13 3 27 Aditya 55.0 C
14 3 23 Radheshyam 78.0 B
15 3 11 Bobby 50.0 D
The sorted dataframe is
Class Roll Name Marks Grade
0 3 11 Bobby 50.0 D
1 2 27 Harsh 55.0 C
2 3 27 Aditya 55.0 C
3 2 1 Joel 68.0 B
4 2 22 Tom 73.0 B
5 1 14 Sam 75.0 B
6 1 16 Aditya 78.0 B
7 3 23 Radheshyam 78.0 B
8 2 15 Golu 79.0 B
9 3 33 Tina 82.0 A
10 1 11 Aditya 85.0 A
11 3 34 Amy 88.0 A
12 1 12 Chris 95.0 A
13 1 15 Harry NaN C
14 2 23 Clara NaN B
15 3 15 Prashant NaN B
In this example, you can observe that the Marks
column contains some NaN
values. When we sort the dataframe by the Marks
column, the rows with NaN
values in the Marks
column are placed at the bottom of the sorted dataframe.
If you want to place the rows with NaN
values at the top of the dataframe, you can set the na_position
parameter to “first”
in the sort_values()
function as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
grades.sort_values(by="Marks",inplace=True,ignore_index=True,na_position="first")
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Class Roll Name Marks Grade
0 1 11 Aditya 85.0 A
1 1 12 Chris 95.0 A
2 1 14 Sam 75.0 B
3 1 16 Aditya 78.0 B
4 1 15 Harry NaN C
5 2 1 Joel 68.0 B
6 2 22 Tom 73.0 B
7 2 15 Golu 79.0 B
8 2 27 Harsh 55.0 C
9 2 23 Clara NaN B
10 3 33 Tina 82.0 A
11 3 34 Amy 88.0 A
12 3 15 Prashant NaN B
13 3 27 Aditya 55.0 C
14 3 23 Radheshyam 78.0 B
15 3 11 Bobby 50.0 D
The sorted dataframe is
Class Roll Name Marks Grade
0 1 15 Harry NaN C
1 2 23 Clara NaN B
2 3 15 Prashant NaN B
3 3 11 Bobby 50.0 D
4 2 27 Harsh 55.0 C
5 3 27 Aditya 55.0 C
6 2 1 Joel 68.0 B
7 2 22 Tom 73.0 B
8 1 14 Sam 75.0 B
9 1 16 Aditya 78.0 B
10 3 23 Radheshyam 78.0 B
11 2 15 Golu 79.0 B
12 3 33 Tina 82.0 A
13 1 11 Aditya 85.0 A
14 3 34 Amy 88.0 A
15 1 12 Chris 95.0 A
In the above example, we have set the parameter na_position
to "top"
. Due to this, the rows in which the Marks
column has NaN
value are placed at the top of the sorted dataframe.
Sort Columns of a Dataframe By a Row in Python
We can also sort the columns of a dataframe based on the values in a row. We can use the axis
parameter in the sort_values()
function for this.
To sort the columns of a dataframe by a row, we will pass the index of the row as an input argument to the “by”
method. Additionally, we will set the axis
parameter to 1 in the sort_values()
method. After execution, the sort_values()
method will return a dataframe with columns sorted by the given row. You can observe this in the following example.
import pandas as pd
grades=pd.read_csv("StudentMarks.csv",index_col="Student")
print("The input dataframe is")
print(grades)
grades.sort_values(by="Aditya",axis=1,inplace=True,ignore_index=True,na_position="first")
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Physics Chemistry Math Biology Arts
Student
Aditya 92 76 95 73 91
Chris 95 96 79 71 93
Sam 65 62 75 95 63
Harry 68 92 69 66 98
Golu 74 95 96 76 64
Joel 99 79 77 91 61
Tom 72 94 61 65 69
Harsh 98 99 93 95 91
Clara 93 67 78 79 71
Tina 99 76 78 94 95
The sorted dataframe is
0 1 2 3 4
Student
Aditya 73 76 91 92 95
Chris 71 96 93 95 79
Sam 95 62 63 65 75
Harry 66 92 98 68 69
Golu 76 95 64 74 96
Joel 91 79 61 99 77
Tom 65 94 69 72 61
Harsh 95 99 91 98 93
Clara 79 67 71 93 78
Tina 94 76 95 99 78
In the above example, we have sorted the columns of the dataframe based on the row with the index "Aditya"
. For this, we have set the axis
parameter to 1 and passed the index name to the by
parameter of the sort_values()
method.
In the output dataframe above, you can observe that the column names have been removed. This is due to the reason that we have set the ignore_index
parameter to True
.
If you want to preserve the column names, you can either remove the ignore_index
parameter or set it to False
as shown below.
import pandas as pd
grades=pd.read_csv("StudentMarks.csv",index_col="Student")
print("The input dataframe is")
print(grades)
grades.sort_values(by="Aditya",axis=1,inplace=True,na_position="first")
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Physics Chemistry Math Biology Arts
Student
Aditya 92 76 95 73 91
Chris 95 96 79 71 93
Sam 65 62 75 95 63
Harry 68 92 69 66 98
Golu 74 95 96 76 64
Joel 99 79 77 91 61
Tom 72 94 61 65 69
Harsh 98 99 93 95 91
Clara 93 67 78 79 71
Tina 99 76 78 94 95
The sorted dataframe is
Biology Chemistry Arts Physics Math
Student
Aditya 73 76 91 92 95
Chris 71 96 93 95 79
Sam 95 62 63 65 75
Harry 66 92 98 68 69
Golu 76 95 64 74 96
Joel 91 79 61 99 77
Tom 65 94 69 72 61
Harsh 95 99 91 98 93
Clara 79 67 71 93 78
Tina 94 76 95 99 78
In this example, you can observe that we have retained the column names of the dataframe. This is due to the reason that we have removed the ignore_index
parameter and it has been set to False
which is its default value.
Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on clustering mixed data types in Python.
The sort_index() Method
The sort_index()
method is used to sort a pandas dataframe by indices. It has the following syntax.
DataFrame.sort_index(*, axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)
- The
axis
parameter is used to decide if we want to sort the rows or columns of the dataframe. To sort the rows of a dataframe based on a column or list of columns, we can pass the value 0 to theaxis
parameter which is its default value. To sort the columns of a dataframe based on a row or multiple rows, we can pass the value 1 to theaxis
parameter. - The
level
parameter is used to decide the index level by which the dataframe is to be sorted. It has the default valueNone
denoting that sorting happens by all the index levels. If you want to sort the dataframe by a specific index level, you can pass the index level or index name to the level parameter. To sort the dataframe by multiple indices, you can give a list of index names or index levels to thelevel
parameter. - The
ascending
parameter determines if the dataframe is sorted in ascending or descending order. By default, it isTrue
denoting that sorting occurs in ascending order. You can set it toFalse
to sort the dataframe in descending order. For dataframes having multilevel indices, you can pass a list ofTrue
andFalse
values to decide on which level you want in ascending order and which level you want in descending order. - The
inplace
parameter is used to decide whether we modify the original dataframe or create a new dataframe after sorting. By default,inplace
is set toFalse
. Hence, thesort_index()
method doesn’t modify the original dataframe and returns the new sorted dataframe. If you want to modify the original dataframe while sorting, you can setinplace
toTrue
. - The
kind
parameter is used to decide the sorting algorithm. By default, thesort_values()
method uses the quicksort algorithm. After data analysis, if you think the input data has a definite pattern and a certain sorting algorithm can reduce the time, you can use ‘mergesort’, ‘heapsort’,
or‘stable’
sorting algorithms. - The
na_position
parameter is used to decide the position of rows havingNaN
values. By default, it has the value'last'
denoting that the rows withNaN
values are stored at last in the sorted dataframe. You can set it to“first”
if you want to have rows withNaN
values at the top of the sorted dataframe. - The
sort_remaining
parameter is used for dataframes having multilevel indices. If you want to sort the dataframe by levels that are not specified in thelevel
parameter, you can set thesort_remaining
parameter toTrue
. If you don’t want to sort the dataframe by the remaining indices, you can setsort_remaining
toFalse
. - The
key
parameter is used to perform operations on the index of the dataframe before sorting. It takes a vectorized function as its input argument. The function provided to the key parameter must take an Index object as its input argument and return an Index object after execution. Before sorting, the function is applied to each index column in the input dataframe independently.
After execution, the sort_index()
method returns the sorted dataframe if the inplace
parameter is set to False
. If inplace
is set to True
, the sort_index()
method returns None
.
Sort Pandas Dataframe by Index
To sort a pandas dataframe by index, you can use the sort_index()
method on the dataframe. For this, we first need to create a dataframe with an index. Then, we can invoke the sort_index()
method on the dataframe. After execution, the sort_index()
method returns a sorted dataframe. You can observe this in the following example.
import pandas as pd
grades=pd.read_csv("grade.csv",index_col="Roll")
print("The input dataframe is")
print(grades)
sorted_df=grades.sort_index()
print("The sorted dataframe is")
print(sorted_df)
Output:
The input dataframe is
Class Name Marks Grade
Roll
11 1 Aditya 85.0 A
12 1 Chris 95.0 A
14 1 Sam 75.0 B
16 1 Aditya 78.0 B
15 1 Harry NaN C
1 2 Joel 68.0 B
22 2 Tom 73.0 B
15 2 Golu 79.0 B
27 2 Harsh 55.0 C
23 2 Clara NaN B
33 3 Tina 82.0 A
34 3 Amy 88.0 A
15 3 Prashant NaN B
27 3 Aditya 55.0 C
23 3 Radheshyam 78.0 B
11 3 Bobby 50.0 D
The sorted dataframe is
Class Name Marks Grade
Roll
1 2 Joel 68.0 B
11 1 Aditya 85.0 A
11 3 Bobby 50.0 D
12 1 Chris 95.0 A
14 1 Sam 75.0 B
15 1 Harry NaN C
15 2 Golu 79.0 B
15 3 Prashant NaN B
16 1 Aditya 78.0 B
22 2 Tom 73.0 B
23 2 Clara NaN B
23 3 Radheshyam 78.0 B
27 2 Harsh 55.0 C
27 3 Aditya 55.0 C
33 3 Tina 82.0 A
34 3 Amy 88.0 A
In the above example, we first read a CSV file using the read_csv()
method. In the read_csv()
method, we have used the index_col
parameter to specify that the "Roll"
column should be used as the index of the dataframe. When we invoke the sort_index()
method on the dataframe returned by the read_csv()
method, it returns a dataframe sorted by the index column.
In the above example, the original dataframe isn’t modified. If you want to modify the original dataframe, you can use the inplace=True
parameter in the sort_index()
method. After execution, the original dataframe will be modified. You can observe this in the following example.
import pandas as pd
grades=pd.read_csv("grade.csv",index_col="Roll")
print("The input dataframe is")
print(grades)
grades.sort_index(inplace=True)
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Class Name Marks Grade
Roll
11 1 Aditya 85.0 A
12 1 Chris 95.0 A
14 1 Sam 75.0 B
16 1 Aditya 78.0 B
15 1 Harry NaN C
1 2 Joel 68.0 B
22 2 Tom 73.0 B
15 2 Golu 79.0 B
27 2 Harsh 55.0 C
23 2 Clara NaN B
33 3 Tina 82.0 A
34 3 Amy 88.0 A
15 3 Prashant NaN B
27 3 Aditya 55.0 C
23 3 Radheshyam 78.0 B
11 3 Bobby 50.0 D
The sorted dataframe is
Class Name Marks Grade
Roll
1 2 Joel 68.0 B
11 1 Aditya 85.0 A
11 3 Bobby 50.0 D
12 1 Chris 95.0 A
14 1 Sam 75.0 B
15 1 Harry NaN C
15 2 Golu 79.0 B
15 3 Prashant NaN B
16 1 Aditya 78.0 B
22 2 Tom 73.0 B
23 2 Clara NaN B
23 3 Radheshyam 78.0 B
27 2 Harsh 55.0 C
27 3 Aditya 55.0 C
33 3 Tina 82.0 A
34 3 Amy 88.0 A
In this example, you can observe that the original dataframe has been sorted. This is due to the reason that we have set the inplace
parameter to True
in the sort_index()
method.
If you have multilevel indices in your dataframe and you want to sort the dataframe by a specific index, you can pass the index level to the level
parameter in the sort_index()
method.
In the following example, both the Class
and Roll
columns have been used as indexes. The Class
column is used as the primary index while the Roll
column is used as the secondary index. To sort the dataframe only by the Roll
column, we will use the level
parameter and set it to 1. In this way, the input dataframe will be sorted by the Roll
column.
import pandas as pd
grades=pd.read_csv("grade.csv",index_col=["Class","Roll"])
print("The input dataframe is")
print(grades)
grades.sort_index(level=1,inplace=True)
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Name Marks Grade
Class Roll
1 11 Aditya 85.0 A
12 Chris 95.0 A
14 Sam 75.0 B
16 Aditya 78.0 B
15 Harry NaN C
2 1 Joel 68.0 B
22 Tom 73.0 B
15 Golu 79.0 B
27 Harsh 55.0 C
23 Clara NaN B
3 33 Tina 82.0 A
34 Amy 88.0 A
15 Prashant NaN B
27 Aditya 55.0 C
23 Radheshyam 78.0 B
11 Bobby 50.0 D
The sorted dataframe is
Name Marks Grade
Class Roll
2 1 Joel 68.0 B
1 11 Aditya 85.0 A
3 11 Bobby 50.0 D
1 12 Chris 95.0 A
14 Sam 75.0 B
15 Harry NaN C
2 15 Golu 79.0 B
3 15 Prashant NaN B
1 16 Aditya 78.0 B
2 22 Tom 73.0 B
23 Clara NaN B
3 23 Radheshyam 78.0 B
2 27 Harsh 55.0 C
3 27 Aditya 55.0 C
33 Tina 82.0 A
34 Amy 88.0 A
In the above example, we have used the index level as an input argument to the level
parameter. Alternatively, you can also pass the name of the index level to the level
parameter as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv",index_col=["Class","Roll"])
print("The input dataframe is")
print(grades)
grades.sort_index(level="Roll",inplace=True)
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Name Marks Grade
Class Roll
1 11 Aditya 85.0 A
12 Chris 95.0 A
14 Sam 75.0 B
16 Aditya 78.0 B
15 Harry NaN C
2 1 Joel 68.0 B
22 Tom 73.0 B
15 Golu 79.0 B
27 Harsh 55.0 C
23 Clara NaN B
3 33 Tina 82.0 A
34 Amy 88.0 A
15 Prashant NaN B
27 Aditya 55.0 C
23 Radheshyam 78.0 B
11 Bobby 50.0 D
The sorted dataframe is
Name Marks Grade
Class Roll
2 1 Joel 68.0 B
1 11 Aditya 85.0 A
3 11 Bobby 50.0 D
1 12 Chris 95.0 A
14 Sam 75.0 B
15 Harry NaN C
2 15 Golu 79.0 B
3 15 Prashant NaN B
1 16 Aditya 78.0 B
2 22 Tom 73.0 B
23 Clara NaN B
3 23 Radheshyam 78.0 B
2 27 Harsh 55.0 C
3 27 Aditya 55.0 C
33 Tina 82.0 A
34 Amy 88.0 A
In the above example, we use the parameter level="Roll"
instead of level=1
to sort the input dataframe. In both cases, the output will be the same.
After sorting by the specified index, the sort_index()
method also sorts the dataframe by the remaining indices. To stop that, you can set the sort_remaining
parameter to False
as shown in the following example.
import pandas as pd
grades=pd.read_csv("grade.csv",index_col=["Class","Roll"])
print("The input dataframe is")
print(grades)
grades.sort_index(level="Roll",inplace=True,sort_remaining=False)
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Name Marks Grade
Class Roll
1 11 Aditya 85.0 A
12 Chris 95.0 A
14 Sam 75.0 B
16 Aditya 78.0 B
15 Harry NaN C
2 1 Joel 68.0 B
22 Tom 73.0 B
15 Golu 79.0 B
27 Harsh 55.0 C
23 Clara NaN B
3 33 Tina 82.0 A
34 Amy 88.0 A
15 Prashant NaN B
27 Aditya 55.0 C
23 Radheshyam 78.0 B
11 Bobby 50.0 D
The sorted dataframe is
Name Marks Grade
Class Roll
2 1 Joel 68.0 B
1 11 Aditya 85.0 A
3 11 Bobby 50.0 D
1 12 Chris 95.0 A
14 Sam 75.0 B
15 Harry NaN C
2 15 Golu 79.0 B
3 15 Prashant NaN B
1 16 Aditya 78.0 B
2 22 Tom 73.0 B
23 Clara NaN B
3 23 Radheshyam 78.0 B
2 27 Harsh 55.0 C
3 27 Aditya 55.0 C
33 Tina 82.0 A
34 Amy 88.0 A
In the above example, if two rows have the same value in the "Roll"
column and the sort_remaining
parameter is not set to False
, the sort_index()
method will sort the dataframe according to the Class
index. To stop the sort_index()
method from doing so, we have used the sort_remaining
parameter and set it to False
.
Sort Pandas Dataframe by Multiple Indices in Python
To sort a pandas dataframe by multiple indices, you can pass the list of index levels to the level
parameter of the sort_index()
method as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv",index_col=["Class","Roll"])
print("The input dataframe is")
print(grades)
grades.sort_index(level=[0,1],inplace=True)
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Name Marks Grade
Class Roll
1 11 Aditya 85.0 A
12 Chris 95.0 A
14 Sam 75.0 B
16 Aditya 78.0 B
15 Harry NaN C
2 1 Joel 68.0 B
22 Tom 73.0 B
15 Golu 79.0 B
27 Harsh 55.0 C
23 Clara NaN B
3 33 Tina 82.0 A
34 Amy 88.0 A
15 Prashant NaN B
27 Aditya 55.0 C
23 Radheshyam 78.0 B
11 Bobby 50.0 D
The sorted dataframe is
Name Marks Grade
Class Roll
1 11 Aditya 85.0 A
12 Chris 95.0 A
14 Sam 75.0 B
15 Harry NaN C
16 Aditya 78.0 B
2 1 Joel 68.0 B
15 Golu 79.0 B
22 Tom 73.0 B
23 Clara NaN B
27 Harsh 55.0 C
3 11 Bobby 50.0 D
15 Prashant NaN B
23 Radheshyam 78.0 B
27 Aditya 55.0 C
33 Tina 82.0 A
34 Amy 88.0 A
In the above example, we have Class
and Roll
columns as indices. When we pass level=[0,1]
, the sort_index()
method first sorts the input dataframe by the Class
column. If two rows have the same value for the Class
column, it sorts them according to the Roll
column.
Instead of index levels, you can also pass the name of index levels to the level
parameter as shown in the following example.
import pandas as pd
grades=pd.read_csv("grade.csv",index_col=["Class","Roll"])
print("The input dataframe is")
print(grades)
grades.sort_index(level=["Class","Roll"],inplace=True)
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Name Marks Grade
Class Roll
1 11 Aditya 85.0 A
12 Chris 95.0 A
14 Sam 75.0 B
16 Aditya 78.0 B
15 Harry NaN C
2 1 Joel 68.0 B
22 Tom 73.0 B
15 Golu 79.0 B
27 Harsh 55.0 C
23 Clara NaN B
3 33 Tina 82.0 A
34 Amy 88.0 A
15 Prashant NaN B
27 Aditya 55.0 C
23 Radheshyam 78.0 B
11 Bobby 50.0 D
The sorted dataframe is
Name Marks Grade
Class Roll
1 11 Aditya 85.0 A
12 Chris 95.0 A
14 Sam 75.0 B
15 Harry NaN C
16 Aditya 78.0 B
2 1 Joel 68.0 B
15 Golu 79.0 B
22 Tom 73.0 B
23 Clara NaN B
27 Harsh 55.0 C
3 11 Bobby 50.0 D
15 Prashant NaN B
23 Radheshyam 78.0 B
27 Aditya 55.0 C
33 Tina 82.0 A
34 Amy 88.0 A
In the above example, we use the parameter level=["Class", "Roll"]
instead of level=[0, 1]
to sort the input dataframe. In both cases, the output will be the same.
Sort Pandas Dataframe by Index in Descending Order
To sort a dataframe by index in descending order, you can set the ascending
parameter in the sort_index()
method to False as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv",index_col=["Class","Roll"])
print("The input dataframe is")
print(grades)
grades.sort_index(level="Roll",inplace=True,ascending=False)
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Name Marks Grade
Class Roll
1 11 Aditya 85.0 A
12 Chris 95.0 A
14 Sam 75.0 B
16 Aditya 78.0 B
15 Harry NaN C
2 1 Joel 68.0 B
22 Tom 73.0 B
15 Golu 79.0 B
27 Harsh 55.0 C
23 Clara NaN B
3 33 Tina 82.0 A
34 Amy 88.0 A
15 Prashant NaN B
27 Aditya 55.0 C
23 Radheshyam 78.0 B
11 Bobby 50.0 D
The sorted dataframe is
Name Marks Grade
Class Roll
3 34 Amy 88.0 A
33 Tina 82.0 A
27 Aditya 55.0 C
2 27 Harsh 55.0 C
3 23 Radheshyam 78.0 B
2 23 Clara NaN B
22 Tom 73.0 B
1 16 Aditya 78.0 B
3 15 Prashant NaN B
2 15 Golu 79.0 B
1 15 Harry NaN C
14 Sam 75.0 B
12 Chris 95.0 A
3 11 Bobby 50.0 D
1 11 Aditya 85.0 A
2 1 Joel 68.0 B
In the above example, the sort_index()
method sorts the input dataframe by Class
and Roll
column in descending order.
While sorting a dataframe by multiple indices, you can pass a list of True
and False
values to the ascending
parameter as shown below.
import pandas as pd
grades=pd.read_csv("grade.csv",index_col=["Class","Roll"])
print("The input dataframe is")
print(grades)
grades.sort_index(level=["Class","Roll"],inplace=True,ascending=[False,True])
print("The sorted dataframe is")
print(grades)
Output:
The input dataframe is
Name Marks Grade
Class Roll
1 11 Aditya 85.0 A
12 Chris 95.0 A
14 Sam 75.0 B
16 Aditya 78.0 B
15 Harry NaN C
2 1 Joel 68.0 B
22 Tom 73.0 B
15 Golu 79.0 B
27 Harsh 55.0 C
23 Clara NaN B
3 33 Tina 82.0 A
34 Amy 88.0 A
15 Prashant NaN B
27 Aditya 55.0 C
23 Radheshyam 78.0 B
11 Bobby 50.0 D
The sorted dataframe is
Name Marks Grade
Class Roll
3 11 Bobby 50.0 D
15 Prashant NaN B
23 Radheshyam 78.0 B
27 Aditya 55.0 C
33 Tina 82.0 A
34 Amy 88.0 A
2 1 Joel 68.0 B
15 Golu 79.0 B
22 Tom 73.0 B
23 Clara NaN B
27 Harsh 55.0 C
1 11 Aditya 85.0 A
12 Chris 95.0 A
14 Sam 75.0 B
15 Harry NaN C
16 Aditya 78.0 B
In this example, we have sorted the dataframe by Class
and Marks
Column. In the ascending
parameter, we have given the list [False, True]
. Due to this, the dataframe is first sorted by the Class
column in descending
order. If the rows have the same values for the Class
column, such rows are sorted by Marks
in ascending order.
Conclusion
In this article, we have discussed different ways to sort a pandas dataframe in Python using the sort_values()
and the sort_index()
methods.
To learn more about python programming, you can read this article on dictionary comprehension in python. You might like this article on list comprehension in python too.
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.