Pandas dataframe is a great tool for handling tabular data in python. In this article, we will discuss different ways to check if a column is sorted in a pandas dataframe.
Check if a Column Is Sorted Using Column Attributes
To check if a column is sorted either in ascending order in a pandas dataframe, we can use the is_monotonic
attribute of the column. The is_monotonic
attribute evaluates to True
if a column is sorted in ascending order i.e. if values in the column are monotonically increasing.
For instance, if a dataframe is sorted in ascending order, the is_monotonic
attribute will evaluate to True as shown below.
import pandas as pd
df=pd.read_csv("grade2.csv")
df.sort_values(by="Marks",inplace=True)
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic
print("The 'Marks' column is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Marks Grade
7 3 11 Bobby 50 D
0 2 27 Harsh 55 C
5 3 27 Aditya 55 C
1 2 23 Clara 78 B
4 3 15 Prashant 78 B
6 3 23 Radheshyam 78 B
2 3 33 Tina 82 A
3 3 34 Amy 88 A
The 'Marks' column is sorted: True
In the above example, we first loaded a CSV file into a dataframe using the read_csv()
function. After that, we sorted the dataframe by the "Marks"
column using the sort_values()
method. After sorting, you can observe that the is_monotonic
attribute of the column returns True. It denotes that the column is sorted in descending order.
If a column is sorted in descending order, the is_monotonic
attribute will evaluate to False.
import pandas as pd
df=pd.read_csv("grade2.csv")
df.sort_values(by="Marks",inplace=True,ascending=False)
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic
print("The 'Marks' column is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Marks Grade
3 3 34 Amy 88 A
2 3 33 Tina 82 A
1 2 23 Clara 78 B
4 3 15 Prashant 78 B
6 3 23 Radheshyam 78 B
0 2 27 Harsh 55 C
5 3 27 Aditya 55 C
7 3 11 Bobby 50 D
The 'Marks' column is sorted: False
In this example, we have sorted the "Marks"
column in descending order. Due to this, the is_monotonic
attribute evaluates to False.
If a column in the dataframe is not sorted, the is_monotonic
attribute will evaluate to False. You can observe this in the following example.
import pandas as pd
df=pd.read_csv("grade2.csv")
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic
print("The 'Marks' column is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Marks Grade
0 2 27 Harsh 55 C
1 2 23 Clara 78 B
2 3 33 Tina 82 A
3 3 34 Amy 88 A
4 3 15 Prashant 78 B
5 3 27 Aditya 55 C
6 3 23 Radheshyam 78 B
7 3 11 Bobby 50 D
The 'Marks' column is sorted: False
Here, you can observe that we have accessed the is_monotonic attribute without sorting the dataframe by the "Marks"
column. Hence, the "Marks"
column is unsorted and the is_monotonic
attribute evaluates to False.
The is_monotonic
doesn’t work with NaN
values. If a column contains NaN
values, the is_monotonic
attribute always evaluates to False. You can observe this in the following example.
import pandas as pd
df=pd.read_csv("grade.csv")
df.sort_values(by="Marks",inplace=True,ascending=True)
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic
print("The 'Marks' column is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Marks Grade
6 2 27 Harsh 55.0 C
10 3 27 Aditya 55.0 C
4 2 22 Tom 73.0 B
2 1 14 Sam 75.0 B
5 2 15 Golu 79.0 B
0 1 11 Aditya 85.0 A
8 3 34 Amy 88.0 A
1 1 12 Chris NaN A
3 1 15 Harry NaN NaN
7 2 23 Clara NaN B
9 3 15 Prashant NaN B
11 3 23 Radheshyam NaN NaN
The 'Marks' column is sorted: False
In this example, you can observe that the "Marks"
column contains NaN
values. Due to this, even after sorting, the is_monotonic
attribute evaluates to False. You may argue that the NaN values are at the last of the column. Maybe, this is why the is_monotonic
attribute evaluates to False.
However, if we put the rows having NaN values at the top of the dataframe, the is_monotonic
attribute will again evaluate to False. You can observe this in the following example.
import pandas as pd
df=pd.read_csv("grade.csv")
df.sort_values(by="Marks",inplace=True,ascending=True,na_position="first")
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic
print("The 'Marks' column is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Marks Grade
1 1 12 Chris NaN A
3 1 15 Harry NaN NaN
7 2 23 Clara NaN B
9 3 15 Prashant NaN B
11 3 23 Radheshyam NaN NaN
6 2 27 Harsh 55.0 C
10 3 27 Aditya 55.0 C
4 2 22 Tom 73.0 B
2 1 14 Sam 75.0 B
5 2 15 Golu 79.0 B
0 1 11 Aditya 85.0 A
8 3 34 Amy 88.0 A
The 'Marks' column is sorted: False
In this example, we have put the NaN values at the start of the sorted "Marks"
column. Even after this, the is_monotonic
attribute evaluates to False. Thus, we can conclude that the is_monotonic
attribute cannot be used with columns having NaN values.
While using the is_monotonic
attribute, you will get a FutureWarning
with the message “FutureWarning: is_monotonic is deprecated and will be removed in a future version. Use is_monotonic_increasing instead.” So, the is_monotonic
attribute will be deprecated in future pandas versions. As an alternative, we can use the is_monotonic_increasing
and is_monotonic_decreasing
attributes to check if a column is sorted in a pandas dataframe.
Check if a Column Is Sorted in Ascending Order in a Dataframe
To check if a column in a dataframe is sorted in ascending order, we can use the is_monotonic_increasing
attribute. The is_monotonic_increasing
attribute evaluates to True if a column is sorted in ascending order. Otherwise, it is set to False. You can observe this in the following example.
import pandas as pd
df=pd.read_csv("grade2.csv")
df.sort_values(by="Marks",inplace=True,ascending=True)
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_increasing
print("The 'Marks' column is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Marks Grade
7 3 11 Bobby 50 D
0 2 27 Harsh 55 C
5 3 27 Aditya 55 C
1 2 23 Clara 78 B
4 3 15 Prashant 78 B
6 3 23 Radheshyam 78 B
2 3 33 Tina 82 A
3 3 34 Amy 88 A
The 'Marks' column is sorted: True
If a column is not sorted, the is_monotonic_increasing
attribute evaluates to False.
import pandas as pd
df=pd.read_csv("grade2.csv")
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_increasing
print("The 'Marks' column is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Marks Grade
0 2 27 Harsh 55 C
1 2 23 Clara 78 B
2 3 33 Tina 82 A
3 3 34 Amy 88 A
4 3 15 Prashant 78 B
5 3 27 Aditya 55 C
6 3 23 Radheshyam 78 B
7 3 11 Bobby 50 D
The 'Marks' column is sorted: False
Also, if a column is sorted in descending order, the is_monotonic_increasing
attribute evaluates to False.
import pandas as pd
df=pd.read_csv("grade2.csv")
df.sort_values(by="Marks",inplace=True,ascending=False)
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_increasing
print("The 'Marks' column is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Marks Grade
3 3 34 Amy 88 A
2 3 33 Tina 82 A
1 2 23 Clara 78 B
4 3 15 Prashant 78 B
6 3 23 Radheshyam 78 B
0 2 27 Harsh 55 C
5 3 27 Aditya 55 C
7 3 11 Bobby 50 D
The 'Marks' column is sorted: False
The is_monotonic_increasing
attribute cannot be used with columns having NaN values. The is_monotonic_increasing
attribute always evaluates to False if a column has NaN values. You can observe this in the following example.
import pandas as pd
df=pd.read_csv("grade.csv")
df.sort_values(by="Marks",inplace=True,ascending=True,na_position="last")
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_increasing
print("The 'Marks' column is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Marks Grade
6 2 27 Harsh 55.0 C
10 3 27 Aditya 55.0 C
4 2 22 Tom 73.0 B
2 1 14 Sam 75.0 B
5 2 15 Golu 79.0 B
0 1 11 Aditya 85.0 A
8 3 34 Amy 88.0 A
1 1 12 Chris NaN A
3 1 15 Harry NaN NaN
7 2 23 Clara NaN B
9 3 15 Prashant NaN B
11 3 23 Radheshyam NaN NaN
The 'Marks' column is sorted: False
Even if we put the rows having NaN values at the top of the dataframe, the is_monotonic_increasing
attribute will evaluate to False.
import pandas as pd
df=pd.read_csv("grade.csv")
df.sort_values(by="Marks",inplace=True,ascending=True,na_position="first")
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_increasing
print("The 'Marks' column is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Marks Grade
1 1 12 Chris NaN A
3 1 15 Harry NaN NaN
7 2 23 Clara NaN B
9 3 15 Prashant NaN B
11 3 23 Radheshyam NaN NaN
6 2 27 Harsh 55.0 C
10 3 27 Aditya 55.0 C
4 2 22 Tom 73.0 B
2 1 14 Sam 75.0 B
5 2 15 Golu 79.0 B
0 1 11 Aditya 85.0 A
8 3 34 Amy 88.0 A
The 'Marks' column is sorted: False
Check if a Column Is Sorted in Descending Order in a Pandas Dataframe
To check if a column is sorted in descending order in a pandas dataframe, we will use the is_monotonic_decreasing
attribute. The is_monotonic_decreasing
attribute evaluates to True if a column is sorted in descending order. You can observe this in the following example.
import pandas as pd
df=pd.read_csv("grade2.csv")
df.sort_values(by="Marks",inplace=True,ascending=False)
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_decreasing
print("The 'Marks' column is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Marks Grade
3 3 34 Amy 88 A
2 3 33 Tina 82 A
1 2 23 Clara 78 B
4 3 15 Prashant 78 B
6 3 23 Radheshyam 78 B
0 2 27 Harsh 55 C
5 3 27 Aditya 55 C
7 3 11 Bobby 50 D
The 'Marks' column is sorted: True
If a column is unsorted or is sorted in ascending order, the is_monotonic_decreasing
attribute evaluates to False as shown below.
import pandas as pd
df=pd.read_csv("grade2.csv")
#df.sort_values(by="Marks",inplace=True,ascending=False)
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_decreasing
print("The 'Marks' column is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Marks Grade
0 2 27 Harsh 55 C
1 2 23 Clara 78 B
2 3 33 Tina 82 A
3 3 34 Amy 88 A
4 3 15 Prashant 78 B
5 3 27 Aditya 55 C
6 3 23 Radheshyam 78 B
7 3 11 Bobby 50 D
The 'Marks' column is sorted: False
The is_monotonic_decreasing
cannot be used with columns having NaN values. The is_monotonic_decreasing
attribute always evaluates to False if a column has NaN values. You can observe this in the following example.
import pandas as pd
df=pd.read_csv("grade.csv")
df.sort_values(by="Marks",inplace=True,ascending=False,na_position="last")
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_decreasing
print("The 'Marks' column is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Marks Grade
8 3 34 Amy 88.0 A
0 1 11 Aditya 85.0 A
5 2 15 Golu 79.0 B
2 1 14 Sam 75.0 B
4 2 22 Tom 73.0 B
6 2 27 Harsh 55.0 C
10 3 27 Aditya 55.0 C
1 1 12 Chris NaN A
3 1 15 Harry NaN NaN
7 2 23 Clara NaN B
9 3 15 Prashant NaN B
11 3 23 Radheshyam NaN NaN
The 'Marks' column is sorted: False
Even if we put the rows having NaN values at the top of the dataframe, the is_monotonic_decreasing
attribute will evaluate to False.
import pandas as pd
df=pd.read_csv("grade.csv")
df.sort_values(by="Marks",inplace=True,ascending=False,na_position="first")
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_decreasing
print("The 'Marks' column is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Marks Grade
1 1 12 Chris NaN A
3 1 15 Harry NaN NaN
7 2 23 Clara NaN B
9 3 15 Prashant NaN B
11 3 23 Radheshyam NaN NaN
8 3 34 Amy 88.0 A
0 1 11 Aditya 85.0 A
5 2 15 Golu 79.0 B
2 1 14 Sam 75.0 B
4 2 22 Tom 73.0 B
6 2 27 Harsh 55.0 C
10 3 27 Aditya 55.0 C
The 'Marks' column is sorted: False
Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on clustering mixed data types in Python.
Check if a Column Is Sorted in a Dataframe Using the Numpy Module
The numpy module in python provides us with different functions to perform operations on numeric data. One such function is the diff()
function. The diff()
function takes an iterable object as its input argument and returns an array containing the first-order difference of the array elements as shown in the following example.
import numpy as np
df=pd.read_csv("grade2.csv")
marks=df["Marks"]
print("The Marks column is:")
print(marks)
temp=np.diff(marks)
print("Array returned by diff() is:")
print(temp)
Output:
The Marks column is:
0 55
1 78
2 82
3 88
4 78
5 55
6 78
7 50
Name: Marks, dtype: int64
Array returned by diff() is:
[ 23 4 6 -10 -23 23 -28]
Here, you can observe that the first-order difference is calculated as the difference between (n+1)th
and nth element in the input array. For example, the first element of the output array is the difference between the second element and the first element of the input "Marks"
column. The second element in the output array is the difference of the third element and the second element of the "Marks"
column.
By observing the output, we can conclude “if the ‘Marks’ column is sorted in ascending order, all the values in the output array will be greater than or equal to 0. Similarly, if the ‘marks’ column is sorted in descending order, all the elements in the output array will be less than or equal to 0.” We will use this conclusion to check if the column is sorted in ascending or descending order.
To check if a column of a pandas dataframe is sorted in ascending order, we will use the following steps.
- First, we will calculate the first-order difference of the specified column. For this, we will pass the column to the
diff()
function as an input argument. - After that, we will check if all the elements in the output array are less than or equal to 0. For this, we will use the comparison operator and the
all()
method. When we use the comparison operator on a numpy array, we get an array of boolean values. Theall()
method, when invoked on an array containing boolean values, returnsTrue
if all the elements are True. - If the
all()
method returns True, it will conclude that all the elements are sorted in ascending order.
You can observe this in the following example.
import numpy as np
df=pd.read_csv("grade2.csv")
df.sort_values(by="Marks",inplace=True,ascending=True)
marks=df["Marks"]
print("The dataframe is:")
print(df)
temp=np.diff(marks)
print("Array returned by diff() is:")
print(temp)
boolean_array= temp>=0
print("Boolean array is:")
print(boolean_array)
result=boolean_array.all()
if result:
print("The marks column is sorted.")
else:
print("The marks column is not sorted.")
Output:
The dataframe is:
Class Roll Name Marks Grade
7 3 11 Bobby 50 D
0 2 27 Harsh 55 C
5 3 27 Aditya 55 C
1 2 23 Clara 78 B
4 3 15 Prashant 78 B
6 3 23 Radheshyam 78 B
2 3 33 Tina 82 A
3 3 34 Amy 88 A
Array returned by diff() is:
[ 5 0 23 0 0 4 6]
Boolean array is:
[ True True True True True True True]
The marks column is sorted.
To check if a column is sorted in descending order, we will check if all the elements in the output array of the diff()
function are less than or equal to 0. For this, we will use the comparison operator and the all()
method. When we use the comparison operator on a numpy array, we get an array of boolean values. The all()
method, when invoked on an array containing boolean values, returns True if all the elements are True.
If the all()
method returns True, it will conclude that all the elements are sorted in descending order. You can observe this in the following example.
import numpy as np
df=pd.read_csv("grade2.csv")
df.sort_values(by="Marks",inplace=True,ascending=False)
marks=df["Marks"]
print("The dataframe is:")
print(df)
temp=np.diff(marks)
print("Array returned by diff() is:")
print(temp)
boolean_array= temp<=0
print("Boolean array is:")
print(boolean_array)
result=boolean_array.all()
if result:
print("The marks column is sorted.")
else:
print("The marks column is not sorted.")
Output:
The dataframe is:
Class Roll Name Marks Grade
3 3 34 Amy 88 A
2 3 33 Tina 82 A
1 2 23 Clara 78 B
4 3 15 Prashant 78 B
6 3 23 Radheshyam 78 B
0 2 27 Harsh 55 C
5 3 27 Aditya 55 C
7 3 11 Bobby 50 D
Array returned by diff() is:
[ -6 -4 0 0 -23 0 -5]
Boolean array is:
[ True True True True True True True]
The marks column is sorted.
Check if the Index Column Is Sorted in a Dataframe
To check if the index of a dataframe is sorted in ascending order, we can use the index
attribute and the is_monotonic
attribute as shown below.
import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
df.sort_index(inplace=True,ascending=True)
print("The dataframe is:")
print(df)
temp=df.index.is_monotonic
print("The Index is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Grade
Marks
50 3 11 Bobby D
55 2 27 Harsh C
55 3 27 Aditya C
78 2 23 Clara B
78 3 15 Prashant B
78 3 23 Radheshyam B
82 3 33 Tina A
88 3 34 Amy A
The Index is sorted: True
To check if the index of a dataframe is sorted in ascending order, we can use the index attribute and the is_monotonic_increasing
attribute as shown below.
import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
df.sort_index(inplace=True,ascending=True)
print("The dataframe is:")
print(df)
temp=df.index.is_monotonic_increasing
print("The Index is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Grade
Marks
50 3 11 Bobby D
55 2 27 Harsh C
55 3 27 Aditya C
78 2 23 Clara B
78 3 15 Prashant B
78 3 23 Radheshyam B
82 3 33 Tina A
88 3 34 Amy A
The Index is sorted: True
To check if the index of a dataframe is sorted in descending order, we can use the index
attribute and the is_monotonic_decreasing
attribute as shown below.
import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
df.sort_index(inplace=True,ascending=False)
print("The dataframe is:")
print(df)
temp=df.index.is_monotonic_decreasing
print("The Index is sorted:",temp)
Output:
The dataframe is:
Class Roll Name Grade
Marks
88 3 34 Amy A
82 3 33 Tina A
78 2 23 Clara B
78 3 15 Prashant B
78 3 23 Radheshyam B
55 2 27 Harsh C
55 3 27 Aditya C
50 3 11 Bobby D
The Index is sorted: True
You need to keep in mind that the is_monotonic
attribute, is_monotonic_increasing
attribute, and the is_monotonic_decreasing
always return False if the index column contains NaN values. Therefore, you cannot use these attributes to check if the index is sorted if the index column contains NaN values.
Conclusion
In this article, we have discussed different ways to check if a column is sorted in a pandas dataframe. For this, we have used the pandas library as well as the numpy module. We have also checked if the index of a pandas dataframe is sorted or not.
To learn more about python programming, you can read this article on dictionary comprehension in python. You might like this article on list comprehension in python too.
I hope you enjoyed reading this article. Stay tuned for more informative articles.
Happy Learning!
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.