The inner join operation is used in database management to join two or more tables. We can also perform inner join operations on two pandas dataframes as they contain tabular values. In this article, we will discuss how we can perform an inner join operation on two dataframes in python.
What is Inner Join Operation?
The inner join operation is used to find the intersection between two tables. For instance, consider that we have a table that contains the personal details of students and another table that contains grades of the students. If both of the tables have a common column, say ‘Name’
, then, we can create another table that has the details of the students as well as their marks in each row.
To perform the inner join operation in python, we can use the pandas dataframes along with the join()
method or the merge()
method. Let us discuss them one by one.
The files used in the programs can be downloaded using the below links.
Inner Join Two DataFrames Using the merge() Method
We can use the merge()
method to perform inner join operation on two dataframes in python. The merge()
method, when invoked on a dataframe, takes another dataframe as its first input argument. Along with that, it takes the value ‘inner’
as an input argument for the ‘how’
parameter. It also takes the column name that is common between the two dataframes as the input argument for the ‘on’
parameter. After execution, it returns a dataframe which is the intersection of both the dataframes and contains columns from both the dataframes. You can observe this in the following example.
import pandas as pd
import numpy as np
names=pd.read_csv("name.csv")
grades=pd.read_csv("grade.csv")
resultdf=names.merge(grades,how="inner",on="Name")
print("The resultant dataframe is:")
print(resultdf)
Output:
The resultant dataframe is:
Class_x Roll_x Name Class_y Roll_y Grade
0 1 11 Aditya 1 11 A
1 1 12 Chris 1 12 A+
2 2 1 Joel 2 1 B
3 2 22 Tom 2 22 B+
4 3 33 Tina 3 33 A-
5 3 34 Amy 3 34 A
You should keep in mind that the output dataframe will only contain those rows from both the tables in which the column given as input to the ‘on’
parameter is the same. All the other rows from both the dataframes will be omitted from the output dataframe.
If there are columns with the same name, the python interpreter adds _x
and _y
suffixes to the column names. To identify the columns from the dataframe on which the merge()
method in invoked, _x
suffix is added. For the dataframe that is passed as the input argument to the merge()
method, _y
suffix is used.
Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on k-means clustering with numerical example.
Inner Join Two DataFrames Using the join() Method
Instead of using the merge()
method, we can use the join()
method to perform the inner join operation on the dataframes.
The join()
method, when invoked on a dataframe, takes another dataframe as its first input argument. Along with that, it takes the value ‘inner’
as an input argument for the ‘how’
parameter. It also takes the column name that is common between the two dataframes as the input argument for the ‘on’
parameter. After execution, the join()
method returns the output dataframe as shown below.
import pandas as pd
import numpy as np
names=pd.read_csv("name.csv")
grades=pd.read_csv("grade.csv")
grades=grades.set_index("Name")
resultdf=names.join(grades,how="inner",on="Name",lsuffix='_names', rsuffix='_grades')
print("The resultant dataframe is:")
print(resultdf)
Output:
The resultant dataframe is:
Class_names Roll_names Name Class_grades Roll_grades Grade
0 1 11 Aditya 1 11 A
1 1 12 Chris 1 12 A+
3 2 1 Joel 2 1 B
4 2 22 Tom 2 22 B+
6 3 33 Tina 3 33 A-
7 3 34 Amy 3 34 A
While using the join()
method, you also need to keep in mind that the column on which the join operation is to be performed should be the index of the dataframe that is passed as input argument to the join()
method. If the dataframes have same column names for some columns, you need to specify the suffix for column names using the lsuffix
and rsuffix
parameters. The values passed to these parameters help us identify which column comes from which dataframe if the column names are the same.
Conclusion
In this article, we have discussed two ways to perform an inner join operation on two dataframes in python. To know more about python programming, you can read this article on dictionary comprehension in python. You might also like this article on list comprehension in python.
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.