Pandas dataframes are the primary tools to analyze tabular data in python. In this article, we will discuss different ways to concatenate, merge, and join pandas dataframes using the merge()
function, the join()
function, and the concat()
function.
- The Pandas merge() Function
- Merge or Inner Join DataFrames Using the merge() Function
- Full Outer Join Pandas DataFrames Using the merge() Function
- The Pandas concat() Function
- The Pandas join() Method
- Left Join Pandas DataFrames Using the Join() Method
- Right Join Pandas DataFrames Using the join() Method
- Inner Join DataFrames Using the join() Method
- Outer Join DataFrames Using the join() Method
- Join DataFrames With Common Column Names
- Join Dataframes Using a Column Name as Join Key
- Join Multiple DataFrames Using the join() Method
- Conclusion
The Pandas merge() Function
The merge()
function is used to merge pandas dataframes in Python. The merge happens in a similar manner to the join operations in database columns. The syntax for the merge()
function is as follows.
pandas.merge(left_df, right_df, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, suffixes=('_x', '_y'),indicator=False)
Here,
- The
left_df
andright_df
parameters take a dataframe each as their input argument. The position of the dataframe affects the output. Hence, we have explicitly named the positions of the dataframe. - The
“how”
parameter is used to decide the join operation that will be performed onleft_df
andright_df
. It has the default value“inner”
which shows that the inner join operation will be performed. You can also use the literal“outer”
,“left”
, or“right”
to perform full outer join, left join, or right join respectively on the dataframes. - The
on
parameter is used to decide the column that is used as the key for the join operations. Here, the column name provided to the on parameter must be present in bothleft_df
andright_df
. If you are using different column names from both the dataframes as join keys, the on parameter is set toNone
by default. - The
left_on
parameter is used to specify the column name to be used as the join key from theleft_df
. - The
right_on
parameter is used to specify the column name to be used as the join key from theright_df
. - If the input dataframe have any of the columns as indices, you can use the
left_index
andright_index
parameter to use the indices as join keys. - The
left_index
parameter is set toTrue
if you want to use the index ofleft_df
as the join key. It has the default value ofFalse
. - The
right_index
parameter is set toTrue
if you want to use the index ofright_df
as the join key. It has the default value ofFalse
. - The suffixes parameter comes into use when
left_df
andright_df
have common column names. Ifleft_df
andright_df
have common column names,_x
is added to the respective column name inleft_df
and_y
is added to the same column name inright_df
. You can also specify suffixes manually using thesuffixes
parameter. - The
indicator
parameter is used to indicate if the join keys were present in the input dataframes. If theindicator
is set toTrue
, an additional column named“_merge”
is added to the output dataframe. For each row, if the join key were present in bothleft_df
andright_df
, the output is_merge
column inboth
. If the join key were present only in theleft_df
orright_df
, the value in the_merge
column will beleft_only
orright_only
respectively.
After execution, the merge()
function returns the output dataframe.
Merge or Inner Join DataFrames Using the merge() Function
We can merge two dataframes using the merge()
function. The merge functionally works as database join operation. The columns that are compared while joining the dataframes are passed to left_on
and the right_on
parameter.
After comparing the values in the left_on
column in left dataframe and right_on
column in the right dataframe, the rows are merged to produce the output dataframe.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,left_on="Roll", right_on="Roll")
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Roll Marks Grade
0 11 85 A
1 12 95 A
2 13 75 B
3 14 75 B
4 16 78 B
5 15 55 C
6 19 75 B
7 20 72 B
8 24 92 A
9 25 95 A
second dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 13 Sam
3 1 14 Joel
4 1 15 Tom
5 1 16 Samantha
6 1 17 Pablo
7 1 20 Tina
8 1 24 Amy
9 1 30 Justin
10 1 31 Karl
Merged dataframe is:
Roll Marks Grade Class Name
0 11 85 A 1 Aditya
1 12 95 A 1 Chris
2 13 75 B 1 Sam
3 14 75 B 1 Joel
4 16 78 B 1 Samantha
5 15 55 C 1 Tom
6 20 72 B 1 Tina
7 24 92 A 1 Amy
In the above example, we have performed inner join on pandas dataframes using the "Roll"
attribute. From both the dataframes, the rows having the same value for the "Roll"
attribute are merged together to form the rows of the output dataframe.
If the input dataframes have common columns, the suffix _x
and _y
are added to the column names of left and right pandas dataframes respectively after the join operation. You can observe this in the following example.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_name.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,left_on="Roll", right_on="Roll")
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Roll Name Marks Grade
0 11 Aditya 85 A
1 12 Chris 95 A
2 13 Sam 75 B
3 14 Joel 75 B
4 16 Tom 78 B
5 15 Samantha 55 C
6 20 Tina 72 B
7 24 Amy 92 A
second dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 13 Sam
3 1 14 Joel
4 1 15 Tom
5 1 16 Samantha
6 1 17 Pablo
7 1 20 Tina
8 1 24 Amy
9 1 30 Justin
10 1 31 Karl
Merged dataframe is:
Roll Name_x Marks Grade Class Name_y
0 11 Aditya 85 A 1 Aditya
1 12 Chris 95 A 1 Chris
2 13 Sam 75 B 1 Sam
3 14 Joel 75 B 1 Joel
4 16 Tom 78 B 1 Samantha
5 15 Samantha 55 C 1 Tom
6 20 Tina 72 B 1 Tina
7 24 Amy 92 A 1 Amy
In this example, both the input dataframes have the "Name
” attribute. Hence, the Name
column of the first dataframe gets _x
suffix in the output dataframe. Similarly, the Name
column of the second dataframe gets the suffix _y
in the output dataframe.
Instead of the default suffix, we can also change the suffixes as shown below.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_name.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,left_on="Roll", right_on="Roll",suffixes=("_left","_right"))
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Roll Name Marks Grade
0 11 Aditya 85 A
1 12 Chris 95 A
2 13 Sam 75 B
3 14 Joel 75 B
4 16 Tom 78 B
5 15 Samantha 55 C
6 20 Tina 72 B
7 24 Amy 92 A
second dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 13 Sam
3 1 14 Joel
4 1 15 Tom
5 1 16 Samantha
6 1 17 Pablo
7 1 20 Tina
8 1 24 Amy
9 1 30 Justin
10 1 31 Karl
Merged dataframe is:
Roll Name_left Marks Grade Class Name_right
0 11 Aditya 85 A 1 Aditya
1 12 Chris 95 A 1 Chris
2 13 Sam 75 B 1 Sam
3 14 Joel 75 B 1 Joel
4 16 Tom 78 B 1 Samantha
5 15 Samantha 55 C 1 Tom
6 20 Tina 72 B 1 Tina
7 24 Amy 92 A 1 Amy
In the above example, we have used the _left
suffix for the first dataframe and the _right
suffix for the second dataframe. You can also change the suffixes using the "suffixes"
parameter.
Left Join Pandas DataFrames Using the merge() Function
By default the merge()
function performs inner join operation on the input dataframes. We can also merge dataframes using the left join operation. For this, we can pass the literal "left"
to the how
parameter.
When we perform left join operation on pandas dataframes, all the rows of the left dataframe are shown in the output dataframe. For the rows in the left dataframe that don’t have a corresponding row in the right dataframe, we get NaN
values in the rows of the columns corresponding to the right dataframe.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,how="left",left_on="Roll", right_on="Roll",suffixes=("_left","_right"))
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Roll Marks Grade
0 11 85 A
1 12 95 A
2 13 75 B
3 14 75 B
4 16 78 B
5 15 55 C
6 19 75 B
7 20 72 B
8 24 92 A
9 25 95 A
second dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 13 Sam
3 1 14 Joel
4 1 15 Tom
5 1 16 Samantha
6 1 17 Pablo
7 1 20 Tina
8 1 24 Amy
9 1 30 Justin
10 1 31 Karl
Merged dataframe is:
Roll Marks Grade Class Name
0 11 85 A 1.0 Aditya
1 12 95 A 1.0 Chris
2 13 75 B 1.0 Sam
3 14 75 B 1.0 Joel
4 16 78 B 1.0 Samantha
5 15 55 C 1.0 Tom
6 19 75 B NaN NaN
7 20 72 B 1.0 Tina
8 24 92 A 1.0 Amy
9 25 95 A NaN NaN
In the above example, we don’t have rows having Roll 19
and Roll 25
in the right dataframe. But, they exist in the left dataframe. Hence, after a left join operation, we get rows having Roll 19 and 25 in the output dataframe. However, the columns from right dataframe have NaN
values in the respective rows for Roll 19 and 25.
Right Join Pandas DataFrames Using the merge() Function
Similar to the left join operation, we can also perform right join operation on pandas dataframes using the merge()
function. We can merge dataframes using right join operation by passing the literal "right"
to the "how"
parameter.
When we perform right join operation on pandas dataframes, all the rows of the right dataframe are shown in the output dataframe. For the rows in the right dataframe that don’t have a corresponding row in the left dataframe, we get NaN
values in the rows of the columns corresponding to the left dataframe.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,how="right",left_on="Roll", right_on="Roll",suffixes=("_left","_right"))
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Roll Marks Grade
0 11 85 A
1 12 95 A
2 13 75 B
3 14 75 B
4 16 78 B
5 15 55 C
6 19 75 B
7 20 72 B
8 24 92 A
9 25 95 A
second dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 13 Sam
3 1 14 Joel
4 1 15 Tom
5 1 16 Samantha
6 1 17 Pablo
7 1 20 Tina
8 1 24 Amy
9 1 30 Justin
10 1 31 Karl
Merged dataframe is:
Roll Marks Grade Class Name
0 11 85.0 A 1 Aditya
1 12 95.0 A 1 Chris
2 13 75.0 B 1 Sam
3 14 75.0 B 1 Joel
4 15 55.0 C 1 Tom
5 16 78.0 B 1 Samantha
6 17 NaN NaN 1 Pablo
7 20 72.0 B 1 Tina
8 24 92.0 A 1 Amy
9 30 NaN NaN 1 Justin
10 31 NaN NaN 1 Karl
In this example, we don’t have rows having Roll 17, 30 and 31 in the left dataframe. But, they exist in the right dataframe. Hence, after a right join operation, we get rows having Roll 17, 30 and 31 in the output dataframe. However, the columns from left dataframe have NaN
values in the respective rows for Roll 17, 30 and 31.
Full Outer Join Pandas DataFrames Using the merge() Function
Inner join, left join, and right join operations on the pandas dataframes lead to loss in data. If you want to retain all the input data, you can perform full outer join on the pandas dataframes. For this, you need to pass the literal "outer"
to the "how"
parameter in the merge()
function.
In the outer join operation, all the rows of the left dataframe are shown in the output dataframe. For the rows in the left dataframe that don’t have a corresponding row in the right dataframe, we get NaN
values in the rows of the columns corresponding to the right dataframe.
Similarly, all the rows of the right dataframe are shown in the output dataframe. For the rows in the right dataframe that don’t have a corresponding row in the left dataframe, we get NaN
values in the rows of the columns corresponding to the left dataframe.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,how="outer",left_on="Roll", right_on="Roll",suffixes=("_left","_right"))
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Roll Marks Grade
0 11 85 A
1 12 95 A
2 13 75 B
3 14 75 B
4 16 78 B
5 15 55 C
6 19 75 B
7 20 72 B
8 24 92 A
9 25 95 A
second dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 13 Sam
3 1 14 Joel
4 1 15 Tom
5 1 16 Samantha
6 1 17 Pablo
7 1 20 Tina
8 1 24 Amy
9 1 30 Justin
10 1 31 Karl
Merged dataframe is:
Roll Marks Grade Class Name
0 11 85.0 A 1.0 Aditya
1 12 95.0 A 1.0 Chris
2 13 75.0 B 1.0 Sam
3 14 75.0 B 1.0 Joel
4 16 78.0 B 1.0 Samantha
5 15 55.0 C 1.0 Tom
6 19 75.0 B NaN NaN
7 20 72.0 B 1.0 Tina
8 24 92.0 A 1.0 Amy
9 25 95.0 A NaN NaN
10 17 NaN NaN 1.0 Pablo
11 30 NaN NaN 1.0 Justin
12 31 NaN NaN 1.0 Karl
In the above example, we don’t have rows having Roll 19 and Roll 25 in the right dataframe. But, they exist in the left dataframe. Hence, after the outer join operation, we get rows having Roll 19 and 25 in the output dataframe. However, the columns from right dataframe have NaN
values in the respective rows for Roll 19 and 25.
Similarly, we don’t have rows having Roll 17, 30 and 31 in the left dataframe. But, they exist in the right dataframe. Hence, after a n outer join operation, we get rows having Roll 17, 30 and 31 in the output dataframe. However, the columns from left dataframe have NaN
values in the respective rows for Roll 17, 30 and 31.
Merge DataFrames Using Indices as Join Keys
We can also merge the dataframes using indices as the join keys. For this, we need to set the left_index
and right_index
parameter to True based on the situation.
For instance, we can use the index of the left dataframe as the join key as shown below.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,left_index=True, right_on="Roll")
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Marks Grade
Roll
11 85 A
12 95 A
13 75 B
14 75 B
16 78 B
15 55 C
19 75 B
20 72 B
24 92 A
25 95 A
second dataframe is:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 13 Sam
3 1 14 Joel
4 1 15 Tom
5 1 16 Samantha
6 1 17 Pablo
7 1 20 Tina
8 1 24 Amy
9 1 30 Justin
10 1 31 Karl
Merged dataframe is:
Marks Grade Class Roll Name
0 85 A 1 11 Aditya
1 95 A 1 12 Chris
2 75 B 1 13 Sam
3 75 B 1 14 Joel
5 78 B 1 16 Samantha
4 55 C 1 15 Tom
7 72 B 1 20 Tina
8 92 A 1 24 Amy
In this example, the Roll
column is used as an index in the left dataframe. Hence, we don’t need to pass the column name to the left_on
parameter. Setting left_index=True
does the work for us.
We can also use the index of right dataframe as the join key as shown below.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,left_on="Roll", right_index=True)
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Roll Marks Grade
0 11 85 A
1 12 95 A
2 13 75 B
3 14 75 B
4 16 78 B
5 15 55 C
6 19 75 B
7 20 72 B
8 24 92 A
9 25 95 A
second dataframe is:
Class Name
Roll
11 1 Aditya
12 1 Chris
13 1 Sam
14 1 Joel
15 1 Tom
16 1 Samantha
17 1 Pablo
20 1 Tina
24 1 Amy
30 1 Justin
31 1 Karl
Merged dataframe is:
Roll Marks Grade Class Name
0 11 85 A 1 Aditya
1 12 95 A 1 Chris
2 13 75 B 1 Sam
3 14 75 B 1 Joel
4 16 78 B 1 Samantha
5 15 55 C 1 Tom
7 20 72 B 1 Tina
8 24 92 A 1 Amy
In this example, the Roll
column is used as an index in the right dataframe. Hence, we don’t need to pass the column name to the right_on
parameter. Setting right_index=True
does the work for us. For left dataframe, we have used the left_on
parameter to specify the join key.
To use indices of both the dataframes, we can use the left_index
and right_index
parameters as shown below.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,left_index=True, right_index=True)
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Marks Grade
Roll
11 85 A
12 95 A
13 75 B
14 75 B
16 78 B
15 55 C
19 75 B
20 72 B
24 92 A
25 95 A
second dataframe is:
Class Name
Roll
11 1 Aditya
12 1 Chris
13 1 Sam
14 1 Joel
15 1 Tom
16 1 Samantha
17 1 Pablo
20 1 Tina
24 1 Amy
30 1 Justin
31 1 Karl
Merged dataframe is:
Marks Grade Class Name
Roll
11 85 A 1 Aditya
12 95 A 1 Chris
13 75 B 1 Sam
14 75 B 1 Joel
16 78 B 1 Samantha
15 55 C 1 Tom
20 72 B 1 Tina
24 92 A 1 Amy
In this example, both the input dataframes have Roll
column as their index. Hence, we have set left_index
and right_index
to True
to specify the indices as join keys.
Merge Pandas DataFrames With Indicator
While merging dataframes, we can add some metadata in the output dataframes. For instance, we can specify if the join key was present in the left dataframe, the right dataframe or both the input dataframes.
For this, we can use the indicator parameter. When the indicator is set to True
, the merge()
function adds an additional column named _merge
in the output dataframe.
For each row in the _merge
column, if the join key were present in both the input dataframes, the output value in the _merge
column in "both"
. If the join key is present only in the left dataframe, the output in the _merge
column is "left_only"
. Similarly, if the join column is only present in the right dataframe, the value in the _merge
column for the corresponding row will be "right_only"
. You can observe this in the following example.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,how="outer",left_index=True, right_index=True,indicator=True)
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Marks Grade
Roll
11 85 A
12 95 A
13 75 B
14 75 B
16 78 B
15 55 C
19 75 B
20 72 B
24 92 A
25 95 A
second dataframe is:
Class Name
Roll
11 1 Aditya
12 1 Chris
13 1 Sam
14 1 Joel
15 1 Tom
16 1 Samantha
17 1 Pablo
20 1 Tina
24 1 Amy
30 1 Justin
31 1 Karl
Merged dataframe is:
Marks Grade Class Name _merge
Roll
11 85.0 A 1.0 Aditya both
12 95.0 A 1.0 Chris both
13 75.0 B 1.0 Sam both
14 75.0 B 1.0 Joel both
15 55.0 C 1.0 Tom both
16 78.0 B 1.0 Samantha both
17 NaN NaN 1.0 Pablo right_only
19 75.0 B NaN NaN left_only
20 72.0 B 1.0 Tina both
24 92.0 A 1.0 Amy both
25 95.0 A NaN NaN left_only
30 NaN NaN 1.0 Justin right_only
31 NaN NaN 1.0 Karl right_only
Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on k-means clustering with numerical example.
The Pandas concat() Function
We can concatenate pandas dataframes horizontally or vertically using the concat()
function. The syntax for the concat()
function is as follows.
pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None,names=None)
Here,
- The
objs
parameter is a list or tuple of dataframes that need to be concatenated. - The
axis
parameter is used to decide whether the input dataframes are joined horizontally or vertically. If you want to concatenate the dataframes vertically, the axis is set to 0 which is its default value. To concatenate dataframes horizontally, the axis is set to 1. - The
join
parameter is used to decide how to handle indices on other indices. If we are concatenating the dataframes vertically, the join parameter decides what columns will be included in the output dataframe. If join is set to“inner”
, the output dataframe contains only those columns that are present in all the input dataframes. If join is set to“outer”,
which is its default value, all the columns from the input dataframes are included in the output dataframe.If we are concatenating the dataframes horizontally, the join operation is performed with indices of the input dataframes. - The
ignore_index
parameter is used to decide whether the output dataframe stores the indices from the input dataframes. By default, it is set toFalse
due to which the indices of the input dataframes are preserved in the output dataframe. Whenignore_index
is set toTrue
, the indices of the input dataframes are ignored. - The
keys
parameter is used to create an additional index level in the output dataframe. Generally, we use this parameter to identify the input dataframes to which the rows or columns in the output dataframe belong. This is done by specifying the names of the input dataframes as keys. - The
names
parameter is used to name the index columns that are created using thekeys
parameter.
After execution, the concat()
function returns the concatenated dataframe.
Concatenate DataFrames Vertically Using the concat() Function
If we have two dataframes with similar data, we can concatenate them vertically using the concat()
function.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade2.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2])
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
second dataframe is:
Class Roll Name Marks Grade
0 2 27 Harsh 55 C
1 2 23 Clara 78 B
2 3 33 Tina 82 A
3 3 34 Amy 88 A
4 3 15 Prashant 78 B
5 3 27 Aditya 55 C
6 3 23 Radheshyam 78 B
7 3 11 Bobby 50 D
Merged dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
0 2 27 Harsh 55 C
1 2 23 Clara 78 B
2 3 33 Tina 82 A
3 3 34 Amy 88 A
4 3 15 Prashant 78 B
5 3 27 Aditya 55 C
6 3 23 Radheshyam 78 B
7 3 11 Bobby 50 D
In the above example, we have created the output dataframe from the input dataframes after merging all the rows. You can observe that the indices of the input dataframes have also been concatenated in the output dataframe.
To ignore the indices of the input dataframes and create new index, you can set the ignore_index
parameter to True
in the concat()
function as shown in the following example.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade2.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2],ignore_index=True)
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
second dataframe is:
Class Roll Name Marks Grade
0 2 27 Harsh 55 C
1 2 23 Clara 78 B
2 3 33 Tina 82 A
3 3 34 Amy 88 A
4 3 15 Prashant 78 B
5 3 27 Aditya 55 C
6 3 23 Radheshyam 78 B
7 3 11 Bobby 50 D
Merged dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
8 2 27 Harsh 55 C
9 2 23 Clara 78 B
10 3 33 Tina 82 A
11 3 34 Amy 88 A
12 3 15 Prashant 78 B
13 3 27 Aditya 55 C
14 3 23 Radheshyam 78 B
15 3 11 Bobby 50 D
Here, you can observe that all the rows have been assigned new indices to them.
Sometimes, the dataframes that are being concatenated horizontally many not have the same columns. In such a case, the columns in the output dataframe is the union of all the columns in the input dataframes. For instance, consider the following example.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2])
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Roll Marks Grade
0 11 85 A
1 12 95 A
2 13 75 B
3 14 75 B
4 16 78 B
5 15 55 C
6 20 72 B
7 24 92 A
second dataframe is:
Roll Name Marks Grade
0 11 Aditya 85 A
1 12 Chris 95 A
2 13 Sam 75 B
3 14 Joel 75 B
4 16 Tom 78 B
5 15 Samantha 55 C
6 20 Tina 72 B
7 24 Amy 92 A
Merged dataframe is:
Roll Marks Grade Name
0 11 85 A NaN
1 12 95 A NaN
2 13 75 B NaN
3 14 75 B NaN
4 16 78 B NaN
5 15 55 C NaN
6 20 72 B NaN
7 24 92 A NaN
0 11 85 A Aditya
1 12 95 A Chris
2 13 75 B Sam
3 14 75 B Joel
4 16 78 B Tom
5 15 55 C Samantha
6 20 72 B Tina
7 24 92 A Amy
In the above example, the first dataframe contains the columns "Roll", "Marks",
and "Grade"
. The second dataframe contains the columns "Roll", "Name", "Marks",
and "Grade"
. Hence, the output dataframe contains the columns "Roll", "Name", "Marks",
and "Grade"
.
The value in the “Name
” column in the rows corresponding to the first dataframe will contain NaN
values as the first dataframe doesn’t contain the "Name"
column. Similarly, if the second dataframe would have had an extra column, the rows corresponding to the first dataframe will contain NaN
values for the respective column.
Concat DataFrames Horizontally Using the concat() Function
We can also concatenate dataframes horizontally using the concat()
function. For this, we need to use the parameter axis=1
in the concat()
function.
When concatenate two dataframes horizontally using the concat()
function, the rows of the input dataframes are merged using the index values of the dataframes. After execution of the concat()
function, we get the output dataframe as shown below.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2],axis=1)
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Roll Marks Grade
0 11 85 A
1 12 95 A
2 13 75 B
3 14 75 B
4 16 78 B
5 15 55 C
6 20 72 B
7 24 92 A
second dataframe is:
Roll Name Marks Grade
0 11 Aditya 85 A
1 12 Chris 95 A
2 13 Sam 75 B
3 14 Joel 75 B
4 16 Tom 78 B
5 15 Samantha 55 C
6 20 Tina 72 B
7 24 Amy 92 A
Merged dataframe is:
Roll Marks Grade Roll Name Marks Grade
0 11 85 A 11 Aditya 85 A
1 12 95 A 12 Chris 95 A
2 13 75 B 13 Sam 75 B
3 14 75 B 14 Joel 75 B
4 16 78 B 16 Tom 78 B
5 15 55 C 15 Samantha 55 C
6 20 72 B 20 Tina 72 B
7 24 92 A 24 Amy 92 A
If any of the input dataframes contains extra rows or indices that are not present in the other dataframe, the output dataframe contains NaN
values in the columns corresponding to the other dataframe in the respective rows. You can observe this in the following example.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2],axis=1)
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Roll Marks Grade
0 11 85 A
1 12 95 A
2 13 75 B
3 14 75 B
4 16 78 B
5 15 55 C
6 20 72 B
7 24 92 A
second dataframe is:
Roll Name Marks Grade
0 11 Aditya 85 A
1 12 Chris 95 A
2 13 Sam 75 B
3 14 Joel 75 B
4 16 Tom 78 B
5 15 Samantha 55 C
6 20 Tina 72 B
7 24 Amy 92 A
8 25 Sinu 95 A
Merged dataframe is:
Roll Marks Grade Roll Name Marks Grade
0 11.0 85.0 A 11 Aditya 85 A
1 12.0 95.0 A 12 Chris 95 A
2 13.0 75.0 B 13 Sam 75 B
3 14.0 75.0 B 14 Joel 75 B
4 16.0 78.0 B 16 Tom 78 B
5 15.0 55.0 C 15 Samantha 55 C
6 20.0 72.0 B 20 Tina 72 B
7 24.0 92.0 A 24 Amy 92 A
8 NaN NaN NaN 25 Sinu 95 A
In this example, you can observe that the last row contains NaN
values for columns corresponding to the first dataframe. This is due to the reason that the first dataframe contain 1 row lesser that the second dataframe.
Concatenate DataFrames Horizontally With Only Common Row Indices
If you don’t want NaN
values to be present in the output dataframe, you can use the join="inner"
parameter as shown below.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2],join="inner",axis=1)
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Roll Marks Grade
0 11 85 A
1 12 95 A
2 13 75 B
3 14 75 B
4 16 78 B
5 15 55 C
6 20 72 B
7 24 92 A
second dataframe is:
Roll Name Marks Grade
0 11 Aditya 85 A
1 12 Chris 95 A
2 13 Sam 75 B
3 14 Joel 75 B
4 16 Tom 78 B
5 15 Samantha 55 C
6 20 Tina 72 B
7 24 Amy 92 A
8 25 Sinu 95 A
Merged dataframe is:
Roll Marks Grade Roll Name Marks Grade
0 11 85 A 11 Aditya 85 A
1 12 95 A 12 Chris 95 A
2 13 75 B 13 Sam 75 B
3 14 75 B 14 Joel 75 B
4 16 78 B 16 Tom 78 B
5 15 55 C 15 Samantha 55 C
6 20 72 B 20 Tina 72 B
7 24 92 A 24 Amy 92 A
Concatenate DataFrames Horizontally With Custom Indices
When we join the dataframes horizontally, the rows are matched using the default indices. If we specify the indices of the input dataframes, the rows of the dataframes are matched using index values as keys. You can observe this in the following example.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2],join="inner",axis=1)
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Marks Grade
Roll
11 85 A
12 95 A
13 75 B
14 75 B
16 78 B
15 55 C
20 72 B
24 92 A
second dataframe is:
Name Marks Grade
Roll
11 Aditya 85 A
12 Chris 95 A
13 Sam 75 B
14 Joel 75 B
16 Tom 78 B
15 Samantha 55 C
20 Tina 72 B
24 Amy 92 A
25 Sinu 95 A
Merged dataframe is:
Marks Grade Name Marks Grade
Roll
11 85 A Aditya 85 A
12 95 A Chris 95 A
13 75 B Sam 75 B
14 75 B Joel 75 B
16 78 B Tom 78 B
15 55 C Samantha 55 C
20 72 B Tina 72 B
24 92 A Amy 92 A
Add Identifiers to the Rows After Concatenating DataFrames
When we concatenate two dataframes, the output dataframe doesn’t specify which input dataframe a particular row belongs to. To specify this, we can use the "keys"
parameter. The "keys"
parameter takes a list of strings as its input. After execution, it adds the keys as an extra level of index in the output dataframe. Each key specified in the "keys"
parameter corresponds to a specific dataframe. You can observe this in the following example.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade2.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2],keys=["Dataframe1","dataframe2"])
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
second dataframe is:
Class Roll Name Marks Grade
0 2 27 Harsh 55 C
1 2 23 Clara 78 B
2 3 33 Tina 82 A
3 3 34 Amy 88 A
4 3 15 Prashant 78 B
5 3 27 Aditya 55 C
6 3 23 Radheshyam 78 B
7 3 11 Bobby 50 D
Merged dataframe is:
Class Roll Name Marks Grade
Dataframe1 0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
dataframe2 0 2 27 Harsh 55 C
1 2 23 Clara 78 B
2 3 33 Tina 82 A
3 3 34 Amy 88 A
4 3 15 Prashant 78 B
5 3 27 Aditya 55 C
6 3 23 Radheshyam 78 B
7 3 11 Bobby 50 D
In the above example, you can observe that we have added “dataframe1
” and “dataframe2
” as extra level of index in the output dataframe. You can also give a name to the index column using the “names
” parameter as shown below.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade2.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2],keys=["Dataframe1","dataframe2"],names=["Dataframe","index"])
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
second dataframe is:
Class Roll Name Marks Grade
0 2 27 Harsh 55 C
1 2 23 Clara 78 B
2 3 33 Tina 82 A
3 3 34 Amy 88 A
4 3 15 Prashant 78 B
5 3 27 Aditya 55 C
6 3 23 Radheshyam 78 B
7 3 11 Bobby 50 D
Merged dataframe is:
Class Roll Name Marks Grade
Dataframe index
Dataframe1 0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
dataframe2 0 2 27 Harsh 55 C
1 2 23 Clara 78 B
2 3 33 Tina 82 A
3 3 34 Amy 88 A
4 3 15 Prashant 78 B
5 3 27 Aditya 55 C
6 3 23 Radheshyam 78 B
7 3 11 Bobby 50 D
The Pandas join() Method
Instead of the merge()
function, we can also use the join()
method to perform joins on two dataframes.
The syntax for the join()
method is as follows.
df.join(other, on=None, how='left', lsuffix='', rsuffix='’)
Here,
df
is our left dataframe.- The parameter
“other”
denotes the right dataframe to join or a list of dataframes if we want to join multiple dataframes. - The
on
parameter is used to specify the column that is used as the join key. - The “
how
” parameter is used to specify whether left, right, inner, or full outer join will be performed. By default, it has the value “left
”. - When the left and right dataframes have common column names, the
lsuffix
andrsuffix
parameters are used to specify suffixes of columns from input dataframes in the output dataframe.
The join()
method works best when the join is performed with the indices of the dataframes as join keys.
Left Join Pandas DataFrames Using the Join() Method
By default, the join()
method performs left join operation. When invoked on a dataframe, it takes another dataframe as its input argument. After execution, it returns the joined dataframe. The input dataframes are joined using the index columns as the join key.
Here, I would like to emphasize that the join()
method works best with dataframes having custom index columns.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=df1.join(df2)
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Marks Grade
Roll
11 85 A
12 95 A
13 75 B
14 75 B
16 78 B
15 55 C
19 75 B
20 72 B
24 92 A
25 95 A
second dataframe is:
Class Name
Roll
11 1 Aditya
12 1 Chris
13 1 Sam
14 1 Joel
15 1 Tom
16 1 Samantha
17 1 Pablo
20 1 Tina
24 1 Amy
30 1 Justin
31 1 Karl
Merged dataframe is:
Marks Grade Class Name
Roll
11 85 A 1.0 Aditya
12 95 A 1.0 Chris
13 75 B 1.0 Sam
14 75 B 1.0 Joel
16 78 B 1.0 Samantha
15 55 C 1.0 Tom
19 75 B NaN NaN
20 72 B 1.0 Tina
24 92 A 1.0 Amy
25 95 A NaN NaN
Right Join Pandas DataFrames Using the join() Method
We can perform right join operation on pandas dataframes using the join()
method as shown below.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=df1.join(df2,how="right")
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Marks Grade
Roll
11 85 A
12 95 A
13 75 B
14 75 B
16 78 B
15 55 C
19 75 B
20 72 B
24 92 A
25 95 A
second dataframe is:
Class Name
Roll
11 1 Aditya
12 1 Chris
13 1 Sam
14 1 Joel
15 1 Tom
16 1 Samantha
17 1 Pablo
20 1 Tina
24 1 Amy
30 1 Justin
31 1 Karl
Merged dataframe is:
Marks Grade Class Name
Roll
11 85.0 A 1 Aditya
12 95.0 A 1 Chris
13 75.0 B 1 Sam
14 75.0 B 1 Joel
15 55.0 C 1 Tom
16 78.0 B 1 Samantha
17 NaN NaN 1 Pablo
20 72.0 B 1 Tina
24 92.0 A 1 Amy
30 NaN NaN 1 Justin
31 NaN NaN 1 Karl
Inner Join DataFrames Using the join() Method
We can perform inner join operation on pandas dataframes by passing the parameter how="inner"
to the join()
method as shown below.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=df1.join(df2,how="inner")
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Marks Grade
Roll
11 85 A
12 95 A
13 75 B
14 75 B
16 78 B
15 55 C
19 75 B
20 72 B
24 92 A
25 95 A
second dataframe is:
Class Name
Roll
11 1 Aditya
12 1 Chris
13 1 Sam
14 1 Joel
15 1 Tom
16 1 Samantha
17 1 Pablo
20 1 Tina
24 1 Amy
30 1 Justin
31 1 Karl
Merged dataframe is:
Marks Grade Class Name
Roll
11 85 A 1 Aditya
12 95 A 1 Chris
13 75 B 1 Sam
14 75 B 1 Joel
16 78 B 1 Samantha
15 55 C 1 Tom
20 72 B 1 Tina
24 92 A 1 Amy
Outer Join DataFrames Using the join() Method
We can perform outer join on pandas dataframes using the join()
method as shown in the following example.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=df1.join(df2,how="outer")
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Marks Grade
Roll
11 85 A
12 95 A
13 75 B
14 75 B
16 78 B
15 55 C
19 75 B
20 72 B
24 92 A
25 95 A
second dataframe is:
Class Name
Roll
11 1 Aditya
12 1 Chris
13 1 Sam
14 1 Joel
15 1 Tom
16 1 Samantha
17 1 Pablo
20 1 Tina
24 1 Amy
30 1 Justin
31 1 Karl
Merged dataframe is:
Marks Grade Class Name
Roll
11 85.0 A 1.0 Aditya
12 95.0 A 1.0 Chris
13 75.0 B 1.0 Sam
14 75.0 B 1.0 Joel
15 55.0 C 1.0 Tom
16 78.0 B 1.0 Samantha
17 NaN NaN 1.0 Pablo
19 75.0 B NaN NaN
20 72.0 B 1.0 Tina
24 92.0 A 1.0 Amy
25 95.0 A NaN NaN
30 NaN NaN 1.0 Justin
31 NaN NaN 1.0 Karl
Join DataFrames With Common Column Names
If the input dataframes have common column names, we need to specify the suffixes for the columns of the dataframes using the rsuffix
and lsuffix
parameter as shown below.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=df1.join(df2,on="Roll",how="outer",lsuffix="_left", rsuffix="_right")
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Roll Marks Grade
0 11 85 A
1 12 95 A
2 13 75 B
3 14 75 B
4 16 78 B
5 15 55 C
6 19 75 B
7 20 72 B
8 24 92 A
9 25 95 A
second dataframe is:
Name Marks Grade
Roll
11 Aditya 85 A
12 Chris 95 A
13 Sam 75 B
14 Joel 75 B
16 Tom 78 B
15 Samantha 55 C
20 Tina 72 B
24 Amy 92 A
25 Sinu 95 A
Merged dataframe is:
Roll Marks_left Grade_left Name Marks_right Grade_right
0 11 85 A Aditya 85.0 A
1 12 95 A Chris 95.0 A
2 13 75 B Sam 75.0 B
3 14 75 B Joel 75.0 B
4 16 78 B Tom 78.0 B
5 15 55 C Samantha 55.0 C
6 19 75 B NaN NaN NaN
7 20 72 B Tina 72.0 B
8 24 92 A Amy 92.0 A
9 25 95 A Sinu 95.0 A
In this example, the Marks
and Grade
column are present in both the input dataframes. Due to this we need to specify the suffixes for the column names.
In case you don’t specify the suffixes and the input dataframes have common column names, the program will run into error.
Join Dataframes Using a Column Name as Join Key
If the dataframe on which the join()
method is invoked doesn’t have the join key as the index column, you can specify the column name using the "on"
parameter. However, the dataframe passed as input argument needs to have the column being using as the join key as its index. You can observe this in the following example.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=df1.join(df2,on="Roll",how="outer")
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Roll Marks Grade
0 11 85 A
1 12 95 A
2 13 75 B
3 14 75 B
4 16 78 B
5 15 55 C
6 19 75 B
7 20 72 B
8 24 92 A
9 25 95 A
second dataframe is:
Class Name
Roll
11 1 Aditya
12 1 Chris
13 1 Sam
14 1 Joel
15 1 Tom
16 1 Samantha
17 1 Pablo
20 1 Tina
24 1 Amy
30 1 Justin
31 1 Karl
Merged dataframe is:
Roll Marks Grade Class Name
0.0 11 85.0 A 1.0 Aditya
1.0 12 95.0 A 1.0 Chris
2.0 13 75.0 B 1.0 Sam
3.0 14 75.0 B 1.0 Joel
4.0 16 78.0 B 1.0 Samantha
5.0 15 55.0 C 1.0 Tom
6.0 19 75.0 B NaN NaN
7.0 20 72.0 B 1.0 Tina
8.0 24 92.0 A 1.0 Amy
9.0 25 95.0 A NaN NaN
NaN 17 NaN NaN 1.0 Pablo
NaN 30 NaN NaN 1.0 Justin
NaN 31 NaN NaN 1.0 Karl
Join Multiple DataFrames Using the join() Method
We can also join multiple dataframes using the join()
method. For this, we need to invoke the join()
method on a dataframe and pass other dataframes in a list as input to the join()
method.
Here, we need to make sure that all the dataframes should have the join key as their index column. Additionally, there should not be even a single common column between any two dataframes apart from the index column. Following these two conditions, we can join multiple pandas dataframes using the join()
method as shown below.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=pd.read_csv("height_with_roll.csv",index_col="Roll")
print("Third dataframe is:")
print(df3)
df4=df1.join([df2,df3])
print("Merged dataframe is:")
print(df4)
Output:
First dataframe is:
Marks Grade
Roll
11 85 A
12 95 A
13 75 B
14 75 B
16 78 B
15 55 C
19 75 B
20 72 B
24 92 A
25 95 A
second dataframe is:
Class Name
Roll
11 1 Aditya
12 1 Chris
13 1 Sam
14 1 Joel
15 1 Tom
16 1 Samantha
20 1 Tina
24 1 Amy
Third dataframe is:
Height
Roll
11 170
12 165
13 155
14 180
16 140
15 162
19 175
20 163
24 154
25 161
Merged dataframe is:
Marks Grade Class Name Height
Roll
11 85 A 1.0 Aditya 170
12 95 A 1.0 Chris 165
13 75 B 1.0 Sam 155
14 75 B 1.0 Joel 180
16 78 B 1.0 Samantha 140
15 55 C 1.0 Tom 162
19 75 B NaN NaN 175
20 72 B 1.0 Tina 163
24 92 A 1.0 Amy 154
25 95 A NaN NaN 161
If the input dataframes have common column names, the program will run into ValueError
exception as shown in the following example.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=pd.read_csv("height_with_roll.csv",index_col="Roll")
print("Third dataframe is:")
print(df3)
df4=df1.join([df2,df3])
print("Merged dataframe is:")
print(df4)
Output:
ValueError: Indexes have overlapping values: Index(['Marks', 'Grade'], dtype='object')
Conclusion
In this article, we have discussed how to merge, concatenate, and join pandas dataframes in python. . To know more about python programming, you can read this article on how to create a pandas dataframe. You might also like this article on string manipulation in Python.
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.