We use dataframes in python to handle and analyze tabular data in python. In this article, we will discuss how we can concatenate two or more dataframes in python.
How to Concatenate DataFrames in Python?
To concatenate two or more dataframes in python, we can use the concat()
method defined in the pandas module. The concat()
method takes a list of dataframes as its input arguments and concatenates them vertically.
We can also concatenate the dataframes in python horizontally using the axis parameter of the concat()
method. The axis parameter has a default value of 0, which denotes that the dataframes will be concatenated vertically. If you want to concatenate the dataframes horizontally, you can pass the value 1 to the axis parameter.
After execution, the concat()
method will return the resultant dataframe.
Concatenate Dataframes Vertically in python
To concatenate two dataframes vertically in python, you need to first import the pandas module using the import statement. After that, you can concatenate the dataframes using the concat()
method as follows.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade2.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2])
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
second dataframe is:
Class Roll Name Marks Grade
0 2 27 Harsh 55 C
1 2 23 Clara 78 B
2 3 33 Tina 82 A
3 3 34 Amy 88 A
4 3 15 Prashant 78 B
5 3 27 Aditya 55 C
6 3 23 Radheshyam 78 B
7 3 11 Bobby 50 D
Merged dataframe is:
Class Roll Name Marks Grade
0 1 11 Aditya 85 A
1 1 12 Chris 95 A
2 1 14 Sam 75 B
3 1 16 Aditya 78 B
4 1 15 Harry 55 C
5 2 1 Joel 68 B
6 2 22 Tom 73 B
7 2 15 Golu 79 B
0 2 27 Harsh 55 C
1 2 23 Clara 78 B
2 3 33 Tina 82 A
3 3 34 Amy 88 A
4 3 15 Prashant 78 B
5 3 27 Aditya 55 C
6 3 23 Radheshyam 78 B
7 3 11 Bobby 50 D
If all the dataframes have the same number of columns and the column names are also the same, the resultant dataframe has the same number of columns as the input dataframes. You can observe this in the example above.
However, if a dataframe has less number of columns than the other dataframes, the corresponding value in the resultant dataframe for that column will be NaN for the rows obtained from the dataframe. You can observe this in the following example.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2])
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Roll Marks Grade
0 11 85 A
1 12 95 A
2 13 75 B
3 14 75 B
4 16 78 B
5 15 55 C
6 20 72 B
7 24 92 A
second dataframe is:
Roll Name Marks Grade
0 11 Aditya 85 A
1 12 Chris 95 A
2 13 Sam 75 B
3 14 Joel 75 B
4 16 Tom 78 B
5 15 Samantha 55 C
6 20 Tina 72 B
7 24 Amy 92 A
Merged dataframe is:
Roll Marks Grade Name
0 11 85 A NaN
1 12 95 A NaN
2 13 75 B NaN
3 14 75 B NaN
4 16 78 B NaN
5 15 55 C NaN
6 20 72 B NaN
7 24 92 A NaN
0 11 85 A Aditya
1 12 95 A Chris
2 13 75 B Sam
3 14 75 B Joel
4 16 78 B Tom
5 15 55 C Samantha
6 20 72 B Tina
7 24 92 A Amy
If the dataframes have different column names, each column name is assigned a separate column in the resultant dataframe. Also, the corresponding value in the resultant dataframe for that column will be NaN for the rows obtained dataframes that do not have the specified column.
Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on k-means clustering with numerical example.
Concatenate DataFrames Horizontally in Python
To concatenate dataframes horizontally, we will use the axis parameter and give the value 1 as its input in the concat()
method. After execution, the concat()
method will return the horizontally concatenated dataframe as shown below.
import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2],axis=1)
print("Merged dataframe is:")
print(df3)
Output:
First dataframe is:
Roll Marks Grade
0 11 85 A
1 12 95 A
2 13 75 B
3 14 75 B
4 16 78 B
5 15 55 C
6 20 72 B
7 24 92 A
second dataframe is:
Roll Name Marks Grade
0 11 Aditya 85 A
1 12 Chris 95 A
2 13 Sam 75 B
3 14 Joel 75 B
4 16 Tom 78 B
5 15 Samantha 55 C
6 20 Tina 72 B
7 24 Amy 92 A
Merged dataframe is:
Roll Marks Grade Roll Name Marks Grade
0 11 85 A 11 Aditya 85 A
1 12 95 A 12 Chris 95 A
2 13 75 B 13 Sam 75 B
3 14 75 B 14 Joel 75 B
4 16 78 B 16 Tom 78 B
5 15 55 C 15 Samantha 55 C
6 20 72 B 20 Tina 72 B
7 24 92 A 24 Amy 92 A
If the dataframes that are being concatenated have the same number of records, the resultant dataframe will not have any NaN values as shown in the above example. However, if a dataframe has a lesser number of rows than the other dataframe, the resultant dataframe will have NaN values. This occurs when the join parameter is set to “outer”.
Conclusion
In this article, we have discussed how to concatenate two pandas dataframe in python. To concatenate more than two dataframes, you just need to add the dataframe to the list of dataframes that is given as input to the concat()
method.
To learn more about python programming, you can read this article on dictionary comprehension in python. You might also like this article on list comprehension in python.
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.