Pandas dataframes are used to handle tabular data in python. Sometimes, we need to create new columns in the dataframe for analysis. This article discusses how to add a column to pandas dataframe in python.
- Add An Empty Column to a Pandas DataFrame
- Add Columns at The End of a DataFrame in Python
- Add Multiple Columns at the End of a Pandas DataFrame
- Add Columns to DataFrame Using the assign() Method in Python
- Add Columns at a Specific Index in a Pandas DataFrame
- Add a Column Based on Another Column in a Pandas DataFrame
- Conclusion
Add An Empty Column to a Pandas DataFrame
You might think that we can add an empty list to the pandas dataframe to add an empty column. However, this isn’t true. If you assign an empty list to a dataframe column, the program will run into an error. You can observe this in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Name"]=[]
print("The modified dataframe is:")
print(df)
Output:
ValueError: Length of values (0) does not match length of index (6)
In this example, you can observe that we have assigned an empty list to create the "Name"
column in the dataframe. As the list is empty, the program runs into a Python ValueError exception.
In contrast, we can add a scaler value like a string or number to the dataframe column. In this case, the value is broadcasted to all the rows of the dataframe. You can observe this in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Name"]="Aditya"
print("The modified dataframe is:")
print(df)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The modified dataframe is:
Roll Maths Physics Chemistry Name
0 1 100 80 90 Aditya
1 2 80 100 90 Aditya
2 3 90 80 70 Aditya
3 4 100 100 90 Aditya
4 5 90 90 80 Aditya
5 6 80 70 70 Aditya
In this example, I have assigned the value "Aditya"
to create the "Name"
column. Although it’s just a single value, it has been broadcasted to all the rows of the dataframe. We will use this property of a dataframe to add an empty column to the pandas dataframe.
To add an empty column to a pandas dataframe, we can use an empty string, the name of the new column, and the python indexing operator using the following syntax.
dataframe[column_name]=””
After executing the above statement, a new empty column will be added to the dataframe. You can observe this in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Name"]=""
print("The modified dataframe is:")
print(df)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The modified dataframe is:
Roll Maths Physics Chemistry Name
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
In the above example, we have assigned an empty string to the "Name"
column. The "Name"
column in the output dataframe might look empty to you, this isn’t correct. Each value in the "Name"
column is actually an empty string.
If you want the new column to have NaN values instead of the empty string, you can assign np.nan, pd.NA, or None value to the dataframe column as shown below.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Name"]=pd.NA
print("The modified dataframe is:")
print(df)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The modified dataframe is:
Roll Maths Physics Chemistry Name
0 1 100 80 90 <NA>
1 2 80 100 90 <NA>
2 3 90 80 70 <NA>
3 4 100 100 90 <NA>
4 5 90 90 80 <NA>
5 6 80 70 70 <NA>
In this example, we have passed the pd.NA
value instead of the empty string to the "Name"
column. You can observe this in the output.
Create An Empty Column Using a Series in the Pandas DataFrame
Instead of assigning a scaler value, you can create an empty series and assign it to a column of the data frame to add a new column. For this, we will create an empty series. Then, we will assign the series to the dataframe column as shown below.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Name"]=pd.Series()
print("The modified dataframe is:")
print(df)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The modified dataframe is:
Roll Maths Physics Chemistry Name
0 1 100 80 90 NaN
1 2 80 100 90 NaN
2 3 90 80 70 NaN
3 4 100 100 90 NaN
4 5 90 90 80 NaN
5 6 80 70 70 NaN
In this example, we have created a pandas series using the Series()
function. Then, we assigned it to the "Name"
column in the dataframe.
Add Columns at The End of a DataFrame in Python
We can add a new column to a dataframe using different approaches. Let us discuss all these approaches one by one.
Add Columns at The End of a DataFrame Using Direct Assignment
To assign a new column to a dataframe, you can assign a list of values to the dataframe using the following syntax.
dataframe[column_name]=list_of_values
After executing the above statement, the new column will be added to the dataframe. You can observe this in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
names=["Aditya","Joel", "Sam", "Chris", "Riya", "Anne"]
df["Name"]=names
print("The modified dataframe is:")
print(df)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The modified dataframe is:
Roll Maths Physics Chemistry Name
0 1 100 80 90 Aditya
1 2 80 100 90 Joel
2 3 90 80 70 Sam
3 4 100 100 90 Chris
4 5 90 90 80 Riya
5 6 80 70 70 Anne
In this example, we created the "Name"
column in the pandas dataframe by assigning a list of names to the dataframe.
In the above example, if the list of values doesn’t contain an equal number of values as the rows of the dataframe, the program will run into a python ValueError exception as shown below.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
names=["Aditya","Joel", "Sam", "Chris", "Riya"]
df["Name"]=names
print("The modified dataframe is:")
print(df)
Output:
ValueError: Length of values (5) does not match length of index (6)
In this example, the original dataframe there are six rows. However, the list we assigned to the “Name"
column has only five values. Due to this, the program runs into a ValueError exception.
Add Multiple Columns at the End of a Pandas DataFrame
To add multiple columns at the same time to the dataframe, you can use the list of column names as shown in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
names=["Aditya","Joel", "Sam", "Chris", "Riya", "Anne"]
heights=[180,170,164,177,167,175]
df["Name"],df["Height"]= [names,heights]
print("The modified dataframe is:")
print(df)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The modified dataframe is:
Roll Maths Physics Chemistry Name Height
0 1 100 80 90 Aditya 180
1 2 80 100 90 Joel 170
2 3 90 80 70 Sam 164
3 4 100 100 90 Chris 177
4 5 90 90 80 Riya 167
5 6 80 70 70 Anne 175
In the above example, we have used list unpacking to assign a list of lists to multiple columns in the pandas dataframe.
Instead of directly assigning the list to the dataframe, you can use the loc attribute of the dataframe to add a column to the dataframe as shown below.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
names=["Aditya","Joel", "Sam", "Chris", "Riya", "Anne"]
heights=[180,170,164,177,167,175]
df.loc[:,"Name"],df.loc[:,"Height"]= [names,heights]
print("The modified dataframe is:")
print(df)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The modified dataframe is:
Roll Maths Physics Chemistry Name Height
0 1 100 80 90 Aditya 180
1 2 80 100 90 Joel 170
2 3 90 80 70 Sam 164
3 4 100 100 90 Chris 177
4 5 90 90 80 Riya 167
5 6 80 70 70 Anne 175
Add Columns to DataFrame Using the assign() Method in Python
The pandas module provides us with the assign()
method to add a new column to a dataframe. The assign()
method, when invoked on a dataframe, takes the column name of the new column and the list containing values as the parameter and the associated input argument respectively. After execution, it returns the modified dataframe.
You can observe this in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
names=["Aditya","Joel", "Sam", "Chris", "Riya", "Anne"]
df=df.assign(Name=names)
print("The modified dataframe is:")
print(df)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The modified dataframe is:
Roll Maths Physics Chemistry Name
0 1 100 80 90 Aditya
1 2 80 100 90 Joel
2 3 90 80 70 Sam
3 4 100 100 90 Chris
4 5 90 90 80 Riya
5 6 80 70 70 Anne
In the above example, we have passed the list of names as the input to the Name
parameter in the assign()
method. Hence, the assign()
method adds a new column with the column name "Name"
to the dataframe and returns the modified dataframe.
You can also add multiple columns to the dataframe using the assign()
method. For this, we can add multiple column names associated list of values as parameters and arguments to the assign()
method respectively. After execution of the assign()
method, we will get the modified dataframe as shown in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
names=["Aditya","Joel", "Sam", "Chris", "Riya", "Anne"]
heights=[180,170,164,177,167,175]
df=df.assign(Name=names, Height=heights)
print("The modified dataframe is:")
print(df)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The modified dataframe is:
Roll Maths Physics Chemistry Name Height
0 1 100 80 90 Aditya 180
1 2 80 100 90 Joel 170
2 3 90 80 70 Sam 164
3 4 100 100 90 Chris 177
4 5 90 90 80 Riya 167
5 6 80 70 70 Anne 175
Add Columns at a Specific Index in a Pandas DataFrame
In all the above examples, the new column is added at the end of the dataframe. However, we can also add a column at a specified position in the dataframe. To add a column to a dataframe at a specific index, we can use the insert()
method. It has the following syntax.
dataframe.insert(index, column_name, column_values)
Here,
- The
index
parameter takes the index at which the new column has to be inserted as its input argument. - The
column_name
parameter takes the name of the column to be inserted into the dataframe. - The
column_values
parameter takes the list containing values in the new column.
After execution, the insert()
method inserts a new column into the dataframe at the specified position. Remember that the original dataframe is modified when we add a column into a data frame using the insert()
method.
To add a column at a specific index in a panda dataframe, we will invoke the insert()
method on the dataframe. Here, we will pass the index, column name, and values as the first, second, and third input arguments to the insert()
method respectively. After execution of the insert()
method, we will get the output dataframe.
You can observe this in the following example.
import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
names=["Aditya","Joel", "Sam", "Chris", "Riya", "Anne"]
df.insert(2, "Name", names)
print("The modified dataframe is:")
print(df)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The modified dataframe is:
Roll Maths Name Physics Chemistry
0 1 100 Aditya 80 90
1 2 80 Joel 100 90
2 3 90 Sam 80 70
3 4 100 Chris 100 90
4 5 90 Riya 90 80
5 6 80 Anne 70 70
Add a Column Based on Another Column in a Pandas DataFrame
Instead of a completely new column, we can also add a column based on an existing column in a dataframe. For this, we can use the apply()
method. The pandas apply method, when invoked on the column of a dataframe, takes a function as its input argument. It then executes the function with each value of the column as its input and creates a new series object using the function outputs.
We can then assign the series object returned by the apply()
method to add a new column based on another column in the pandas dataframe as shown below.
import pandas as pd
def grade_calculator(marks):
if marks>90:
return "A"
elif marks>80:
return "B"
else:
return "C"
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Physics Grade"]=df["Physics"].apply(grade_calculator)
print("The modified dataframe is:")
print(df)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The modified dataframe is:
Roll Maths Physics Chemistry Physics Grade
0 1 100 80 90 C
1 2 80 100 90 A
2 3 90 80 70 C
3 4 100 100 90 A
4 5 90 90 80 B
5 6 80 70 70 C
Instead of using the apply()
method, we can also use the map()
method to add a new column in a dataframe based on an existing column as shown in the following example.
import pandas as pd
def grade_calculator(marks):
if marks>90:
return "A"
elif marks>80:
return "B"
else:
return "C"
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Physics Grade"]=df["Physics"].map(grade_calculator)
print("The modified dataframe is:")
print(df)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The modified dataframe is:
Roll Maths Physics Chemistry Physics Grade
0 1 100 80 90 C
1 2 80 100 90 A
2 3 90 80 70 C
3 4 100 100 90 A
4 5 90 90 80 B
5 6 80 70 70 C
Conclusion
In this article, we have discussed different methods to add a column to a pandas dataframe. To learn more about pandas dataframes, you can read this article on how to check for not null values in pandas. You might also like this article on how to select multiple columns in a pandas dataframe.
I hope you enjoyed reading this article. Stay tuned for more informative articles.
Happy Learning!
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.