Add Column to Pandas DataFrame in Python

Pandas dataframes are used to handle tabular data in python. Sometimes, we need to create new columns in the dataframe for analysis. This article discusses how to add a column to pandas dataframe in python.

Table of Contents

Add An Empty Column to a Pandas DataFrame
1. Create An Empty Column Using a Series in the Pandas DataFrame
Add Columns at The End of a DataFrame in Python
1. Add Columns at The End of a DataFrame Using Direct Assignment
Add Multiple Columns at the End of a Pandas DataFrame
Add Columns to DataFrame Using the assign() Method in Python
Add Columns at a Specific Index in a Pandas DataFrame
Add a Column Based on Another Column in a Pandas DataFrame
Conclusion

Add An Empty Column to a Pandas DataFrame

You might think that we can add an empty list to the pandas dataframe to add an empty column. However, this isn’t true. If you assign an empty list to a dataframe column, the program will run into an error. You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Name"]=[]
print("The modified dataframe is:")
print(df)

Output:

ValueError: Length of values (0) does not match length of index (6)

In this example, you can observe that we have assigned an empty list to create the "Name" column in the dataframe. As the list is empty, the program runs into a Python ValueError exception.

In contrast, we can add a scaler value like a string or number to the dataframe column. In this case, the value is broadcasted to all the rows of the dataframe. You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Name"]="Aditya"
print("The modified dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The modified dataframe is:
   Roll  Maths  Physics  Chemistry    Name
0     1    100       80         90  Aditya
1     2     80      100         90  Aditya
2     3     90       80         70  Aditya
3     4    100      100         90  Aditya
4     5     90       90         80  Aditya
5     6     80       70         70  Aditya

In this example, I have assigned the value "Aditya" to create the "Name" column. Although it’s just a single value, it has been broadcasted to all the rows of the dataframe. We will use this property of a dataframe to add an empty column to the pandas dataframe.

To add an empty column to a pandas dataframe, we can use an empty string, the name of the new column, and the python indexing operator using the following syntax.

dataframe[column_name]=””

After executing the above statement, a new empty column will be added to the dataframe. You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Name"]=""
print("The modified dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The modified dataframe is:
   Roll  Maths  Physics  Chemistry Name
0     1    100       80         90     
1     2     80      100         90     
2     3     90       80         70     
3     4    100      100         90     
4     5     90       90         80     
5     6     80       70         70

In the above example, we have assigned an empty string to the "Name" column. The "Name" column in the output dataframe might look empty to you, this isn’t correct. Each value in the "Name" column is actually an empty string.

If you want the new column to have NaN values instead of the empty string, you can assign np.nan, pd.NA, or None value to the dataframe column as shown below.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Name"]=pd.NA
print("The modified dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The modified dataframe is:
   Roll  Maths  Physics  Chemistry  Name
0     1    100       80         90  <NA>
1     2     80      100         90  <NA>
2     3     90       80         70  <NA>
3     4    100      100         90  <NA>
4     5     90       90         80  <NA>
5     6     80       70         70  <NA>

In this example, we have passed the pd.NA value instead of the empty string to the "Name" column. You can observe this in the output.

Create An Empty Column Using a Series in the Pandas DataFrame

Instead of assigning a scaler value, you can create an empty series and assign it to a column of the data frame to add a new column. For this, we will create an empty series. Then, we will assign the series to the dataframe column as shown below.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Name"]=pd.Series()
print("The modified dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The modified dataframe is:
   Roll  Maths  Physics  Chemistry  Name
0     1    100       80         90   NaN
1     2     80      100         90   NaN
2     3     90       80         70   NaN
3     4    100      100         90   NaN
4     5     90       90         80   NaN
5     6     80       70         70   NaN

In this example, we have created a pandas series using the Series() function. Then, we assigned it to the "Name" column in the dataframe.

Add Columns at The End of a DataFrame in Python

We can add a new column to a dataframe using different approaches. Let us discuss all these approaches one by one.

Add Columns at The End of a DataFrame Using Direct Assignment

To assign a new column to a dataframe, you can assign a list of values to the dataframe using the following syntax.

dataframe[column_name]=list_of_values

After executing the above statement, the new column will be added to the dataframe. You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
names=["Aditya","Joel", "Sam", "Chris", "Riya", "Anne"]
df["Name"]=names
print("The modified dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The modified dataframe is:
   Roll  Maths  Physics  Chemistry    Name
0     1    100       80         90  Aditya
1     2     80      100         90    Joel
2     3     90       80         70     Sam
3     4    100      100         90   Chris
4     5     90       90         80    Riya
5     6     80       70         70    Anne

In this example, we created the "Name" column in the pandas dataframe by assigning a list of names to the dataframe.

In the above example, if the list of values doesn’t contain an equal number of values as the rows of the dataframe, the program will run into a python ValueError exception as shown below.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
names=["Aditya","Joel", "Sam", "Chris", "Riya"]
df["Name"]=names
print("The modified dataframe is:")
print(df)

Output:

ValueError: Length of values (5) does not match length of index (6)

In this example, the original dataframe there are six rows. However, the list we assigned to the “Name" column has only five values. Due to this, the program runs into a ValueError exception.

Add Multiple Columns at the End of a Pandas DataFrame

To add multiple columns at the same time to the dataframe, you can use the list of column names as shown in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
names=["Aditya","Joel", "Sam", "Chris", "Riya", "Anne"]
heights=[180,170,164,177,167,175]
df["Name"],df["Height"]= [names,heights]
print("The modified dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The modified dataframe is:
   Roll  Maths  Physics  Chemistry    Name  Height
0     1    100       80         90  Aditya     180
1     2     80      100         90    Joel     170
2     3     90       80         70     Sam     164
3     4    100      100         90   Chris     177
4     5     90       90         80    Riya     167
5     6     80       70         70    Anne     175

In the above example, we have used list unpacking to assign a list of lists to multiple columns in the pandas dataframe.

Instead of directly assigning the list to the dataframe, you can use the loc attribute of the dataframe to add a column to the dataframe as shown below.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
names=["Aditya","Joel", "Sam", "Chris", "Riya", "Anne"]
heights=[180,170,164,177,167,175]
df.loc[:,"Name"],df.loc[:,"Height"]= [names,heights]
print("The modified dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The modified dataframe is:
   Roll  Maths  Physics  Chemistry    Name  Height
0     1    100       80         90  Aditya     180
1     2     80      100         90    Joel     170
2     3     90       80         70     Sam     164
3     4    100      100         90   Chris     177
4     5     90       90         80    Riya     167
5     6     80       70         70    Anne     175

Add Columns to DataFrame Using the assign() Method in Python

The pandas module provides us with the assign() method to add a new column to a dataframe. The assign() method, when invoked on a dataframe, takes the column name of the new column and the list containing values as the parameter and the associated input argument respectively. After execution, it returns the modified dataframe.

You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
names=["Aditya","Joel", "Sam", "Chris", "Riya", "Anne"]
df=df.assign(Name=names)
print("The modified dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The modified dataframe is:
   Roll  Maths  Physics  Chemistry    Name
0     1    100       80         90  Aditya
1     2     80      100         90    Joel
2     3     90       80         70     Sam
3     4    100      100         90   Chris
4     5     90       90         80    Riya
5     6     80       70         70    Anne

In the above example, we have passed the list of names as the input to the Name parameter in the assign() method. Hence, the assign() method adds a new column with the column name "Name" to the dataframe and returns the modified dataframe.

You can also add multiple columns to the dataframe using the assign() method. For this, we can add multiple column names associated list of values as parameters and arguments to the assign() method respectively. After execution of the assign() method, we will get the modified dataframe as shown in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
names=["Aditya","Joel", "Sam", "Chris", "Riya", "Anne"]
heights=[180,170,164,177,167,175]
df=df.assign(Name=names, Height=heights)
print("The modified dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The modified dataframe is:
   Roll  Maths  Physics  Chemistry    Name  Height
0     1    100       80         90  Aditya     180
1     2     80      100         90    Joel     170
2     3     90       80         70     Sam     164
3     4    100      100         90   Chris     177
4     5     90       90         80    Riya     167
5     6     80       70         70    Anne     175

Add Columns at a Specific Index in a Pandas DataFrame

In all the above examples, the new column is added at the end of the dataframe. However, we can also add a column at a specified position in the dataframe. To add a column to a dataframe at a specific index, we can use the insert() method. It has the following syntax.

dataframe.insert(index, column_name, column_values)

Here,

The index parameter takes the index at which the new column has to be inserted as its input argument.
The column_name parameter takes the name of the column to be inserted into the dataframe.
The column_values parameter takes the list containing values in the new column.

After execution, the insert() method inserts a new column into the dataframe at the specified position. Remember that the original dataframe is modified when we add a column into a data frame using the insert() method.

To add a column at a specific index in a panda dataframe, we will invoke the insert() method on the dataframe. Here, we will pass the index, column name, and values as the first, second, and third input arguments to the insert() method respectively. After execution of the insert() method, we will get the output dataframe.

You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
names=["Aditya","Joel", "Sam", "Chris", "Riya", "Anne"]
df.insert(2, "Name", names)
print("The modified dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The modified dataframe is:
   Roll  Maths    Name  Physics  Chemistry
0     1    100  Aditya       80         90
1     2     80    Joel      100         90
2     3     90     Sam       80         70
3     4    100   Chris      100         90
4     5     90    Riya       90         80
5     6     80    Anne       70         70

Add a Column Based on Another Column in a Pandas DataFrame

Instead of a completely new column, we can also add a column based on an existing column in a dataframe. For this, we can use the apply() method. The pandas apply method, when invoked on the column of a dataframe, takes a function as its input argument. It then executes the function with each value of the column as its input and creates a new series object using the function outputs.

We can then assign the series object returned by the apply() method to add a new column based on another column in the pandas dataframe as shown below.

import pandas as pd
def grade_calculator(marks):
    if marks>90:
        return "A"
    elif marks>80:
        return "B"
    else:
        return "C"
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Physics Grade"]=df["Physics"].apply(grade_calculator)
print("The modified dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The modified dataframe is:
   Roll  Maths  Physics  Chemistry Physics Grade
0     1    100       80         90             C
1     2     80      100         90             A
2     3     90       80         70             C
3     4    100      100         90             A
4     5     90       90         80             B
5     6     80       70         70             C

Instead of using the apply() method, we can also use the map() method to add a new column in a dataframe based on an existing column as shown in the following example.

import pandas as pd
def grade_calculator(marks):
    if marks>90:
        return "A"
    elif marks>80:
        return "B"
    else:
        return "C"
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Physics Grade"]=df["Physics"].map(grade_calculator)
print("The modified dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The modified dataframe is:
   Roll  Maths  Physics  Chemistry Physics Grade
0     1    100       80         90             C
1     2     80      100         90             A
2     3     90       80         70             C
3     4    100      100         90             A
4     5     90       90         80             B
5     6     80       70         70             C

Conclusion

In this article, we have discussed different methods to add a column to a pandas dataframe. To learn more about pandas dataframes, you can read this article on how to check for not null values in pandas. You might also like this article on how to select multiple columns in a pandas dataframe.

I hope you enjoyed reading this article. Stay tuned for more informative articles.

Happy Learning!

Recommended Python Training

Course: Python 3 For Beginners

Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.

Enroll Now

Add An Empty Column to a Pandas DataFrame

Create An Empty Column Using a Series in the Pandas DataFrame

Add Columns at The End of a DataFrame in Python

Add Columns at The End of a DataFrame Using Direct Assignment

Add Multiple Columns at the End of a Pandas DataFrame

Add Columns to DataFrame Using the assign() Method in Python

Add Columns at a Specific Index in a Pandas DataFrame

Add a Column Based on Another Column in a Pandas DataFrame

Conclusion

Related

Recommended Python Training

More Python Topics