Pandas dataframes provide us with various methods to perform data manipulation. Two of those methods are the map()
method and the apply()
method. This article discusses pandas map vs apply to compare both methods.
The map() Method
The pandas map method is used to execute a function on a pandas series or a column in a dataframe. When invoked on a series, the map()
method takes a function, another series, or a Python dictionary as its input argument.
- If we pass a function as input to the
map()
method, the function is executed with all the elements of the series, and a new series is created with the output. - When we pass a dictionary to the
map()
method, the keys of the dictionary should be the present element of the series and the values of the dictionary should be the desired values. After execution of themap()
method, the elements of the series are mapped to new elements according to the dictionary, and a new series is created. - If we pass another series to the
map()
method, the indices of the series should be the present values of the series and the elements of the input series should be the desired values. After execution of themap()
method, the elements of the series are mapped to new elements according to the input series, and a new series is created.
You can observe this in the following example.
import pandas as pd
import numpy as np
series=pd.Series([1,2,3,4,5,6,7])
print("The series is:")
print(series)
series=series.map(np.sqrt)
print("The modified series is:")
print(series)
Output:
The series is:
0 1
1 2
2 3
3 4
4 5
5 6
6 7
dtype: int64
The modified series is:
0 1.000000
1 1.414214
2 1.732051
3 2.000000
4 2.236068
5 2.449490
6 2.645751
dtype: float64
In the above example, we first created a series using the Series()
function. Then, we passed the numpy.sqrt function to the map()
method. You can observe that the function is applied to each element of the input series object and then the output series is created.
In the map()
method, you cannot use aggregate functions as the function is applied to each element of the series. If we pass an aggregate function such as sum()
to the map()
method, the program will run into an error. You can observe this in the following example.
import pandas as pd
series=pd.Series([1,2,3,4,5,6,7])
print("The series is:")
print(series)
series=series.map(sum)
print("The modified series is:")
print(series)
Output:
The series is:
0 1
1 2
2 3
3 4
4 5
5 6
6 7
dtype: int64
TypeError: 'int' object is not iterable
In this example, we passed the sum()
function to the map()
method. You can observe that the program runs into a Python TypeError exception saying that the element of the series is not iterable.
The apply() Method
We use the pandas apply method to apply functions on a series or a dataframe. The apply()
method, when invoked on a series, takes a function as its input.
- If the input function takes a single value as input and provides a single value as output as in the square root function, the function is executed on each value in the series or dataframe. Here, the function must support broadcasting so that it can be executed on the elements of the series and dataframe
- If the function is an aggregate function such as the sum function, the function is executed with the entire row or column as the input.
You can observe the above behavior in the following code.
import pandas as pd
import numpy as np
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The output dataframe is:")
df=df.apply(np.sqrt)
print(df)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The output dataframe is:
Roll Maths Physics Chemistry
0 1.000000 10.000000 8.944272 9.486833
1 1.414214 8.944272 10.000000 9.486833
2 1.732051 9.486833 8.944272 8.366600
3 2.000000 10.000000 10.000000 9.486833
4 2.236068 9.486833 9.486833 8.944272
5 2.449490 8.944272 8.366600 8.366600
In this example, we passed the numpy.sqrt()
function to the apply()
method. You can observe that the function is applied to all the elements of the dataframe to produce the output. This is due to the reason that the sqrt()
function supports broadcasting. If a function such as a user-defined function doesn’t support broadcasting, the program will run into an error.
When we pass an aggregate function to the apply()
method as its input, it works on the columns of a dataframe as shown below.
import pandas as pd
import numpy as np
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
{"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
{"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
{"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
{"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
{"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The output dataframe is:")
df=df.apply(sum)
print(df)
Output:
The input dataframe is:
Roll Maths Physics Chemistry
0 1 100 80 90
1 2 80 100 90
2 3 90 80 70
3 4 100 100 90
4 5 90 90 80
5 6 80 70 70
The output dataframe is:
Roll 21
Maths 540
Physics 520
Chemistry 490
dtype: int64
In this example, we passed the sum()
function to the apply()
method. You can observe that the output dataframe contains the sum of values in all the columns in the input dataframe.
Pandas Map vs Apply in Python
Although you might think that the pandas map and apply function work in the same way, they are entirely different. Following are some of the differences between the pandas map vs apply method.
The map() method | The apply() method |
The map() method is defined only for Series objects. | The apply() method is defined for Series as well as Dataframes. |
It works with a function, series, or dictionary as its input argument. | It works with only a function as its input argument. |
The map() method operates the functions on one element at a time. | The apply() method operates elementwise in a dataframe with only those functions that support broadcasting. For a series, it operates elementwise. |
If you pass an aggregate function as input, the map() method will throw an error saying that the elements of the series are not iterable. | Aggregate functions work on a column or row as a whole to produce the output when used with the apply() method on a dataframe. Aggregate functions don’t work with Series objects. |
Conclusion
In this article, we discussed the differences between the pandas apply vs map method in Python. To learn more about Python programming, you can read this article on tuple index out of range error in Python. You might also like this article on string manipulation in Python.
I hope you enjoyed reading this article. Stay tuned for more informative articles.
Happy learning!
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.