Pandas dataframe is the primary data structure for handling tabular data in Python. In this article, we will discuss different ways to create a dataframe in Python using the pandas module.
Create an Empty Dataframe in Python
To create an empty dataframe, you can use the DataFrame()
function. When executed without any input arguments, the DataFrame()
function will return an empty dataframe without any column or row. You can observe this in the following example.
import pandas as pd
myDf=pd.DataFrame()
print(myDf)
Output:
Empty DataFrame
Columns: []
Index: []
To create an empty dataframe with specified column names, you can use the columns parameter in the DataFrame()
function. The columns
parameter takes a list as its input argument and assigns the list elements to the columns names of the dataframe as shown below.
import pandas as pd
myDf=pd.DataFrame(columns=["A", "B", "C"])
print(myDf)
Output:
Empty DataFrame
Columns: [A, B, C]
Index: []
Here, we have created a dataframe with columns A, B, and C without any data in the rows.
Create Pandas Dataframe From Dict
You can create a pandas dataframe from a python dictionary using the DataFrame()
function. For this, You first need to create a list of dictionaries. After that, you can pass the list of dictionaries to the DataFrame()
function. After execution, the DataFrame()
function will return a new dataframe as shown in the following example.
import pandas as pd
dict1={"A":1,"B":12,"C":14}
dict2={"A":13,"B":17,"C":12}
dict3={"A":2,"B":11,"C":14}
dictList=[dict1,dict2,dict3]
myDf=pd.DataFrame(dictList)
print(myDf)
Output:
A B C
0 1 12 14
1 13 17 12
2 2 11 14
While creating a dataframe from a list of dictionaries, the keys of the dictionaries are used as column names for the dataframe. If all the dictionaries do not contain the same keys, the rows corresponding to a dictionary will contain NaN
values in the columns that are not present in the dictionary as a key. You can observe this in the following example.
import pandas as pd
dict1={"A":1,"B":12,"C":14}
dict2={"A":13,"B":17,"C":12}
dict3={"A":2,"B":11,"C":14,"D":1117}
dictList=[dict1,dict2,dict3]
myDf=pd.DataFrame(dictList)
print(myDf)
Output:
A B C D
0 1 12 14 NaN
1 13 17 12 NaN
2 2 11 14 1117.0
In this example, the first and second rows correspond to the dictionaries that do not have D as its key. Due to this, these rows contain NaN
values in the column D.
Create Pandas Dataframe From Series in Python
A dataframe is made up of pandas series objects as its columns. You can also pass a list of series objects to the DataFrame()
function to create a dataframe as shown below.
series1 = pd.Series([1,2,3])
series2 = pd.Series([4,12,34])
series3 = pd.Series([22,33,44])
seriesList=[series1,series2,series3]
myDf=pd.DataFrame(seriesList)
print(myDf)
Output:
0 1 2
0 1 2 3
1 4 12 34
2 22 33 44
As you can observe, the key labels of the series objects are turned into columns of the dataframe. Hence, if the series objects given as input have different index labels, the column names of the resultant dataframe will be the union of index labels of all the series objects. Also, the rows in the dataframe corresponding to a series will contain NaN
values in the columns that are not present in the series as an index label. You can observe this in the following example.
series1 = pd.Series({"A":1,"B":12,"C":14})
series2 = pd.Series({"A":13,"B":17,"C":12})
series3 = pd.Series({"A":2,"B":11,"C":14,"D":1117})
seriesList=[series1,series2,series3]
myDf=pd.DataFrame(seriesList)
print(myDf)
Output:
A B C D
0 1.0 12.0 14.0 NaN
1 13.0 17.0 12.0 NaN
2 2.0 11.0 14.0 1117.0
Here, the first and second rows correspond to the Series that do not have D as its key. Due to this, these rows contain NaN
values in the column D.
List of Lists to Dataframe
You can also create a dataframe from a list of lists in Python. For this, you can pass the list of lists as an input argument to the DataFrame()
function as shown below.
import pandas as pd
list1=[1,2,3]
list2=[3,55,34]
list3=[12,32,45]
myList=[list1,list2,list3]
myDf=pd.DataFrame(myList)
print(myDf)
Output:
0 1 2
0 1 2 3
1 3 55 34
2 12 32 45
In the above example, you can observe that the column names and indices both have been assigned automatically. You can also observe that the length of rows in the dataframe is taken as the length of all the lists. If there are lists with unequal number of elements, rows with lesser elements are filled with NaN
values as shown below.
import pandas as pd
list1=[1,2,3,4,55]
list2=[3,55,34]
list3=[12,32,45,32]
myList=[list1,list2,list3]
myDf=pd.DataFrame(myList)
print(myDf)
Output:
0 1 2 3 4
0 1 2 3 4.0 55.0
1 3 55 34 NaN NaN
2 12 32 45 32.0 NaN
Here, the number of columns in the dataframe is equal to the maximum length of the input lists. The rows corresponding to the shorter lists contain NaN
values in the rightmost columns.
If you have a list of lists with equal lengths, you can also use the columns
parameter of the DataFrame()
function to give column names to the dataframe. For this, you can pass a list of column names to the columns parameter as shown in the following example.
import pandas as pd
list1=[1,2,3]
list2=[3,55,34]
list3=[12,32,45]
myList=[list1,list2,list3]
myDf=pd.DataFrame(myList,columns=["A", "B", "C"])
print(myDf)
Output:
A B C
0 1 2 3
1 3 55 34
2 12 32 45
In the above example, make sure that the number of columns given in the “columns
” parameters should be greater than the length of the largest input list. Otherwise, the program will run into an error. You can observe this in the following example.
import pandas as pd
list1=[1,2,3]
list2=[3,55,34]
list3=[12,32,45]
myList=[list1,list2,list3]
myDf=pd.DataFrame(myList,columns=["A", "B", "C", "D"])
print(myDf)
Output:
ValueError: 4 columns passed, passed data had 3 columns
In the above example, the maximum length of input lists is 3. However, we have passed four values to the columns parameter. Due to this, the program runs into ValueError exception.
Create Dataframe From CSV File in Python
To create a pandas dataframe from a csv file, you can use the read_csv()
function. The read_csv()
function takes the filename of the csv file as its input argument. After execution, it returns a pandas dataframe as shown below.
myDf=pd.read_csv("samplefile.csv")
print(myDf)
Output:
Class Roll Name
0 1 11 Aditya
1 1 12 Chris
2 1 13 Sam
3 2 1 Joel
4 2 22 Tom
5 2 44 Samantha
6 3 33 Tina
7 3 34 Amy
Create a Pandas Dataframe With Indices
By default, the rows of a pandas dataframe are indexed using whole numbers starting with 0. However, we can create custom indices for the rows in the dataframe. For this, we need to pass a list of index names to the index parameter of the DataFrame()
function as shown below.
import pandas as pd
list1=[1,2,3]
list2=[3,55,34]
list3=[12,32,45]
myList=[list1,list2,list3]
myDf=pd.DataFrame(myList,columns=["A", "B", "C"],index=["a","b","c"])
print(myDf)
Output:
A B C
a 1 2 3
b 3 55 34
c 12 32 45
In this example, we have passed the list [a, b, c]
to the index parameter of the DataFrame()
function. After execution, the values a, b, and c are assigned to the rows as their index in the dataframe.
You can also create the index for rows in a dataframe after creating the dataframe. For this, you can assign the list of indices to the index attribute of the dataframe as shown in the following example.
import pandas as pd
list1=[1,2,3]
list2=[3,55,34]
list3=[12,32,45]
myList=[list1,list2,list3]
myDf=pd.DataFrame(myList,columns=["A", "B", "C"])
myDf.index=["a","b","c"]
print(myDf)
Output:
A B C
a 1 2 3
b 3 55 34
c 12 32 45
In this example, instead of passing the list of indices to the index parameter, we have assigned it to the index attribute of the dataframe after creating the dataframe.
If you are creating a dataframe from a csv file, you can assign one of the columns as the index using the index_col
parameter as shown below.
myDf=pd.read_csv("samplefile.csv",index_col="Class")
print(myDf)
Output:
Roll Name
Class
1 11 Aditya
1 12 Chris
1 13 Sam
2 1 Joel
2 22 Tom
2 44 Samantha
3 33 Tina
3 34 Amy
You can also create a multilevel index. For this, you need to pass a list of column names to the index_col
parameter.
myDf=pd.read_csv("samplefile.csv",index_col=["Class","Roll"])
print(myDf)
Output:
Name
Class Roll
1 11 Aditya
12 Chris
13 Sam
2 1 Joel
22 Tom
44 Samantha
3 33 Tina
34 Amy
Transpose a Pandas Dataframe in Python
We can also create a pandas dataframe by transposing another dataframe. For this, you can use the T operator. The T operator, when invoked on a dataframe, returns the transpose of the original pandas dataframe as shown below.
import pandas as pd
list1=[1,2,3]
list2=[3,55,34]
list3=[12,32,45]
myList=[list1,list2,list3]
myDf=pd.DataFrame(myList,columns=["A", "B", "C"])
myDf.index=["a","b","c"]
newDf=myDf.T
print(newDf)
Output:
a b c
A 1 3 12
B 2 55 32
C 3 34 45
In the output, you can observe that the rows of the original dataframe become the columns of the new dataframe. Subsequently, the columns of the original dataframe become the indices of the new dataframe and vice versa.
Copy a Dataframe in Python
You can also create a pandas dataframe by copying an existing dataframe. For this, you can use the copy()
method. The copy()
method, when invoked on a dataframe, returns a new dataframe having the same data as the original data as shown below.
import pandas as pd
list1=[1,2,3]
list2=[3,55,34]
list3=[12,32,45]
myList=[list1,list2,list3]
myDf=pd.DataFrame(myList,columns=["A", "B", "C"])
myDf.index=["a","b","c"]
newDf=myDf.copy()
print(newDf)
Output:
A B C
a 1 2 3
b 3 55 34
c 12 32 45
Conclusion
In this article, we have discussed different ways to create a pandas dataframe in Python. To learn more about python programming, you can read this article on how to create a chat app in Python. You might also like this article on linear regression using sklearn module in Python.
Stay tuned for more informative articles.
Happy Learning!
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.