CSV files are the most popular way to store tabular data in the file system. Sometimes the csv file can contain multiple columns that we don’t need for analysis. In this article, we will discuss how we can read specific columns from a csv file in python.
Read Specific Columns From CSV File Using Pandas Dataframe
To read a csv file in python, we use the read_csv()
method provided in the pandas module. The read_csv()
method takes the name of the csv file as its input argument. After execution, the read_csv()
method returns the dataframe containing the data of the csv file. You can observe this in the following example.
import pandas as pd
import numpy as np
df=pd.read_csv("demo_file.csv")
print("The dataframe is:")
print(df)
Output:
The dataframe is:
Name Roll Language
0 Aditya 1 Python
1 Sam 2 Java
2 Chris 3 C++
3 Joel 4 TypeScript
As you can see, the read_csv()
method returns the dataframe with all the columns of the csv file. To read a specific column from the dataframe, we can use the column name as an index as we do to obtain an element from a list. For this, we can simply pass the column name to the square bracket after the dataframe as shown in the example.
import pandas as pd
import numpy as np
df=pd.read_csv("demo_file.csv")
print("The dataframe is:")
print(df)
specific_column=df["Name"]
print("The column is:")
print(specific_column)
Output:
The dataframe is:
Name Roll Language
0 Aditya 1 Python
1 Sam 2 Java
2 Chris 3 C++
3 Joel 4 TypeScript
The column is:
0 Aditya
1 Sam
2 Chris
3 Joel
Name: Name, dtype: object
To read multiple columns from the dataframe, we can pass a list of column names in the square brackets as shown below.
import pandas as pd
import numpy as np
df=pd.read_csv("demo_file.csv")
print("The dataframe is:")
print(df)
specific_columns=df[["Name","Roll"]]
print("The column are:")
print(specific_columns)
Output:
The dataframe is:
Name Roll Language
0 Aditya 1 Python
1 Sam 2 Java
2 Chris 3 C++
3 Joel 4 TypeScript
The column are:
Name Roll
0 Aditya 1
1 Sam 2
2 Chris 3
3 Joel 4
Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on k-means clustering with numerical example.
Read Specific Columns From CSV File Using The ‘usecols’ Parameter
Reading the entire dataframe and extracting columns from the dataframe doesn’t serve our purpose. In the above approach, we are also reading the unwanted columns into the dataframe. After that, we are reading the specific columns. We can avoid this entire process by using the ‘usecols
’ parameter in the read_csv()
method. To read specific columns from the csv file, we will pass the list of columns to be read as an input argument to the ‘usecols
’ parameter. After execution, the read_csv()
method returns the dataframe with specific columns as shown in the following example.
import pandas as pd
import numpy as np
df=pd.read_csv("demo_file.csv",usecols=["Name"])
print("The dataframe is:")
print(df)
Output:
The dataframe is:
Name
0 Aditya
1 Sam
2 Chris
3 Joel
As you can see above, we have read only the specific columns from the csv file using the ‘usecols
’ parameter.
Conclusion
In this article, we have discussed how to read specific columns from a csv file. To know more about python programming, you can read this article on dictionary comprehension in python. You might also like this article on list comprehension on python.
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.