This tutorial examines the various methods of how to compare two files in Python. We’ll cover reading two files and comparing them line by line, as well as using available modules to complete this common task.
There are many ways of comparing two files in Python. Python comes with modules for this very purpose, including the filecmp and difflib modules.
The following Python 3 examples contrast the various methods of determining whether or not two files contain the same data. We’ll use functions and modules that come built-in with Python 3, so there’s no need to download additional packages.
Compare Two Text Files Line by Line
We can compare two text files using the open() function to read the data contained in the files. The open() function will look for a file in the local directory and attempt to read it.
For this example, we’ll compare two files that contain email data. These two lists of emails, we’re told, may not be identical. We’ll let Python check the files for us. Using the readlines() method, it’s possible to extract the lines from the text file.
emails_A.txt
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
emails_B.txt
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Once the data is extracted, a for loop is used to compare the files line by line. If the lines don’t match, the user receives a message telling them where the mismatch occurred. We’ll include the data itself so the user can easily track down the different lines.
Example: Using Python to compare email lists
file1 = open("emails_A.txt",'r')
file2 = open("emails_B.txt",'r')
file1_lines = file1.readlines()
file2_lines = file2.readlines()
for i in range(len(file1_lines)):
if file1_lines[i] != file2_lines[i]:
print("Line " + str(i+1) + " doesn't match.")
print("------------------------")
print("File1: " + file1_lines[i])
print("File2: " + file2_lines[i])
file1.close()
file2.close()
Output
Line 1 doesn't match.
------------------------
File1: [email protected]
File2: [email protected]
Line 3 doesn't match.
------------------------
File1: [email protected]
File2: [email protected]
Line 4 doesn't match.
------------------------
File1: [email protected]
File2: [email protected]
Using the filecmp Module to Compare Files
The filecmp module includes functions for working with files in Python. Specifically, this module is used to compare data between two or more files. We can do this using the filecmp.cmp() method. This method will return True if the files match, or False if they don’t.
This example uses three files. The first and third are identical, while the second is slightly different. We’ll use the filecmp.cmp() method to compare the files using Python.
punctuation1.txt
Eat your dinner.
I’d like to thank my parents, Janet and God.
I’m sorry I care about you.
She’s really into cooking, her family, and her cats.
punctuation2.txt
Eat. You’re dinner!
I’d like to thank my parents, Janet, and God.
I’m sorry. I care about you.
She’s really into cooking her family and her cats.
punctuation3.txt
Eat your dinner.
I’d like to thank my parents, Janet and God.
I’m sorry I care about you.
She’s really into cooking, her family, and her cats.
Before we can use the filecmp module, we’ll need to import it. We also need to import the os module, which will allow us to load a file using the path in the directory. For this example, a custom function was used to complete the comparison.
After we compare the files, we can see if the data matches, Finally, we’ll alert the user to the outcome.
Example: Compare two files with the filecmp.cmp()
import filecmp
import os
# notice the two backslashes
file1 = "C:\\Users\jpett\\Desktop\\PythonForBeginners\\2Files\\punctuation1.txt"
file2 = "C:\\Users\jpett\\Desktop\\PythonForBeginners\\2Files\\punctuation2.txt"
file3 = "C:\\Users\jpett\\Desktop\\PythonForBeginners\\2Files\\punctuation3.txt"
def compare_files(file1,file2):
compare = filecmp.cmp(file1,file2)
if compare == True:
print("The files are the same.")
else:
print("The files are different.")
compare_files(file1,file2)
compare_files(file1,file3)
Output
The files are different.
The files are the same.
Compare Two Files Using the difflib Module
The difflib module is useful for comparing texts and finding the differences between them. This Python 3 module comes pre-packaged with the language. It contains many useful functions for comparing bodies of texts.
Firstly, we’ll use the unified_diff() function to pinpoint mismatches between two data files. These files contain the information for fictitious students, including their names and grade point averages.
Secondly, we’ll compare these student records and examine how the student’s grades change from the years 2019 through 2020. We can do this using the unified_diff() function. The following example makes use of the with statement to read the file data. By using the Python with statement, we can safely open and read files.
student_gpa_2019.txt
Chelsea Walker 3.3
Caroline Bennett 2.8
Garry Holmes 3.7
Rafael Rogers 3.6
Patrick Nelson 2.1
student_gpa_2020.txt
Chelsea Walker 3.6
Caroline Bennett 2.7
Garry Holmes 3.7
Rafael Rogers 3.7
Patrick Nelson 2.1
Example: Comparing Student GPA’s
import difflib
with open("student_gpa_2019.txt",'r') as file1:
file1_contents = file1.readlines()
with open("student_gpa_2020.txt",'r') as file2:
file2_contents = file2.readlines()
diff = difflib.unified_diff(
file1_contents, file2_contents, fromfile="file1.txt",
tofile="file2.txt", lineterm='')
for line in diff:
print(line)
Output
--- file1.txt
+++ file2.txt
@@ -1,5 +1,5 @@
-Chelsea Walker 3.3
-Caroline Bennett 2.8
+Chelsea Walker 3.6
+Caroline Bennett 2.7
Garry Holmes 3.7
-Rafael Rogers 3.6
+Rafael Rogers 3.7
Patrick Nelson 2.1
Looking at the output, we can see that the difflib module does much more than compare text files line by line. The unified_diff() function also provides some context about the differences found.
Compare Two .csv Files in Python Line by Line
Comma separated value files are used for exchanging data between programs. Python provides tools for working with these files as well. By using the csv module, we can quickly access the data within a csv file.
Using the csv module, we’ll compare two files of data and identify the lines that don’t match. These files contain employee records, including the first name, last name, and email of each employee. This data was generated randomly, but we’ll pretend our employee urgently needs us to complete the comparison.
employeesA.csv
“First Name”,”Last Name”,”Email”
“David”,”Crawford”,”[email protected]”
“Sarah”,”Payne”,”[email protected]”
“Robert”,”Cooper”,”[email protected]”
“Aida”,”Alexander”,”[email protected]”
“Valeria”,”Douglas”,”[email protected]”
employeesB.csv
“First Name”,”Last Name”,”Email”
“Andrew”,”Crawford”,”[email protected]”
“Sarah”,”Payne”,”[email protected]”
“Robert”,”Cooper”,”[email protected]”
“Agata”,”Anderson”,”[email protected]”
“Miley”,”Holmes”,”[email protected]”
Once we have the employee data, we can read it using the reader() function. Contained within the csv module, the reader() function can interpret csv data. With the data collected, we can use Python to convert the data to a list.
Finally, using a for loop, we’ll compare the elements of the two lists. Each element will hold a line from the employee data files. This way, we can iterate over the lists and discover which lines aren’t identical.
The Python program will compare the files line by line. As a result, we can identify all the differences between the employee data files.
Example: Using the csv module to compare employee data files
import csv
file1 = open("employeesA.csv",'r')
file2 = open("employeesB.csv",'r')
data_read1= csv.reader(file1)
data_read2 = csv.reader(file2)
# convert the data to a list
data1 = [data for data in data_read1]
data2 = [data for data in data_read2]
for i in range(len(data1)):
if data1[i] != data2[i]:
print("Line " + str(i) + " is a mismatch.")
print(f"{data1[i]} doesn't match {data2[i]}")
file1.close()
file2.close()
Output
Line 1 is a mismatch.
['David', 'Crawford', '[email protected]'] doesn't match ['Andrew', 'Crawford', '[email protected]']
Line 4 is a mismatch.
['Aida', 'Alexander', '[email protected]'] doesn't match ['Agata', 'Anderson', '[email protected]']
Line 5 is a mismatch.
['Valeria', 'Douglas', '[email protected]'] doesn't match ['Miley', 'Holmes', '[email protected]']
In Conclusion
Python provides many tools for comparing two text files, including csv files. In this post, we’ve discussed many of the functions and modules that come with Python 3. Moreover, we’ve seen how to use them to compare files line by line in Python.
By discovering new modules, we can write programs that make our lives easier. Many of the programs and web apps that we use on a daily basis are powered by Python.
Related Posts
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.