XML file format is used extensively to store and transmit data. While processing XML files in python, we need to convert them into a Python object such as a dictionary. This article discusses how to convert an XML file or string to a python dictionary.
What is XML?
XML (eXtensible Markup Language) is a markup language that is used to store and transmit data. It is a flexible text format similar to HTML but used for a different purpose. We use HTML to structure and display data in a web browser. On the other hand, XML is used to describe and exchange data between systems and for storing data.
The syntax of XML is based on elements and attributes, where elements define the structure of the data and attributes provide additional information about the elements. An XML document starts with an XML declaration, which specifies the version of XML used, and is followed by a root element that contains all the other elements in the document.
The syntax for storing data in XML format is as follows.
<field_name> value </field_name>
A field can have one or more fields inside itself. For instance, consider the following example.
<?xml version="1.0"?>
<employee>
<name>John Doe</name>
<age>35</age>
<job>
<title>Software Engineer</title>
<department>IT</department>
<years_of_experience>10</years_of_experience>
</job>
<address>
<street>123 Main St.</street>
<city>San Francisco</city>
<state>CA</state>
<zip>94102</zip>
</address>
</employee>
In the above file,
- The root element is
"employee"
. The"employee"
element contains several sub-elements, including"name"
,"age"
,"job"
, and"address"
. - The
"job"
element, in turn, contains sub-elements such as"title"
,"department"
, and"years_of_experience"
. "address"
element contains sub-elements such as"street"
,"city"
,"state"
, and"zip"
.
These elements define the structure and content of the data in the XML file and provide information about the employee, such as name, age, job title, department, years of experience, and address.
The corresponding python dictionary for the above file is as follows.
{'name': 'John Doe',
'age': '35',
'job': {'title': 'Software Engineer',
'department': 'IT',
'years_of_experience': '10'},
'address': {'street': '123 Main St.',
'city': 'San Francisco',
'state': 'CA',
'zip': '94102'}}
Now, we will discuss how to convert an XML file or string to a python dictionary. For this, we will use the xmltodict
module and the ElementTree
module. The ElementTree
module is included in the standard python library. You need to install the xmltodict
module.
In Python3, you can install the xmltodict module using PIP as shown below.
pip3 install xmltodict
For earlier versions of python, you can use the following command.
pip install xmltodict
XML String to Python Dictionary Using the ElementTree Module
The ElementTree module provides us with the fromstring()
method that we can use to convert an XML string to ElementTree. Each element tree is like a python tree where each node represents an element of the XML file. For example, the ElementTree for the data given in the example XML file is as follows.
In the above element tree,
- Each node contains the tag attribute. The tag attribute is the name of the node i.e. name, job, title, etc.
- If the element has no children, it contains a text attribute having the value for its tag. For instance, the age node will have the tag attribute with the value “age” and the text attribute with the value “35”.
To convert the element tree obtained from the XML file to a python dictionary, we will write a recursive function with the following properties.
- The function takes an XML string as its input argument.
- It first creates an empty dictionary. Next, it obtains an element tree from the string using the
fromstring()
method. - Now, we will check if the current node has 0 children. For this, we will check the length of the child attribute of the current node.
- If the current node has no children, we will obtain its tag and text attribute and add it to the dictionary.
- If the current node has one or more children, we will convert the sub-tree of the current node into an XML string using the
to_string()
method. Theto_string()
method takes an XML Element tree and returns the XML string. Then, we will recursively convert this string into a python dictionary. When the recursive call returns a dictionary, we will assign it to the dictionary using the tag attribute as a key.
After execution of the above function, you will get the Python dictionary from the XML string. You can observe this in the following example.
import xml.etree.ElementTree as ET
xml_string = '''<?xml version="1.0"?>
<employee>
<name>John Doe</name>
<age>35</age>
<job>
<title>Software Engineer</title>
<department>IT</department>
<years_of_experience>10</years_of_experience>
</job>
<address>
<street>123 Main St.</street>
<city>San Francisco</city>
<state>CA</state>
<zip>94102</zip>
</address>
</employee>
'''
def xml_to_dict(xml_string):
root = ET.fromstring(xml_string)
result = {}
for child in root:
if len(child) == 0:
result[child.tag] = child.text
else:
result[child.tag] = xml_to_dict(ET.tostring(child))
return result
print("The XML string is:")
print(xml_string)
python_dict = xml_to_dict(xml_string)
print("The python dictionary is:")
print(python_dict)
Output:
The XML string is:
<?xml version="1.0"?>
<employee>
<name>John Doe</name>
<age>35</age>
<job>
<title>Software Engineer</title>
<department>IT</department>
<years_of_experience>10</years_of_experience>
</job>
<address>
<street>123 Main St.</street>
<city>San Francisco</city>
<state>CA</state>
<zip>94102</zip>
</address>
</employee>
The python dictionary is:
{'name': 'John Doe', 'age': '35', 'job': {'title': 'Software Engineer', 'department': 'IT', 'years_of_experience': '10'}, 'address': {'street': '123 Main St.', 'city': 'San Francisco', 'state': 'CA', 'zip': '94102'}}
In this example, we have used the ElementTree module with recursion in python to convert an XML string to a dictionary.
Convert XML String to Python Dictionary Using xmltodict Module
The xmltodict module provides us with a simpler way to convert an XML string to a python dictionary. For this, we can use the parse()
method defined in the xmltodict module.
The parse()
method takes an XML string as its input argument and returns a python dictionary after execution. You can observe this in the following example.
import xmltodict
xml_string = '''<?xml version="1.0"?>
<employee>
<name>John Doe</name>
<age>35</age>
<job>
<title>Software Engineer</title>
<department>IT</department>
<years_of_experience>10</years_of_experience>
</job>
<address>
<street>123 Main St.</street>
<city>San Francisco</city>
<state>CA</state>
<zip>94102</zip>
</address>
</employee>
'''
print("The XML string is:")
print(xml_string)
python_dict = xmltodict.parse(xml_string)
print("The python dictionary is:")
print(python_dict)
Output:
The XML string is:
<?xml version="1.0"?>
<employee>
<name>John Doe</name>
<age>35</age>
<job>
<title>Software Engineer</title>
<department>IT</department>
<years_of_experience>10</years_of_experience>
</job>
<address>
<street>123 Main St.</street>
<city>San Francisco</city>
<state>CA</state>
<zip>94102</zip>
</address>
</employee>
The python dictionary is:
{'employee': {'name': 'John Doe', 'age': '35', 'job': {'title': 'Software Engineer', 'department': 'IT', 'years_of_experience': '10'}, 'address': {'street': '123 Main St.', 'city': 'San Francisco', 'state': 'CA', 'zip': '94102'}}}
In this example, you can observe that we have converted the XML file to a dictionary with a single in-built function i.e. parse() function defined in the xmltodict module. Hence, the xmltodict module makes it easy for us to convert an XML file to a dictionary.
Convert XML File to Python Dictionary
To convert an XML file to a Python dictionary, we will first open the file in read mode. For this, we will pass the file name as the first input argument and the python literal “r” as the second argument to the open()
function. After execution, the open()
function returns a file pointer.
We will use the following XML file for this operation.
Using the file pointer, we obtain the XML string. For this, we will invoke the read() method on the file pointer. The read()
method returns the contents of the XML file as a string.
Once we get the XML string from the file, we can convert it to a python dictionary as shown below.
import xml.etree.ElementTree as ET
xml_file=open("employee.xml","r")
xml_string = xml_file.read()
def xml_to_dict(xml_string):
root = ET.fromstring(xml_string)
result = {}
for child in root:
if len(child) == 0:
result[child.tag] = child.text
else:
result[child.tag] = xml_to_dict(ET.tostring(child))
return result
print("The XML string is:")
print(xml_string)
python_dict = xml_to_dict(xml_string)
print("The python dictionary is:")
print(python_dict)
Output:
The XML string is:
<?xml version="1.0"?>
<employee>
<name>John Doe</name>
<age>35</age>
<job>
<title>Software Engineer</title>
<department>IT</department>
<years_of_experience>10</years_of_experience>
</job>
<address>
<street>123 Main St.</street>
<city>San Francisco</city>
<state>CA</state>
<zip>94102</zip>
</address>
</employee>
The python dictionary is:
{'name': 'John Doe', 'age': '35', 'job': {'title': 'Software Engineer', 'department': 'IT', 'years_of_experience': '10'}, 'address': {'street': '123 Main St.', 'city': 'San Francisco', 'state': 'CA', 'zip': '94102'}}
Instead of using the element tree approach, you can also use the xmltodict module to convert an XML file to a python dictionary as shown below.
import xmltodict
xml_file=open("employee.xml","r")
xml_string = xml_file.read()
print("The XML string is:")
print(xml_string)
python_dict = xmltodict.parse(xml_string)
print("The python dictionary is:")
print(python_dict)
Output:
The XML string is:
<?xml version="1.0"?>
<employee>
<name>John Doe</name>
<age>35</age>
<job>
<title>Software Engineer</title>
<department>IT</department>
<years_of_experience>10</years_of_experience>
</job>
<address>
<street>123 Main St.</street>
<city>San Francisco</city>
<state>CA</state>
<zip>94102</zip>
</address>
</employee>
The python dictionary is:
{'employee': {'name': 'John Doe', 'age': '35', 'job': {'title': 'Software Engineer', 'department': 'IT', 'years_of_experience': '10'}, 'address': {'street': '123 Main St.', 'city': 'San Francisco', 'state': 'CA', 'zip': '94102'}}}
Conclusion
In this article, we have discussed how to convert an XML file or string to a python dictionary. Although we have discussed two approaches, I suggest you use the xmltodict module to convert an XML file or string to a dictionary as it is the easier approach.
To learn more about python programming, you can read this article on how to convert a dictionary to YAML in Python. You might also like this article on custom json encoders in python.
I hope you enjoyed reading this article. Stay tuned for more informative articles.
Happy Learning!
Recommended Python Training
Course: Python 3 For Beginners
Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.