Processing XML using DOM

How to Process XML using DOM

The Document object Model (DOM) for parsing an XML document is essentially just a method for turning your XML document into an object tree. Because all XML documents are essentially built like a tree, accessing an individual element by its branch seems a logical step.

Lots of different DOM parsers are supported under Perl, including XML::DOM, XML::Simple, and ML: Twig. Of these, my personal favorite is XML: Grove, written by KEN MACLEOD. XML: GROVE Is not strictly a DOM parser – it doesn’t adhere to W3c’s DOM API, but it does provide a very similar interface. For a genuine DOM parser, use the XML: DOM module.

The XML : : Grove module provides an easy way to work with an entire XML document by loading an XML document into memory and then converting it into a tree of objects that can be accessed just like any other set of nested references. To demonstrate the tree format offered by XML: Grove lets look at a sample XML document.

We’ll use a contact entry within an address book, a structure most people are familiar with. If we think about a single record within a contact database, then the base of the XML document will be the contact. We’ll use a fictional version of me for our example.

A contact Record Written in XML

<Contact>
<Name>Martin Brown</name>
<Address>
<Description>Main Address</description>
<Addressline>the house, the street, the
Town</Addressline>
</Address>
<Address>
<Description>Holiday Chalet</Description>
<addressline>The Chalet, The Hillside, The
Forest</addressline>
</Address>
</Contact>

The grove. Pl sample script that comes with the XML: Grove module kit can convert this document into a textual tree. This version has been modified slightly so that it also output the array reference number of each branch.

Because we can access individual tags within a DOM-parsed XML document, DOM parsers are particularly useful when we want to update the contents of an XML document. Using SAX to process the document sequentially rather than using the tree model offered by a DOM parser is far from ideal, because it means reading in the content, identifying which bits you want to change as they are triggered, and then regenerating the result.

For example, if we wanted to update my holiday Chalet address using SAX, we’d have to read in the content, identify first that we were in the address branch, and then that were we in the correct addressline branch. Then we could replace the information in the output.

Using DOM, we parse the entire document; update the address within the branch we want to update, and then dump the XML document back out again. Updating the branch is just a case of referencing the branch’s location within the DOM structure.

XML: Grove converts your XML document into a series of nested arrays and hashes. The arrays contain a list of elements within the current branch, and the hashes are used to soppy the element type, name, and data (if applicable) for that branch. Because there are different element types, the numbers don’t always match what you would normally expect.

In listing you will notice the array reference numbers required to accesses each branch. To access the contents of a branch, you access the contents element form the enclosed hash and get the data continued in a branch using the Data key. Finally the name key returns the tag name for a given branch, and the attributes key returns the attributes for the tag.

To get the data from the name XML tag, we’d need to access the data key from Branch 0
Print ‘Name: ‘, $grove-> {Contents} [0] -> {contents}
[2]-> {contents} -> [0] -> [Data], “n”;

FREE Subscription

Subscribe to our mailing list and receive new articles
through email. Keep yourself updated with latest
developments in the industry.

Note : We never rent, trade, or sell my email lists to
anyone. We assure that your privacy is respected
and protected.

_______________________________________