Reading Time: 3 minutes

XML remains a core technology in many production systems. It powers SOAP services, enterprise integrations, configuration files, document standards, and structured data exchange in regulated industries. Even in ecosystems where JSON dominates APIs, XML continues to play a critical role in backend systems and legacy integrations.

For developers working across stacks, understanding how to read and write XML in PHP, Python, and Java is essential. While the syntax of XML remains the same across languages, each ecosystem provides different libraries and parsing models. This guide explains the shared concepts and shows how they are implemented in practice.

XML Concepts You Need Before Coding

Before writing code, it is important to understand a few core XML principles:

  • Well-formed documents follow XML syntax rules.
  • Valid documents conform to a schema such as XSD or DTD.
  • XML elements may contain attributes, text, or nested elements.
  • Namespaces help avoid naming conflicts.
  • Character encoding (typically UTF-8) must be handled carefully.

Most XML errors in production systems stem from namespace confusion, encoding mismatches, or improper escaping.

Choosing a Parsing Strategy

There are two main parsing approaches across all three languages:

DOM (Document Object Model)

The entire XML document is loaded into memory as a tree structure. This allows random access and modification but can consume significant memory for large files.

Streaming Parsers (SAX, StAX, iterparse, XMLReader)

Streaming parsers read XML sequentially. They are more memory-efficient and suitable for very large documents but offer less flexibility for random access.

Choosing between DOM and streaming depends on document size and processing requirements.

Shared Example XML

Throughout this article, assume the following XML structure:

<order>
    <customer id="123">John Smith</customer>
    <total currency="USD">99.99</total>
    <items>
        <item sku="A1">Laptop</item>
        <item sku="B2">Mouse</item>
    </items>
</order>

We will demonstrate how to read and generate similar XML across languages.

Reading and Writing XML in PHP

SimpleXML

SimpleXML provides an easy interface for small documents.

$xml = simplexml_load_file("order.xml");
echo $xml->customer;
echo $xml->total;

Accessing attributes:

echo $xml->customer['id'];

SimpleXML is intuitive but limited for complex transformations.

DOMDocument

DOMDocument offers more control and supports XPath queries.

$dom = new DOMDocument();
$dom->load("order.xml");
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//item");

Creating XML:

$doc = new DOMDocument("1.0", "UTF-8");
$order = $doc->createElement("order");
$doc->appendChild($order);
$doc->save("new_order.xml");

XMLReader (Streaming)

XMLReader is suitable for large files.

$reader = new XMLReader();
$reader->open("large.xml");
while ($reader->read()) {
    if ($reader->nodeType == XMLReader::ELEMENT) {
        echo $reader->name;
    }
}

Reading and Writing XML in Python

ElementTree (Standard Library)

import xml.etree.ElementTree as ET

tree = ET.parse("order.xml")
root = tree.getroot()

customer = root.find("customer").text
total = root.find("total").text

Creating XML:

order = ET.Element("order")
customer = ET.SubElement(order, "customer")
customer.text = "John Smith"

tree = ET.ElementTree(order)
tree.write("new_order.xml", encoding="utf-8", xml_declaration=True)

lxml (Advanced Features)

lxml supports XPath and XSD validation.

from lxml import etree

tree = etree.parse("order.xml")
items = tree.xpath("//item")

Validation with XSD:

schema_root = etree.XML(open("schema.xsd").read())
schema = etree.XMLSchema(schema_root)
parser = etree.XMLParser(schema=schema)
etree.parse("order.xml", parser)

Streaming with iterparse

for event, elem in ET.iterparse("large.xml"):
    if elem.tag == "item":
        print(elem.text)
        elem.clear()

Reading and Writing XML in Java

DOM Parsing

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("order.xml");

NodeList items = doc.getElementsByTagName("item");

SAX Parsing

SAX uses event-driven handlers for streaming large documents efficiently.

StAX (Pull Parser)

StAX allows iterative reading of XML events.

JAXB for Object Mapping

JAXB maps XML to Java objects using annotations. It simplifies complex XML handling in enterprise systems.

JAXBContext context = JAXBContext.newInstance(Order.class);
Unmarshaller unmarshaller = context.createUnmarshaller();
Order order = (Order) unmarshaller.unmarshal(new File("order.xml"));

Cross-Language Comparison

Language Simple Parsing Streaming XPath Support Schema Validation
PHP SimpleXML XMLReader DOMXPath Supported via DOM
Python ElementTree iterparse lxml lxml
Java DOM SAX / StAX XPath API JAXP / JAXB

Common Pitfalls

  • Incorrect namespace handling
  • Encoding mismatches
  • Unescaped special characters
  • Loading very large XML files fully into memory
  • Failing to disable external entity resolution (XXE risk)

Security Best Practices

Always treat XML as untrusted input. Disable external entity resolution and validate against schemas when processing external documents. Configure parsers to prevent entity expansion attacks.

Conclusion

Reading and writing XML in PHP, Python, and Java follows the same conceptual model: parse the document, navigate its structure, modify or extract data, and serialize the result. The differences lie in the libraries and APIs provided by each language.

For small documents, DOM-based solutions are convenient and expressive. For large datasets, streaming parsers are essential. When working in enterprise systems, schema validation and secure parser configuration should always be part of the implementation strategy.