How to use Parsing with SAX and DOM
The Simple API for XML (SAX) interface was originally developed under Java, although interfaces now exist under most languages, Python 2 supports SAX Version, and the interface is extensive.
Python provides the Basic interface to the SAX parser, an exception handling system, a set of base classes for creating SAX handlers, and a low-level interface to the SAX system for building your own low-level SAX-based parsers.
SAX works by accepting a content handler class that you have previously created to handle the different elements. The method is similar in principle to Expect, except that the class you create is entirely devoted to supporting the handler methods for the different elements. SAX handles all of the data reading and feeding of the information to the Parser.
Keeping with the basic theme for the moment, listing a script that uses SAX to output the start and end tags from a sample file.
A Simple SAX Parser
From xml.sax import make_parser
From xml.sax.handler import contenthandleer
# Define a new content handler class, the
# defined methods will be triggered when the
# individual elements are found in the XML document
def startElement(self, name, attrs):
print ‘Start: ‘, name, attrs
def endElement(self, name):
print ‘End: ‘,name
# Make a new parser
parser = make_parser()
# Create a new handler instance based on our class
sehandler = FindStartEnd()
# Set up the content handler for using our handler
xmlfile = open(sys.argv)
print “you must supply the name of the file
# We pass off the name of the file to
# the parsing engine
Aside from not printing out our data sections, the output from this script is identical to the previous examples. Also note that we no longer have to supply the data in descrete segments to the parser: the SAX interface opens a file by name and handles all of the reading internally.
Because the way SAX works, it’s ideally suited to situations where we want to pick out specific elements while processing a document. For example, we can install triggers to identify specific tags and or data sections in a simpler way than offered by the DOM techniques.
Sax can also be a great way of serializing documents into another format because we can act on each element as its extracted from the original XML source.
Subscribe to our mailing list and receive new articles
through email. Keep yourself updated with latest
developments in the industry.
Note : We never rent, trade, or sell my email lists to
anyone. We assure that your privacy is respected
Stay Current With the Latest Technology Developments Realted to XML. Signup for Our Newsletter and Receive New Articles Through Email.
Note : We never rent, trade, or sell our email lists to anyone. We assure that your privacy is respected and protected.