Introduction to VTD - XML
VTD-XML is a new, open-source, non-validating, non-extractive eXtensible Markup Lanugauge (XML) processing Application Programming Interface (API) written in Java. The VTD-XML is the best alternative to Simple API for XML (SAX) and Document Object Model (DOM), as it does not force you to trade processing performance for usability.
The Java-based, non-validating VTD - XML parser is faster than DOM and better than SAX. Unlike the other XML processing technologies, VTD-XML is designed to be random-access capable without incurring excessive resource overhead.
Memory buffers can be allocated in bulk to store the VTD records, as the records are constant in length. This avoids the creation of a multitude of string/node objects usually associated with other XML processing technologies. As a result, both memory usage and object creation cost are greatly reduced by using VTD-XML, which leads to significantly higher processing performance. For example, on a 1.5 Ghz Athlon machine, VTD-XML delivers random access at a performance level of 25 to 35 MB/sec, outperforming most SAX parsers with null content handlers. An in-memory VTD-XML document typically consumes only 1.3 to 1.5 times the size of the XML document.
VTD-XML provides several benefits for software developers. For example, you require a processing model to start work on a project involving XML. The DOM is slow and consumes too much memory, particularly for large documents. The SAX difficult to use especially for XML documents with complex structures. As a result, the best option is to choose the VTD-XML, as the features of VTD-XML does not force you to trade processing performance for usability. The random-access capability of VTD-XML provides the best possible performance. Even though SAX is fast due to ifs forward only nature, it does not suit for all the conditions.
VTD-XML can be used for an XML project, only if the two criteria are met. The first criteria is that the current version of VTD-XML does not support entity declarations in document type definitions (DTDs). The VTD-XML recognizes only the five built-in entities such as &s;, ', <, >, and ". The VTD-XML works well when Simple Object Access Protocol (SOAP), Resource Description Framework (RDF), Financial Information Exchange Markup Language (FIXML), or Really Simple Syndication (RSS) are used in the XML project. The next criterion is that the VTD-XML's internal parsed representation of XML is slightly larger than the XML, which as a result demands sufficient RAM. To provide true, random access to the entire document, the document needs to be placed in memory. When both the criteria are met, the VTD-XML is the most efficient XML processing API.
The Java API of VTD-XML consists of three essential components which include VTDGen (VTD generator) that encapsulates the parsing routine that produces the internal parsed representation of XML, the VTDNav (VTD navigator) which is a cursor-based API that allows for DOM-like random access to the hierarchical structure of XML, and the Autopilot which is the class that allows for document-order element traversal.
At the onset of navigation, the cursor of the VTDNav instance points at the root element of the XML document. You can use one of the overloaded versions of toElement() function, to move the cursor manually to different positions in the hierarchy. The toElement() function when declared as toElement(int direction) takes an integer as the input, to indicate the direction in which the cursor moves. Defined as class variables of VTDNav, the six possible values of this integer are ROOT, PARENT, FIRST_CHILD, LAST_CHILD, NEXT_SIBLING, and PREV_SIBLING. Each has its respective acronym such as R, P, FC, LC, NS, and PS. The method toElement() returns a Boolean value indicating the status of the operation. The toElement() returns true when the cursor is moved successfully. When the cursor is moved to a non-existent location, for example, the first child of a childless element, then the cursor does not move and the toElement() returns false.
The method getAttrVal(String attrName) retrieves the attribute value of the element at the cursor position.
Now let us see some of the unique properties of VTD-XML compared to other similar XML APIs, such as DOM and XMLCursor. The hierarchy of VTD-XML consists exclusively of element nodes. This is very different from DOM, which treats every node, whether it is an attribute node or a text node, as a part of the hierarchy. In VTD-XML, every instance of VTDNav has only one cursor. The cursor can be moved back and forth in the hierarchy, but you cannot duplicate it. However, you can temporarily save the location of the cursor on a global stack. VTDNav has two stack access methods which include Calling push() which saves the cursor state and Calling pop() which restores the cursor state. For example, consider that you are somewhere in the element hierarchy and you want to move to a different area of the document after saving the current location and then continue at the saved point. To accomplish this task, you need to first push() the location onto the stack. After moving the cursor to a different part of the document, you can very quickly jump back to the saved location by popping it off the stack.
One of the most unique aspect of VTD-XML that distinguishes it from any other XML processing API, is its non-extractive tokenization based on Virtual Token Descriptor. Non-extractive parsing enables you to achieve optimal processing and memory efficiency in VTD-XML. VTD-XML manifests this non-extractiveness in the following ways. Most of the member methods of VTDNav, such as getAttrVal(), getCurrentIndex(), and getText() return an integer. This integer is a VTD record index that describes the token as requested by the calling functions. VTD-XML produces a linear buffer filled with VTD records, after parsing. You can access any VTD record in the buffer if you know its index value, as all the VTD records are have the same length. In addition, the VTD records cannot be addressed using pointers, as the records are not objects. When a VTDNav function does not evaluate to any meaningful value, it returns -1 which is more or less equivalent to a NULL pointer in DOM.
VTD-XML implements its own set of comparison functions that directly operate on VTD records, as the parsing process does not create any string objects. For example, the matchElement() method of VTDNav, tests whether the element name, which effectively is the VTD record of the cursor, matches a given string. Similarly, the matchTokenString(), matchRawTokenString(), and matchNormalizedTokenString() methods of VTDNav perform a direct comparison between a string and a VTD record. This is advantageous as you need not pull tokens out into string objects, which are expensive to create, especially when you create lots of them. Bypassing excessive object creation is the main reason why VTD-XML significantly outperforms DOM and SAX. VTD-XML can also implement its own set of string-to-numeric data conversion functions that operate directly on VTD records. VTDNav has four member methods which include parseInt(), parseLong(), parseFloat() and parseDouble(). These functions take a VTD record index value and convert it directly into a numeric data type.
“Amazon and the Amazon logo are trademarks of Amazon.com, Inc. or its affiliates.”Copyright - © 2004 - 2019 - All Rights Reserved.