A Document Type Definition (DTD) is a fundamental building block of an XML document. It defines the legal structure of an XML document by specifying which elements, attributes, and entities are allowed.
A DTD can be declared either internally within an XML document or externally as a separate file. Using a DTD helps ensure that XML data follows a consistent and valid structure.
Why DTD Is Important
With the help of a DTD, XML files can carry structured information in a defined format. A standard DTD can also be used to validate XML data received from external sources, ensuring data integrity and consistency.
Types of DTD Declarations
DTD declarations can be included in an XML document in two ways:
- Internal DTD – declared inside the XML document
- External DTD – stored in a separate file and referenced by the XML document
Basic Building Blocks of XML
The following components form the basic structure of XML documents:
- Tags
- Elements
- Entities
- Attributes
- CDATA
- PCDATA
Tags
Tags are used to mark up elements in an XML document. They define the start and end of an element and help determine how data is structured.
Example:
<Wish>
<To>John</To>
<From>Jill</From>
<Heading>Reminder</Heading>
<Body>HAPPY BIRTHDAY</Body>
</Wish>
In this example, Wish is the root tag.
Elements
Elements are the main building blocks of XML documents. They can contain text, attributes, child elements, or be empty.
Example:
<To>John</To>
<From>Jill</From>
<Body>HAPPY BIRTHDAY</Body>
Here, To, From, and Body are elements.
Entities
Entities act as variables that represent commonly used characters or strings. They help avoid repetition and ensure proper character handling.
Some predefined XML entities include:
| Entity Reference | Character |
|---|---|
| > | > |
| < | < |
| " | “ |
| & | & |
| ' | ‘ |
Attributes
Attributes provide additional information about elements. They are written inside the start tag and consist of name-value pairs.
Example:
<Img src="computer.gif" />
In this example, the src attribute specifies the source of the image.
CDATA
CDATA stands for Character Data. Text inside a CDATA section is not parsed by the XML parser and is treated as plain text.
This is useful when the text contains characters that would otherwise be interpreted as markup.
PCDATA
PCDATA stands for Parsed Character Data. Text inside PCDATA is parsed by the XML parser, and entities are expanded.
PCDATA is the default type of text content inside XML elements.
Conclusion
DTD plays a crucial role in defining the structure and validity of XML documents. By understanding its components such as elements, attributes, entities, CDATA, and PCDATA, developers can create well-structured and reliable XML data.