Reading Time: 2 minutes

A Document Type Definition (DTD) is a fundamental building block of an XML document. It defines the legal structure of an XML document by specifying which elements, attributes, and entities are allowed.

A DTD can be declared either internally within an XML document or externally as a separate file. Using a DTD helps ensure that XML data follows a consistent and valid structure.

Why DTD Is Important

With the help of a DTD, XML files can carry structured information in a defined format. A standard DTD can also be used to validate XML data received from external sources, ensuring data integrity and consistency.

Types of DTD Declarations

DTD declarations can be included in an XML document in two ways:

  • Internal DTD – declared inside the XML document
  • External DTD – stored in a separate file and referenced by the XML document

Basic Building Blocks of XML

The following components form the basic structure of XML documents:

  • Tags
  • Elements
  • Entities
  • Attributes
  • CDATA
  • PCDATA

Tags

Tags are used to mark up elements in an XML document. They define the start and end of an element and help determine how data is structured.

Example:


<Wish>
  <To>John</To>
  <From>Jill</From>
  <Heading>Reminder</Heading>
  <Body>HAPPY BIRTHDAY</Body>
</Wish>

In this example, Wish is the root tag.

Elements

Elements are the main building blocks of XML documents. They can contain text, attributes, child elements, or be empty.

Example:


<To>John</To>
<From>Jill</From>
<Body>HAPPY BIRTHDAY</Body>

Here, To, From, and Body are elements.

Entities

Entities act as variables that represent commonly used characters or strings. They help avoid repetition and ensure proper character handling.

Some predefined XML entities include:

Entity Reference Character
&gt; >
&lt; <
&quot;
&amp; &
&apos;

Attributes

Attributes provide additional information about elements. They are written inside the start tag and consist of name-value pairs.

Example:


<Img src="computer.gif" />

In this example, the src attribute specifies the source of the image.

CDATA

CDATA stands for Character Data. Text inside a CDATA section is not parsed by the XML parser and is treated as plain text.

This is useful when the text contains characters that would otherwise be interpreted as markup.

PCDATA

PCDATA stands for Parsed Character Data. Text inside PCDATA is parsed by the XML parser, and entities are expanded.

PCDATA is the default type of text content inside XML elements.

Conclusion

DTD plays a crucial role in defining the structure and validity of XML documents. By understanding its components such as elements, attributes, entities, CDATA, and PCDATA, developers can create well-structured and reliable XML data.