The ExtensibleMarkup Language (XML)
XML is a subset of Standard Generalized Markup Language (SGML); its design goal
is to be as powerful and flexible as SGML with less complexity. If you’ve ever worked
with Hypertext Markup Language (HTML), then you’re familiar with an application
of SGML. If you’ve ever worked with Extensible Hypertext Markup Language
(XHTML), then you’re familiar with an application of XML, since XHTML is a reformulation
of HTML 4 as XML.It is not the scope of this book to provide a complete primer on XML. As such, we
assume that you are familiar with the XML and XPath languages and their associated
In order to understand the concepts that follow in this chapter, it is important that
you know some basic principles about XML and how to create well-formed and valid
XML documents. In fact, it is nowimportant to define a few terms before proceeding:
• Entity: An entity is a named unit of storage. In XML, they can be used for a
variety of purposes—such as providing convenient “variables” to hold data,
or to represent characters that cannot normally be part of an XML document
(for example, angular brackets and ampersand characters). Entity definitions
can be either embedded directly in an XML document, or included from an
• Element: A data object that is part of an XML document. Elements can contain
other elements or raw textual data, as well as feature zero or more attributes.
• Document Type Declaration: A set of instructions that describes the accepted
structure and content of an XML file. Like entities, DTDs can either be externally
defined or embedded.
• Well-formed: An XML document is considered well-formed when it contains a
single root level element, all tags are opened and closed properly and all entities
(<, >, &, ’, ") are escaped properly. Specifically, it must conform to all
“well-formedness” constraints as defined by the W3C XML recommendation.
• Valid: An XML document is valid when it is both well-formed and obeys a
referenced DTD. An XML document can be well-formed and not valid, but it
can never be valid and not well-formed.
A well-formed XML document can be as simple as:
<?xml version="1.0"?> <message> Hello, World!This example conforms fully to the definition described earlier: it has at least one
element, and that element is delimited by start and end tags. However, it is not valid,
because it doesn’t reference a DTD. Here is an example of a valid version of the same
<?xml version="1.0"?> <!DOCTYPE message SYSTEM "message.dtd"> <message> Hello, World! </message>In this case, an external DTD is loaded from local storage, but the declarations may
also be listed locally:
<?xml version="1.0"?> <!DOCTYPE message [< !ELEMENT message (#PCDATA)>] ><message> Hello, World! </message>In practice, mostXMLdocuments youwork with will not contain aDTD—and, therefore,
will not be valid. In fact, the DTD is not a requirement except to validate the structure of a document, which may not even be a requirement for your particular
needs. However, all XML documents must be well-formed for PHP’s XML functionality
to properly parse them, as XML itself is a strict language.