Parsing XML Documents in PHP

Sep 10, 2009 Author: Developer

All XML parsing is done by SimpleXML internally using the DOM parsing model. There are no special calls or tricks you need to perform to parse a document. The only restraint is that the XML document must be well-formed, or SimpleXML will emit warnings and fail to parse it. Also, while the W3C has published a recommended specification for XML1.1, SimpleXML supports only version 1.0 documents. Again, SimpleXMLwill emit a warning and fail to parse the document if it encounters an XML document with a version of 1.1. All objects created by SimpleXML are instances of the SimpleXMLElement class. Thus, when parsing a document or XML string, you will need to create a new SimpleXMLElement; there are several ways to do this. The first two ways involve the use of procedural code, or functions, that return SimpleXMLElement objects. One such function, simplexml_load_string(), loads an XML document from a string, while the other, simplexml_load_file(), loads an XML document from a path. The following example illustrates the use of each, pairing file_get_contents() with simplexml_load_string(); however, in a real-world scenario, it would make much more sense to simply use simple_xml_load_file():

// Load an XML string$xmlstr = file_get_contents(’library.xml’);
$library = simplexml_load_string($xmlstr);
// Load an XML file$library = simplexml_load_file(’library.xml’);

Since it was designed to work in an object-oriented environment, SimpleXML also supports an OOP-centric approach to loading a document. In the following example, the first method loads an XML string into a SimpleXMLElement, while the second loads an external document, which can be a local file path or a valid URL.

// Load an XML string$xmlstr = file_get_contents(’library.xml’);
$library = new SimpleXMLElement($xmlstr);
// Load an XML file$library = new SimpleXMLElement(’library.xml’, NULL, true);

The second method also passes two additional arguments to SimpleXMLElement’s constructor. The second argument optionally allows the ability to specify additional libxml parameters that influence the way the library parses the XML. It is not necessary to set any of these parameters at this point, so we left it to NULL. The third parameter is important, though, because it informs the constructor that the first argument represents the path to a file, rather than a string that contains the XML data itself.

views 4408
  1. Add New Comment