XML Syntax

XML's structural rules are reflected in its linguistic rules, or syntax, and this section looks at how some of the structural rules play out in writing the language components. Since most readers will probably be more familiar with HTML than with SGML, HTML is used here as a reference to the XML included in this section. As you know, HTML and XML are both applications of SGML. Because HTML and XML have the same parent language, the similarities in the syntax of both are obvious. But these similarities do not go very deep.

Opening and Closing Tags

In HTML code, an element usually contains both opening and closing tags. XML, unlike HTML, requires that a closing tag be used for every element. Consider, for example, the HTML Paragraph element, which would normally include an opening tag, some content, and a closing tag as shown here:

<P>This is an HTML Paragraph element.</P>

If you have written much code in HTML, you might be thinking, "Wait a minute, I never use the closing Paragraph tag in my pages!" You might not be aware that a closing Paragraph tag even exists because HTML (and its parent, SGML) allow tagging shortcuts. That is, you can omit closing tags and the code is still valid.

HTML is based on a predefined structure that allows processors to assume where certain tags should be located in a document. Because a paragraph in HTML cannot be nested inside another paragraph, the processor can read an opening Paragraph tag and assume that it also marks the end of the preceding paragraph. Such minimization techniques are not allowed in XML, and this represents the most obvious syntactical difference between the two languages.

The Empty-Element Tag

Even though XML requires that closing tags be used, it does support a shortcut for empty elements called, well, the empty-element tag. The empty-element tag effectively combines the opening and closing tags for an element containing no content. It uses a special format: <TAGNAME/>. Notice that the forward slash follows the tag name—this is not supported in HTML. Suppose, for example, that we created a tag called <GENUS>. If the Genus element contained no data, we could write it using opening and closing tags, as shown below:

<GENUS></GENUS>

Or we could write it using an empty-element tag:

<GENUS/>

Attributes

Attributes provide a method of associating values to an element without making the attributes a part of the content of that element. For example, let's look at a common HTML element and how it uses an attribute:

<A HREF="http://www.microsoft.com">Microsoft Home Page</A>

Here, the Anchor element indicated by the <A> tag contains an attribute with the name HREF. The value for the attribute is http://www.microsoft.com. While the value of this attribute is never displayed to the user, it contains important information about the element and provides the destination for the anchor. This name/value format demonstrates the way attributes are used in XML.

This example adds an attribute to one of the elements in the sample document:

<?xml version="1.0"?>
<!DOCTYPE Wildflowers SYSTEM "Wldflr.dtd">

<PLANT ZONE=3>
  <COMMON>Columbine</COMMON>
  <BOTANICAL>Aquilegia canadensis</BOTANICAL>
</PLANT>

Notice that the ZONE attribute in the opening <PLANT> tag follows the name/value format.

NOTE
Although not demonstrated in the example above, an important aspect of attribute values is that they can contain any ASCII characters, including those normally reserved for markup. Because of this, attribute values were not designed to be parsable by an XML processor—meaning that attribute values cannot be validated. The processor will check to determine that an attribute name and value match the type that was declared in the DTD, but it will not care what the value is. (For more information about writing attribute declarations, see Chapter 4.)