Two of XML's most valuable features are its ability to provide structure to documents and to make data self-describing. These features would not be of much use if you could not enforce the structural and grammatical rules. If you have created SGML documents, you should understand the idea of a valid document. If you're familiar with HTML, you should understand the concept of a well-formed document. The next two sections discuss these terms.
As you learned earlier in the chapter, the DTD specified in the prolog outlines all the rules for the document. A valid XML document strictly obeys all these rules. (The next chapter looks at the parts of a DTD in detail.) A valid document also obeys all the validity constraints identified in the XML specification.
Here is an example of a validity constraint for attribute defaults from section 3.3.2 of the XML specification:
Validity Constraint: Required AttributeIf the default declaration is the keyword #REQUIRED, then the attribute must be specified for all elements of the type in the attribute-list declaration.
The processor must understand the validity constraints of the XML specification and check the document for possible violations. If the processor finds any errors, it must report them to the XML application. The processor must also read the DTD, validate the document against it, and again report any violations to the XML application. Because all of this processing and checking can take time (not to mention bandwidth) and because validation might not always be necessary, XML supports the notion of the well-formed document.
Even though being well formed means that some rules must be obeyed, these rules are not nearly as strict as those constraints required for validity. The XML specification addresses the concept of a well-formed document that is not validated. Fortunately, only XML processors have to deal with the rules for well-formedness. If you, as an XML document author, don't follow the rules, the processor will let you know!
NOTE
Although a well-formed document isn't required to adhere to validity constraints, a valid document must adhere to all the rules for well-formedness as well as to all the validity constraints.
Why does XML allow an author to simply follow the syntax rules and create content without worrying about a DTD? While this might seem like an invitation to chaos, that is not the intention. Remember from Chapter 2 that one of XML's goals is to make XML documents easy to create. Well-formedness helps meet that goal by not requiring additional work to create a DTD. Following are some other ways that well-formedness provides a benefit:
According to the XML specification, a well-formed document must meet the following criteria:
It matches the definition of a document. Matching the definition of a document means that:
Let's look again at the sample document created earlier in the chapter:
<?xml version="1.0"?> <!DOCTYPE Wildflowers SYSTEM "Wldflr.dtd"> <PLANT> <COMMON>Columbine</COMMON> <BOTANICAL>Aquilegia canadensis</BOTANICAL> </PLANT> |
This document contains the Plant element as the single Document element, and the Common and Botanical elements are nested inside the Document element. To illustrate this concept, the following example is not well-formed XML because it contains two elements at the root:
<?xml version="1.0"?> <!DOCTYPE Wildflowers SYSTEM "Wldflr.dtd"> <COMMON>Columbine</COMMON> <BOTANICAL>Aquilegia canadensis</BOTANICAL> |
The Common and Botanical elements are both located at the root level of the document—in other words, these two complete elements immediately follow the prolog. Each element has opening and closing tags, but neither element is nested within the other.
It observes the constraints for a well-formed document as defined by the XML specification. The XML specification identifies certain constraints by which a document must abide to be considered well formed. Anyone creating a processor needs to understand these constraints and enforce them in the processor. Following is a sample constraint from the XML specification:
Well-Formedness Constraint: Legal Character
Characters referred to using character references must be legal according to the nonterminal Char.
If the character begins with "&#x", the digits and letters up to the terminating ";" provide a hexadecimal representation of the character's value in ISO/IEC 10646. If it begins just with "&#", the digits up to the terminating ";" provide a decimal representation of the character's value.
All of the parsed entities referenced in the document are well formed. Since parsed entities become part of the document once they are parsed by the XML processor, they must also be well formed for the document to be considered well formed. This is important to note if you are using external entities created by someone else. If the creator of the external entity you are using did not meet the well-formedness constraints, the entity might cause errors in your document.