What Is XML?

As noted in Chapter 1, XML is derived from SGML. But unlike HTML, XML is not an application of SGML but is a subset, or profile, of it. That being the case, XML is a metalanguage in much the same way as SGML. That is, other languages, or vocabularies, can be developed in XML (more on vocabularies in Chapter 5). As mentioned in Chapter 1, anything that can be done in XML can also be done in SGML. So why is XML needed?

The Case for XML

Because XML is optimized for use on the World Wide Web, the XML initiative brings to the table some benefits that are not found in SGML. XML has the ability to work with HTML for data display and presentation, so XML provides several advantages over SGML for Web-delivered data:

NOTE
XLL and XSL are two powerful additions to the XML family of languages. XLL is discussed in Chapter 7, and XSL is covered in Chapter 8.

To put it simply, XML provides 80 percent of the features and functionality of SGML with 20 percent of the complexity.

XML Is About Data

If HTML is about displaying information, XML is about describing information. XML is a standard language used to structure and describe data that can be understood by different applications. The power of XML is its ability to separate the user interface from the data. Let's rewrite the memo document from Chapter 1 and see how this works. The XML code for the new document is shown here:

<?xml version="1.0"?>
<MEMO>
  <TO>Jodie</TO>
  <FROM>Bill</FROM>
  <CC>Philip</CC>
  <SUBJECT>Chapter 2</SUBJECT>
  <BODY>This is where we start getting into some XML code!</BODY>
</MEMO>

You'll notice that the code above looks similar to the SGML version of the document in Chapter 1, with the exception that every element has a closing tag—but more on that later. Notice here that nothing inherent in the document indicates how the data should look. In other words, no formatting information (such as bold or italic fonts, text indent, and font size) is included. However, much of the document code describes what the data is. A human reader could easily look at this code and understand what the document is about and how it is structured.

XML documents are also known as self-describing. That is, each document contains the set of rules to which its data must conform. Because any set of rules can be reused in another document, other authors can easily create the same class of document, if necessary.

NOTE
Document classes are discussed in Chapter 4. The class concept was borrowed from object-oriented programming, in which each class is used to describe a group of objects that have a common set of characteristics. Classing documents is a powerful way to group documents based on the kind of content they contain.

Some other ways that XML can be used to work with data include the following:

As you will see, XML can be an extremely powerful way to author and store data, not only for use on the Web but for use in other applications as well.