What Is XML?
As noted in Chapter 1, XML is derived from SGML. But unlike HTML, XML is not an
application of SGML but is a subset, or profile, of it. That being the case, XML is a
metalanguage in much the same way as SGML. That is, other languages, or
vocabularies, can be developed in XML (more on vocabularies in Chapter 5). As mentioned in
Chapter 1, anything that can be done in XML can also be done in SGML. So why is XML needed?
The Case for XML
Because XML is optimized for use on the World Wide Web, the XML initiative brings to
the table some benefits that are not found in SGML. XML has the ability to work
with HTML for data display and presentation, so XML provides several advantages over SGML for Web-delivered data:
- XML is a smaller language than SGML. The designers of
XML tried to cut out everything in SGML that was not needed for Web delivery.
The result is a much simpler and slimmed down language. (The specification
proves this: the basic SGML specification is about 155 pages long, while the
XML specification is only about 35 pages long.)
- XML includes a specification for a hyperlinking scheme, which is described as
a separate language called Extensible Linking
Language (XLL). Not only does XML support the basic hyperlinking found in HTML, but it takes the concept further
with extended linking. (Extended linking is covered in detail in Chapter 8 .) While
SGML allows a hyperlinking mechanism to be defined, it does not include
hyperlinking as part of its original specification.
- XML includes a specification for a style language called
Extensible Stylesheet Language (XSL). This
language provides support for a style-sheet mechanism, also something that is
not found in SGML. Style sheets allow an author to create a template of
various styles (such as bold, italic, and so on) or combinations of styles and
apply them to elements in a document.
NOTE
XLL and XSL are two powerful additions to the XML family of languages. XLL is
discussed in Chapter 7, and XSL is covered in Chapter 8.
To put it simply, XML provides 80 percent of the features and functionality of
SGML with 20 percent of the complexity.
XML Is About Data
If HTML is about displaying information, XML is about
describing information. XML is a standard language used to structure and describe data that can be understood by
different applications. The power of XML is its ability to separate the user interface from the
data. Let's rewrite the memo document from Chapter 1 and see how this works. The XML
code for the new document is shown here:
<?xml version="1.0"?>
<MEMO>
<TO>Jodie</TO>
<FROM>Bill</FROM>
<CC>Philip</CC>
<SUBJECT>Chapter 2</SUBJECT>
<BODY>This is where we start getting into some XML code!</BODY>
</MEMO>
|
You'll notice that the code above looks similar to the SGML version of the
document in Chapter 1, with the exception that every element has a closing tag—but more on
that later. Notice here that nothing inherent in the document indicates how the data should
look. In other words, no formatting information (such as bold or italic fonts, text indent, and
font size) is included. However, much of the document code describes what the data
is. A human reader could easily look at this code and understand what the document is about and how it is structured.
XML documents are also known as self-describing. That is, each document contains
the set of rules to which its data must conform. Because any set of rules can be reused in
another document, other authors can easily create the same
class of document, if necessary.
NOTE
Document classes are discussed in Chapter 4. The class concept was borrowed
from object-oriented programming, in which each class is used to describe a group of
objects that have a common set of characteristics. Classing documents is a powerful way
to group documents based on the kind of content they contain.
Some other ways that XML can be used to work with data include the following:
- Using XML as a data interchange format. Many systems that have been in use
for a while, called legacy systems, can contain data in disparate formats, and
developers are doing a lot of work to connect these systems using the Internet. One of
their challenges is to be able to exchange data between systems that ordinarily are
not compatible. XML might be the answer. Since the XML text format is
standards based (implying that many applications can
understand it), data can be converted to XML and then easily read by another
system or application.
- Using XML for Web data. Imagine having an HTML page
in which none of the content is located on the page itself. Instead, the
content is stored in an XML file, and the HTML page is used simply for
formatting and display. The content can be updated, translated into another
language, or otherwise modified without an author ever having to touch the
HTML code.
- Using XML to create a common data store for
information that might get used in many different ways. Suppose, for example,
that you are writing an article for a magazine. The publisher also wants to
include the article on a Web site and then submit it for inclusion in a book
or journal. If the original article was authored in a proprietary format, such
as RTF, the article would have to be reworked for the Web posting and then
probably reworked again for the book or journal. If the article was written in
XML, however, it could be published to the three different environments
simultaneously because the data of the article is independent of how it is
being displayed. The formatting, layout, and so on are dependent upon the
application using the data and are not attached to the content itself.
Furthermore, the application code that displays the data needs to be written
only once, and it then can be used to display any number of articles.
As you will see, XML can be an extremely powerful way to author and store data,
not only for use on the Web but for use in other applications as well.