Data Typing in XML

If you are familiar with databases or programming, you know that generally a piece of data possesses certain characteristics or conforms to a specific format that brands it a string, a number, a date, a currency, or another data type. So far, the XML data we've worked with in previous chapters has been of the single type string, or text data. However, XML data types allow authors to specify element data as objects that can be interpreted as different types.

It is important to distinguish the element type from its data type. Chapter 4 looked at how to define element types in a Document Type Definition (DTD). You will remember that an element can contain data that is either parsed character data (#PCDATA) or character data (CDATA). The DTD can further define the element by its context or position in the structure. This method of element typing indicates the semantics of the element (such as Age representing how old a person is) but does not describe the type of data the element contains (such as Age containing a number). Parsed character data and character data are both string data types. You'll recall from Chapter 4 that additional types can be specified for XML attributes. These types are restated in the table below:

Attribute Type	Usage
CDATA	Only character data can be used in the attribute.
ENTITY	Attribute value must refer to an external binary entity declared in the DTD.
ENTITIES	Same as ENTITY, but allows multiple values separated by white space.
ID	Attribute value must be a unique identifier. If a document contains ID attributes with the same value, the processor should generate an error.
IDREF	Value must be a reference to an ID declared elsewhere in the document. If the attribute does not match the referenced ID value, the processor should generate an error.
IDREFS	Same as IDREF, but allows multiple values separated by white space.
NMTOKEN	Attribute value is any mixture of name token characters, which must be letters, numbers, periods, dashes, colons, or underscores.
NMTOKENS	Same as NMTOKEN, but allows multiple values separated by white space.
NOTATION	Attribute value must refer to a notation declared elsewhere in the DTD. Declaration can also be a list of notations. The value must be one of the notations in the list. Each notation must have its own declaration in the DTD.
Enumerated	Attribute value must match one of the included values. For example: <!ATTLIST MyAttribute (content1\|content2)>.

The attribute types in the above table are known as primitive data types. The Microsoft XML processor (Msxml) also supports rich data types, which represent the kinds of data types found in traditional programming languages and database systems. Data typing, as discussed in this chapter, focuses on rich data types and indicates the parser class that is used to interpret the data as something other than just a string.

NOTE
The attribute types in the above table are identified in section 3.3.1 of the XML 1.0 specification (included on the companion CD).

Strong Typing vs. Weak Typing

Data typing falls into two basic contexts: strong typing and weak typing. In strong typing, an element must always contain a single type of data. The content in the element must conform to a strict set of rules regarding its type. For example, you might have an element named Part that must always contain an integer. It could not contain a string, a date, or even a decimal. Data is usually strongly typed in database APIs such as ODBC (Open Database Connectivity) or JDBC (Java Database Connectivity).

Weak typing, as you might guess, allows multiple types of data to exist in a single element. So if our Part element was specified to have a weak type, it might contain an integer, a string, a name, a date, or some combination of types.

Specifying Data Types

You specify data types in an XML document using the dt:dt attribute in an element. The syntax is dt:dt="datatype", where datatype represents one of the supported data types. In the following example, the Id element is of type number.

<?xml version="1.0"?>
<PRODUCT xmlns:dt="urn:schemas-microsoft-com:datatypes">
  <PART>
    <ID dt:dt="number">4535645.234</ID>
    <NAME>widget</NAME>
  </PART>
</PRODUCT>

NOTE
The unusual syntax of the dt:dt attribute is because of the fact that data types are defined by a particular namespace. You might also have noticed the namespace declaration in the second line of the above code, which declares the datatypes namespace. Namespaces use special prefixes to uniquely identify elements and attributes. See the section "XML Namespaces" later in this chapter.

The table below identifies some of the more common data types. For a complete list of supported data types, see the section "Supported Data Types" in Appendix B of this book.

Data Type	Description	Example
boolean	1 or 0	1; 0
char	String (one character only)	a
float	A signed or unsigned whole number or fraction. No effective limit on number of digits. Can contain an exponent.	34234.376; 477
int	An unsigned whole number, no exponent.	345
number	A signed or unsigned whole number or fraction. No effective limit on number of digits. Can contain an exponent.	-23; 567556; 443.34; 67E12
string	#PCDATA	This is a string.
uri	Universal Resource Identifier	http://mspress.microsoft.com

Whenever a node is assigned a type, the data in that node will conform to the specified type, despite the data's original format. Consider this Number element, for example:

<NUMBER dt:dt="int">-255</NUMBER>

This element will be processed as its typed value: the integer 255. It will not be processed as -255. This type conversion is important to understand when working with data types, because unexpected results can occur if you are not careful.

Working with Data Types in Script

The XML object model allows scripts access to data types. As is true of other objects in the object model, the data type object is exposed and available to the script code. The element node properties dataType and nodeTypedValue allow the content author to access the data types of any node in the content tree. Let's look at how these properties work in script code, using the XML document shown in Code Listing 6-1.

NOTE
Throughout the rest of this chapter we will work with an XML document containing information for a wildflower plant catalog. A copy of the catalog document is included on the companion CD in the Chap06\Lst6_1.xml file. A portion of the file is shown in Code Listing 6-1.

Code Listing 6-1.

<CATALOG xmlns:dt="urn:schemas-microsoft-com:datatypes">
  <PLANT>
    <COMMON>Bloodroot</COMMON>
    <BOTANICAL>Sanguinaria canadensis</BOTANICAL>
    <ZONE>4</ZONE>
    <LIGHT>Mostly Shady</LIGHT>
    <PRICE dt:dt="fixed.14.4">2.44</PRICE>
    <AVAILABILITY dt:dt="dateTime">1999-03-15</AVAILABILITY>
  </PLANT>
  
  <PLANT>
    <COMMON>Columbine</COMMON>
    <BOTANICAL>Aquilegia canadensis</BOTANICAL>
    <ZONE>3</ZONE>
    <LIGHT>Mostly Shady</LIGHT>
    <PRICE dt:dt="fixed.14.4">9.37</PRICE>
    <AVAILABILITY dt:dt="dateTime">1999-03-06</AVAILABILITY>
  </PLANT>
  
  <PLANT>
    <COMMON>Marsh Marigold</COMMON>
    <BOTANICAL>Caltha palustris</BOTANICAL>
    <ZONE>4</ZONE>
    <LIGHT>Mostly Sunny</LIGHT>
    <PRICE dt:dt="fixed.14.4">6.81</PRICE>
    <AVAILABILITY dt:dt="dateTime">1999-05-17</AVAILABILITY>
  </PLANT>
</CATALOG>

Next we'll create an HTML page to work with the data in our XML document. Code Listing 6-2 (Chap06\Lst6_2.htm on the companion CD) uses the XML object model to walk through elements in the document tree and pull out the data we want.

NOTE
For more detail on the XML object model, see Appendix A, "The XML Object Model"

The dataType Property

In Code Listing 6-2, the dataType property is used to get the data type of the price node.

Code Listing 6-2.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>

  <HEAD>
    <SCRIPT LANGUAGE="JavaScript" FOR=window EVENT=onload>
      loadDoc();
    </SCRIPT>

    <SCRIPT LANGUAGE="JavaScript">
      var xmlDoc = new ActiveXObject("microsoft.xmldom");
      xmlDoc.load("lst6_1.xml");

      function loadDoc()
        {
        if (xmlDoc.readyState == "4")
          start();
        else
          window.setTimeout("loadDoc()", 4000);
        }

      function start()
        {
        var rootElem = xmlDoc.documentElement;
        var plantNode = rootElem.childNodes.item(0);
        var plantLength = plantNode.childNodes.length;
        for (cl=0; cl<plantLength; cl++)
          {
          currNode = plantNode.childNodes.item(cl);
          switch (currNode.nodeName)
            {
            case "PRICE":
              alert("The data type of this node is " +


                currNode.dataType + ".");
              break;
            }
          }
        }
    </SCRIPT>

    <TITLE>Code Listing 6-2</TITLE>
  </HEAD>

  <BODY>
  </BODY>

</HTML>

When you run the code in Code Listing 6-2, the start function displays the value fixed.14.4 because fixed.14.4 is the data type specified in the XML document.

The nodeTypedValue Property

This property is the typed value of a node, which may differ from the value as it is formatted in the document. Code Listing 6-3 (Chap06\Lst6_3.htm on the companion CD) uses the nodeTypedValue property to display the typed value.

Code Listing 6-3.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>

  <HEAD>
    <SCRIPT LANGUAGE="JavaScript" FOR=window EVENT=onload>
      loadDoc();
    </SCRIPT>

    <SCRIPT LANGUAGE="JavaScript">
      var xmlDoc = new ActiveXObject("microsoft.xmldom");
      xmlDoc.load("Lst6_1.xml");

      function loadDoc()
        {
        if (xmlDoc.readyState == "4")
          start();
        else window.setTimeout("loadDoc()", 4000);
        }



      function start()
        {
        var rootElem = xmlDoc.documentElement;
        var plantNode = rootElem.childNodes.item(0);
        var plantLength = plantNode.childNodes.length;
        for (cl=0;cl<plantLength;cl++)
          {
          currNode = plantNode.childNodes.item(cl);
          switch (currNode.nodeName)
            {
            case "AVAILABILITY":
              alert("The typed value of this node is " +
                currNode.nodeTypedValue + ".");
              break;
            }
          }
        }
    </SCRIPT>

    <TITLE>Code Listing 6-3</TITLE>
  </HEAD>

  <BODY>
  </BODY>

</HTML>

This code displays the value Mon Mar 15 00:00:00 PST 1999 rather than 1999-03-15 because the type specified is dateTime. Even though the data can appear as a date (without a time) when the XML document is displayed, the nodeTypedValue property for the node is its typed value, not just a date.

Changing the Data Type

You can change the data type of an element or attribute through a process called data type coercion, or casting. However, you can convert only from a primitive data type to a rich data type. XML does not support conversions between different rich data types. To change the data type of an element or attribute, you set its dataType property to a different rich data type. You can also use the dataType property to retrieve the current rich data type of an element or attribute. Note that the dataType property can be used to retrieve a rich data type only. If an element or attribute is a primitive data type, its dataType property will be the null value. Code Listing 6-4 (Chap06\Lst6_4.htm on the companion CD) shows an example of how this works.

Code Listing 6-4.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>

  <HEAD>
    <SCRIPT LANGUAGE="JavaScript" FOR=window EVENT=onload>
      loadDoc();
    </SCRIPT>

    <SCRIPT LANGUAGE="JavaScript">
      var xmlDoc = new ActiveXObject("microsoft.xmldom");
      xmlDoc.load("Lst6_1.xml");

      function loadDoc()
        {
        if (xmlDoc.readyState == "4")
          start();
        else
          window.setTimeout("loadDoc()", 4000);
        }

      function start()
        {
        var rootElem = xmlDoc.documentElement;
        var plantNode = rootElem.childNodes.item(0);
        var plantLength = plantNode.childNodes.length;
        for (cl=0; cl<plantLength; cl++)
          {
          currNode = plantNode.childNodes.item(cl);
          switch (currNode.nodeName)
            {
            case "ZONE":
              alert("The data type of this node is " +
                currNode.dataType + ".");
              currNode.dataType = "number";
              alert("The new data type of this node is " +
                currNode.dataType + ".");
              break;
            }
          }
        }
    </SCRIPT>

    <TITLE>Code Listing 6-4</TITLE>
  </HEAD>

  <BODY>
  </BODY>

</HTML>

This example changes the data type of the Zone element from its default primitive data type (#PCDATA) to a number. The processor will now treat the value of this element as a numeric value instead of just text.

The table below lists the rich data type coercions supported by Msxml. Note that these coercions might not work with other XML processors.

Rich Data Type	Description
bbin.base64	MIME-style Base64 encoded binary block.
bin.hex	Hexadecimal digits representing octets.
boolean	0 (false) or 1 (true).
char	One-character string.
date	Date, in a subset of the ISO 8601 format, without the time data. Example: 1998-11-02.
dateTime	Date, in a subset of the ISO 8601 format, with optional time but no zone. Fractional seconds can be as precise as nanoseconds. Example: 1988-07-09T18:39:09.
dateTime.tz	Date, in a subset of the ISO 8601 format, with optional time and optional zone. Fractional seconds can be as precise as nanoseconds. Example: 1988-07-09T18:39:09-08:00.
fixed.14.4	Same as number, but supports no more than 14 digits to the left of the decimal point and no more than 4 to the right.
float	Real number with essentially no limit on the number of digits. Optionally contains a leading sign, fractional digits, and/or an exponent. Values from 1.7976931348623157E+308 to 2.2250738585072014E-308.
int	Number with optional sign, no fractions, and no exponent.
number	Number with essentially no limit on the number of digits. Optionally contains a leading sign, fractional digits, and/or an exponent.
time	Time, in a subset of the ISO 8601 format, with no date and no time zone. Example: 06:18:35.
time.tz	Time, in a subset of the ISO 8601 format, with no date but optional time zone. Example: 03:1525-04:00.
i1	Integer represented in 1 byte. A number with optional sign, no fractions, and no exponent. Examples: 1, 34, -165.
i2	Integer represented in 2 bytes. A number with optional sign, no fractions, and no exponent. Examples: 1, 244, -56433.
i4	Integer represented in 4 bytes. A number with optional sign, no fractions, and no exponent. Examples: 1, 556, -34234, 156645, -2005000700.
i8	Integer represented in 8 bytes. A number with optional sign, no fractions, and no exponent. Examples: 1, 646, -65333, 2666345433454, -2007000800090090.
r4	Real number with essentially no limit on the number of digits. Optionally contains a leading sign, fractional digits, and/or an exponent. Values from 3.40282347E+38F to 1.17549435E-38F.
r8	Same as float.
ui1	Unsigned 1-byte integer. An unsigned number with no fractions and no exponent. Examples: 1, 255.
ui2	Unsigned 2-byte integer. An unsigned number with no fractions and no exponent. Examples: 1, 255, 65535.
ui4	Unsigned 4-byte integer. An unsigned number with no fractions and no exponent. Examples: 1, 660, 2005000000.
ui8	Unsigned 8-byte integer. An unsigned number with no fractions and no exponent. Example: 1582437474934.
uri	Uniform Resource Identifier (URI). Example: urn:schemas-flowers-com:wildflowers.
uuid	Hexadecimal digits that represent octets. Optionally contains embedded hyphens that are ignored. Example: 333C7BC4-460F-11D0-BC04-0080C7055A83.