If you are familiar with databases or programming, you know that generally a piece of data possesses certain characteristics or conforms to a specific format that brands it a string, a number, a date, a currency, or another data type. So far, the XML data we've worked with in previous chapters has been of the single type string, or text data. However, XML data types allow authors to specify element data as objects that can be interpreted as different types.
It is important to distinguish the element type from its data type. Chapter 4 looked at how to define element types in a Document Type Definition (DTD). You will remember that an element can contain data that is either parsed character data (#PCDATA) or character data (CDATA). The DTD can further define the element by its context or position in the structure. This method of element typing indicates the semantics of the element (such as Age representing how old a person is) but does not describe the type of data the element contains (such as Age containing a number). Parsed character data and character data are both string data types. You'll recall from Chapter 4 that additional types can be specified for XML attributes. These types are restated in the table below:
Attribute Type | Usage |
---|---|
CDATA | Only character data can be used in the attribute. |
ENTITY | Attribute value must refer to an external binary entity declared in the DTD. |
ENTITIES | Same as ENTITY, but allows multiple values separated by white space. |
ID | Attribute value must be a unique identifier. If a document contains ID attributes with the same value, the processor should generate an error. |
IDREF | Value must be a reference to an ID declared elsewhere in the document. If the attribute does not match the referenced ID value, the processor should generate an error. |
IDREFS | Same as IDREF, but allows multiple values separated by white space. |
NMTOKEN | Attribute value is any mixture of name token characters, which must be letters, numbers, periods, dashes, colons, or underscores. |
NMTOKENS | Same as NMTOKEN, but allows multiple values separated by white space. |
NOTATION | Attribute value must refer to a notation declared elsewhere in the DTD. Declaration can also be a list of notations. The value must be one of the notations in the list. Each notation must have its own declaration in the DTD. |
Enumerated | Attribute value must match one of the included values. For example: <!ATTLIST MyAttribute (content1|content2)>. |
The attribute types in the above table are known as primitive data types. The Microsoft XML processor (Msxml) also supports rich data types, which represent the kinds of data types found in traditional programming languages and database systems. Data typing, as discussed in this chapter, focuses on rich data types and indicates the parser class that is used to interpret the data as something other than just a string.
NOTE
The attribute types in the above table are identified in section 3.3.1 of the XML 1.0 specification (included on the companion CD).
Data typing falls into two basic contexts: strong typing and weak typing. In strong typing, an element must always contain a single type of data. The content in the element must conform to a strict set of rules regarding its type. For example, you might have an element named Part that must always contain an integer. It could not contain a string, a date, or even a decimal. Data is usually strongly typed in database APIs such as ODBC (Open Database Connectivity) or JDBC (Java Database Connectivity).
Weak typing, as you might guess, allows multiple types of data to exist in a single element. So if our Part element was specified to have a weak type, it might contain an integer, a string, a name, a date, or some combination of types.
You specify data types in an XML document using the dt:dt attribute in an element. The syntax is dt:dt="datatype", where datatype represents one of the supported data types. In the following example, the Id element is of type number.
<?xml version="1.0"?> <PRODUCT xmlns:dt="urn:schemas-microsoft-com:datatypes"> <PART> <ID dt:dt="number">4535645.234</ID> <NAME>widget</NAME> </PART> </PRODUCT> |
NOTE
The unusual syntax of the dt:dt attribute is because of the fact that data types are defined by a particular namespace. You might also have noticed the namespace declaration in the second line of the above code, which declares the datatypes namespace. Namespaces use special prefixes to uniquely identify elements and attributes. See the section "XML Namespaces" later in this chapter.
The table below identifies some of the more common data types. For a complete list of supported data types, see the section "Supported Data Types" in Appendix B of this book.
Data Type | Description | Example |
---|---|---|
boolean | 1 or 0 | 1; 0 |
char | String (one character only) | a |
float | A signed or unsigned whole number or fraction. No effective limit on number of digits. Can contain an exponent. | 34234.376; 477 |
int | An unsigned whole number, no exponent. | 345 |
number | A signed or unsigned whole number or fraction. No effective limit on number of digits. Can contain an exponent. | -23; 567556; 443.34; 67E12 |
string | #PCDATA | This is a string. |
uri | Universal Resource Identifier | http://mspress.microsoft.com |
Whenever a node is assigned a type, the data in that node will conform to the specified type, despite the data's original format. Consider this Number element, for example:
<NUMBER dt:dt="int">-255</NUMBER> |
This element will be processed as its typed value: the integer 255. It will not be processed as -255. This type conversion is important to understand when working with data types, because unexpected results can occur if you are not careful.
The XML object model allows scripts access to data types. As is true of other objects in the object model, the data type object is exposed and available to the script code. The element node properties dataType and nodeTypedValue allow the content author to access the data types of any node in the content tree. Let's look at how these properties work in script code, using the XML document shown in Code Listing 6-1.
NOTE
Throughout the rest of this chapter we will work with an XML document containing information for a wildflower plant catalog. A copy of the catalog document is included on the companion CD in the Chap06\Lst6_1.xml file. A portion of the file is shown in Code Listing 6-1.
Code Listing 6-1.
|
Next we'll create an HTML page to work with the data in our XML document. Code Listing 6-2 (Chap06\Lst6_2.htm on the companion CD) uses the XML object model to walk through elements in the document tree and pull out the data we want.
NOTE
For more detail on the XML object model, see Appendix A, "The XML Object Model"
In Code Listing 6-2, the dataType property is used to get the data type of the price node.
Code Listing 6-2.
|
When you run the code in Code Listing 6-2, the start function displays the value fixed.14.4 because fixed.14.4 is the data type specified in the XML document.
This property is the typed value of a node, which may differ from the value as it is formatted in the document. Code Listing 6-3 (Chap06\Lst6_3.htm on the companion CD) uses the nodeTypedValue property to display the typed value.
Code Listing 6-3.
|
This code displays the value Mon Mar 15 00:00:00 PST 1999 rather than 1999-03-15 because the type specified is dateTime. Even though the data can appear as a date (without a time) when the XML document is displayed, the nodeTypedValue property for the node is its typed value, not just a date.
You can change the data type of an element or attribute through a process called data type coercion, or casting. However, you can convert only from a primitive data type to a rich data type. XML does not support conversions between different rich data types. To change the data type of an element or attribute, you set its dataType property to a different rich data type. You can also use the dataType property to retrieve the current rich data type of an element or attribute. Note that the dataType property can be used to retrieve a rich data type only. If an element or attribute is a primitive data type, its dataType property will be the null value. Code Listing 6-4 (Chap06\Lst6_4.htm on the companion CD) shows an example of how this works.
Code Listing 6-4.
|
This example changes the data type of the Zone element from its default primitive data type (#PCDATA) to a number. The processor will now treat the value of this element as a numeric value instead of just text.
The table below lists the rich data type coercions supported by Msxml. Note that these coercions might not work with other XML processors.
Rich Data Type | Description |
---|---|
bbin.base64 | MIME-style Base64 encoded binary block. |
bin.hex | Hexadecimal digits representing octets. |
boolean | 0 (false) or 1 (true). |
char | One-character string. |
date | Date, in a subset of the ISO 8601 format, without the time data. Example: 1998-11-02. |
dateTime | Date, in a subset of the ISO 8601 format, with optional time but no zone. Fractional seconds can be as precise as nanoseconds. Example: 1988-07-09T18:39:09. |
dateTime.tz | Date, in a subset of the ISO 8601 format, with optional time and optional zone. Fractional seconds can be as precise as nanoseconds. Example: 1988-07-09T18:39:09-08:00. |
fixed.14.4 | Same as number, but supports no more than 14 digits to the left of the decimal point and no more than 4 to the right. |
float | Real number with essentially no limit on the number of digits. Optionally contains a leading sign, fractional digits, and/or an exponent. Values from 1.7976931348623157E+308 to 2.2250738585072014E-308. |
int | Number with optional sign, no fractions, and no exponent. |
number | Number with essentially no limit on the number of digits. Optionally contains a leading sign, fractional digits, and/or an exponent. |
time | Time, in a subset of the ISO 8601 format, with no date and no time zone. Example: 06:18:35. |
time.tz | Time, in a subset of the ISO 8601 format, with no date but optional time zone. Example: 03:1525-04:00. |
i1 | Integer represented in 1 byte. A number with optional sign, no fractions, and no exponent. Examples: 1, 34, -165. |
i2 | Integer represented in 2 bytes. A number with optional sign, no fractions, and no exponent. Examples: 1, 244, -56433. |
i4 | Integer represented in 4 bytes. A number with optional sign, no fractions, and no exponent. Examples: 1, 556, -34234, 156645, -2005000700. |
i8 | Integer represented in 8 bytes. A number with optional sign, no fractions, and no exponent. Examples: 1, 646, -65333, 2666345433454, -2007000800090090. |
r4 | Real number with essentially no limit on the number of digits. Optionally contains a leading sign, fractional digits, and/or an exponent. Values from 3.40282347E+38F to 1.17549435E-38F. |
r8 | Same as float. |
ui1 | Unsigned 1-byte integer. An unsigned number with no fractions and no exponent. Examples: 1, 255. |
ui2 | Unsigned 2-byte integer. An unsigned number with no fractions and no exponent. Examples: 1, 255, 65535. |
ui4 | Unsigned 4-byte integer. An unsigned number with no fractions and no exponent. Examples: 1, 660, 2005000000. |
ui8 | Unsigned 8-byte integer. An unsigned number with no fractions and no exponent. Example: 1582437474934. |
uri | Uniform Resource Identifier (URI). Example: urn:schemas-flowers-com:wildflowers. |
uuid | Hexadecimal digits that represent octets. Optionally contains embedded hyphens that are ignored. Example: 333C7BC4-460F-11D0-BC04-0080C7055A83. |