Appendix B

Data Types in XML

This appendix contains three tables that identify various data types in XML: two tables focuses on XML content and one table focuses on the XML Document object model (DOM). The first table contains the primitive data types, the second table contains the supported rich data types, and the third table contains the DOM node types.

Primitive data types are those types identified in the XML 1.0 specification. These basic data types are used to identify different "pieces" of an XML document, and are not the data types you would typically find in a traditional programming language or database management system (DBMS). For example, the type entity identifies to the processor that the object is an entity and must therefore follow certain rules. But specifying this type doesn't indicate whether the data is text, a number, or a date, for example, because according to the XML 1.0 specification, all text that is not markup constitutes the character data of the document. So you can think of the primitive data types as different varieties of character data, or text. More information on how primitive types are used can be found in Chapter 4.

Rich data types are referenced in the XML-Data specification and are used within the datatypes namespace. These data types are more typical of those found in a programming language or DBMS, and include types such as int, char, and date. More on rich data types can be found in Chapters 6 and 10.

DOM provides support for typing nodes of the document tree. Many of these types map to the primitive XML data types, but others are included as well. A DOM node type identifies the type of node being worked with and might indicate the type of data contained in the node. For example, NODE_COMMENT indicates that the node in question is a comment. DOM identifies each node type by a numerical value. More information on DOM node types can be found in Chapter 9.

Primitive Types (Available for Attributes Only)

Data Type Name Example of Attribute Value Parse Type
entity entity1 ENTITY
entities entity1 entity2 ENTITIES
enumeration one ENUMERATION
id a ID
idref a IDREF
idrefs a b c IDREFS
nmtoken name1 NMTOKEN
nmtokens name1 name2 NMTOKENS
notation GIF NOTATION
string This is a string. PCDATA

Rich Data Types (Available for Elements and Attributes)

Data Type Name Examples of Element/Attribute Values Parse Type
bin.base64 MIME-style Base64 encoded binary chunk.
bin.hex Hexadecimal digits representing octets.
boolean 0, 1 (0 equals false, 1 equals true) "0" or "1"
char x string
date 1994-11-05 A date in a subset of the ISO 8601 format with no time.
dateTime 1988-04-07T18:39:09 A date in a subset of the ISO 8601 format with optional time and no optional zone. Fractional seconds may be as precise as nanoseconds.
dateTime.tz 1988-04-07T18:39:09-08:00 A date in a subset of the ISO 8601 format with optional time and optional zone. Fractional seconds may be as precise as nanoseconds.
fixed.14.4 12.0044 Same as number but no more than 14 digits to the left of the decimal point, and no more than 4 digits to the right.
i1 1, 127, -128 A number with optional sign, no fractions, and no exponent.
i2 1, 703, -32768 A number with optional sign, no fractions, and no exponent.
i4 1, 703, -32768, 148343,
-1000000000
A number with optional sign, no fractions, and no exponent.
i8 1, 703, -32768, 1483433434334,
-1000000000000000
A number with optional sign, no fractions, and no exponent.
int 1, 58502, -13 A number with optional sign, no fractions, and no exponents.
number 15, 3.14, -123.456E+10 A number with essentially no limit on the number of digits. May potentially have a leading sign, fractional digits, and, optionally, an exponent. Punctuation as in U.S. English.
r4 .3141592E+1 Same parse type as number, but with approximate minimum value 1.17549435E-38F and approximate maximum value 3.40282347E+38F.
r8 .314159265358979E+1 Same parse type as number, but with approximate minimum value 2.2250738585072014E
-308 and approximate maximum value 1.7976931348623157E+308.
string This is a string. PCDATA
time 08:15:27 A time in a subset of the ISO 8601 format with no date and no zone.
time.tz 08:1527-05:00 A time in a subset of the ISO 8601 format with no date but optional zone.
ui1 1, 255 An unsigned number with no fractions and no exponent.
ui2 1, 255, 65535 An unsigned number with no fractions and no exponent.
ui4 1, 703, 3000000000 An unsigned number with no fractions and no exponent.
ui8 1483433434334 An unsigned number with no fractions and no exponent.
uri urn:schemas-microsoft-com Universal Resource Identifier
user-defined type VT_UNKNOWN
uuid 333C7BC4-460F-11D0-BC04-0080C7055A83 Hexadecimal digits representing octets, with optional embedded hyphens that should be ignored.

DOM Node Types

Node Type Name Value
NODE_ELEMENT 1
NODE_ATTRIBUTE 2
NODE_TEXT 3
NODE_CDATA_SECTION 4
NODE_ENTITY_REFERENCE 5
NODE_ENTITY 6
NODE_PROCESSING_INSTRUCTION 7
NODE_COMMENT 8
NODE_DOCUMENT 9
NODE_DOCUMENT_TYPE 10
NODE_DOCUMENT_FRAGMENT 11
NODE_NOTATION 12