The XML DTD for HDF5 is the standard DTD to describe HDF5 files and their contents. The HDF5 DTD specifies rules for the structure of an HDF5 XML document. It provides a list of the elements, tags, attributes, and entities contained in an HDF5 XML document, and their relationship to each other. NCSA tools, such as h5hiew, h5gen and the HDF5 dumper, read and write XML that conforms to the rules defined by the DTD. Other tools should conform to this standard.
This document gives a brief tutorial introduction to how an HDF5 file is represented in XML.
Document Header
A header is required for a valid HDF5 XML document. The required header information includes information for HDF5 tools to identify the document type and locate the HDF5 DTD. Every XML file generated from h5view and h5dump will have the following XML header.
<?xml version="1.0"?> <!DOCTYPE HDF5-File PUBLIC "HDF5-File.dtd" "/DTDs/HDF5-File.dtd"> <HDF5-File> <RootGroup OBJ-XID="root"> </RootGroup> </HDF5-File> |
Table shows the XML description of an empty HDF5 file. The first
elements are the XML declaration and the document type declaration:
<?xml version="1.0"?>
<!DOCTYPE HDF5-File PUBLIC "HDF5-File.dtd"
"/DTDs/HDF5-File.dtd">
The URL "/DTDs/HDF5-File.dtd" is the offcial
HDF5 DTD at the NCSA-HDF web site. This does not have to point to the HDF
web site, but it must be the URL of a valid version of the DTD. For
interoperability, it is recommended that shared XML documents use the pubilshed
URL.
The <HDF5-File> element is the top level of the document. An HDF5 file file always has a single root group. The XML document must have one and only one <RootGroup> element.
Table 1 is a complete and valid XML description of an HDF5 file.
Group Element
Objects in an HDF5 file are organized in Groups. Objects can be shared, i.e., they can be members of more than one group. In this case, the HDF5 file is a graph, not a tree, because some objects have more than one parent. It is also possible for Groups to directly or indirectly contain an ancestor. In other words, the graph can have a loop in it.Although XML is general enough to describe almost any structure, there are some limitation in representing the structure of an HDF5 file in a general set XML notation. For the details of this issue, read the section Description of the Structure (Groups) in the Design Notes.
Since H5View displays the structure of an HDF5 file as tree, the XML descriptions are trees, with exactly one root, and objects nested in their parent. A group that is also its direct or indirect ancestor (loop)is treated as an empty group.
Table 2 shows an XML document which contains MyGroup with two membergroups:
Group_A and Group_B.
<?xml version="1.0"?> <!DOCTYPE HDF5-File PUBLIC "HDF5-File.dtd" "/DTDs/HDF5-File.dtd"> <HDF5-File> <RootGroup OBJ-XID="root"> <Group Name="MyGroup" OBJ-XID="/MyGroup" Parents="/"> <Group Name="Group_A" OBJ-XID="/MyGroup/Group_A" Parents="/MyGroup"> </Group> <Group Name="Group_B" OBJ-XID="/MyGroup/Group_B" Parents="/MyGroup"> </Group> </Group> </RootGroup> </HDF5-File> |
Dataset Elements
A dataset element in XML must have one each <Dataspace>, <DataType>, and <Data> elements. A dataset may also have zero or more <Attribute> elements. The <Dataspace> and <DataType> elements describe the HDF5 data space and the data type of the elements.The <Data> element optionally includes the data values of the dataset in a single text element. The order of the data values is the same as C memory order, as printed by the h5dump utility. Numerical data values are separated by spaces or new lines. String data value must be enclosed with double quotes. An escape character is inserted in front of XML reserved characters, ( ", ', &, <, and >). Some tools may not write the values of compound or variable length datas. However, the data type and data space information is always written into the XML file. (In other words, in some cases the XML will describe the data, but not contain the actual values.)
The value of an HDF5 object reference is represented with a full path of the object that the reference points to. For example, the value of attribute "PALETTE" is a path of the palette dataset.
Table 3 is an example of dataset in XML. The XML document contains two
datasets, a string dataset and an integer dataset.
<?xml version="1.0"?> <!DOCTYPE HDF5-File PUBLIC "HDF5-File.dtd" "/DTDs/HDF5-File.dtd"> <HDF5-File> <RootGroup OBJ-XID="root"> <Dataset Name="A note" OBJ-XID="/A note" Parents="/"> <Dataspace> <SimpleDataspace Ndims="1"> <Dimension DimSize="2" MaxDimSize="2" /> </SimpleDataspace> </Dataspace> <DataType> <AtomicType> <StringType Cset="H5T_CSET_ASCII" StrSize="47" StrPad="H5T_STR_NULLPAD" /> </AtomicType> </DataType> <Data> <DataFromFile> "This file was created for testing purpose. " "It contains groups, datatypes, datasets, links." </DataFromFile> </Data> </Dataset> <Dataset Name="int_array" OBJ-XID="/int_array" Parents="/"> <Dataspace> <SimpleDataspace Ndims="2"> <Dimension DimSize="5" MaxDimSize="5" /> <Dimension DimSize="10" MaxDimSize="10" /> </SimpleDataspace> </Dataspace> <DataType> <AtomicType> <IntegerType ByteOrder="LE" Sign="true" Size="4" /> </AtomicType> </DataType> <Data> <DataFromFile> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 </DataFromFile> </Data> </Dataset> </RootGroup> </HDF5-File> |
Shared Objects and Links
For HDF5 objects that may be shared (Groups, Datasets, Named Datatypes) the XML element is defined to be either a description of the object or a "pointer" to an element that describes the object. A shared object should be described in exactly one element, and all other instances should point to that element.Table 4 is an example of link element. In the XML document, the dataset
HL_dset is a link to "/dset".
<?xml version="1.0"?> <!DOCTYPE HDF5-File PUBLIC "HDF5-File.dtd" "/DTDs/HDF5-File.dtd"> <HDF5-File> <RootGroup OBJ-XID="root"> <Dataset Name="dset" OBJ-XID="/dset" Parents="/"> <Dataspace> <SimpleDataspace Ndims="2"> <Dimension DimSize="4" MaxDimSize="4" /> <Dimension DimSize="6" MaxDimSize="6" /> </SimpleDataspace> </Dataspace> <DataType> <AtomicType> <IntegerType ByteOrder="BE" Sign="true" Size="4" /> </AtomicType> </DataType> <Data> <DataFromFile> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 </DataFromFile> </Data> </Dataset> <Group Name="glinks" OBJ-XID="/glinks" Parents="/"> <Dataset Name="HL_dset" OBJ-XID="/glinks/HL_dset" Parents="/glinks"> <DatasetPtr OBJ-XID="/dset" /> </Dataset> </Group> <Dataset Name="HL_dset" OBJ-XID="/HL_dset" Parents="/"> <DatasetPtr OBJ-XID="/dset" /> </Dataset> </RootGroup> </HDF5-File> |
- - Last modified:August 15th 2007
