An HDF5 dataset is an array of data elements, arranged according to the specifications of the dataspace. In general, a data element is the smallest addressable unit of storage in the HDF5 file. (Compound datatypes are the exception to this rule.) The HDF5 datatype defines the storage format for a single data element (Figure 1).
The model for HDF5 attributes is extremely similar to datasets: an attribute has a dataspace and a datatype, as shown in Figure 1. The information in this chapter applies to both datasets and attributes.
|
|
| Figure 1 |
Abstractly, each data element within the dataset is a sequence of bits, interpreted as a single value from a set of values (e.g., a number or a character). For a given data type, there is a standard or convention for representing the values as bits, and when the bits are represented in a particular storage the bits are laid out in a specific storage scheme, e.g., as 8-bit bytes, with a specific ordering and alignment of bytes within the storage array.
HDF5 datatypes implement a flexible, extensible, and portable mechanism for specifying and discovering the storage layout of the data elements, determining how to interpret the elements (e.g., as floating point numbers), and for transferring data from different compatible layouts.
An HDF5 datatype describes one specific layout of bits, a dataset has a single datatype which applies to every data element. When a dataset is created, the storage datatype is defined, the datatype cannot be changed.
When data is transferred (e.g., a read or write), each end point of the transfer has a datatype, which describes the correct storage for the elements. The source and destination may have different (but compatible) layouts, in which case the data elements are automatically transformed during the transfer.
HDF5 datatypes describe commonly used binary formats for numbers (integers and floating point) and characters (ASCII). A given computing architecture and programming language supports certain number and character representations. For example, a computer may support 8-, 16-, 32-, and 64-bit signed integers, stored in memory in little-endian byte order. These would presumably correspond to the C programming language types 'char', 'short', 'int', and 'long'.
When reading and writing from memory, the HDF5 library must know the appropriate datatype that describes the architecture specific layout. The HDF5 library provides the platform independent 'NATIVE' types, which are mapped to an appropriate datatype for each platform. So the type 'H5T_NATIVE_INT' is an alias for the appropriate descriptor for each platform.
Data in memory has a datatype
In addition to numbers and characters, an HDF5 datatype can describe more abstract classes of types, including enumerations, strings, bit strings, and references (pointers to objects in the HDF5 file). HDF5 supports several classes of composite datatypes, which are compose one or more other datatypes. In addition to the standard predefined datatypes, users can define new datatypes within the datatype classes.
The HDF5 datatype model is very general and flexible
The HDF5 Library implements an object-oriented model of datatypes. HDF5 datatypes are organized as a logical set of base types, or datatype classes. Each datatype class defines a format for representing logical values as a sequence of bits. For example the H5T_INTEGER class is a format for representing twos complement integers of various sizes.
A datatype class is defined as a set of one or more datatype properties. A datatype property is a property of the bit string. The datatype properties are defined by the logical model of the datatype class. For example, the integer class (twos complement integers) has properties such as "signed or unsigned", "length", and "byte-order". The float class (IEEE floating point numbers) has these properties, plus "exponent bits", "exponent sign", etc.
A datatype is derived from one datatype class: a given datatype has a specific value for the datatype properties defined by the class. For example, for 32-bit signed integers, stored big-endian, the HDF5 datatype is a sub-type of integer, with the properties set to: signed=1, size=4 (bytes), byte-order=BE.
The HDF5 datatype API provides methods to create datatypes of different datatype classes, to set the datatype properties of a new datatype, and to discover the datatype properties of an existing datatype.
The datatype for a dataset is stored in the HDF5 file as part of the metadata for the dataset. A datatype can be shared by more than one dataset in the file. A datatype can optionally be stored as a named object in the file.
When transferring data (e.g., a read or write), the data elements of the source and destination storage must have compatible types. As a general rule, data elements with the same datatype class are compatible, while elements from different datatype classes are not compatible. When transferring data of one datatype to another compatible datatype, the HDF5 Library uses the datatype properties of the source and destination to automatically transform each data element. For example, when reading from data stored as 32-bit, signed integers, big-endian, into 32-bit signed integers, little-endian, the HDF5 Library will automatically swap the bytes.
Thus, data transfer operations (H5Dread, H5Dwrite, H5Aread, H5Awrite) require a datatype for both the source and the destination.
|
|
| Figure 2 |
The HDF5 Library defines a set of predefined datatypes, corresponding to commonly used storage formats, such as twos complement integers, IEEE Floating point numbers, etc., 4- and 8-byte sizes, big endian and little endian byte orders. In addition, a user can derive types with custom values for the properties. For example, a user program may create a datatype to describe a 6-bit integer, or a 600-bit floating point number.
In addition to atomic datatypes, the HDF5 Library supports composite datatypes. A composite datatype is an aggregation of one or more datatypes. Each class of composite datatypes has properties that describe the organization of the composite datatype (Figure 3). Composite datatypes include:
|
|
| Figure 3 |
Figure 4 shows the HDF5 datatype classes. Each class is defined to have a set of properties which describe layout of the data element and the interpretation of the bits. Table 1 lists the properties for the datatype classes.
|
|
| Figure 4 |
Table 1. Datatype Classes and their properties. |
|||
|
Class |
Description |
Properties |
Notes |
| Integer |
Twos complement integers |
Size (bytes), precision (bits), offset (bits), pad, byte order, signed/unsigned |
|
| Float |
Floating Point numbers |
Size (bytes), precision (bits), offset (bits), pad, byte order, sign position, exponent position, exponent size (bits), exponent sign, exponent bias, mantissa position, mantissa (size) bits, mantissa sign, mantissa normalization, internal padding |
See IEEE 754 for a definition of these properties. These properties describe non-IEEE 754 floating point formats as well. |
| Character |
Array of 1-byte character encoding |
Size (characters), Character set, byte order, pad/no pad, pad character |
Currently, only ASCII is supported. |
| Bitfield |
String of bits |
Size (bytes), precision (bits), offset (bits), pad, byte order |
When stored, are packed into bytes |
| Opaque |
Uninterpreted data |
Size (bytes), precision (bits), offset (bits), pad, byte order, tag |
A sequence of bytes, stored and retrieved as a block. The ‘tag’ is a string that can be used to label the value. |
| Enumeration |
A list of discrete values, with symbolic names in the form of strings. |
Number of elements, element names, element values |
Enumeration is a list of pairs, (name, value). The name is a string, the value is an unsigned integer. |
| Reference |
Reference to object or region within the HDF5 file |
See the Reference API, H5R |
|
| Array |
Array (1-4 dimensions) of data elements |
Number of dimensions, dimension sizes, base datatype |
The array is accessed atomically: no selection or subsetting. |
| Variable length |
A variable length 1-dimensional array of data data elements |
Current size, base type |
|
| Compound |
A Datatype composed of a sequence of Datatypes |
Number of members, member names, member types, member offset, member class, member size, byte order |
|
The HDF5 library predefines a modest number of commonly used datatypes.
These types have standard symbolic names of the form
H5T_arch_base where arch is an architecture
name and base is a programming type name (Table 2). New types can
be derived from the predefined types by copying the predefined type (see
H5Tcopy()) and then modifying the result.
The base name of most types consists of a letter to indicate the class (Table 3), a precision in bits, and an indication of the byte order (Table 4).
Table 5 shows examples of predefined datatypes.
The full list can be found in the "HDF5 Predefined Datatypes" section
of the HDF5 Reference Manual.
Table 2 |
|
|
Architecture Name |
Description |
|
|
IEEE-754 standard floating point types in various byte orders. |
|
|
This is an architecture that contains semi-standard datatypes like signed two's complement integers, unsigned integers, and bitfields in various byte orders. |
|
|
Types which are specific to the C or Fortran
programming languages are defined in these architectures. For instance,
|
|
|
This architecture contains C-like datatypes for
the machine on which the library was compiled. The types were actually
defined by running the |
|
|
Cray architectures. These are word-addressable, big-endian systems with non-IEEE floating point. |
|
|
All Intel and compatible CPU's including 80286, 80386, 80486, Pentium, Pentium-Pro, and Pentium-II. These are little-endian systems with IEEE floating-point. |
|
|
All MIPS CPU's commonly used in SGI systems. These are big-endian systems with IEEE floating-point. |
|
|
All DEC Alpha CPU's, little-endian systems with IEEE floating-point. |
Table 3 |
|
|
|
Bitfield |
| F |
Floating point |
| I |
Signed integer |
| R |
References |
| S |
Character string |
| U |
Unsigned integer |
Table 4 |
|
|
BE |
Big endian |
| LE |
Little endian |
Table 5 |
|
|
Example |
Description |
|
|
Eight-byte, little-endian, IEEE floating-point |
|
|
Four-byte, big-endian, IEEE floating point |
|
|
Four-byte, little-endian, signed two's complement integer |
|
|
Two-byte, big-endian, unsigned integer |
|
|
One-byte, null-terminated string of eight-bit characters |
|
|
Eight-byte bit field on an Intel CPU |
|
|
Eight-byte Cray floating point |
|
|
Reference to an entire object in a file |
The HDF5 Library predefines a set of NATIVE datatypes which
are similar to C type names. The native types are set to be an alias for the
appropriate HDF5 datatype for each platform. For example, H5T_NATIVE_INT
corresponds to a C int type. On an Intel based PC, this type is the same as
H5T_STD_I32LE, while on a MIPS system this would be equivalent to
H5T_STD_I32BE. Table 6 shows examples of NATIVE types and corresponding
C types for a common 32-bit workstation.
Table 6 |
|
|
Example |
Corresponding C Type |
|
|
char |
|
|
signed char |
|
|
unsigned char |
|
|
short |
|
|
unsigned short |
|
|
int |
|
|
unsigned |
|
|
long |
|
|
unsigned long |
|
|
long long |
|
|
unsigned long long |
|
|
float |
|
|
double |
|
|
long double |
|
|
hsize_t |
|
|
hssize_t |
|
|
herr_t |
|
|
hbool_t |
The HDF5 Library manages datatypes as objects. The HDF5 datatype API
manipulates the datatype objects through C function calls. New datatypes
can be created from scratch or copied from existing datatypes. When a
datatype is no longer needed its resources should be released by calling
H5Tclose().
The datatype object is used in several roles in the HDF5 data model and library. Essentially, a datatype is used whenever the format of data elements is needed. There are four major uses of datatypes in the HDF5 Library: at dataset creation, during data transfers, when discovering the contents of a file, and for specifying user defined data types (Table 7).
Table 7 |
|
|
Use |
Description |
| Dataset creation |
The datatype of the data elements must be declared when the dataset is created. |
| Data transfer |
The datatype (format) of the data elements must be defined for both the source and destination. |
| Discovery |
The datatype of a dataset can be interrogated to retrieve a complete description of the storage layout. |
| Creating User defined Datatypes |
Users can define their own datatypes by creating datatype objects and setting its properties. |
All the data elements of a dataset have the same datatype. When a dataset
is created (H5Tcreate), the datatype for the data elements must
be specified. The datatype of a dataset can never be changed. Figure 5 shows
the use of a datatype to create a dataset called "/dset". In this example,
the dataset will be stored as 32-bit signed integers, in big endian order.
|
| Figure 5 |
Probably the most common use of datatypes is to write or read data from a dataset or attribute. In these operations, each data element is transferred from the source to the destination (possibly rearranging the order of the elements). Since the source and destination do not need to be identical (i.e., one is disk and the other is memory) the transfer requires both the format of the source element and the destination element. Therefore, data transfers use two datatype objects, for the source and destination.
When data is written, the source is memory and the destination is disk (file). The memory datatype describes the format of the data element in the machine memory, and the file datatype describes the desired format of the data element on disk. Similarly, when reading, the source datatype describes the format of the data element on disk, and the destination datatype describes the format in memory.
In the most common cases, the file datatype is the datatype specified when the dataset was created, and the memory datatype should be the appropriate NATIVE type.
Figures 5 and 6, respectively, show examples of writing data to and reading data from a dataset. The data in memory is declared C type 'int', the datatype H5T_NATIVE_INT corresponds to this type. The datatype of the dataset should be of datatype class H5T_INTEGER.
|
| Figure 6 |
|
| Figure 7 |
The HDF5 Library enables a program to determine the datatype class and properties for any data type. In order to discover the storage format of data in a dataset, the datatype is obtained, and the properties determined by queries to the datatype object. Figure 8 shows an example of code that analyzes the datatype for an integer, and prints out a description of its storage properties (byte Order, signed, size.)
|
| Figure 8 |
Most programs will primarily use the predefined datatypes described above, possibly in composite datatypes such as compound or array datatypes. However, the HDF5 datatype model is extremely general; a user program can define a great variety of atomic datatypes (storage layouts). In particular, the datatype properties can define signed and unsigned integers of any size and byte order, and floating point numbers with different formats, size, and byte order. The HDF5 datatype API provides methods to set these properties.
User defined types can be used to define the layout of data in memory, e.g.,
to match some platform specific number format or application defined bit-field.
The user defined type can also describe data in the file, e.g., some
application-defined format. The user defined types can be translated to and
from standard types of the same class, as described above.
3. Datatype (H5T) Function Summaries
|
C Function F90 Function |
Purpose |
H5Tcreate
|
Creates a new datatype. |
H5Topen
|
Opens a named datatype. |
H5Tcommit
|
Commits a transient datatype to a file, creating a new named datatype. |
H5Tcommitted
|
Determines whether a datatype is a named type or a transient type. |
H5Tcopy
|
Copies an existing datatype. |
H5Tequal
|
Determines whether two datatype identifiers refer to the same datatype. |
H5Tlock
|
Locks a datatype. |
H5Tget_class
|
Returns the datatype class identifier. |
H5Tget_size
|
Returns the size of a datatype. |
H5Tget_super
|
Returns the base datatype from which a datatype is derived. |
H5Tget_native_type
|
Returns the native datatype of a specified datatype. |
H5Tdetect_class
|
Determines whether a datatype is of the given datatype class. |
H5Tclose
|
Releases a datatype. |
|
C Function F90 Function |
Purpose |
H5Tconvert
|
Converts data from between specified datatypes. |
H5Tfind
|
Finds a conversion function. |
H5Tset_overflow
|
Sets the overflow handler to a specified function. |
H5Tget_overflow
|
Returns a pointer to the current global overflow function. |
H5Tregister
|
Registers a conversion function. |
H5Tunregister
|
Removes a conversion function from all conversion paths. |
|
C Function F90 Function |
Purpose |
H5Tset_size
|
Sets the total size for an atomic datatype. |
H5Tget_order
|
Returns the byte order of an atomic datatype. |
H5Tset_order
|
Sets the byte ordering of an atomic datatype. |
H5Tget_precision
|
Returns the precision of an atomic datatype. |
H5Tset_precision
|
Sets the precision of an atomic datatype. |
H5Tget_offset
|
Retrieves the bit offset of the first significant bit. |
H5Tset_offset
|
Sets the bit offset of the first significant bit. |
H5Tget_pad
|
Retrieves the padding type of the least and most-significant bit padding. |
H5Tset_pad
|
Sets the least and most-significant bits padding types. |
H5Tget_sign
|
Retrieves the sign type for an integer type. |
H5Tset_sign
|
Sets the sign property for an integer type. |
H5Tget_fields
|
Retrieves floating point datatype bit field information. |
H5Tset_fields
|
Sets locations and sizes of floating point bit fields. |
H5Tget_ebias
|
Retrieves the exponent bias of a floating-point type. |
H5Tset_ebias
|
Sets the exponent bias of a floating-point type. |
H5Tget_norm
|
Retrieves mantissa normalization of a floating-point datatype. |
H5Tset_norm
|
Sets the mantissa normalization of a floating-point datatype. |
H5Tget_inpad
|
Retrieves the internal padding type for unused bits in floating-point datatypes. |
H5Tset_inpad
|
Fills unused internal floating point bits. |
H5Tget_cset
|
Retrieves the character set type of a string datatype. |
H5Tset_cset
|
Sets character set to be used. |
H5Tget_strpad
|
Retrieves the storage mechanism for a string datatype. |
H5Tset_strpad
|
Defines the storage mechanism for character strings. |
|
C Function F90 Function |
Purpose |
H5Tenum_create
|
Creates a new enumeration datatype. |
H5Tenum_insert
|
Inserts a new enumeration datatype member. |
H5Tenum_nameof
|
Returns the symbol name corresponding to a specified member of an enumeration datatype. |
H5Tenum_valueof
|
Returns the value corresponding to a specified member of an enumeration datatype. |
H5Tget_member_value
|
Returns the value of an enumeration datatype member. |
H5Tget_nmembers
|
Retrieves the number of elements in a compound or enumeration datatype. |
H5Tget_member_name
|
Retrieves the name of a compound or enumeration datatype member. |
H5Tget_member_index
|
Retrieves the index of a compound or enumeration datatype member. |
|
C Function F90 Function |
Purpose |
H5Tget_nmembers
|
Retrieves the number of elements in a compound or enumeration datatype. |
H5Tget_member_class
|
Returns datatype class of compound datatype member. |
H5Tget_member_name
|
Retrieves the name of a compound or enumeration datatype member. |
H5Tget_member_index
|
Retrieves the index of a compound or enumeration datatype member. |
H5Tget_member_offset
|
Retrieves the offset of a field of a compound datatype. |
H5Tget_member_type
|
Returns the datatype of the specified member. |
H5Tinsert
|
Adds a new member to a compound datatype. |
H5Tpack
|
Recursively removes padding from within a compound datatype. |
|
C Function F90 Function |
Purpose |
H5Tarray_create
|
Creates an array datatype object. |
H5Tget_array_ndims
|
Returns the rank of an array datatype. |
H5Tget_array_dims
|
Returns sizes of array dimensions and dimension permutations. |
|
C Function F90 Function |
Purpose |
H5Tvlen_create
|
Creates a new variable-length datatype. |
H5Tis_variable_str
|
Determines whether datatype is a variable-length string. |
|
C Function F90 Function |
Purpose |
H5Tset_tag
|
Tags an opaque datatype. |
H5Tget_tag
|
Gets the tag associated with an opaque datatype. |
|
C Function F90 Function |
Purpose |
H5LTtext_to_dtype
|
Creates a datatype from a text description. |
H5LTdtype_to_text
|
Generates a text description of a datatype. |
The HDF5 Library implements an object-oriented model of datatypes. HDF5 datatypes are organized as a logical set of base types, or datatype classes. The HDF5 Library manages datatypes as objects. The HDF5 datatype API manipulates the datatype objects through C function calls. Figure 9 shows the abstract view of the datatype object. Table 8 shows the methods (C functions) that operate on datatype object as a whole. New datatypes can be created from scratch or copied from existing datatypes.
| |||
| Figure 9. The datatype object |
Table 8. General operations on datatype objects |
|
| API function |
Description |
|
|
Create a new datatype object of datatype class class. The following datatype classes are supported with this function:
H5Tcopy(). |
|
|
Obtain a modifiable transient datatype which is a copy of type. If type is a dataset identifier then the type returned is a modifiable transient copy of the datatype of the specified dataset. |
|
|
Open a named datatype. The named datatype returned by this function is read-only. |
|
|
Determines if two types are equal. |
|
|
Releases resources associated with a datatype obtained from H5Tcopy, H5Topen, or H5Tcreate. It is illegal to close an immutable transient datatype (e.g., predefined types). |
|
|
Commit a transient datatype (not immutable) a file to become a named datatype. Named datatypes can be shared. |
|
|
Test whether the datatype is transient or commited (named). |
|
|
Make a transient datatype immutable (read-only and not closable). Predefined types are locked. |
In order to use a datatype, the object must be created (H5Tcreate),
or a reference obtained by cloning from an existing type (H5Tcopy),
or opened (H5Topen). In addition, a reference to the datatype of
a dataset or attribute can be obtained with H5Dget_type or H5Aget_type.
For composite datatypes a reference to the datatype for members or base types
can be obtained (H5Tget_member_type, H5Tget_super).
When the datatype object is no longer needed, the reference is discarded with
H5Tclose.
Two datatype objects can be tested to see if they are the same with H5Tequal.
This function returns true if the two datatype references refer to the same
datatype object. However, if two datatype objects define equivalent datatypes
(the same datatype class and datatype properties), they will not be considered
'equal'.
A datatype can be written to the file as a first class object (H5Tcommit).
Named datatypes can be used in the same way as any other dataype. Named datatypes
are explained below.
Any HDF5 datatype object can be queried to discover all of its datatype properties. For each datatype class, there are a set of API functions to retrieve the datatype properties for this class.
Table 9 lists the functions to discover the properties of atomic datatypes. Table 10 lists the queries relevant to specific numeric types. Table 11 gives the properties for atomic string datatype, and Table 12 gives the property of the opaque datatype.
Table 9 |
|
| Functions to Discover Properties of Atomic DataTypes |
Description |
|
|
The datatype class: |
|
|
The total size of the element in bytes, including padding which may appear on either side of the actual value. |
|
|
The byte order describes how the bytes of
the datatype are laid out in memory. If the lowest memory address contains
the least significant byte of the datum then it is said to be little-endian
or |
|
|
The |
|
|
The |
|
|
Padding is the bits of a data element which
are not significant as defined by the |
Table 10 |
|
| Properties of Atomic Numeric Types |
Description |
|
|
(INTEGER) Integer data can
be signed two's complement ( |
|
|
(FLOAT) A floating-point data element has bit fields which are the exponent and mantissa as well as a mantissa sign bit. These properties define the location (bit position of least significant bit of the field) and size (in bits) of each field. The sign bit is always of length one and none of the fields are allowed to overlap. |
|
|
(FLOAT) The exponent is stored
as a non-negative value which is |
|
|
(FLOAT) This property describes the normalization method of the mantissa.
|
|
|
(FLOAT) If any internal
bits (that is, bits between the sign bit, the mantissa field, and the
exponent field but within the precision field) are unused, then they will
be filled according to the value of this property. The padding can be:
H5T_PAD_NONE, |
Table 11 |
|
| Properties of Atomic String Datatypes |
Description |
|
|
The only character set currently supported
is |
|
|
The string datatype has a fixed length,
but the String may be shorter than the length. This property defines the
storage mechanism for the left over bytes. The options are: |
Table 12 |
|
| Properties of Opaque Atomic Datatypes |
Description |
| char *H5Tget_tag(hid_t type_id) |
A user defined string. |
The composite datatype classes can also be analyzed to discover their datatype properties and the datatypes that are members or base types of the composite datatype. The member or base type can, in turn, be analyzed. Table 13 lists the functions that can access the datatype properties of the different composite datatypes.
Table 13 |
|
| Properties of Composite Datatype |
Description |
|
|
(COMPOUND) The number of fields in the compound datatype. |
|
|
(COMPOUND) The datatype class
of compound datatype member |
|
|
(COMPOUND) The name of field
|
|
|
(COMPOUND) The byte offset of the beginning of a field within a compound datatype. |
|
|
(COMPOUND) The datatype of the specified member. |
|
|
(ARRAY) The number of dimensions (rank) of the array datatype object. |
|
|
(ARRAY) The sizes of the dimensions and the dimension permutations of the array datatype object. |
|
|
(ARRAY, VL, ENUM) The base datatype from which the datatype type is derived. |
|
|
(ENUM) The symbol name that corresponds to the specified value of the enumeration datatype |
|
|
(ENUM) The value that corresponds to the specified name of the enumeration datatype |
|
|
(ENUM) The value of the
enumeration datatype member |
The HDF5 Library enables user programs to create and modify datatypes. The essential steps are:
To create a user defined atomic datatype, the procedure is to clone a predefined
datatype of the appropriate datatype class (H5Tcopy). Then set
the datatype properties appropriate to the datatype class. For example, Table
14 shows how to create a datatype to describe a 1024-bit unsigned integer.
Table 14 |
|
hid_t new_type = H5Tcopy (H5T_NATIVE_INT);
H5Tset_precision(new_type, 1024);
H5Tset_sign(new_type, H5T_SGN_NONE);
|
|
Composite datatypes are created with a specific API call for each datatype class. Table 15 shows the creation method for each datatype class. A newly created datatype cannot be used until the datatype properties are set. For example, a newly created compound datatype has no members and cannot be used.
Table 15 |
|
| Datatype Class |
Function to Create |
| COMPOUND |
|
| OPAQUE |
|
| ENUM |
|
| ARRAY |
|
| VL |
|
Once the datatype is created and the datatype properties set, the datatype object can be used.
Predefined datatypes are defined by the library during initialization using
the same mechanisms as described here. Each predefined datatype is locked (H5Tlock),
so that it cannot be changed or destroyed. User defined datatypes may also be
locked using H5Tlock.
Table 16 summarizes the API methods that set properties of atomic types. Table 17 shows properties specific to numeric types, Table 18 shows properties specific to the string datatype class. Note that offset, pad, etc. don't apply to strings. Table 19 shows the specific property of the OPAQUE datatype class.
Table 16 |
|
| Functions to set Properties of Atomic DataTypes |
Description |
|
|
Set the total size of the element in bytes, including padding which may appear on either side of the actual value. If this property is reset to a smaller value which would cause the significant part of the data to extend beyond the edge of the datatype then the offset property is decremented a bit at a time. If the offset reaches zero and the significant part of the data still extends beyond the edge of the datatype then the precision property is decremented a bit at a time. Decreasing the size of a datatype may fail if the H5T_FLOAT bit fields would extend beyond the significant part of the type. |
|
|
Set the byte order to little-endian ( |
|
|
Set the number of significant bits of a datatype.
The |
|
|
Set the bit location of the least significant
bit of a bit field whose length is |
|
|
Set the padding to zeros ( |
Table 17 |
|
| Properties of Numeric Types |
Description |
|
|
(INTEGER) Integer data can
be signed two's complement ( |
|
|
(FLOAT) Set the properties define the location (bit position of least significant bit of the field) and size (in bits) of each field. The sign bit is always of length one and none of the fields are allowed to overlap. |
|
|
(FLOAT) The exponent is stored
as a non-negative value which is |
|
|
(FLOAT) This property describes the normalization method of the mantissa.
|
|
|
(FLOAT) If any internal
bits (that is, bits between the sign bit, the mantissa field, and the
exponent field but within the precision field) are unused, then they will
be filled according to the value of this property. The padding can be:
H5T_PAD_NONE, |
Table 18 |
|
| Properties of Atomic String Datatypes |
Description |
|
|
Set the length of the string, in bytes. The
precision is automatically set to 8* |
|
|
The precision must be a multiple of 8. |
|
|
The only character set currently supported
is |
|
|
The string datatype has a fixed length, but the string may be shorter than the length. This property defines the storage mechanism for the left over bytes. The method used to store character strings differs with the programming language:
Valid string padding values, as passed in the parameter strpad, are as follows:
|
Table 19 |
|
| Properties of Opaque Atomic Datatypes |
Description |
|
|
Tags the opaque datatype type_id with an ASCII identifier tag. |
Figure 10 shows an example of how to create a 128-bit, little-endian signed integer type one could use the following (increasing the precision of a type automatically increases the total size). Note that the proper procedure is to begin from a type of the intended datatype class, in this case, a NATIVE INT.
|
| Figure 10 |
Figure 11 shows the storage layout as the type is defined. The H5Tcopy creates a datatype that is the same as H5T_NATIVE_INT. In this example, suppose this is a 32-bit big endian number (Figure 11a). The precision is set to 128 bits, which automatically extends the size to 8 bytes (Figure 11b). Finally, the byte order is set to little-endian (Figure 11c).
| |||||||||||||||||||||||||||||||||||||||||||||||||
| Figure 11 |
The significant bits of a data element can be offset from the beginning of
the memory for that element by an amount of padding. The offset
property specifies the number of bits of padding that appear to the "right of"
the value. Table 20 and Figure 12 shows how a 32-bit unsigned integer with 16-bits
of precision having the value 0x1122 will be laid out in memory.
Table 20 |
||||
| Byte Position |
Big-Endian |
Big-Endian |
Little-Endian |
Little-Endian |
| 0: |
[pad] |
[0x11] |
[0x22] |
[pad] |
| 1: |
[pad] |
[0x22] |
[0x11] |
[pad] |
| 2: |
[0x11] |
[pad] |
[pad] |
[0x22] |
| 3: |
[0x22] |
[pad] |
[pad] |
[0x11] |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Figure 12 |
If the offset is incremented then the total size is incremented also if necessary to prevent significant bits of the value from hanging over the edge of the datatype.
The bits of the entire data are numbered beginning at zero at the least significant
bit of the least significant byte (the byte at the lowest memory address for
a little-endian type or the byte at the highest address for a big-endian type).
The offset property defines the bit location of the least signficant
bit of a bit field whose length is precision. If the offset is
increased so the significant bits "hang over" the edge of the datum, then the
size property is automatically incremented.
To illustrate the properties of the integer datatype class, Figure 13 shows how to create a user defined datatype that describes a 24-bit signed integer that starts on the third bit of a 32-bit word. The datatype is specialized from a 32-bit integer, the precision is set to 24 bits, and the offset is set to 3.
|
| Figure 13 |
Figure 14 shows the storage layout for a data element. Note that the unused
bits in the offset will be set to zero and the unused bits at the end will be
set to one, as specified in the H5Tset_pad call.
|
||||||||||||||||
| Figure 14. A User defined integer Datatype: range -1,048,583 to 1,048,584 | ||||||||||||||||
To illustrate a user defined floating point number, Figure 13 shows how to create a 24-bit floating point number, that starts 5 bits into a 4 byte word. The floating point number is defined to have a mantissa of 19 bits (bits 5-23), and exponent of 3 bits (25-27) and the sign bit is bit 28. (Note that this is an illustration of what can be done, not necessarily a floating point format that a user would require.)
|
| Figure 15 |
|
||||||||||||||||
| Figure 16. A User defined Floating Point Datatype. | ||||||||||||||||
Figure 16 shows the storage layout of a data element for this datatype. Note that there is an unused bit (24) between the mantissa and the exponent. This bit is filled with the inpad value, in this case 0.
The sign bit is always of length one and none of the fields are allowed to overlap. When expanding a floating-point type one should set the precision first; when decreasing the size one should set the field positions and sizes first.
All composite datatypes must be user defined; there are no predefined composite datatypes.
The subsections below describe how to create a compound datatype and how to write and read data of compound datatype.
Compound datatypes are conceptually similar to a C struct or Fortran 95 derived types. The compound datatype defines a contiguous sequence of bytes, which are formatted using one up to 2^16 datatypes (members). A compound datatype may have any number of members, in any order, and the members may have any datatype, including compound. Thus, complex nested compound datatypes can be created. The total size of the compound datatype is greater than or equal to the sum of the size of its members, up to a maximum of 2^32 bytes. HDF5 does not support datatypes with distinguished records or the equivalent of C unions or Fortran 95 EQUIVALENCE statement.
Usually a C struct or Fortran derived type will be defined to hold
a data point in memory, and the offsets of the members in memory will
be the offsets of the struct members from the beginning of an instance
of the struct. The HDF5 C library provides a macro
HOFFSET (s,m) to calculate the member's offset. The HDF5
Fortran applications have to calculate offsets by using sizes of members
datatypes and by taking in consideration the order of members in the
Fortran derived type.
HOFFSET(s,m)
offsetof(s,m)
stddef.h does exactly the same
thing as the HOFFSET() macro.Note for Fortran users: Offsets of Fortran structure members correspond to the offsets within a packed datatype (see explanation below) stored in an HDF5 file.
Each member of a compound datatype must have a descriptive name which is the key used to uniquely identify the member within the compound datatype. A member name in an HDF5 datatype does not necessarily have to be the same as the name of the member in the C struct of Fortran derived type, although this is often the case. Nor does one need to define all members of the C struct of Fortran derived type in the HDF5 compound datatype (or vice versa).
Unlike atomic datatypes which are derived from other atomic datatypes, compound datatypes are created from scratch. First, one creates an empty compound datatype and specifies its total size. Then members are added to the compound datatype in any order. Each member type is inserted at a designated offset. Each member has a name which is the key used to uniquely identify the member within the compound datatype.
Figure 17a shows an example of creating an HDF5 C compound datatype to
describe a complex number, which is a structure with two components,
"real" and "imaginary", each double. An equivalent C struct
is whose type is defined by the complex_t struct, is shown.
|
| Figure 17a |
Figure 17b shows an example of creating an HDF5 Fortran compound datatype to
describe a complex number, which is a Fortran derived type with two components,
"real" and "imaginary", each DOUBLE PRECISION. An equivalent Fortran TYPE is
whose type is defined by the TYPE complex_t, is shown.
|
| Figure 17b |
Important Note: The compound datatype is created with a size
sufficient to hold all its members. In the C example above, the size of
the C struct and the HOFFSET macro are used as a convenient
mechanism to determine the appropriate size and offset. Alternatively, the
size and offset could be manually determined, e.g., the size can be set to
16 with "real" at offset 0 and "imaginary" at offset 8. However, different
platforms and compilers have different sizes for "double", and may have
alignment restrictions which require additional padding within the structure.
It is much more portable to use the HOFFSET macro, which assures
that the values will be correct for any platform.
Figure 18 shows how the compound datatype would be laid out, assuming that
NATIVE_DOUBLE are 64-bit numbers, and there are no alignment
requirements. The total size of the compound datatype will be 16 bytes,
the "real" component will start at byte 0, and "imaginary" will start at byte 8.
|
|||||||||||||||||||||||||||||||||||||||
| Figure 18 | |||||||||||||||||||||||||||||||||||||||
The members of a compound datatype may be any HDF5 datatype, including compound, array, and VL. Figures 19 and 20 show an example which creates a compound datatype composed of two complex values, each of which is a compound datatype as in Figure 18 above.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Figure 19 |
|
| Figure 20 |
Note that a similar result could be accomplished by creating a
compound datatype and inserting four fields (Figure 21). This
results in the same layout as above (Figure 19). The difference
would be how the fields are addressed. In the first case, the
real part of 'y' is called 'y.re'; in the second case it is 'y-re'.
|
| Figure 21 |
The members of a compound datatype do not always
fill all the bytes. The HOFFSET macro
assures that the members will be laid out according
to the requirements of the platform and language.
Figure 22 shows an example of a C struct which requires
extra bytes of padding on many platforms. The second
element, 'b', is a 1-byte character, followed by an 8
byte double, 'c'. On many systems, the 8-byte value must
be stored on a 4- or 8-byte boundary, requiring the struct
to be larger than the sum of the size of its elements.
In Figure 22 , the sizeof and
HOFFSET macro is used to assure that the
members are inserted at the correct offset to match the
memory conventions of the platform. Figure 23 shows how
this data element would be stored in memory, assuming the
double must start on a 4-byte boundary. Notice the extra
bytes between 'b' and 'c'.
|
| Figure 22 |
|
|