Chapter 6
HDF5 Datatypes

1. Introduction

1.1 Introduction and Definitions

An HDF5 dataset is an array of data elements, arranged according to the specifications of the dataspace. In general, a data element is the smallest addressable unit of storage in the HDF5 file. (Compound datatypes are the exception to this rule.) The HDF5 datatype defines the storage format for a single data element (Figure 1).

The model for HDF5 attributes is extremely similar to datasets: an attribute has a dataspace and a datatype, as shown in Figure 1. The information in this chapter applies to both datasets and attributes.

Figure 1

Abstractly, each data element within the dataset is a sequence of bits, interpreted as a single value from a set of values (e.g., a number or a character). For a given data type, there is a standard or convention for representing the values as bits, and when the bits are represented in a particular storage the bits are laid out in a specific storage scheme, e.g., as 8-bit bytes, with a specific ordering and alignment of bytes within the storage array.

HDF5 datatypes implement a flexible, extensible, and portable mechanism for specifying and discovering the storage layout of the data elements, determining how to interpret the elements (e.g., as floating point numbers), and for transferring data from different compatible layouts.

An HDF5 datatype describes one specific layout of bits, a dataset has a single datatype which applies to every data element. When a dataset is created, the storage datatype is defined, the datatype cannot be changed.

When data is transferred (e.g., a read or write), each end point of the transfer has a datatype, which describes the correct storage for the elements. The source and destination may have different (but compatible) layouts, in which case the data elements are automatically transformed during the transfer.

HDF5 datatypes describe commonly used binary formats for numbers (integers and floating point) and characters (ASCII). A given computing architecture and programming language supports certain number and character representations. For example, a computer may support 8-, 16-, 32-, and 64-bit signed integers, stored in memory in little-endian byte order. These would presumably correspond to the C programming language types 'char', 'short', 'int', and 'long'.

When reading and writing from memory, the HDF5 library must know the appropriate datatype that describes the architecture specific layout. The HDF5 library provides the platform independent 'NATIVE' types, which are mapped to an appropriate datatype for each platform. So the type 'H5T_NATIVE_INT' is an alias for the appropriate descriptor for each platform.

Data in memory has a datatype

In addition to numbers and characters, an HDF5 datatype can describe more abstract classes of types, including enumerations, strings, bit strings, and references (pointers to objects in the HDF5 file). HDF5 supports several classes of composite datatypes, which are compose one or more other datatypes. In addition to the standard predefined datatypes, users can define new datatypes within the datatype classes.

The HDF5 datatype model is very general and flexible

1.2 HDF5 Datatype Model

The HDF5 Library implements an object-oriented model of datatypes. HDF5 datatypes are organized as a logical set of base types, or datatype classes. Each datatype class defines a format for representing logical values as a sequence of bits. For example the H5T_INTEGER class is a format for representing twos complement integers of various sizes.

A datatype class is defined as a set of one or more datatype properties. A datatype property is a property of the bit string. The datatype properties are defined by the logical model of the datatype class. For example, the integer class (twos complement integers) has properties such as "signed or unsigned", "length", and "byte-order". The float class (IEEE floating point numbers) has these properties, plus "exponent bits", "exponent sign", etc.

A datatype is derived from one datatype class: a given datatype has a specific value for the datatype properties defined by the class. For example, for 32-bit signed integers, stored big-endian, the HDF5 datatype is a sub-type of integer, with the properties set to: signed=1, size=4 (bytes), byte-order=BE.

The HDF5 datatype API provides methods to create datatypes of different datatype classes, to set the datatype properties of a new datatype, and to discover the datatype properties of an existing datatype.

The datatype for a dataset is stored in the HDF5 file as part of the metadata for the dataset. A datatype can be shared by more than one dataset in the file. A datatype can optionally be stored as a named object in the file.

When transferring data (e.g., a read or write), the data elements of the source and destination storage must have compatible types. As a general rule, data elements with the same datatype class are compatible, while elements from different datatype classes are not compatible. When transferring data of one datatype to another compatible datatype, the HDF5 Library uses the datatype properties of the source and destination to automatically transform each data element. For example, when reading from data stored as 32-bit, signed integers, big-endian, into 32-bit signed integers, little-endian, the HDF5 Library will automatically swap the bytes.

Thus, data transfer operations (H5Dread, H5Dwrite, H5Aread, H5Awrite) require a datatype for both the source and the destination.

Figure 2

The HDF5 Library defines a set of predefined datatypes, corresponding to commonly used storage formats, such as twos complement integers, IEEE Floating point numbers, etc., 4- and 8-byte sizes, big endian and little endian byte orders. In addition, a user can derive types with custom values for the properties. For example, a user program may create a datatype to describe a 6-bit integer, or a 600-bit floating point number.

In addition to atomic datatypes, the HDF5 Library supports composite datatypes. A composite datatype is an aggregation of one or more datatypes. Each class of composite datatypes has properties that describe the organization of the composite datatype (Figure 3). Composite datatypes include:


Figure 3

1.2.1 Datatype Classes and Properties

Figure 4 shows the HDF5 datatype classes. Each class is defined to have a set of properties which describe layout of the data element and the interpretation of the bits. Table 1 lists the properties for the datatype classes.

Figure 4

Table 1. Datatype Classes and their properties.

Class

Description

Properties

Notes

Integer

Twos complement integers

Size (bytes), precision (bits), offset (bits), pad, byte order, signed/unsigned

 

Float

Floating Point numbers

Size (bytes), precision (bits), offset (bits), pad, byte order, sign position, exponent position, exponent size (bits), exponent sign, exponent bias, mantissa position, mantissa (size) bits, mantissa sign, mantissa normalization, internal padding

See IEEE 754 for a definition of these properties. These properties describe non-IEEE 754 floating point formats as well.

Character

Array of 1-byte character encoding

Size (characters), Character set, byte order, pad/no pad, pad character

Currently, only ASCII is supported.

Bitfield

String of bits

Size (bytes), precision (bits), offset (bits), pad, byte order

When stored, are packed into bytes

Opaque

Uninterpreted data

Size (bytes), precision (bits), offset (bits), pad, byte order, tag

A sequence of bytes, stored and retrieved as a block. The ‘tag’ is a string that can be used to label the value.

Enumeration

A list of discrete values, with symbolic names in the form of strings.

Number of elements, element names, element values

Enumeration is a list of pairs, (name, value). The name is a string, the value is an unsigned integer.

Reference

Reference to object or region within the HDF5 file

 

See the Reference API, H5R

Array

Array (1-4 dimensions) of data elements

Number of dimensions, dimension sizes, base datatype

The array is accessed atomically: no selection or subsetting.

Variable length

A variable length 1-dimensional array of data data elements

Current size, base type

 

Compound

A Datatype composed of a sequence of Datatypes

Number of members, member names, member types, member offset, member class, member size, byte order

 

1.2.2 Predefined Datatypes

The HDF5 library predefines a modest number of commonly used datatypes. These types have standard symbolic names of the form H5T_arch_base where arch is an architecture name and base is a programming type name (Table 2). New types can be derived from the predefined types by copying the predefined type (see H5Tcopy()) and then modifying the result.

The base name of most types consists of a letter to indicate the class (Table 3), a precision in bits, and an indication of the byte order (Table 4).

Table 5 shows examples of predefined datatypes. The full list can be found in the "HDF5 Predefined Datatypes" section of the HDF5 Reference Manual.

Table 2

Architecture Name

Description

IEEE

IEEE-754 standard floating point types in various byte orders.

STD

This is an architecture that contains semi-standard datatypes like signed two's complement integers, unsigned integers, and bitfields in various byte orders.

C
FORTRAN

Types which are specific to the C or Fortran programming languages are defined in these architectures. For instance, H5T_C_S1 defines a base string type with null termination which can be used to derive string types of other lengths.

NATIVE

This architecture contains C-like datatypes for the machine on which the library was compiled. The types were actually defined by running the H5detect program when the library was compiled. In order to be portable, applications should almost always use this architecture to describe things in memory.

CRAY

Cray architectures. These are word-addressable, big-endian systems with non-IEEE floating point.

INTEL

All Intel and compatible CPU's including 80286, 80386, 80486, Pentium, Pentium-Pro, and Pentium-II. These are little-endian systems with IEEE floating-point.

MIPS

All MIPS CPU's commonly used in SGI systems. These are big-endian systems with IEEE floating-point.

ALPHA

All DEC Alpha CPU's, little-endian systems with IEEE floating-point.



Table 3

 

Bitfield

F

Floating point

I

Signed integer

R

References

S

Character string

U

Unsigned integer



Table 4

BE

Big endian

LE

Little endian



Table 5

Example

Description

H5T_IEEE_F64LE

Eight-byte, little-endian, IEEE floating-point

H5T_IEEE_F32BE

Four-byte, big-endian, IEEE floating point

H5T_STD_I32LE

Four-byte, little-endian, signed two's complement integer

H5T_STD_U16BE

Two-byte, big-endian, unsigned integer

H5T_C_S1

One-byte, null-terminated string of eight-bit characters

H5T_INTEL_B64

Eight-byte bit field on an Intel CPU

H5T_CRAY_F64

Eight-byte Cray floating point

H5T_STD_ROBJ

Reference to an entire object in a file


The HDF5 Library predefines a set of NATIVE datatypes which are similar to C type names. The native types are set to be an alias for the appropriate HDF5 datatype for each platform. For example, H5T_NATIVE_INT corresponds to a C int type. On an Intel based PC, this type is the same as H5T_STD_I32LE, while on a MIPS system this would be equivalent to H5T_STD_I32BE. Table 6 shows examples of NATIVE types and corresponding C types for a common 32-bit workstation.

Table 6

Example

Corresponding C Type

H5T_NATIVE_CHAR

char

H5T_NATIVE_SCHAR

signed char

H5T_NATIVE_UCHAR

unsigned char

H5T_NATIVE_SHORT

short

H5T_NATIVE_USHORT

unsigned short

H5T_NATIVE_INT

int

H5T_NATIVE_UINT

unsigned

H5T_NATIVE_LONG

long

H5T_NATIVE_ULONG

unsigned long

H5T_NATIVE_LLONG

long long

H5T_NATIVE_ULLONG

unsigned long long

H5T_NATIVE_FLOAT

float

H5T_NATIVE_DOUBLE

double

H5T_NATIVE_LDOUBLE

long double

H5T_NATIVE_HSIZE

hsize_t

H5T_NATIVE_HSSIZE

hssize_t

H5T_NATIVE_HERR

herr_t

H5T_NATIVE_HBOOL

hbool_t

2. How Datatypes Are Used

2.1 The Datatype object and the HDF5 Datatype API

The HDF5 Library manages datatypes as objects. The HDF5 datatype API manipulates the datatype objects through C function calls. New datatypes can be created from scratch or copied from existing datatypes. When a datatype is no longer needed its resources should be released by calling H5Tclose().

The datatype object is used in several roles in the HDF5 data model and library. Essentially, a datatype is used whenever the format of data elements is needed. There are four major uses of datatypes in the HDF5 Library: at dataset creation, during data transfers, when discovering the contents of a file, and for specifying user defined data types (Table 7).

Table 7

Use

Description

Dataset creation

The datatype of the data elements must be declared when the dataset is created.

Data transfer

The datatype (format) of the data elements must be defined for both the source and destination.

Discovery

The datatype of a dataset can be interrogated to retrieve a complete description of the storage layout.

Creating User defined Datatypes

Users can define their own datatypes by creating datatype objects and setting its properties.

2.2 Dataset creation

All the data elements of a dataset have the same datatype. When a dataset is created (H5Tcreate), the datatype for the data elements must be specified. The datatype of a dataset can never be changed. Figure 5 shows the use of a datatype to create a dataset called "/dset". In this example, the dataset will be stored as 32-bit signed integers, in big endian order.


  hid_t dt;
  dt = H5Tcopy(H5T_STD_I32BE);
  dataset_id = H5Dcreate(file_id, "/dset", dt, dataspace_id,   
      H5P_DEFAULT);
Figure 5

2.3 Data transfer (Read and Write)

Probably the most common use of datatypes is to write or read data from a dataset or attribute. In these operations, each data element is transferred from the source to the destination (possibly rearranging the order of the elements). Since the source and destination do not need to be identical (i.e., one is disk and the other is memory) the transfer requires both the format of the source element and the destination element. Therefore, data transfers use two datatype objects, for the source and destination.

When data is written, the source is memory and the destination is disk (file). The memory datatype describes the format of the data element in the machine memory, and the file datatype describes the desired format of the data element on disk. Similarly, when reading, the source datatype describes the format of the data element on disk, and the destination datatype describes the format in memory.

In the most common cases, the file datatype is the datatype specified when the dataset was created, and the memory datatype should be the appropriate NATIVE type.

Figures 5 and 6, respectively, show examples of writing data to and reading data from a dataset. The data in memory is declared C type 'int', the datatype H5T_NATIVE_INT corresponds to this type. The datatype of the dataset should be of datatype class H5T_INTEGER.


   int  dset_data[DATA_SIZE];

   status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, 
         H5P_DEFAULT, dset_data);
Figure 6


 int dset_data[DATA_SIZE];

  status = H5Dread(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, 
      H5P_DEFAULT,  dset_data);
Figure 7

2.4 Discovery of data format

The HDF5 Library enables a program to determine the datatype class and properties for any data type. In order to discover the storage format of data in a dataset, the datatype is obtained, and the properties determined by queries to the datatype object. Figure 8 shows an example of code that analyzes the datatype for an integer, and prints out a description of its storage properties (byte Order, signed, size.)


    switch (H5Tget_class(type)) {
    case H5T_INTEGER:
	ord = H5Tget_order(type);
	sgn = H5Tget_sign(type);
	printf("Integer ByteOrder= ");
	switch (ord) {
	case H5T_ORDER_LE:
	    printf("LE");
	    break;
	case H5T_ORDER_BE:
	    printf("BE");
	    break;
	}
	printf(" Sign= ");
	switch (sgn) {
	case H5T_SGN_NONE:
	    printf("false");
	    break;
	case H5T_SGN_2:
	    printf("true");
	    break;
	}
	printf(" Size= ");
	sz = H5Tget_size(type);
	printf("%d", sz);
	printf("\n");
	break;
Figure 8

2.5 Creating and using user defined datatypes

Most programs will primarily use the predefined datatypes described above, possibly in composite datatypes such as compound or array datatypes. However, the HDF5 datatype model is extremely general; a user program can define a great variety of atomic datatypes (storage layouts). In particular, the datatype properties can define signed and unsigned integers of any size and byte order, and floating point numbers with different formats, size, and byte order. The HDF5 datatype API provides methods to set these properties.

User defined types can be used to define the layout of data in memory, e.g., to match some platform specific number format or application defined bit-field. The user defined type can also describe data in the file, e.g., some application-defined format. The user defined types can be translated to and from standard types of the same class, as described above.

3. Datatype (H5T) Function Summaries

3.1 General Datatype Operations

C Function
F90 Function
Purpose
H5Tcreate
h5tcreate_f
Creates a new datatype.
H5Topen
h5topen_f
Opens a named datatype.
H5Tcommit
h5tcommit_f
Commits a transient datatype to a file, creating a new named datatype.
H5Tcommitted
h5tcommitted_f
Determines whether a datatype is a named type or a transient type.
H5Tcopy
h5tcopy_f
Copies an existing datatype.
H5Tequal
h5tequal_f
Determines whether two datatype identifiers refer to the same datatype.
H5Tlock
(none)
Locks a datatype.
H5Tget_class
h5tget_class_f
Returns the datatype class identifier.
H5Tget_size
h5tget_size_f
Returns the size of a datatype.
H5Tget_super
h5tget_super_f
Returns the base datatype from which a datatype is derived.
H5Tget_native_type
(none)
Returns the native datatype of a specified datatype.
H5Tdetect_class
(none)
Determines whether a datatype is of the given datatype class.
H5Tclose
h5tclose_f
Releases a datatype.

3.2 Conversion Functions

C Function
F90 Function
Purpose
H5Tconvert
(none)
Converts data from between specified datatypes.
H5Tfind
(none)
Finds a conversion function.
H5Tset_overflow
(none)
Sets the overflow handler to a specified function.
H5Tget_overflow
(none)
Returns a pointer to the current global overflow function.
H5Tregister
(none)
Registers a conversion function.
H5Tunregister
(none)
Removes a conversion function from all conversion paths.

3.3 Atomic Datatype Properties

C Function
F90 Function
Purpose
H5Tset_size
h5tset_size_f
Sets the total size for an atomic datatype.
H5Tget_order
h5tget_order_f
Returns the byte order of an atomic datatype.
H5Tset_order
h5tset_order_f
Sets the byte ordering of an atomic datatype.
H5Tget_precision
h5tget_precision_f
Returns the precision of an atomic datatype.
H5Tset_precision
h5tset_precision_f
Sets the precision of an atomic datatype.
H5Tget_offset
h5tget_offset_f
Retrieves the bit offset of the first significant bit.
H5Tset_offset
h5tset_offset_f
Sets the bit offset of the first significant bit.
H5Tget_pad
h5tget_pad_f
Retrieves the padding type of the least and most-significant bit padding.
H5Tset_pad
h5tset_pad_f
Sets the least and most-significant bits padding types.
H5Tget_sign
h5tget_sign_f
Retrieves the sign type for an integer type.
H5Tset_sign
h5tset_sign_f
Sets the sign property for an integer type.
H5Tget_fields
h5tget_fields_f
Retrieves floating point datatype bit field information.
H5Tset_fields
h5tset_fields_f
Sets locations and sizes of floating point bit fields.
H5Tget_ebias
h5tget_ebias_f
Retrieves the exponent bias of a floating-point type.
H5Tset_ebias
h5tset_ebias_f
Sets the exponent bias of a floating-point type.
H5Tget_norm
h5tget_norm_f
Retrieves mantissa normalization of a floating-point datatype.
H5Tset_norm
h5tset_norm_f
Sets the mantissa normalization of a floating-point datatype.
H5Tget_inpad
h5tget_inpad_f
Retrieves the internal padding type for unused bits in floating-point datatypes.
H5Tset_inpad
h5tset_inpad_f
Fills unused internal floating point bits.
H5Tget_cset
h5tget_cset_f
Retrieves the character set type of a string datatype.
H5Tset_cset
h5tset_cset_f
Sets character set to be used.
H5Tget_strpad
h5tget_strpad_f
Retrieves the storage mechanism for a string datatype.
H5Tset_strpad
h5tset_strpad_f
Defines the storage mechanism for character strings.

3.4 Enumeration Datatypes

C Function
F90 Function
Purpose
H5Tenum_create
h5tenum_create_f
Creates a new enumeration datatype.
H5Tenum_insert
h5tenum_insert_f
Inserts a new enumeration datatype member.
H5Tenum_nameof
h5tenum_nameof_f
Returns the symbol name corresponding to a specified member of an enumeration datatype.
H5Tenum_valueof
h5tenum_valueof_f
Returns the value corresponding to a specified member of an enumeration datatype.
H5Tget_member_value
h5tget_member_value_f
Returns the value of an enumeration datatype member.
H5Tget_nmembers
h5tget_nmembers_f
Retrieves the number of elements in a compound or enumeration datatype.
H5Tget_member_name
h5tget_member_name_f
Retrieves the name of a compound or enumeration datatype member.
H5Tget_member_index
(none)
Retrieves the index of a compound or enumeration datatype member.

3.5 Compound Datatype Properties

C Function
F90 Function
Purpose
H5Tget_nmembers
h5tget_nmembers_f
Retrieves the number of elements in a compound or enumeration datatype.
H5Tget_member_class
(none)
Returns datatype class of compound datatype member.
H5Tget_member_name
h5tget_member_name_f
Retrieves the name of a compound or enumeration datatype member.
H5Tget_member_index
(none)
Retrieves the index of a compound or enumeration datatype member.
H5Tget_member_offset
h5tget_member_offset_f
Retrieves the offset of a field of a compound datatype.
H5Tget_member_type
h5tget_member_type_f
Returns the datatype of the specified member.
H5Tinsert
h5tinsert_f
Adds a new member to a compound datatype.
H5Tpack
h5tpack_f
Recursively removes padding from within a compound datatype.

3.6 Array Datatypes

C Function
F90 Function
Purpose
H5Tarray_create
(none)
Creates an array datatype object.
H5Tget_array_ndims
(none)
Returns the rank of an array datatype.
H5Tget_array_dims
(none)
Returns sizes of array dimensions and dimension permutations.

3.7 Variable-length Datatypes

C Function
F90 Function
Purpose
H5Tvlen_create
h5tvlen_create_f
Creates a new variable-length datatype.
H5Tis_variable_str
h5tis_variable_str_f
Determines whether datatype is a variable-length string.

3.8 Opaque Datatypes

C Function
F90 Function
Purpose
H5Tset_tag
h5tset_tag_f
Tags an opaque datatype.
H5Tget_tag
h5tget_tag_f
Gets the tag associated with an opaque datatype.

3.9 Datatype-to-text and Text-to-datatype Conversions

C Function
F90 Function
Purpose
H5LTtext_to_dtype
(none)
Creates a datatype from a text description.
H5LTdtype_to_text
(none)
Generates a text description of a datatype.

4. The Programming Model

4.1 Introduction

The HDF5 Library implements an object-oriented model of datatypes. HDF5 datatypes are organized as a logical set of base types, or datatype classes. The HDF5 Library manages datatypes as objects. The HDF5 datatype API manipulates the datatype objects through C function calls. Figure 9 shows the abstract view of the datatype object. Table 8 shows the methods (C functions) that operate on datatype object as a whole. New datatypes can be created from scratch or copied from existing datatypes.


Datatype
 size:int?
 byteOrder:BOtype
 open(hid_t loc, char *, name):return hid_t
 copy(hid_t tid) return hid_t
 create(hid_class_t clss, size_t size) return hid_t 
 
Figure 9. The datatype object

Table 8. General operations on datatype objects

API function

Description

hid_t H5Tcreate (H5T_class_t class, size_t size)

Create a new datatype object of datatype class class. The following datatype classes are supported with this function:

  • H5T_COMPOUND
  • H5T_OPAQUE
  • H5T_ENUM
Other datatypes are created with H5Tcopy().

hid_t H5Tcopy (hid_t type)

Obtain a modifiable transient datatype which is a copy of type. If type is a dataset identifier then the type returned is a modifiable transient copy of the datatype of the specified dataset.

hid_t H5Topen (hid_t location, const char *name)

Open a named datatype. The named datatype returned by this function is read-only.

htri_t H5Tequal (hid_t type1, hid_t type2)

Determines if two types are equal.

herr_t H5Tclose (hid_t type)

Releases resources associated with a datatype obtained from H5Tcopy, H5Topen, or H5Tcreate. It is illegal to close an immutable transient datatype (e.g., predefined types).

herr_t H5Tcommit (hid_t location, const char *name, hid_t type)

Commit a transient datatype (not immutable) a file to become a named datatype. Named datatypes can be shared.

htri_t H5Tcommitted (hid_t type)

Test whether the datatype is transient or commited (named).

herr_t H5Tlock (hid_t type)

Make a transient datatype immutable (read-only and not closable). Predefined types are locked.


In order to use a datatype, the object must be created (H5Tcreate), or a reference obtained by cloning from an existing type (H5Tcopy), or opened (H5Topen). In addition, a reference to the datatype of a dataset or attribute can be obtained with H5Dget_type or H5Aget_type. For composite datatypes a reference to the datatype for members or base types can be obtained (H5Tget_member_type, H5Tget_super). When the datatype object is no longer needed, the reference is discarded with H5Tclose.

Two datatype objects can be tested to see if they are the same with H5Tequal. This function returns true if the two datatype references refer to the same datatype object. However, if two datatype objects define equivalent datatypes (the same datatype class and datatype properties), they will not be considered 'equal'.

A datatype can be written to the file as a first class object (H5Tcommit). Named datatypes can be used in the same way as any other dataype. Named datatypes are explained below.

4.2 Discovery of Datatype Properties

Any HDF5 datatype object can be queried to discover all of its datatype properties. For each datatype class, there are a set of API functions to retrieve the datatype properties for this class.

4.2.1 Properties of Atomic Datatypes

Table 9 lists the functions to discover the properties of atomic datatypes. Table 10 lists the queries relevant to specific numeric types. Table 11 gives the properties for atomic string datatype, and Table 12 gives the property of the opaque datatype.

Table 9

Functions to Discover Properties of Atomic DataTypes

Description

H5T_class_t H5Tget_class (hid_t type)

The datatype class: H5T_INTEGER, H5T_FLOAT, H5T_STRING, or H5T_BITFIELD, H5T_OPAQUE, H5T_COMPOUND, H5T_REFERENCE, H5T_ENUM, H5T_VLEN, H5T_ARRAY

size_t H5Tget_size (hid_t type)

The total size of the element in bytes, including padding which may appear on either side of the actual value.

H5T_order_t H5Tget_order (hid_t type)

The byte order describes how the bytes of the datatype are laid out in memory. If the lowest memory address contains the least significant byte of the datum then it is said to be little-endian or H5T_ORDER_LE. If the bytes are in the opposite order then they are said to be big-endian or H5T_ORDER_BE.

size_t H5Tget_precision (hid_t type)

The precision property identifies the number of significant bits of a datatype and the offset property (defined below) identifies its location. Some datatypes occupy more bytes than what is needed to store the value. For instance, a short on a Cray is 32 significant bits in an eight-byte field.

int H5Tget_offset (hid_t type)

The offset property defines the bit location of the least significant bit of a bit field whose length is precision.

herr_t H5Tget_pad (hid_t type, H5T_pad_t *lsb, H5T_pad_t *msb)

Padding is the bits of a data element which are not significant as defined by the precision and offset properties. Padding in the low-numbered bits is lsb padding and padding in the high-numbered bits is msb padding. Padding bits can be set to zero (H5T_PAD_ZERO) or one (H5T_PAD_ONE).


Table 10

Properties of Atomic Numeric Types

Description

H5T_sign_t H5Tget_sign (hid_t type)

(INTEGER) Integer data can be signed two's complement (H5T_SGN_2) or unsigned (H5T_SGN_NONE).

herr_t H5Tget_fields (hid_t type, size_t *spos, size_t *epos, size_t *esize, size_t *mpos, size_t *msize)

(FLOAT) A floating-point data element has bit fields which are the exponent and mantissa as well as a mantissa sign bit. These properties define the location (bit position of least significant bit of the field) and size (in bits) of each field. The sign bit is always of length one and none of the fields are allowed to overlap.

size_t H5Tget_ebias (hid_t type)

(FLOAT) The exponent is stored as a non-negative value which is ebias larger than the true exponent.

H5T_norm_t H5Tget_norm (hid_t type)

(FLOAT) This property describes the normalization method of the mantissa.

  • H5T_NORM_MSBSET: the mantissa is shifted left (if non-zero) until the first bit after the radix point is set and the exponent is adjusted accordingly. All bits of the mantissa after the radix point are stored.
  • H5T_NORM_IMPLIED: the mantissa is shifted left \ (if non-zero) until the first bit after the radix point is set and the exponent is adjusted accordingly. The first bit after the radix point is not stored since it's always set.
  • H5T_NORM_NONE: the fractional part of the mantissa is stored without normalizing it.

H5T_pad_t H5Tget_inpad (hid_t type)

(FLOAT) If any internal bits (that is, bits between the sign bit, the mantissa field, and the exponent field but within the precision field) are unused, then they will be filled according to the value of this property. The padding can be: H5T_PAD_NONE, H5T_PAD_ZERO or H5T_PAD_ONE.


Table 11

Properties of Atomic String Datatypes

Description

H5T_cset_t H5Tget_cset (hid_t type)

The only character set currently supported is H5T_CSET_ASCII.

H5T_str_t H5Tget_strpad (hid_t type)

The string datatype has a fixed length, but the String may be shorter than the length. This property defines the storage mechanism for the left over bytes. The options are: H5T_STR_NULLTERM, H5T_STR_NULLPAD, or H5T_STR_SPACEPAD.


Table 12

Properties of Opaque Atomic Datatypes

Description

char *H5Tget_tag(hid_t type_id)

A user defined string.


4.2.2 Properties of Composite Datatypes

The composite datatype classes can also be analyzed to discover their datatype properties and the datatypes that are members or base types of the composite datatype. The member or base type can, in turn, be analyzed. Table 13 lists the functions that can access the datatype properties of the different composite datatypes.

Table 13

Properties of Composite Datatype

Description

int H5Tget_nmembers(hid_t type_id )

(COMPOUND) The number of fields in the compound datatype.

H5T_class_t H5Tget_member_class( hid_t cdtype_id, unsigned member_no )

(COMPOUND) The datatype class of compound datatype member member_no.

char * H5Tget_member_name(hid_t type_id, unsigned field_idx )

(COMPOUND) The name of field field_idx of a compound datatype.

size_t H5Tget_member_offset(hid_t type_id, unsigned memb_no )

(COMPOUND) The byte offset of the beginning of a field within a compound datatype.

hid_t H5Tget_member_type(hid_t type_id, unsigned field_idx )

(COMPOUND) The datatype of the specified member.

int H5Tget_array_ndims( hid_t adtype_id )

(ARRAY) The number of dimensions (rank) of the array datatype object.

int H5Tget_array_dims( hid_t adtype_id, hsize_t *dims[], int *perm[] )

(ARRAY) The sizes of the dimensions and the dimension permutations of the array datatype object.

hid_t H5Tget_super(hid_t type )

(ARRAY, VL, ENUM) The base datatype from which the datatype type is derived.

herr_t H5Tenum_nameof(hid_t type void *value, char *name, size_t size )

(ENUM) The symbol name that corresponds to the specified value of the enumeration datatype

herr_t H5Tenum_valueof(hid_t type char *name, void *value )

(ENUM) The value that corresponds to the specified name of the enumeration datatype

herr_t H5Tget_member_value(hid_t type unsigned memb_no, void *value )

(ENUM) The value of the enumeration datatype member memb_no

4.3 Definition of Datatypes

The HDF5 Library enables user programs to create and modify datatypes. The essential steps are:

  1. a) Create a new datatype object of a specific composite datatype class, or
    b) Copy an existing atomic datatype object.
  2. Set properties of the datatype object.
  3. Use the datatype object.
  4. Close the datatype object.

To create a user defined atomic datatype, the procedure is to clone a predefined datatype of the appropriate datatype class (H5Tcopy). Then set the datatype properties appropriate to the datatype class. For example, Table 14 shows how to create a datatype to describe a 1024-bit unsigned integer.

Table 14

    hid_t new_type = H5Tcopy (H5T_NATIVE_INT);
    H5Tset_precision(new_type, 1024);
    H5Tset_sign(new_type, H5T_SGN_NONE);

Composite datatypes are created with a specific API call for each datatype class. Table 15 shows the creation method for each datatype class. A newly created datatype cannot be used until the datatype properties are set. For example, a newly created compound datatype has no members and cannot be used.

Table 15

Datatype Class

Function to Create

COMPOUND

H5Tcreate

OPAQUE

H5Tcreate

ENUM

H5Tenum_create

ARRAY

H5Tarray_create

VL

H5Tvlen_create

Once the datatype is created and the datatype properties set, the datatype object can be used.

Predefined datatypes are defined by the library during initialization using the same mechanisms as described here. Each predefined datatype is locked (H5Tlock), so that it cannot be changed or destroyed. User defined datatypes may also be locked using H5Tlock.

4.3.1 User Defined Atomic Datatypes

Table 16 summarizes the API methods that set properties of atomic types. Table 17 shows properties specific to numeric types, Table 18 shows properties specific to the string datatype class. Note that offset, pad, etc. don't apply to strings. Table 19 shows the specific property of the OPAQUE datatype class.

Table 16

Functions to set Properties of Atomic DataTypes

Description

herr_t H5Tset_size (hid_t type, size_t size)

Set the total size of the element in bytes, including padding which may appear on either side of the actual value. If this property is reset to a smaller value which would cause the significant part of the data to extend beyond the edge of the datatype then the offset property is decremented a bit at a time. If the offset reaches zero and the significant part of the data still extends beyond the edge of the datatype then the precision property is decremented a bit at a time. Decreasing the size of a datatype may fail if the H5T_FLOAT bit fields would extend beyond the significant part of the type.

herr_t H5Tset_order (hid_t type, H5T_order_t order)

Set the byte order to little-endian (H5T_ORDER_LE)or big endian (H5T_ORDER_BE).

herr_t H5Tset_precision (hid_t type, size_t precision)

Set the number of significant bits of a datatype. The offset property (defined below) identifies its location. The size property defined above represents the entire size (in bytes) of the datatype. If the precision is decreased then padding bits are inserted on the MSB side of the significant bits (this will fail for H5T_FLOAT types if it results in the sign, mantissa, or exponent bit field extending beyond the edge of the significant bit field). On the other hand, if the precision is increased so that it "hangs over" the edge of the total size then the offset property is decremented a bit at a time. If the offset reaches zero and the significant bits still hang over the edge, then the total size is increased a byte at a time.

herr_t H5Tset_offset (hid_t type, size_t offset)

Set the bit location of the least significant bit of a bit field whose length is precision. The bits of the entire data are numbered beginning at zero at the least significant bit of the least significant byte (the byte at the lowest memory address for a little-endian type or the byte at the highest address for a big-endian type). The offset property defines the bit location of the least significant bit of a bit field whose length is precision. If the offset is increased so the significant bits "hang over" the edge of the datum, then the size property is automatically incremented.

herr_t H5Tset_pad (hid_t type, H5T_pad_t lsb, H5T_pad_t msb)

Set the padding to zeros (H5T_PAD_ZERO) or ones (H5T_PAD_ONE). Padding is the bits of a data element which are not significant as defined by the precision and offset properties. Padding in the low-numbered bits is lsb padding and padding in the high-numbered bits is msb padding.


Table 17

Properties of Numeric Types

Description

herr_t H5Tset_sign (hid_t type, H5T_sign_t sign)

(INTEGER) Integer data can be signed two's complement (H5T_SGN_2) or unsigned (H5T_SGN_NONE).

herr_t H5Tset_fields (hid_t type, size_t spos, size_t epos, size_t esize, size_t mpos, size_t msize)

(FLOAT) Set the properties define the location (bit position of least significant bit of the field) and size (in bits) of each field. The sign bit is always of length one and none of the fields are allowed to overlap.

herr_t H5Tset_ebias (hid_t type, size_t ebias)

(FLOAT) The exponent is stored as a non-negative value which is ebias larger than the true exponent.

herr_t H5Tset_norm (hid_t type, H5T_norm_t norm)

(FLOAT) This property describes the normalization method of the mantissa.

  • H5T_NORM_MSBSET: the mantissa is shifted left (if non-zero) until the first bit after the radix point is set and the exponent is adjusted accordingly. All bits of the mantissa after the radix point are stored.
  • H5T_NORM_IMPLIED: the mantissa is shifted left (if non-zero) until the first bit after the radix point is set and the exponent is adjusted accordingly. The first bit after the radix point is not stored since it's always set.
  • H5T_NORM_NONE: the fractional part of the mantissa is stored without normalizing it.

herr_t H5Tset_inpad (hid_t type, H5T_pad_t inpad)

(FLOAT) If any internal bits (that is, bits between the sign bit, the mantissa field, and the exponent field but within the precision field) are unused, then they will be filled according to the value of this property. The padding can be: H5T_PAD_NONE, H5T_PAD_ZERO or H5T_PAD_ONE.


Table 18

Properties of Atomic String Datatypes

Description

herr_t H5Tset_size (hid_t type, size_t size)

Set the length of the string, in bytes. The precision is automatically set to 8*size.

herr_t H5Tset_precision (hid_t type, size_t precision)

The precision must be a multiple of 8.

herr_t H5Tset_cset(hid_t type_id, H5T_cset_t cset )

The only character set currently supported is H5T_CSET_ASCII.

herr_t H5Tset_strpad(hid_t type_id, H5T_str_t strpad )

The string datatype has a fixed length, but the string may be shorter than the length. This property defines the storage mechanism for the left over bytes. The method used to store character strings differs with the programming language:

  • C usually null terminates strings while
  • Fortran left-justifies and space-pads strings.

Valid string padding values, as passed in the parameter strpad, are as follows:

H5T_STR_NULLTERM (0)
Null terminate (as C does)
H5T_STR_NULLPAD (1)
Pad with zeros
H5T_STR_SPACEPAD (2)
Pad with spaces (as FORTRAN does).

Table 19

Properties of Opaque Atomic Datatypes

Description

herr_t H5Tset_tag(hid_t type_id const char *tag )

Tags the opaque datatype type_id with an ASCII identifier tag.

Examples

Figure 10 shows an example of how to create a 128-bit, little-endian signed integer type one could use the following (increasing the precision of a type automatically increases the total size). Note that the proper procedure is to begin from a type of the intended datatype class, in this case, a NATIVE INT.


    hid_t new_type = H5Tcopy (H5T_NATIVE_INT);
    H5Tset_precision (new_type, 128);
    H5Tset_order (new_type, H5T_ORDER_LE);
    
Figure 10

Figure 11 shows the storage layout as the type is defined. The H5Tcopy creates a datatype that is the same as H5T_NATIVE_INT. In this example, suppose this is a 32-bit big endian number (Figure 11a). The precision is set to 128 bits, which automatically extends the size to 8 bytes (Figure 11b). Finally, the byte order is set to little-endian (Figure 11c).


Byte 0 Byte 1 Byte 2 Byte 3
01234567 89012345 67890123 45678901
a) The H5T_NATIVE_INT
 
Byte 0 Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6 Byte 7
01234567 89012345 67890123 45678901 23456789 01234567 89012345 67890123
b) Precision extended to 128-bits, the size is automatically adjusted.
 
Byte 0 Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6 Byte 7
01234567 89012345 67890123 45678901 23456789 01234567 89012345 67890123
c) The Byte Order is switched.
Figure 11

The significant bits of a data element can be offset from the beginning of the memory for that element by an amount of padding. The offset property specifies the number of bits of padding that appear to the "right of" the value. Table 20 and Figure 12 shows how a 32-bit unsigned integer with 16-bits of precision having the value 0x1122 will be laid out in memory.

Table 20

Byte Position

Big-Endian
Offset=0

Big-Endian
Offset=16

Little-Endian
Offset=0

Little-Endian
Offset=16

0:

[pad]

[0x11]

[0x22]

[pad]

1:

[pad]

[0x22]

[0x11]

[pad]

2:

[0x11]

[pad]

[pad]

[0x22]

3:

[0x22]

[pad]

[pad]

[0x11]



Big-Endian: Offset = 0
Byte 0 Byte 1 Byte 2 Byte 3
01234567 89012345 67890123 45678967
PPPPPPPP PPPPPPPP 00010001 00100010
 
Big-Endian: Offset = 16
Byte 0 Byte 1 Byte 2 Byte 3
01234567 89012345 67890123 45678967
00010001 00100010 PPPPPPPP PPPPPPPP
 
Little-Endian: Offset = 0
Byte 0 Byte 1 Byte 2 Byte 3
01234567 89012345 67890123 45678967
00010001 00100010 PPPPPPPP PPPPPPPP
 
Little-Endian: Offset = 16
Byte 0 Byte 1 Byte 2 Byte 3
01234567 89012345 67890123 45678967
PPPPPPPP PPPPPPPP 00010001 00100010
 
Figure 12

If the offset is incremented then the total size is incremented also if necessary to prevent significant bits of the value from hanging over the edge of the datatype.

The bits of the entire data are numbered beginning at zero at the least significant bit of the least significant byte (the byte at the lowest memory address for a little-endian type or the byte at the highest address for a big-endian type). The offset property defines the bit location of the least signficant bit of a bit field whose length is precision. If the offset is increased so the significant bits "hang over" the edge of the datum, then the size property is automatically incremented.

To illustrate the properties of the integer datatype class, Figure 13 shows how to create a user defined datatype that describes a 24-bit signed integer that starts on the third bit of a 32-bit word. The datatype is specialized from a 32-bit integer, the precision is set to 24 bits, and the offset is set to 3.


    hid_t dt;

    dt = H5Tcopy(H5T_SDT_I32LE);

    H5Tset_precision(dt, 24);
    H5Tset_offset(dt,3);
    H5Tset_pad(dt, H5T_PAD_ZERO,H5T_PAD_ONE);
    
Figure 13

Figure 14 shows the storage layout for a data element. Note that the unused bits in the offset will be set to zero and the unused bits at the end will be set to one, as specified in the H5Tset_pad call.

Byte 0 Byte 1 Byte 2 Byte 3
01234567 89012345 67890123 45678967
ooo00000 00000000 00000000 00sppppp
Figure 14. A User defined integer Datatype: range -1,048,583 to 1,048,584

To illustrate a user defined floating point number, Figure 13 shows how to create a 24-bit floating point number, that starts 5 bits into a 4 byte word. The floating point number is defined to have a mantissa of 19 bits (bits 5-23), and exponent of 3 bits (25-27) and the sign bit is bit 28. (Note that this is an illustration of what can be done, not necessarily a floating point format that a user would require.)


    hid_t dt;

    dt = H5Tcopy(H5T_IEEE_F32LE);

    H5Tset_precision(dt, 24);
    H5Tset_fields (dt, 28, 25, 3, 5, 19);
    H5Tset_pad(dt, H5T_PAD_ZERO,H5T_PAD_ONE);
    H5Tset_inpad(dt, H5T_PAD_ZERO);
    
Figure 15

Byte 0 Byte 1 Byte 2 Byte 3
01234567 89012345 67890123 45678967
ooooommm mmmmmmmm mmmmmmmm ieeesppp
Figure 16. A User defined Floating Point Datatype.

Figure 16 shows the storage layout of a data element for this datatype. Note that there is an unused bit (24) between the mantissa and the exponent. This bit is filled with the inpad value, in this case 0.

The sign bit is always of length one and none of the fields are allowed to overlap. When expanding a floating-point type one should set the precision first; when decreasing the size one should set the field positions and sizes first.

4.3.2 Composite Datatypes

All composite datatypes must be user defined; there are no predefined composite datatypes.

4.3.2.1 Compound Datatypes

The subsections below describe how to create a compound datatype and how to write and read data of compound datatype.

4.3.2.1.1 Defining Compound Datatypes

Compound datatypes are conceptually similar to a C struct or Fortran 95 derived types. The compound datatype defines a contiguous sequence of bytes, which are formatted using one up to 2^16 datatypes (members). A compound datatype may have any number of members, in any order, and the members may have any datatype, including compound. Thus, complex nested compound datatypes can be created. The total size of the compound datatype is greater than or equal to the sum of the size of its members, up to a maximum of 2^32 bytes. HDF5 does not support datatypes with distinguished records or the equivalent of C unions or Fortran 95 EQUIVALENCE statement.

Usually a C struct or Fortran derived type will be defined to hold a data point in memory, and the offsets of the members in memory will be the offsets of the struct members from the beginning of an instance of the struct. The HDF5 C library provides a macro HOFFSET (s,m) to calculate the member's offset. The HDF5 Fortran applications have to calculate offsets by using sizes of members datatypes and by taking in consideration the order of members in the Fortran derived type.

HOFFSET(s,m)
This macro computes the offset of member m within a struct s
offsetof(s,m)
This macro defined in stddef.h does exactly the same thing as the HOFFSET() macro.

Note for Fortran users: Offsets of Fortran structure members correspond to the offsets within a packed datatype (see explanation below) stored in an HDF5 file.

Each member of a compound datatype must have a descriptive name which is the key used to uniquely identify the member within the compound datatype. A member name in an HDF5 datatype does not necessarily have to be the same as the name of the member in the C struct of Fortran derived type, although this is often the case. Nor does one need to define all members of the C struct of Fortran derived type in the HDF5 compound datatype (or vice versa).

Unlike atomic datatypes which are derived from other atomic datatypes, compound datatypes are created from scratch. First, one creates an empty compound datatype and specifies its total size. Then members are added to the compound datatype in any order. Each member type is inserted at a designated offset. Each member has a name which is the key used to uniquely identify the member within the compound datatype.

Figure 17a shows an example of creating an HDF5 C compound datatype to describe a complex number, which is a structure with two components, "real" and "imaginary", each double. An equivalent C struct is whose type is defined by the complex_t struct, is shown.

typedef struct {
    double re;   /*real part*/
    double im;   /*imaginary part*/
 } complex_t;

 hid_t complex_id = H5Tcreate (H5T_COMPOUND, sizeof (complex_t));
 H5Tinsert (complex_id, "real", HOFFSET(complex_t,re),
            H5T_NATIVE_DOUBLE);
 H5Tinsert (complex_id, "imaginary", HOFFSET(complex_t,im),
            H5T_NATIVE_DOUBLE);
    
Figure 17a

Figure 17b shows an example of creating an HDF5 Fortran compound datatype to describe a complex number, which is a Fortran derived type with two components, "real" and "imaginary", each DOUBLE PRECISION. An equivalent Fortran TYPE is whose type is defined by the TYPE complex_t, is shown.

  TYPE complex_t
     DOUBLE PRECISION re   ! real part
     DOUBLE PRECISION im;  ! imaginary part
  END TYPE complex_t

  CALL h5tget_size_f(H5T_NATIVE_DOUBLE, re_size, error)
  CALL h5tget_size_f(H5T_NATIVE_DOUBLE, im_size, error)
  complex_t_size = re_size + im_size
  CALL h5tcreaet_f(H5T_COMPOUND_F, complex_t_size, type_id)
  offset = 0
  CALL h5tinsert_f(type_id, "real", offset, H5T_NATIVE_DOUBLE, error)
  offset = offset + re_size
  CALL h5tinsert_f(type_id, "imaginary", offset, H5T_NATIVE_DOUBLE, error)
    
Figure 17b

Important Note: The compound datatype is created with a size sufficient to hold all its members. In the C example above, the size of the C struct and the HOFFSET macro are used as a convenient mechanism to determine the appropriate size and offset. Alternatively, the size and offset could be manually determined, e.g., the size can be set to 16 with "real" at offset 0 and "imaginary" at offset 8. However, different platforms and compilers have different sizes for "double", and may have alignment restrictions which require additional padding within the structure. It is much more portable to use the HOFFSET macro, which assures that the values will be correct for any platform.

Figure 18 shows how the compound datatype would be laid out, assuming that NATIVE_DOUBLE are 64-bit numbers, and there are no alignment requirements. The total size of the compound datatype will be 16 bytes, the "real" component will start at byte 0, and "imaginary" will start at byte 8.
Byte 0 Byte 1 Byte 2 Byte 3
rrrrrrrr rrrrrrrr rrrrrrrr rrrrrrrr
Byte 4 Byte 5 Byte 6 Byte 7
rrrrrrrr rrrrrrrr rrrrrrrr rrrrrrrr
Byte 8 Byte 9 Byte 10 Byte 11
iiiiiiii iiiiiiii iiiiiiii iiiiiiii
Byte 12 Byte 13 Byte 14 Byte 15
iiiiiiii iiiiiiii iiiiiiii iiiiiiii
  Total size of Compound Datatype is 16 bytes
Figure 18

The members of a compound datatype may be any HDF5 datatype, including compound, array, and VL. Figures 19 and 20 show an example which creates a compound datatype composed of two complex values, each of which is a compound datatype as in Figure 18 above.
Byte 0 Byte 1 Byte 2 Byte 3
rrrrrrrr rrrrrrrr rrrrrrrr rrrrrrrr
Byte 4 Byte 5 Byte 6 Byte 7
rrrrrrrr rrrrrrrr rrrrrrrr rrrrrrrr
Byte 8 Byte 9 Byte 10 Byte 11
iiiiiiii iiiiiiii iiiiiiii iiiiiiii
Byte 12 Byte 13 Byte 14 Byte 15
iiiiiiii iiiiiiii iiiiiiii iiiiiiii
Byte 16 Byte 17 Byte 18 Byte 19
rrrrrrrr rrrrrrrr rrrrrrrr rrrrrrrr
Byte 20 Byte 21 Byte 22 Byte 23
rrrrrrrr rrrrrrrr rrrrrrrr rrrrrrrr
Byte 24 Byte 25 Byte 26 Byte 27
iiiiiiii iiiiiiii iiiiiiii iiiiiiii
Byte 28 Byte 29 Byte 30 Byte 31
iiiiiiii iiiiiiii iiiiiiii iiiiiiii
  Total size of Compound Datatype is 32 bytes.
Figure 19


     typedef struct {
        complex_t x;
        complex_t y;
     } surf_t;

    hid_t complex_id, surf_id; /*hdf5 datatypes*/

     complex_id = H5Tcreate (H5T_COMPOUND, sizeof(complex_t));
     H5Tinsert (complex_id, "re", HOFFSET(complex_t,re),
                H5T_NATIVE_DOUBLE);
     H5Tinsert (complex_id, "im", HOFFSET(complex_t,im),
                H5T_NATIVE_DOUBLE);

     surf_id = H5Tcreate (H5T_COMPOUND, sizeof(surf_t));
     H5Tinsert (surf_id, "x", HOFFSET(surf_t,x), complex_id);
     H5Tinsert (surf_id, "y", HOFFSET(surf_t,y), complex_id);
    
Figure 20

Note that a similar result could be accomplished by creating a compound datatype and inserting four fields (Figure 21). This results in the same layout as above (Figure 19). The difference would be how the fields are addressed. In the first case, the real part of 'y' is called 'y.re'; in the second case it is 'y-re'.

    typedef struct {
        complex_t x;
        complex_t y;
     } surf_t;

     hid_t surf_id = H5Tcreate (H5T_COMPOUND, sizeof(surf_t));
     H5Tinsert (surf_id, "x-re", HOFFSET(surf_t,x.re),
                H5T_NATIVE_DOUBLE);
     H5Tinsert (surf_id, "x-im", HOFFSET(surf_t,x.im),
                H5T_NATIVE_DOUBLE);
     H5Tinsert (surf_id, "y-re", HOFFSET(surf_t,y.re),
                H5T_NATIVE_DOUBLE);
     H5Tinsert (surf_id, "y-im", HOFFSET(surf_t,y.im),
                H5T_NATIVE_DOUBLE);        
    
Figure 21

The members of a compound datatype do not always fill all the bytes. The HOFFSET macro assures that the members will be laid out according to the requirements of the platform and language. Figure 22 shows an example of a C struct which requires extra bytes of padding on many platforms. The second element, 'b', is a 1-byte character, followed by an 8 byte double, 'c'. On many systems, the 8-byte value must be stored on a 4- or 8-byte boundary, requiring the struct to be larger than the sum of the size of its elements.

In Figure 22 , the sizeof and HOFFSET macro is used to assure that the members are inserted at the correct offset to match the memory conventions of the platform. Figure 23 shows how this data element would be stored in memory, assuming the double must start on a 4-byte boundary. Notice the extra bytes between 'b' and 'c'.

    typedef struct s1_t {                
       int    a;         
       char  b;                  
       double c;          
    } s1_t;
 
    s1_tid = H5Tcreate (H5T_COMPOUND, sizeof(s1_t));
    H5Tinsert(s1_tid, "a_name", HOFFSET(s1_t, a), H5T_NATIVE_INT);
    H5Tinsert(s1_tid, "b_name", HOFFSET(s1_t, b), H5T_NATIVE_CHAR);
    H5Tinsert(s1_tid, "c_name", HOFFSET(s1_t, c), H5T_NATIVE_DOUBLE);
    
Figure 22

<