- How do I create a String attribute or dataset? (See DATATYPES)
- When would you use attributes vs. datasets for your metadata ?
- When would you use attributes vs. compact datasets ?
C++
- Does C++ support stream operators?
- Does it do IO via standard library containers?
- Example of creating a string attribute
DATASETS
- How do I find out what compression has been used to write a dataset?
- How do I create a chunked/compressed dataset?
- How do I read/write a subset or sub-sample of a dataset?
- Can you delete objects in an HDF5 file ? If yes, how ?
- When you specify memspace and filespace for H5Dwrite and H5Dread does it mean it is allocating memory for both the dataset and memory space (ie. twice the size of dataset) ?
- Can you subset or select hyperslabs from a compressed data set ? (See DATASPACE)
- When would you use attributes vs. compact datasets ? (See ATTRIBUTES)
DATASPACE
DATATYPES
- Why was the H5T_ARRAY datatype created ? What is it for ?
- How can you tell if two datasets share their datatype ?
- How do I create a String attribute or dataset?
- How do you create datasets with a time datatype (class of H5T_TIME)?
- When do you use pre-defined standard vs. native datatypes?
- Does HDF5 support a boolean datatype ?
- What are the plusses and minusses of using compound datatypes ?
- Information on using bitfields in HDF5
- Working with
compound datatypes: Can HOFFSET return different answers on different machines?
- How would you use a float pointer in a compound datatype ?
- H5Tenum_insert returns error when type has different endianess
- When reading compound data from file w/different compound field name, the data is different for this field.
- What is the proper way to allocate memory when reading a variable length datatype ?
- Is there a limit on the number of fields allowed in a compound datatype ?
EXTERNAL LIBRARIES
- I build the SZIP library from source using the
Intel compiler (icc), but the gentest fails when I run
make check.- How to detect the SZIP encoder at run time
- Can't open shared library:
/lib...s#.0
ERROR
- The error report prints out full path for file name instead of file name only
- How do you turn off error messages ?
FILE
- Can you open a file, open a group, close the file, and continue working on a group?
- How can I resolve file access problems due to objects being left open?
FORTRAN
- How do you create a dataset with a 16-bit datatype in F90 ?
- How do you build HDF5 using gcc and either Intel or Lahey Fortran ?
- How to build HDF5 Fortran with Intel Compilers
- Why is there no H5Tget_native_type function for Fortran ? What should you do to get the memory datatype for a compound datatype in F90 ?
- Can you build shared Fortran libraries in HDF5 ?
GENERAL
- Do you have example code of reading an HDF5 file?
- Does HDF5 support meshes ?
- Does HDF5 support netCDF ?
- How do you create files over 2GB ?
- Is HDF5 threadsafe? Is it multi-threaded? Explain.
- Why are my files sizes different, if I open an HDF5
file more than once rather than writing the data out in one
call?
- I'd like to access an HDF5 file without using the HDF5 library. Is this possible?
- Will data manipulation routines be added to HDF5?
- Given an object binary file such as libhdf5.a or a.out, how can I easily determine which version of HDF5 is being used (aka linked)?
- Given a library file such as libxyz.a that calls HDF5 functions, how can I easily determine which version of HDF5 is being used to build (aka compile) libxyz.a?
- How is HDF5 different than HDF 4?
- My HDF5-1.4 library cannot read files created with 5-1.6.0
- Can you read an HDF5 while it is being written to?
- Performance-wise, how does HDF5 compare to a relational database?
- If you run an application twice on the same machine will it produce identical HDF5 files ?
GROUPS
- H5Gget_obj_info: How do you tell the difference between a soft and hard link? If my reader reads an HDF5 file with a loop in it, then my code goes into an infinite loop, also.
- Iterating (H5Giterate) takes a long time when you have lots of objects. Can this be sped up?
- Can you delete objects in an HDF5 file ? If yes, how ?
IDENTIFIERS
IMAGES AND PALETTES
- How do you store a true color image in HDF5?
- What kind of palettes are supported?
- If you have an X by Y image in Z bands how would you store that in HDF5 ?
- How do you save an image made from (r,g,b) floating point values?
INSTALLING/BUILDING HDF5
(For Fortran issues, see FORTRAN)
- Information on building and using HDF/HDF5 on Windows
- Having problems building an application with pre-built libraries.
- Is the HDF5 C source C99 compliant? Is it C89 compatible ?
- How can you determine what compiler/flags are used by an HDF5 installation?
- The mtime test fails with the message 'Old modification time incorrect.', when building with an unsupported platform/compiler
- Can't open shared library: <path&gr;/lib...s#.0
- Gets -DPIC error building HDF5 on AIX w/F90
- Problems installing Parallel HDF5 on IBM Regatta (with AIX5) (see PARALLEL HDF5)
- Tests fail when building HDF5 on HP-UX 10.20
- Building on MAC OSX, the symbols restFP and saveFP come up as undefined. Why?
- Float to Conversion Tests Fail on AMD with Intel compiler. Why?
- Building 5-1.6.2(3) using Intel 8.1, get ULLONG_MAX error.
- AIX Configure failure: "...config.sub: too many arguments"
- Using Purify on HDF5 library, get uninitialized memory read error
- When Building HDF5, the Object Header test, "Testing message deletion", fails.
- Building on AIX 64-bit, get: ERROR: No csects or exported symbols have been saved.
JAVA
- Java error: can't find HDF5 file format
- How do you read/write a string dataset in Java?
- Data values are reversed/swapped/transposed in HDFView. Why?
- Do you support Java 64-bit with HDF-JAVA?
- What are the limitations of HDF Java interface ?
- What kind of palettes are supported in HDFView?
- java.lang.UnsatisfiedLinkError: no jhdf5 in java.library.path
- Will you be adding a pure Java interface to HDF5?
- Installing HDFView, cannot find shared object files
- What are the issues regarding replacing the JNI?
- Is parallel HDF5 supported with java?
- Can HDFView handle bit-field data?
- Can't get HDFView to work on HP-UX
- HDFView installs properly on Mac but crashes when selected
- HDFView: connection time out errors running behind a firewall
- Will HDFView be supporting variable length strings soon?
- How do you increase the Java Virtual machine memory?
- Get EXCEPTION_ACCESS_VIOLATION when more than 1024MB is allocated to the Java Virtual machine.
- How to workaround error
Exception - dataset too big? - Error - variable HDF5CDataTypes not found
- Do you have an example of writing/reading compound datatypes in java?
- How do you insert JAR files into JBuilder ?
PARALLEL HDF5
- What do you need to run Parallel HDF5?
- What performance can you expect from Parallel HDF5?
- Performance: Parallel I/O with Chunking Storage
- Does HDF5 support compression with parallel HDF5 ? If not, why ?
- Does Parallel HDF5 support chunking ?
- Can you run parallel HDF5 and the threadsafe feature together ? What about Parallel HDF5 and C++?
- How do you write to attributes independently ?
- Does Parallel HDF5 support variable length datatypes ?
- Most HDF5 calls to not make a statement about whether they are being called collectively or independently. How are they being called?
- Problems installing Parallel HDF5 on IBM Regatta (with AIX5)
- How to write and NOT to write compound datasets usingF90 in Parallel HDF5
- MPI ... failed: array services not available
- How do you write data when one process doesn't have or need to write data ?
- How do you build HDF5 on BlueGene/L?
- How do you set up HDF5 so only one MPI rank 0 process does I/O ?
- How do you configure HDF5 to create separate files for each compute node in a cluster ?
- Also see INSTALLING/BUILDING HDF5 questions.
PERFORMANCE
- Things That Can Affect Performance
- Linux Memory Handling and Performance
- Information on the Metadata Cache
- The Chunk Cache and Chunked Datasets:
- Parallel I/O with Chunking Storage
- Are there performance metrics for working with HDF5 files ?
- Problem with Valgrind (Purify) and HDF5
PLATFORM SPECIFIC
- Information on building and using HDF/HDF5 on Windows
- HDF5 1.6.6 Patch for Linux Fedora with Sparc architecture
- Building on AIX 64-bit, get: ERROR: No csects or exported symbols have been saved.
- Also see INSTALLING/BUILDING HDF5 questions.
PROPERTIES
- With HDF5-1.6.0-post*, a fix was added for using H5Pset_istore_k and H5Pget_istore_k. This fix required a change to the HDF5 format. What are the implications of this ?
- What function do you use to get compression level information?
- How do you use the stream driver in HDF5-1.6.2?
- How do you work with a file created with the file family feature?
- Can you work with an HDF5 file in memory ?
UTILITIES
- How do you use the h5cc (h5fc) utility ?
- Why is h5dump slower than h5ls?
- Can you add an option to h5dump or h5ls to print the version of a file ?
ATTRIBUTES
When would you use attributes vs. datasets for your metadata ?
-
Attributes are similar to datasets, but there are major differences
between the two:
- Datasets (and groups) are considered to be "primary" objects.
Objects that are not primary are the datatype, dataspace, and attributes.
An attribute is not considered a primary object because attributes are
stored in an object's header.
- There is a 64k limit for an attribute.
- Attributes are not extendible.
- You cannot use compression with attributes.
- You cannot do partial I/O operations on attributes (such as hyperslab selections).
In general using attributes is convenient and straightforward, but if any of the above items may be an issue, then it may be preferable to use datasets for your metadata.
When would you use attributes vs. compact datasets ?
-
Refer to the following design documents:
Compact Storage RFCThe size of an attribute is limited to 64KB. The compact dataset size limit is 64KB, but the recommended size is 30KB or less.
Small Data Tuning RFC
An attribute cannot be shared between several objects. If you store the data in a compact dataset, other objects can use attributes with the object reference datatype to point to the compact dataset. This mechanism allows "sharing" the data stored in the compact dataset between the objects.
I/0 speed should be the same for both attributes and compact datasets since it will be a memory operation. Please remember that you will pay a price while opening/closing the file if you store information in the objects' headers. An application will benefit from the attribute or compact storage mechanism only if it accesses and updates the object state many times during the life of an application. If it is done just a few times, the benefit is questionable.
C++
Does C++ support stream operators?
-
No, not yet. We intend to add support for them, but since they are
a convenience feature, this has not been high in priority.
Does it do IO via standard library containers?
-
No, we have not looked at this yet, but suspect it will be
complicated to implement.
DATASETS
How do I find out what compression has been used to write a dataset?
-
You open the dataset with H5Dopen(), query the dataset's creation
property list with H5Dget_create_plist() and then get the number of
filters defined for the dataset with H5Pget_nfilters(). Then you loop
from 0 to n-1 and calling H5Pget_filter() to retrieve info about each
filter.
Public filters are identified by a unique integer ID listed in H5Zpublic.h (currently only H5Z_FILTER_DEFLATE). See doc/html/Filters.html for more info (the transient filters it mentions have never been implemented).
You don't need to know what filters were used to write a particular dataset -- you only have to make sure that they have been registered with H5Zregister() before reading.
How do I create a chunked/compressed dataset?
-
The storage properties, including chunking and compression, are
controlled with an HDF5 Property List. See the
Chunking and Extendible
Dataset example in the HDF5 tutorial.
How do I read/write a subset or sub-sample of a dataset?
-
HDF5 supports a very flexible and general set of data selection
features, which control both the source and destination. HDF5
supports:
- Hyperslabs, including repeated blocks as well as strides
- Unions of hyperslabs
- Sets of points
See the Hyperslab selection and Point selection examples in the HDF5 tutorial.
When you specify memspace and filespace for H5Dwrite and H5Dread does it mean it is allocating memory for both the dataset and memory space (ie. twice the size of dataset) ?
-
No, memspace is just a description of the buffer in memory (i.e. where
read elements will go). If there is no data conversion, then we read directly
into the user supplied buffer. If there is data conversion, we use a 1MB
buffer to do the conversions, but we still use the user's buffer for
reading data in the first place.
Also, you can adjust the 1MB default conversion buffer size. (see H5Pset_buffer)
DATASPACE
Can you subset or select hyperslabs from a compressed dataset?
-
Yes. You must use chunking in order to do this. With HDF5-1.4.4,
the implementation of the I/O pipeline and caching may cause the
same chunk to be brought to memory several times when hyperslabs of data
are read from the file. That may slow down the performance.
DATATYPES
Why was the H5T_ARRAY created ? What is it for ?
-
The array datatype was created to address the simple case of a compound
datatype when all members of the compound datatype are of the same
type and there is no need to subset by compound datatype members.
Creation of such a datatype is more efficient and I/O also requires less
work, because there is no alignment involved.
Previously, you had to create a compound datatype if you wanted to use an array-like datatype for creating a dataset. This was fine if you really wanted to use the array as a field in a compound datatype, but there were developers who wanted to just have a "plain" array (without the compound datatype wrapping) as a datatype for their dataset. We decided to obsolete the array fields in compound datatypes and promote arrays to a "first-class" datatype. This allowed applications to create and use them without involving a compound datatype. Along with being more "obvious" about the intentions of the datatype, array datatypes are also somewhat more efficient in certain circumstances, as mentioned above.
How can you tell if two datasets share their (named, committed) datatype?
-
You can use the H5Gget_objinfo function to retrieve the "stat" information
for each named datatype. Then, comparing the fileno and objno fields in the
H5G_stat_t struct for each type should tell you if the two named datatypes
refer to the same object in the file.
How do I create a String attribute or dataset?
-
HDF5 has a string type. They are created as follows (in C):
hid_t strtype; /* HDF5 data type id. */
if((strtype=H5Tcopy(H5T_C_S1))<0) {
return -1;
}
/* the length */
if((H5Tset_size(strtype,size))<0) {
return -1;
}
/* optional--pad with zero or space */
if((H5Tset_strpad(strtype,pad))<0){
return -1;
}
/* use the strtype in H5Acreate or H5Dcreate */
Example programs to create a string attribute/dataset can be found at:
http://www.hdfgroup.uiuc.edu/UserSupport/code-examples/
How do you create datasets with a time datatype (class of H5T_TIME)?
-
You would use the datatypes H5T_UNIX_32BE (LE) or H5T_UNIX_64BE (LE).
HDF5 doesn't try to interpret or do anything special with the data.
This would currently be up to the user's code. For a C example, see:
h5time.c
When do you use pre-defined standard vs. native datatypes?
-
You would generally create your dataset with the pre-defined standard
datatypes, and you would read from a dataset with the native datatypes
(H5T_NATIVE*). Basically all memory datatypes should be native datatypes,
and the datatype for a read is a memory datatype. See the
Datatypes Table for combinations
of memory and pre-defined datatypes to use.
Following is what you must do (in HDF5-1.4) when creating a general purpose tool for reading HDF5 datasets. You should check both the class and size of the datatype for the dataset and also the size of the native C types for a particular machine. (With HDF5-1.6, we will be simplifying these steps by adding the H5Tget_native_type function to reconstruct a datatype based on the native memory datatype.)
dset=H5Dopen(...);
dtype=H5Dget_type(dset);
class=H5Tget_class(dtype);
size=H5Tget_size(dtype);
switch(class) {
case H5T_INTEGER:
if(size==sizeof(long long))
mem_dtype=H5T_NATIVE_LLONG;
else if(size==sizeof(long)
mem_dtype=H5T_NATIVE_LONG;
else if(size==sizeof(int)
mem_dtype=H5T_NATIVE_INT;
else if(size==sizeof(short)
mem_dtype=H5T_NATIVE_SHORT;
else
mem_dtype=H5T_NATIVE_CHAR;
break;
case H5T_FLOAT:
if(size==sizeof(long double))
mem_dtype=H5T_NATIVE_LDOUBLE;
else if(size==sizeof(double)
mem_dtype=H5T_NATIVE_DOUBLE;
else
mem_dtype=H5T_NATIVE_FLOAT;
break;
}
This code does not take into account whether or not the integer is
signed. It also assumes that dataypes and C types of the same size
have the same precision (which is almost always true).
It may be useful to have the above switch statement in a simple function that returns the appropriate native datatype. Then that function could be used inside a wrapper which correctly handled compound, array and variable-length datatypes.
Users may also want to make certain that they malloc the appropriate amount of space needed to store the memory datatype, as well as making sure that they read in and access the data as the correct C type.
Does HDF5 support a boolean datatype ?
-
No. HDF5 is written in C, which does not have a boolean
datatype. Use an integer type and interpret your data according
to your rules.
What are the plusses and minusses of using compound datatypes ?
-
If you are using C, then compound datatypes will be fast to use.
Compound datatypes work well with C, as they are patterned after
it. If using Fortran or Java, then using them will be slow, and you
have to read/write data by field, so it is also cumbersome. [It is not
possible to pass an array of Fortran structures to a C function in
a portable manner. In any case, the Fortran layer has to repack the
Fortran array to an array of C structures. The main problem is that Fortran
enforces type checking at compilation time and it is impossible to
overload the
h5dread/write_f function with a datatype that is
defined by the user.]
There are some issues you may run into. First of all, applications that support HDF5 may not fully support compound datatypes. We only support them minimally in our HDF Java Viewer, because java doesn't handle them well.
There have been problems with including variable length datatypes in a compound datatype. Performance is slow and a couple of users have also encountered some other problems. We plan to address any problems, though.
Information on using bitfields in HDF5
-
Where are example programs of using bitfields ?
See:
[ bitfield.c ]
Also, refer to
test/dtypes.c in the source code for
examples of using bitfields. Search for "H5T_STD_B"...
What function can be used to create a bitfield ?
The H5Tset_precision() routine determines the number of bits in a
datatype that are significant within it.
What are the limits on the number of bits ?
Up to the size of the datatype that contains it (which is defined for up to 64-bit datatypes currently)
How are these stored then? Any sort of padding or what?
We currently do not pack them, so a 13-bit field in a 32-bit datatype
still takes up 4 bytes of space. This is not ideal, but it is a fairly
complicated problem to pack the bits on disk (in light of using bitfields
in compound, array and variable-length datatypes mostly). Eventually
we should fix this.
How would you use a float pointer in a compound datatype ?
-
You cannot create a compound datatype using a typedef that looks like
this, where z is allocated dynamically:
typedef struct xxx_t {
int x;
short y;
float * z;
} xxx_t
However, you can use the hvl_t struct to do this.
There is no [easy] way to determine the end of an array of floating-point numbers (or other non-character string sequences), so the hvl_t struct must be used to provide the length of the sequence.
H5Tenum_insert returns error when type has different endianess
-
This happens because H5Eenum_insert uses a void pointer to pass in the enum
member value. It doesn't know whether a "char" or "int" is coming in. If the
base type of the enum is H5T_STD_I8LE, but an big-endian "int" is passed in,
H5Eenum_insert simply copies the first byte of the big-endian "int" as the
value of this member. That will be the high-digit byte, instead of the low
digit byte which contains the actual value.
The best way to avoid the problem is to use the native type for the base type in calling H5Tenum_insert. In this example, it will be "char":
int status = H5.H5Tenum_insert(booleanEnum, "true", new char[] {1});
When reading compound data from file w/different compound field name, the data is different for this field.
-
The library is designed to check the fields of a compound type by name.
If the name doesn't match, the library will leave the data for this field
alone, in case the user has some background data in memory.
The right way to read data into memory is to call H5Dget_type and H5Tget_native_type to figure out the data type in memory. (Another way of course, is simply to avoid using wrong names. :)
What is the proper way to allocate memory when reading a variable length datatype?
In the case of a VL type, the HDF5 library allocates a buffer and the user's application has to free it. There is no special call for a character string, so just use a C free. For more complex VL types, use H5Dvlen_reclaim. See:
http://www.hdfgroup.uiuc.edu/UserSupport/code-examples/hamlet.c
Is there a limit on the number of fields allowed in a compound datatype ?
If you are using HDF5 1.8.0 or previous releases, there is a limit on the number of fields you can have in a compound datatype. This is due to the 64K limit on object header messages, into which datatypes are encoded. (However, you can create a lot of fields before it will fail. One user was able to create up to 1260 fields in a compound datatype before it failed.)
EXTERNAL LIBRARIES
I build the SZIP library from source using the Intel compiler (icc),
but the gentest fails when I run make check.
-
Either:
- Build the SZIP library with the gcc compiler. We tested that it
can be used when building HDF5 with the Intel compilers.
- Disable optimization.
How to detect the SZIP encoder at run time
-
On Unix platforms, a quick way is to use
strings and
grep on the SZIP library, as follows:
strings libsz.a |grep ENCODEThis will return "SZIP ENCODER ENABLED" if the encoder is enabled in the SZIP library.
Another way is to write an application that checks whether the SZIP
library included with HDF5 is encoder-enabled or not. Use the
H5Zget_filter_info function, as follows:
#include "hdf5.h"
int main(void)
{
herr_t status;
unsigned int filter_config_flags;
status =H5Zget_filter_info(H5Z_FILTER_SZIP, &filter_config_flags);
if ((filter_config_flags & H5Z_FILTER_CONFIG_ENCODE_ENABLED) == 0)
printf("SZIP encoding is disabled.\n");
else printf ("SZIP encoding is enabled.\n");
}
ERROR
The error report prints out full path for file name instead of file name only
-
What can I change in the configuration/headers to make it
not print the full path of the source file in error messages?
This behavior is seen on several platforms.
The HDF5 library simply prints out the C macro __FILE__ as the file name. Each compiler has its own interpretation: Some compilers print the file name only; some print the full path name; others, just the relevant path name.
The HDF5 library has no control over this.
How do you turn off error messages ?
-
Use the
H5Eset_auto call to toggle error printing on
and off.
FILE
Can you open a file, open a group, close the file, and continue working on a group ?
-
Yes, that is explicitly supported by the library. Once you are no
longer in need of the file identifier, you can close the file.
How can I resolve file access problems due to objects being left open?
-
If an object (group, dataset, etc.) in a file is not closed, then the
file does not get closed, which can causes problems.
- We have a file access property list that you can use to do a 'strong'
close of a file. See the function:
H5Pset_fclose_degree.
See the example program h5close.c. - You can call
H5close, and that will automatically close everything. - You can call
H5Fget_obj_countto get the number of open object identifiers for an open file.
You can callH5Fget_obj_idsto get the list of open object identifiers.
See the example program: h5ckopen.c
There are some things you can do. You can either close everything automatically, or get the number of open objects and then close them:
FORTRAN
How do you create a dataset with a 16-bit datatype in F90 ?
-
What you have to do is use the Fortran INTEGER type in memory and use
h5tset_size_f on H5T_INTEGER_NATIVE (or another INTEGER type)
to set the size to 2 bytes. The library will store 16-bit integers instead
of 32-bits.
Currently the F90 APIs do not support INTEGER*2 in memory.
See the example, h516bit.f90.
How do you build HDF5 using gcc and either Intel or Lahey Fortran ?
-
The Intel and Lahey Fortran linkers cannot find the proper GNU gcc library,
causing the build to fail in the fortran/test and fortran/examples
directories with the error
unresolved __fixunsdfdi symbol.
Use the,
setenv LIBS "-lgcc_s" (if using dynamic linking)
or
setenv LIBS "-lgcc" (if using static linking. For example,
if using ifort with the "-static" option)
command before running configure, or modify the LIBS
argument in the fortran/test/Makefile and
fortran/examples/Makefile files.
Then continue the build in the fortran directory.
If you use h5cc or h5fc, you will also need to
edit them and add "-lgcc_s" or "-lgcc" to them.
Can you build shared Fortran libraries in HDF5 ?
Shared Fortran libraries are not supported in the HDF5 1.6 branch, but they are supported for some platforms in the HDF5 1.8 branch. Refer to the Supported Configuration Features Summary in the 1.8.0 release notes for more details.
GENERAL
Do you have example code of reading an HDF5 file?
-
Yes, see:
h5_info.c
Can you delete objects in an HDF5 file ? If yes, how ?
-
Yes, use the H5Gunlink function to delete objects in an
HDF5 file. Currently, however, the space where the object was
located in the file does not get re-used. So the size of the
file will remain the same. You can get rid of this unused space
in a file by writing the contents of the HDF5 file to a new file.
This can be done with the
HDFView tool.
A user also contributed the h5compact.c utility for compacting simple files.
In a future release of HDF5 we will include support for managing the free space in a file.
Does HDF5 support meshes ?
-
We have a (prototype) Mesh API that provides a standard higher-level
API for storing and retrieving structured and unstructured 'mesh'
data, typical of applications such as computational fluid dynamics,
finite element analysis, and visualization.
See the HDF5 Mesh API web page.
Does HDF5 support netCDF ?
-
Unidata and NCSA are collaborating on creating netCDF-4, using HDF5 as
its storage layer. For information on status and availability, see
netCDF 4: Merging the NetCDF and HDF5 Libraries.
How do you create files over 2GB ?
-
If a filesystem ordinarily handles files over two gigabytes, then
HDF5 will be able to create files larger than two gigabytes.
If a filesystem does not handle files greater than two gigabytes,
there are still ways to create files greater than two gigabytes with HDF5.
You can use the file access property list to set up a file family driver. Your HDF5 file will be split into a "family" of files of the same size.
Another way is to use the external file feature. This is controlled by the Dataset Creation Property list. You could store the datasets in an HDF5 file in separate external files of less than 2GB.
For examples of using File Access and Dataset Creation property lists, see the Property tutorial topic in the Advanced section of the HDF5 Tutorial.
With Windows -32 bit, HDF5 can handle files greater than 2GB by use of the native datatype. For example, using H5T_NATIVE_LLONG instead of H5T_NATIVE_LONG.
Is HDF5 threadsafe? Multi-threaded? Explain.
-
HDF5 is NOT multithreaded, though it is threadsafe (using PThreads).
The current implementation of the threadsafe HDF5 serializes all HDF5 calls. It provides thread safety but is not thread- efficient. We have a design plan to make it more thread-efficient but currently don't have the resources to implement it.
If you use only 1 thread to open a file, define the datasets, use only fixed dimension datasets (that is, no chunked storage or variable length types, ...), then HDF5 has all the data locations defined within the file. After that, you can use multiple threads to do read/write of the array data (in concept).
For further information on thread-safe HDF5, see:
threadsafe page/We currently do not have plans on supporting the thread-safe work on MS Windows.
Why are my files sizes different, if I open an HDF5 file more than once rather than writing the data out in one call?
-
The size discrepencies can be related to the way small metadata and raw data
gets allocated in the file.
Currently, all metadata below a certain threshold size (2KB by default) will cause the library to allocate a block of that threshold size (i.e. 2KB) to store the metadata in, anticipating that more metadata will be added to the file soon and could be sub-allocated from that block. A program which doesn't add more metadata to the block will cause the rest of that block to be wasted in the file because the library doesn't currently remember the free space in the file from one file open to the next.
The threshold block size in the library can be changed with a call to H5Pset_meta_block_size (and H5Pset_small_data_block_size, in libraries which have it - should be in the 1.4.4 release) like so:
fapl_id=H5Pcreate(H5P_FILE_ACCESS);
printf ("H5Pcreate returns: %i\n", fapl_id);
status = H5Pset_meta_block_size (fapl_id,0);
printf ("H5Pset_meta_block_size returns: %i\n", status);
#ifdef WHEN_ITS_AVAILABLE
status = H5Pset_small_data_block_size (fapl_id,0);
printf ("H5Pset_small_data_block_size returns: %i\n", status);
#endif /* WHEN_ITS_AVAILABLE */
Setting the block size to zero should really only be used when small
amounts of metadata are being added each time the file is opened. Setting the
block size to zero will intermix the raw data blocks allocated in the file with
the metadata information in the file and cause the overall number of I/O
operations on the file to increase (reducing performance), because the library
cannot cache as much metadata in memory.
Performance-wise, it would be better to hold the file open as long as possible and not to adjust the block size, but users will have to decide whether file size or I/O performance is their overall goal.
I'd like to access an HDF5 file without using the HDF5 library. Is this possible?
-
Although it is possible to parse through an HDF5 file using just the file
format documentation as a guide, it is strongly recommended that you use
the HDF5 library to access HDF5 files instead. The algorithms and data
structures stored in an HDF5 file can be complex and difficult to
understand well enough to parse correctly. Additionally, there are certain
requirements on the structure of the data structures (the B-trees, for
example) that may not be obvious from a static representation of them in
the file and may not be fulfilled by indiscriminate operations on them.
There are also some third-party data structures stored in the file that
are not documented in the HDF5 file format documentation, such as
the format of compressed data using the deflate algorithm.
Will data manipulation routines be added to HDF5?
-
No, this is left up to the user. There are many packages available,
including BLAS and LINPACK for this.
Given an object binary file such as libhdf5.a or a.out, how can I easily determine which version of HDF5 is being used (aka linked)?
- Use one of the following to get the H5_VERS_INFO string as defined in
H5public.h.:
% strings libhdf5.a | grep "HDF5 library version:" % strings a.out | grep "HDF5 library version:"This method works even if the a.out file is "stripped", and even if the binary file is not produced by the host machine.
Given a library file such as libxyz.a that calls HDF5 functions, how can I easily determine which version of HDF5 is being used to build (aka compile) libxyz.a?
-
Add the following line to the calling library source, e.g., xyz.c.
/* C automatically merges two adjacent strings into one. */ /* Use non static char string so that it is included always. */ char XYZ_built_with_H5_lib_vers_info_g[] = "XYZ built with " H5_VERS_INFO;Rebuild the library and then the following commands will show the information:
% strings libxyz.a | grep "HDF5 library version:" XYZ built with HDF5 library version: 1.4.4
My HDF5-1.4 library cannot read files created with 5-1.6.0
-
This is a known problem, fixed in version 5-1.4.5-post2.
Prior to 5-1.4.5-post2, HDF5 issued an error and failed to open objects with unknown header messages (for example, metadata about an object added in a later version of the library). It should have just ignored the unknown message and proceeded to open the object.
Can you read an HDF5 while it is being written to?
-
It is possible for multiple processes to read an HDF5 file when
it is being written to, and still read correct data.
(The following steps should be followed, EVEN IF the dataset
that is being
written to is different than the datasets that are read.)
- Call H5Fflush() from the writing process.
- The writing process _must_ wait until either a copy of the file
is made for the reading process, or the reading process is done
accessing the file (so that more data isn't written to the
file, giving the reader an inconsistent view of the file's state).
- The reading process _must_ open the file (it cannot have the
file open before the writing process flushes its information, or
it runs the risk of having its data cached in memory being incorrect
with respect to the state of the file) and read whatever information
it wants.
- The reading process must close the file.
- The writing process may now proceed to write more data to the file.
Here's what needs to be done:
There must also be some mechanism for the writing process to signal the reading process that the file is ready for reading and some way for the reading process to signal the writing process that the file may be written to again.
Performance-wise, how does HDF5 compare to a relational database?
-
It really depends on your application. HDF5 is tuned to do efficient
I/O and storage for "big" data (hundreds of megabytes and more). It will
not work well for small reads/writes.
It doesn't have indexing capabilities, though we are working on some limited features. See the HDF5_Prototype_Indexing_Requirements for details.
HDF5 was designed to complement DBs and not to compete with them.
If you run an application twice on the same machine will it produce identical HDF5 files ?
-
Will netCDF4 make bit-for-bit (BFB) reproducible files?
In other words will running a deterministic model twice on the same
machine (without changing compiler, netCDF library, etc.) produce
identical output files?
I thought that netCDF4 would, like netCDF3, use deterministic algorithms without any date-stamps within the file, and thus produce BFB files. Apparently I am wrong.
To determine whether files were BFB I checked their SHA1 sums. netCDF3 produces BFB files and netCDF4 does not. BFB output files make models easier to debug, so it would be helpful if netCDF4 continued the netCDF3 BFB "tradition". Is this possible? Am I missing something? Is HDF5 the culprit?
Response: If you turn off the create/modify/access time tracking for objects created (with the H5Pset_obj_track_times() routine), everything should be bit-for-bit reproducible. Coincidentally, it makes accessing those objects faster and the size of their metadata smaller also. You do lose the ability to know when the object was created/modified/accessed.
GROUPS
H5Gget_obj_info: How do you tell the difference between a soft and hard link? If my reader reads an HDF5 file with a loop in it, then my code goes into an infinite loop, also.
-
Currently, this issue is not easy to resolve, but we will be
addressing it in a future release. We plan to provide a function that
will return a table of contents, so this will no longer be an issue.
Right now what you have to do is traverse all the objects in the file. For any object for which the link count is > 1, record it in a table, using the 'objno' as a key. (See H5Gget_objinfo) The first time you see it, there will be no entry in the table, subsequent visits will find an entry.
When you hit a group that you have already visited (it's in the table described above), you are about to loop, so stop.
If all you want is to avoid loops, then you only need a table of objects with link count > 1, so you know that you have visited them.
If you need a table of contents for the whole file, then collect all the info you need, and use the objno as a key to avoid loops.
The following example illustrates how this could be done. However, this code is not considered "finished" or exemplary code:
h5_git.cThe H5Gn_members and H5Gn_get_obj_info_idx calls in the example will be added to a future release of HDF5.
Iterating (H5Giterate) takes a long time when you have lots of objects. Can this be sped up?
-
Iterating does take a long time, and this is something we
are trying to speed up (12/19/02). An alternative might be
to store the names of the objects in a dataset, reading this
and then accessing the object by its name. You could
use Object References to do this.
IDENTIFIERS
How do you get the name of an object opened with H5Rdereference?
-
You can't. There is no way to get the name of an object opened with
H5Rdereference. The H5Iget_name function
cannot be used to get the name
from an object opened with H5Rdereference.
However, if you need to compare two object identifiers, there is a
workaround. You can determine if two object identifiers point to the
same object, by using H5Gget_objinfo:
hid_t obj1, obj2; H5G_stat_t stat1, stat2; H5Gget_objinfo(obj1, ".", flag, &stat1); H5Gget_objinfo(obj2, ".", flag, &stat2);and then compare
stat1 and stat2.
Although there are issues with creating a function that will return the name of a dereferenced object, we are planning on adding this function to HDF5. For more information, see:
derefobjname.txt
IMAGES AND PALETTES
How do you store a true color image in HDF5 ?
- There are two ways to store true color images - pixel vs. plane
With both you will have a 3D dataset. With pixel interlace mode,
it will be stored like this: [height][width][pixel_components]
For plane interlace the data will be store as: [pixel components][height][width]
Refer to the Image and Palette Specification for further details.
What kind of palettes are supported?
-
The Image and Palette
Specification covers the types of images and palettes supported in HDF5.
Currently the two supported types of palette are "STANDARD8" or "RANGEINDEX".
(We may add more types into the future.)
You can create your own image palette. An image palette in HDF5 is just another dataset. For example, if you were only interested in values between 50 and 100, you could create a palette such that all values over 100 were white and all values smaller than 50 to be black. Something like this:
index red green blue 0 0 0 0 ... 49 0 0 0 50 whatever color you want ... 100 whatever color you want 102 255 255 255 ... 255 255 255 255Then add a palette attribute in the image dataset to point to the palette you created. An image can have more than one palettes.
If you have an X by Y image in Z bands how would you store that in HDF5 ?
-
We don't address using bands with the Image Specification. You would
have to store each band as a separate image.
How do you save an image made from (r,g,b) floating point values?
-
Basically, you would save the dataset with a floating point datatype.
See the implementation of the HDF5 High Level function, H5IMmake_image_24bit, in the HDF5 source code. To save an image made from (r,g,b) floating point values, just replace the following call in this function:
if ( H5LTmake_dataset( loc_id, dset_name, rank, dims, H5T_NATIVE_UCHAR, buffer ) < 0 ) return -1;with:
if ( H5LTmake_dataset( loc_id, dset_name, rank, dims, H5T_NATIVE_FLOAT, buffer ) < 0 ) return -1;Save the result in another function name (for example, H5IMmake_image_24bit_float).
INSTALLING/BUILDING HDF5
Having problems building an application with pre-built libraries.
-
The HDF5 libraries are built with enabled external deflate and szip filters.
Therefore when an application is linked with the HDF5 prebuilt libraries,
linking may fail if the gzip and/or szip libraries are not found by the linker.
- Download the prebuilt binaries for your desired platform from the
HDF5 FTP Site,
and install them on your system. We will refer to the directory where
HDF5 is installed as /HDF5_INSTALL.
- Run the h5redeploy script in the /HDF5_INSTALL/bin directory to modify
the h5cc/h5fc/h5c++ scripts to point to the /HDF5_INSTALL/... directories.
- Check the libhdf5.settings file under the /HDF5_INSTALL/bin directory to
see which external filters are enabled. For example, look for the following line:
I/O filters (external): deflate,szip
Note: On some platforms (e.g. Crays) the szip library is not available, therefore only the deflate filter is used.
- Make sure that your system has the required gzip and szip libraries
installed.
NCSA provides the prebuilt zlib and szip libraries for all supported platforms and compiler options (for example 64-bit libraries for SunOS-5.7(8), IRIX64-6.5, AIX 5.1, etc.) You may download the prebuilt zlib and szip libraries from ftp://ftp.hdfgroup.org/lib-external/zlib/ and ftp://ftp.hdfgroup.org/lib-external/szip/ correspondingly or build and install the libraries yourself.
- Make sure that the linker finds the correct library.
For example, when building a 64-bit application on AIX 5.1 make sure
that you use 64-bit zlib and 64-bit szip libraries along with the 64-bit
HDF5 Libraries.
- We recommend that you use the h5cc/h5fc/h5c++ scripts to build your
application with the prebuilt HDF5 Libraries.
Before you use the script
- make sure that h5redeploy was run in the bin directory to
modify the scripts to point to the directory where HDF5 is currently
installed.
- make sure that the required zlib and/or szip libraries are
- installed in the system directories on your system
or
- installed somewhere on your system and h5cc/h5fc/h5c++ are edited
to point to the correct location.
To see the flags that are used by the h5cc/h5fc/h5c++ scripts use
h5cc -show h5fc -show h5c++ -show
commands.
- installed in the system directories on your system
- make sure that h5redeploy was run in the bin directory to
modify the scripts to point to the directory where HDF5 is currently
installed.
- If you still have problems, we recommend you
- build and install the HDF5 libraries from the source code on your system
- then use h5cc/h5fc/h5c++ to build your application
- If it still doesn't work, please send email to THG Helpdesk
and be ready to follow steps 0-6 with us :-)
Below is the list of steps that we hope will help you to successfully use the prebuilt HDF5 libraries:
How can you determine what compiler/flags are used by an HDF5 installation?
-
The following files in the
lib/ directory of the HDF5
binaries give this information:
libhdf5.settings - HDF5 C library libhdf5_cpp.settings - HDF5 C++ (if built) libhdf5_fortran.settings - HDF5 Fortran library (if built)
The mtime test fails with the message 'Old modification time incorrect.', when building with an unsupported platform/compiler
-
This error is probably due to HDF5 being unable to get the current time
on your platform/compiler, and is not a serious problem. To verify that
the rest of the tests will run correctly, edit test/Makefile, remove
mtime from the TEST_PROGS variable, and re-run 'make check'.
This problem has occurred when using gcc on Solaris, which is not supported and tested.
Can't open shared library: /lib...s#.0
-
The software finds and is trying to open the specified
shared library. If you are on a Unix machine (but not Mac OS X), just add
the path to this library to LD_LIBRARY_PATH.
If you are on Mac OS X, you must either build the shared library from
source, or change the path with install_name_tool:
install_name_tool -change <oldpath> <newpath> <library>You can use
otool -L to see what the shared path is for
a given library.
Gets -DPIC error building HDF5 on AIX w/F90
-
This problem is a bug in the libtool software that HDF5 uses.
To get around it do NOT build the shared libraries. Specify the following options when configuring:
--disable-shared --enable-static
Tests fail when building HDF5 on HP-UX 10.20
-
When building HDF5 on HP-UX 10.20, the hyperslab tests fail. We
do not test or support HP-UX 10.20.
One user was able to resolve this error by editing the libtool script to NOT use the pic option (pic_mode=no). (He noticed that the HDF5 tests were compiled with "+Z -DPIC", but linked with a static library; the +Z option probably cannot be used with a static library?)
How to build HDF5 Fortran with Intel Compilers
-
If you have problems building the HDF5 Fortran APIs with the Intel
compilers on Unix systems, please do the following:
- Use the -fpp -DDEC$=DEC_ -DMS$=MS_ compiler flags to disable
DEC and MS compiler directives in source files in the fortran/src,
fortran/test, and fortran/examples directories.
See section 5.7 of the release_docs/INSTALL file. For example:
setenv F9X 'ifc -fpp -DDEC$=DEC_ -DMS$=MS_' ./configure --enable-fortran ... make .... - If step 1 doesn't work, run the following script (courtesy of
Hugh C. Pumphrey) to remove DEC and MS compiler directives from the
source in each fortran directory (fortran/src, fortran/test,
fortran/examples) before the configuration step:
#! /bin/bash # script to forcibly disable directives like !DEC$Foo and !MS$Bar # in a directoryload of Fortran 90 code. for filename in `ls *.f90` do echo hacking $filename mv $filename tmpbollox.txt cat tmpbollox.txt | sed -e "s/\\!DEC\\$/\\!FooDECS/g" \ -e s/\\!MS\\$/\\!FooMSS/g > $filename done # end script
exit subroutine.
Comment out the line:
IF (total_error .ne. 0) CALL exit (total_error)
Building on MAC OSX, the symbols restFP and saveFP come up as undefined. Why?
-
These are defined in /usr/lib/libgcc.a. To fix this, add -lgcc to your link
line or link against /usr/lib/libgcc.a.
Float to Conversion Tests Fail on AMD with Intel compiler. Why?
-
HDF5 builds properly on the AMD Opteron. However, the Float to Double
Conversion Tests Fail. For example:
Testing random sw float -> double conversions *FAILED*
test 1, elmt 323
src = 80 32 39 ac -1.19983053689896439985e-32
dst = b7 f9 1c d6 00 00 00 00 -4.61246357842685217724e-39
ans = 80 00 00 00 00 00 00 00 -0.00000000000000000000e+00
Answer:
The Intel compiler on AMD Opteron processor doesn't support denormalized floating values by default. For HDF5-1.6, the data conversion test uses random values. For 5-1.7, we have changed it to a warning if such a problem occurs.
To enable support for denormalized values, add the option -mp to the compiler. If you do not want to sacrifice speed for this feature, go to the test program and comment out the test.
In one sentence, this is a feature, not an error.
Building 5-1.6.2(3) using Intel 8.1, get ULLONG_MAX error.
-
When building HDF5-1.6.2 (or 5-1.6.3) using Intel 8.1, the make
fails with the error:
...src/H5S.c(842): error: identifier "ULLONG_MAX" is undefinedThe problem is resolved by adding the -c99 option when building. With 5-1.6.4, we added this option to the build so the problem no longer occurs.
This error does not occur with versions of Intel prior to 8.1.
AIX Configure failure: "...config.sub: too many arguments"
Problem:
Configure failed on AIX systems with the following messages:
checking build system type... /usr/bin/oslevel[7]: /usr/bin/rm_mlcache_file: cannot execute config.sub: too many arguments
Answer:
Quick fix:
ask the AIX system administrator to "chmod 0 /usr/bin/oslevel".
Details:
Configure calls config.guess to find out what system it is on. Config.guess figures that it is on an AIX system and calls /usr/bin/oslevel to get more AIX specific information. /usr/bin/oslevel calls /usr/bin/rm_mlcache_file for some information but /usr/bin/rm_mlcache_file is changed to public inaccessible due to security problem AIX uncovers. This causes the error messages that cascade back to configure which then calls config.sub which does not like the parameters and aborts. That causes configure to fail.
% ls -lc /usr/bin/rm_mlcache_file ---------- 1 root system 12726 Apr 24 10:23 /usr/bin/rm_mlcache_fileThe quick fix is that if /usr/bin/oslevel is not executable, config.guess will not try to call but proceed on and that is okay. So, asking the system administrator to change oslevel to non-executable should fix it. The long term fix would be for AIX to make oslevel to work properly.
Using Purify on HDF5 library, get uninitialized memory read error
-
This error message is spurious.
To use Purify on the HDF5 library, you must set the "-D H5_USING_PURIFY" flag for the CFLAGS environment variable. This is done during the configure step, as follows:
env CFLAGS="-D H5_USING_PURIFY" ./configureThen the 'make' command doesn't need any extra flags, etc.
When Building HDF5, the Object Header test, "Testing message deletion", fails.
The "gmake check" fails as follows , when running the Object Header (ohdr) tests:
...
Testing object header overflow on disk PASSED
Testing message deletion *FAILED*
at ohdr.c:220 in main()...
HDF5-DIAG: Error detected in HDF5 (1.8.0-beta5) thread 0:
#000: H5Omessage.c line 948 in H5O_msg_remove(): unable to remove object header message
major: Object header
minor: Can't delete message
#001: H5Omessage.c line 1124 in H5O_msg_remove_real(): error iterating over messages
major: Object header
minor: Object not found
#002: H5Omessage.c line 1255 in H5O_msg_iterate_real(): unable to decode message
major: Object header
minor: Unable to decode value
#003: H5Omtime.c line 220 in H5O_mtime_decode(): badly formatted modification time message
major: Object header
minor: Unable to initialize object
*** TESTS FAILED ***
This problem is due to how HDF5 translates the timestamp to UTC time from mktime. It is reproducible if the TZ variable is set to "EET-2EEST", which is the timezone for Eastern Europe. The timestamp which causes this error is encoded as "19700101003321", which represents Jan. 1, 1970 00:33:21. When HDF5 tries to translate this to UTC time from mktime, it subtracts 2 hours, thus putting it in pre-Epoch time, which is known to cause problems on Windows and AIX.
If you encounter this error, you can ignore it by using the the environment variable $HDF5_Make_Ignore to tell the hdf5 Makefile to ignore test errors and continue on. For example:
env HDF5_Make_Ignore=yes gmake check
If a test fails, make will print a message (echo "*** Error ignored") and continue. Therefore, you can search the output for the string "Error ignored" to see if any tests failed.
Building on AIX 64-bit, get: ERROR: No csects or exported symbols have been saved.
-
This error has to to do with shared libraries. Try building just
the static libraries, by configuring with
--disable-shared --enable-static.
JAVA
Java error: can't find HDF5 file format
-
The problem may be caused by:
- not finding the Java classes
- the dynamic link library is not linked correctly
- the file format is not registered
-
import ncsa.hdf.object.h5.*;
If you cannot do that, it means that the hdf5 package is not in your classpath
-
H5File h5file = new H5File(filename);
h5file.open();If that fails, it means the dynamic link library is not linked correctly. Check your environment variable setting for your application. Make sure the path which contains the dll is in your path.
- Make sure the following code is called in your server application:
try { Class fileclass = Class.forName("ncsa.hdf.object.h5.H5File"); FileFormat fileformat = (FileFormat)fileclass.newInstance(); if (fileformat != null) FileFormat.addFileFormat("HDF5", fileformat); } catch (Throwable err ) {;}
Put the following code in your test code:
How do you read/write a string dataset in Java?
-
See example under:
stringdata/
Data values are reversed/swapped/transposed in HDFView. Why?
-
This is a programming language issue:
- If your file was created with a C program, then the data is stored
with the assumption that the last dimension of the slab varies fastest
("row-major order").
- If your file was created with a F90 application, then the data is stored with the assumption that the first dimension varies fastest ("column-major order").
The data itself in HDF5 is exactly the same. Because HDF Java consists of wrappers around C, it would read the data in row-major order. If your data was written with a Fortran application, then the data would appear to be transposed in HDFView.
Since HDFView doesn't know how the file was created, it gives the user the choice of swapping the dimensions, if need be. You can change the order that the data is read and viewed as follows:
-
Select the dataset by clicking with the left mouse button.
Then click the right mouse button and select "Open As".
This will pop up a window. You can change what you want to see for the height, width, and depth from this page.
Do you support Java 64-bit with HDF-JAVA?
-
We do not support 64-bit HDF-Java with the current released software. It
requires a lot of work to build HDF-Java on a 64-bit machine.
- build the external libraries (jpeg, zlib, szip) with 64-bit flag
- build 64-bit hdf4
- build 64-bit hdf5
- make changes to various configuration files
- build jni with 64-bit flag
- build Java classes with 64-bit Java
You will have to:
We are currently providing Java 64-bit support on a platform by platform basis, as we obtain funding to do so. We have been funded to build 64-bit HDF-Java for Opteron Linux and Solaris. The pre-compiled beta versions for these platforms can be found at:
ftp://ftp.hdfgroup.org/HDF5/hdf-java24-beta/bin/
What are the limitations of the HDF Java interface?
- Compound datatypes can only be read by field
- Writing to variable length strings is not supported. You can
actually create them with HDF Java, just not write to them.
- Property lists are not completely supported. For example, the
file family is not supported in HDFView. It is in HDF Java,
but is not tested. In general, the file drivers are not supported
with HDF Java.
- Datatypes:
bitfield, region references, and Date types are not supported
Dataset region references use a 12-byte structure which is not supported in the Java code. With version 2.4, HDFView will show the selection that was made when you look at the dataset region reference, but you won't actually be able to see the selection itself. This is a difficult thing to fix.
What kind of palettes are supported in HDFView?
-
The Image and Palette
Specification covers the types of images and palettes supported in HDF5.
Currently the two supported types of palette are "STANDARD8" or "RANGEINDEX".
(We may add more types into the future.)
HDFView only supports an indexed RGB color table with 256 colors (8-bit) or a 24-bit true color image. It does not support an image with any other color table. When an image has an unsupported color table, HDFView will display the image with one of the default color tables (grey, nature, wave and rainbow).
Also see the following question under "Images and Palettes": What kind of palettes are supported?
java.lang.UnsatisfiedLinkError: no jhdf5 in java.library.path
-
This error means that the HDF5 library cannot be found. Check that the paths
are set up properly (CLASSPATH, LD_LIBRARY_PATH). See the
NCSA HDF Object Package - How to use it page, for more information.
Will you be adding a pure Java interface to HDF5?
-
We have no plans on doing this.
Installing HDFView, cannot open shared object files
-
Executing
hdfview_install_linux_vm.bin fails, as follows:
$ ./hdfview_install_linux_vm.bin Preparing to install... Extracting the JRE from the installer archive... Unpacking the JRE... Extracting the installation resources from the installer archive... Configuring the installer for this system's environment... awk: error while loading shared libraries: libdl.so.2: cannot open shared object file: No such file or directoryThere are errors on many other shared libraries, as well.
This problem occurs on Linux 2.6. Although we do not support HDFView on Linux 2.6, the following workaround has helped other users:
- Run hexedit on hdfview_install_linux_vm.bin and change line 2251 from:
export LD_ASSUME_KERNEL=2.2.5
to#xport LD_ASSUME_KERNEL=2.2.5
- Then edit a few lines at the top of hdfview.sh so that HDFVIEW_HOME and JAVAPATH are set properly.
Another solution which worked for a user was to copy an already working HDFView installation on a Linux 2.4 machine to the Linux 2.6 machine. Then modify the HDFVIEW_HOME and JAVAPATH in hdfview.sh. This user was using JDK 1.6.
What are the issues regarding replacing the JNI?
-
There are no great solutions regarding replacing or eliminating
the JNI. There are only two approaches that could be considered
feasible: reimplement in Java, and a CORBA service. A third
possibility is some sort of Web service that sends out XML, but
that's basically the same idea as CORBA, using different technology.
- Reimplement in Java
The HDF5 library is much more than 100K lines of C, plus another 100K or so of unit tests, etc. There are >10 programmer years invested. Clearly, reengineering would be very costly--even if the library was totally stable and easy to understand. Also, although some of the code would be simple and elegant in Java, other parts would definitely not be.Besides the scope, there are a number of features in the library and format that are very C-specific:
- parallel I/O with MPI/IO (MPI/IO is inside the library and wired into the format)
- zlib encryption (we import the zlib library -- in C)
- iterators that take C functions as arguments, i.e., as callbacks
- various aspects of datatypes that pass around "void *", assuming you can cast, etc., structures, variables length arrays, and so on. These are C features wired in.
We'd have to come up with emulation for as much of this as possible.
A third issue is that you would really want to deal nicely with Java datatypes, not translating to C. This would be a whole library in itself.
And fourth: how do we assure interoperability of Java and C libraries. It would take a lifetime to write a cross verification suite.
Overall, this would be a huge undertaking, with quite a lot of unknowns.
- A CORBA server (or similar idea with your favorite technology)
The JNI is basically providing a Java <-> the real C library layer. JNI is a terrible instrument, so why not do it the right way: CORBA.We've done explorations, so we know we could make a CORBA server that exports the HDF5 API. Java programs can talk directly or remotely to this layer. See:
/HDF5/XML/JSPExperiments/index.html
Good points:
- The I/O uses the real C library, which guarantees interoperability, has all the features, etc. Also, we can chase the library and format as they evolve.
- The server resides local to the data, which is definitely a plus for really big files or collections.
- The user app can be pure Java, and can even be remote.
- CORBA has all the right stuff, including security, etc.
- Same server can work with Smalltalk, etc., not just Java.
- This plugs in to Web services, etc., pretty well
Bad points:
- CORBA is painful and complex (and often expensive),
- CORBA has its own datatypes, so you have to deal with that.
- Several layers in the stack, who knows what the performance might be.
Creating a full feature CORBA server would be quite a bit of work, possibly a couple of programmer years. This estimate is based on our previous work which indicates a number of important tasks:
- Design reasonable interface, i.e., a network protocol -- pretty complex
- Special problems, such as caching, fiddling datatypes, etc., to implement the server layer
- Performance issues
We would also need a good free CORBA, and ideally a little mini ORB that could be packed with the library, etc.
Overall, using CORBA would probably be the preferable choice over reverse engineering the C library.
The Problems
The first basic problem is how to access HDF5 data from Java programs. Java can read files, but HDF5 files really are persistent objects that must be interpreted by the HDF5 library. So reading the bits isn't sufficient; you need to implement the algorithms of the HDF5 C library.Secondly, we need to replicate as much of the HDF5 library as possible. For a given use, it might be possible to write something that peeks into HDF5 files, extracts what you need, etc. However, we (NCSA) probably don't want to do a limited case.
Thirdly, is the issue of interoperability. Whatever we do must be able to exchange data with files created by the C library. This is a pretty stringent requirement which rules out some easy solutions, such as just writing and reading serialized objects.
Analysis
Summary
Basically, Java doesn't really want to play with C or any other language, and HDF is really designed to be used as a C (and secondarily Fortran) library. There is no trivial solution.Is parallel HDF5 supported with java?
-
No.
Can HDFView handle bit-field data?
-
No. You can store bit-field data in HDF5, but HDFView does not
know how to deal with bit-field data.
Can't get HDFView to work on HP-UX
-
From the release notes for JDK 1.2 on HPUX 11:
-
"C and C++ Libraries
HotSpot requires use of the HP aC++ compiler for any application C or C++ libraries loaded dynamically at runtime. Libraries compiled with the cfront HP C++ compiler will not work with HotSpot."
Note that all our products are portable C, not C++.
With proper modifications the JHI5 was able to compile with aCC. This involved making the C capable of being compiled with either C or C++, especially alternate calls the JNI depending on the languages.
However, the library was unable to link with the HDF5 library which is compiled with straight C. The HDF5 library cannot be compiled with aCC, at least not without significant effort.
Therefore, we cannot provide the JHI5 for HPUX 11, and we cannot provide HDFView for HPUX 11.
HDFView installs properly on Mac but crashes when selected
-
There are several problems with HDFView on the Macintosh. First,
we took HDF4 out of HDFView 2.2, because it caused problems with
crashes/hangs. If you are using HDFView 2.1, this could be the
problem.
Another problem is that the installation requires install_name_tool to change the path of the dynamic link library. If the install_name_tool is not installed, then HDFView will not find the dynamic link library.
Also, due to a bug in JDK 1.4 on Mac OS X, the HDFView tool cannot be run by clicking on it from the desktop. The Java virtual machine in JDK 1.4 cannot obtain the environment variable, DYLD_LIBRARY_PATH, from the system. Therefore, HDFView cannot find the HDF5 library. This problem does not occur with JDK 1.3.
To work around this problem you can run HDFView from a command line prompt, by going to the directory where it is located and typing:
open -a <absolute path to HDFView executable>
You can also use Mac's 'explorer' to get to the location of the
HDFView executable, and then click on it.
HDFView: connection time out errors running behind a firewall
-
Although HDFView itself does nothing with proxies, a user found that
when his system wide proxy was not set up properly, he would get
a
java.net.ConnectException: Connection Timed Out error when
he first brought up HDFView. After that, everything would be okay.
Since the HDFView binaries ares built on networked machines, it is possible that somehow certain java network classes are getting pulled in.
Will HDFView be supporting variable length strings soon?
-
We have added the ability to view datasets with variable length strings in
HDFView, but not to edit them. This is on our to-do list, but it is
difficult to implement in Java.
How do you increase the Java Virtual machine memory?
-
You can increase the heap size for the Java virtual machine by
the "-mx" option. For example, to increase the heap size to 512MB,
specify: java -mx512m
On Linux, type free to see how much memory you have on your machine.
JVMDG305: Java core not written, unable to allocate memory for print buffer.
Get EXCEPTION_ACCESS_VIOLATION when more than 1024MB is allocated to the Java Virtual machine.
When using the HDF Java package to read a dataset out of an HDF5 file, it works as expected when only 1024 MB are allocated to the java virtual machine. However, it encounters an unexpected error and crashes the VM at larger amounts. The basic process is opening the file, retrieving the file structure, getting and setting the start dimensions, and calling dataset.read. Then the file is closed.
This is a Java virtual machine problem. We have no solution for it at this time.
How to workaround error Exception - dataset too big ?
-
The size of the dataset to create or open is limited by the Java Virtual
machine, which is limited by the machine RAM.
If your machine has enough memory to hold the data and you still get an out of memory error, the Java virtual machine may run out of memory. You can increase the heap size for the Java virtual machine by the "-mx" option. For example: java -mx512m
Error - variable HDF5CDataTypes not found
-
We removed the HDF5CdataTypes from the java library
as of HDF5-1.6.1.
Now, you should use the constants just as they are used in HDF5 library constants. The JNI will translate for you. In other words, use H5T_xxx instead of JH5T_xxx.
Do you have an example of writing/reading compound datatypes in java?
-
With java, each field has to be
written out one by one; you cannot write out the whole structure
at once, like in C. When reading, in java you cannot easily get
the size of the compound datatype; this has to be obtained piece
by piece. (So basically it takes a really complicated piece of
code to do this.)
You can look at the source code for our HDFView tool to see how it handles compound datatypes. The hdf java pages are at:
/hdf-java-html/The java source code is at:
ftp://ftp.hdfgroup.org/HDF5/hdf-java/
How do you insert JAR files into JBuilder ?
-
If you are using JBuilder, here is what you need to do to
add jar files to the project. On the top menu of the IDE, select:
Project --> project properties --> paths --> required libraries --> addThe following page has some information which may be of use to new users of JBuilder:
http://bdn.borland.com/article/0,1410,29008,00.html
PARALLEL HDF5
What do you need to run Parallel HDF5 ?
-
You need MPI and MPI I/O. (If you can't tell whether you have
MPI I/O working on your system, let us know.)
What performance can you expect from Parallel HDF5 ?
-
HDF5 cannot do better than MPI I/O on your system. Usually
HDF5 parallel applications have little overhead over MPI I/O
applications on the same system. If MPI I/O performs well, then
you should expect good performance from Parallel HDF5.
If you want to compare the performance of MPI I/O and Parallel HDF5 on your system, you can use the h5perf program that is built along with the parallel library. This is under the ./<HDF5 source code>/perform/ directory of the source code.
Does HDF5 support compression with parallel HDF5 ? If not, why ?
As of HDF5 1.6.3, you can read compressed data but cannot write in parallel.
Why do we not support writing of compressed data in parallel? Compression uses chunking. Since chunks are preallocated in the file before writing, chunks have to be of the same size. However, the size of the compressed chunk is not known in advance.
Chunks are preallocated in the file to avoid the following problem: we allow independent I/O on raw data (with H5Dwrite), but require collective operations to operate on metadata (like the B-tree that tracks the chunks in a chunked dataset or the "free space in the file" metadata (for allocating space when a compressed chunk changes size)). Therefore, in order to allow independent raw data I/O (and simplify the collective raw data I/O), we require the chunks to be preallocated (so we don't have to change the chunk B-tree) and disallow writing to compressed chunked data and variable-length datatypes (so we don't have to allocate/free space in the file) when performing parallel I/O.
Does Parallel HDF5 support chunking ?
-
Yes. It is not necessarily efficient, though.
Can you run parallel HDF5 and the threadsafe feature together ? What about Parallel HDF5 and C++?
-
No, the threadsafe and parallel (MPI-parallel) configurations are NOT
compatible. You would need to do separate builds for each.
This is also true of C++. You cannot configure Parallel HDF5 with the --enable-cxx option.
How do you write to attributes independently ?
-
Right now, you cannot. The workaround is to use the H5D interface instead.
Does Parallel HDF5 support variable length datatypes ?
-
Currently, it does NOT.
Most HDF5 calls do not make a statement about whether they are being called collectively or independently. How are they being called?
-
All H5F, H5A, and H5G calls are collective. All H5D calls,
except for H5Dwrite and H5Dread are collective. H5Dwrite
and H5Dread are both.
For the H5P and H5S interfaces, it depends on each API call.
Problems installing Parallel HDF5 on IBM Regatta (with AIX5)
-
When running configure and then make, errors similar to the following come up:
Macro name H5_PACKAGE_NAME cannot be redefined. "H5_PACKAGE_NAME" is defined on line 312 of ../../src/H5pubconf.h. Macro name H5_PACKAGE_STRING cannot be redefined. ... etc ...This indicates that the HDF5 library was being built with the wrong C compiler that does not support MPI. Follow the instructions in the
./release_docs/INSTALL_parallel file, under the "IBM SP"
section. Though it is geared towards
a particular IBM installation, it does applies to the Regatta.
The only exception is that you probably don't do the:
setenv LLNL_COMPILE_SINGLE_THREADED TRUEOn the other hand, IBM has hundreds of environment variables and various compilers. Each site has various individual settings. Therefore, you should first try to compile and run some simple MPI-IO programs, both C and Fortran90, with the compilers,
mpcc_r and
mpxlf_r, respectively. Once you get your MPI parallel
environment set up, you can proceed with the instructions mentioned above.
How to write and NOT to write compound datasets using F90 in Parallel HDF5
-
Here are two examples of writing compound datatypes using F90 in Parallel
HDF5:
- compound_pall.f90 - Right way
- compound_p.f9 - Wrong way (for now)
Both examples write a one dimensional array of size 16 using 4 processes. Elements of the array are structures:
char*2
integer
double precision
float
The compound_p.f90 program tries to write the dataset by
each process writing one field: process 0 writes character field (16 elements) process 1 writes integer field (16 elements) process 2 writes double field (16 elements) process 3 writes real filed (16 elements)However, this is NOT currently supported by the HDF5 Library (i.e. C example fails too). It will take a lot of work to implement this, but it could be done.
The compound_pall.f90 program writes a dataset by each process
writing all 4 fields and its own portion of the data array:
process 0 writes character, integer, double and real fields for elements 1 through 4 process 1 writes character, integer, double and real fields for elements 5 through 8 process 2 writes character, integer, double and real fields for elements 9 through 12 process 3 writes character, integer, double and real fields for elements 13 through 16This works for both independent and collective writes, and unfortunately is the only way now for Fortran to write a compound datatset in parallel. It is very cumbersome and inefficient.
MPI ... failed: array services not available
-
This error indicates that some services needed for MPI are not running.
Contact your system administrator for help. Following is a sample program
for C and Fortran that uses MPI I/O, but does not use HDF5. If you can get
this to run, then you should be able to get HDF5 to run:
Sample_mpio.c Sample_mpio.f90One user reported that to turn on the array services he logged in as root, ran chkconfig array on, then restarted his machine.
How do you write data when one process doesn't have or need to write data ?
-
The following examples show how to write data collectively and
independently when one process doesn't have data or does not need to write
data.
- coll_test.c - Uses H5Sset_none to tell
H5Dwrite call that there will be no data. 4-th process HAS to participate
since we are in a collective mode.
- ind_test.c - Specifies which process writes data. H5Dwrite is not called by the 4-th process at all in this case; this approach will work only when independent mode is used.
PERFORMANCE
The Chunk Cache and Chunked Datasets:
-
The H5Pset_cache call can be used to tune and enable/disable the
chunk caches associated with each open chunked dataset for your
application. Unfortunately, these adjustments apply to all open
chunked datasets in the file. At present, there is no way to
tune chunk caches on a per dataset basis.
A chunk cache serves to cache chunks of raw data from an associated open dataset, and goes away when the dataset is closed. The chunk caches exist primarily to reduce the overhead involved in reading and writing partial chunks.
If you create and/or open a large number of datasets, and then read or write to them all, the associated caches will fill up, and use a huge amount of memory.
If this is a problem, the obvious solution is to not to have a lot of chunked datasets open at once.
If that is not an option, you can also disable the chunk caches by setting the number of elements in the raw data cache (rdcc_nelmts) and the maximum number of bytes in the raw data cache (rdcc_nbytes) to zero using the H5Pset_cache / h5pset_cache_f call. This function must be used to modify the file access property list that is passed to h5fcreate_f(). Use H5Pget_cache / h5pget_cache_f to get the current values of the parameters you don't want to change. Again by default, each chunk cache can contain up to 1 MB of chunks.
If you only read and write integral chunks, and don't access the same chunk more than once in a short period of time, you should not see a performance hit if you disable chunk caching.
However, if you access partial chunks and disable the chunk cache, you will take a significant performance hit. For example, if you disable the chunk caches and write a chunk in several pieces, on each write after the first one, the library will have to read the chunk back into memory, decompress it (if required), write the new data to the uncompressed version, recompress (if required), and then write the modified chunk back to disk. In contrast, if the chunk cache was enabled, and the chunk was in cache, the write would just modify the cached image of the chunk.
Partial reads would not be quite as bad, but they would still be painful.
By default, the chunk caches are limited to 1 MB. If your chunk size is larger than this, and you don't always access integral chunks, you will probably want to increase the chunk cache size so that the cache can contain at least one chunk.
More generally, you want to choose chunk cache size so that the chance of a chunk staying in the cache until you are done with is high. However, this is very much access pattern dependent.
On the other hand, chunk caches take up space, so you don't want them to be any bigger than necessary. Remember, each open chunked dataset has its own cache. If you have a lot of open chunked datasets, the caches can eat up RAM quickly.
Needless to say, you can play games with chunk size to increase your chances of reading and writing integral chunks. Don't go too far with this, as small chunk sizes can also cause performance problems.
As long as chunks are large in comparison to the logical block size of the underlying file system, you shouldn't run into major performance problems on that front. If you use compression, remember to consider it in your choice of chunk size, as the underlying file system will be dealing with compressed chunks.
Another reason for relatively large chunks is the library overhead required for each chunk. The library has to keep track of each chunk, and large numbers of chunks result in large indexes (which are maintained in B-Trees). If the B-Tree is too big to fit in the metadata cache, you will see significant performace hits on chunk reads and writes. Also, since the B-Trees must be stored in the file, you will increased file size.
Following is a piece of code that shows the use of H5Pset_cache:
hid_t file; hid_t fapl; hsize_t dimsf[3]; /* dataset dimensions */ hsize_t ch_dims[3]; /* chunk dimensions */ herr_t status; int mdc_nelmts, rdcc_nelmts; size_t rdcc_nbytes; double rdcc_w0; /* when opening the file for read, set cache size */ /* need to do the same when writing the dataset */ fapl = H5Pcreate (H5P_FILE_ACCESS); /* to be safe, retrieve the current settings, and reset only the total size */ status = H5Pget_cache (fapl, &mdc_nelmts, &rdcc_nelmts, &rdcc_nbytes, &rdcc_w0); rdcc_nbytes = /* size of the cache, at least the size of a chunk in bytes */; status = H5Pset_cache (fapl, mdc_nelmts, rdcc_nelmts, rdcc_nbytes, rdcc_w0); file = H5Fopen (DATAFILE, H5F_ACC_RDONLY /* or RDWR */, fapl); /* when creating: file = H5Fcreate (DATAFILE, H5F_ACC_RDONLY /* or RDWR */, fapl, H5P_DEFAULT); */
PROPERTIES
With HDF5-1.6.0-post*, a fix was added for using H5Pset_istore_k and H5Pget_istore_k. This fix required a change to the HDF5 format. What are the implications of this ?
-
If you never use this property then new versions of HDF5 can read files
created with old versions and vice-versa.
However, if you start using it, then old versions prior to HDF5-1.6.0-post* will not be able to read those HDF5 files created with the property list.
What function do you use to get compression level information?
-
The function H5Pget_filter_by_id returns this information. This function
is new with HDF5-1.6.0.
How do you use the stream driver in HDF5-1.6.2?
-
Use the Reference Manual entry for H5Pset_fapl_stream() and the
test program test/stream_test.c in the HDF5 source code. Eventually
we will add this information to the User's Guide.
How do you work with a file created with the file family feature?
-
The file family feature is specified as a File Access Property List.
The Property List topic in the HDF5 Tutorial talks about this feature.
To access a file with the file family feature, when you open or create
the file, the name of it should include a printf integer
format specifier (which gets replaced with the family member number). For
example, junk%d.h5 would result in files, such as,
junk0.h5, junk1.h5, junk2.h5 ...
There are some problems with the file family feature with the 5-1.6* (and earlier) releases.
Right now, if you are going to use the file family feature, we recommend that when you read the file, you know that it was created with the file family feature and what the file member size is. Also, you must have written enough data to the file when you created it, to fill up the first file member. Otherwise, HDF5 will re-set the file member size to the size of the data that was written.
Here is an example of how you would read a file that was created using the file family feature:
...
#define FILE "junk%d.h5"
main() {
hid_t file_id, fapl;
hsize_t msize;
herr_t status;
fapl = H5Pcreate (H5P_FILE_ACCESS);
msize = 1024*1024;
status = H5Pset_fapl_family (fapl, msize, H5P_DEFAULT);
file_id = H5Fopen(FILE, H5F_ACC_RDWR, fapl);
...
As you can see above, it requires you to know ahead of time that the files
were created with the file family feature, and what the file member size
is. In actuality, for read-only, the size you specify doesn't matter.
That's why h5dump and h5ls can read the file. (You can change 'msize' to
1 in the code above and it works!)
However, if you open the file using the wrong file member size, and try to *write* data to the file, it may not work as expected.
The problems with the file family feature will be fixed in a future release.
Can you work with an HDF5 file in memory ?
Yes. You can create an HDF5 file in memory using H5Pset_fapl_core and use the backing_store parameter to write the data to disk on closing.
Can I subsequently open the file and put the data in memory ?
Yes, with HDF5 1.8.0, you can bring the file into memory, read it, modify it and write it back, using the core driver. With HDF5 1.6, you could only write it back, with no open.
UTILITIES
How do you use the h5cc (h5fc, h5c++) utility?
-
(Below, h5cc is specified but the information applies to h5fc
and h5c++, as well.)
- cd to the bin/ directory in the pre-compiled binaries.
- Run ./h5redeploy and enter yes to the question. This will fix some of the paths used in h5cc.
- Edit h5cc and search for LDFLAGS and CPPFLAGS. Check, and if need be, update the paths for the external libraries (SZIP, ZLIB). If ZLIB is in a default location, then it will probably be okay, but the SZIP path will need to be updated.
You can just type, h5cc -o prog prog.c where prog is the executable that gets created and prog.c is your application.
Use h5cc -show to see what libraries and compiler are used by h5cc.
If building the HDF5 library from source, then the compile scripts should be ready to use without changes.
If using the pre-compiled binaries that we provide, you will need to do a few things before you can use h5cc, as it has site specific paths in it.
After you have copied the files to the final installation directory, do the following:
The h5cc utility should then be ready for use in most cases. However, if you use a different compiler name or if some of the required libraries are in non-standard places, you may need to edit it and modify some other variables. Take a look at these variables in the script:
prefix - Path to the HDF5 top level installation directory
CCBASE - Name of the alternative C compiler
CLINKERBASE - Name of the alternative linker
LDFLAGS - Path to different libraries your application will link with
(this path should include the path to the zlib library)
LIBS - Libraries your application will link with
Why is h5dump slower than h5ls?
-
The h5dump utility creates a table and then displays it, whereas
h5ls displays the data as it reads it. With large files, for
example files greater than 2 GB, h5dump will be very slow, and
is not ideal to use. You may want to look at other tools or ways of
accessing your data if you are having problems due to the size of the file.
Using HDFView may be an alternative. You can also use h5ls -f -r to
get a list of objects and their absolute paths, and then just use
h5dump to view specific datasets.
Can you add an option to h5dump or h5ls to print the version of a file ?
-
No, we do not plan on adding this option. Users should use attributes to
specify the version of a file. There are many reasons why we shouldn't
add this. For example, different objects in the file could be created or
modified by different versions of the library.
- - Last modified:July 01st 2008
