hdf5 icon indicating copy to clipboard operation
hdf5 copied to clipboard

Implement support for complex number datatypes

Open jhendersonHDF opened this issue 1 year ago • 18 comments

Adds the new datatype class H5T_COMPLEX

Adds the new API function H5Tcomplex_create which creates a complex number datatype from an ID of a base floating-point datatype

Adds the new feature check macros H5_HAVE_COMPLEX_NUMBERS and H5_HAVE_C99_COMPLEX_NUMBERS

Adds the new datatype size macros H5_SIZEOF_FLOAT_COMPLEX, H5_SIZEOF_DOUBLE_COMPLEX and H5_SIZEOF_LONG_DOUBLE_COMPLEX

Adds the new datatype ID macros H5T_NATIVE_FLOAT_COMPLEX, H5T_NATIVE_DOUBLE_COMPLEX, H5T_NATIVE_LDOUBLE_COMPLEX, H5T_CPLX_IEEE_F16LE, H5T_CPLX_IEEE_F16BE, H5T_CPLX_IEEE_F32LE, H5T_CPLX_IEEE_F32BE, H5T_CPLX_IEEE_F64LE and H5T_CPLX_IEEE_F64BE

Adds hard and soft datatype conversion paths between complex number datatypes and all the integer and floating-point datatypes, as well as between other complex number datatypes

Adds a special conversion path between complex number datatypes and array or compound datatypes where the in-memory layout of data is the same between the datatypes and data can be converted directly

Adds support for complex number datatypes to the h5dump, h5ls and h5diff/ph5diff tools. Allows h5dump '-m' option to change floating-point printing format for float complex and double complex datatypes, as well as long double complex if it has the same size as double complex

Adds minimal support to the h5watch and h5import tools

Adds support for the predefined complex number datatypes and H5Tcomplex_create function to the Java wrappers. Also adds initial, untested support to the JNI for future use with HDFView

Adds support for just the H5T_COMPLEX datatype class to the Fortran wrappers

Adds support for the predefined complex number datatypes and H5Tcomplex_create function to the high level library H5LT interface for use with the H5LTtext_to_dtype and H5LTdtype_to_text functions

Changes some usages of "complex" in the library since it conflicts with the "complex" keyword from the complex.h header. Also changes various usages of the word "complex" throughout the library to distinguish compound datatypes from complex datatypes.

jhendersonHDF avatar Jul 08 '24 15:07 jhendersonHDF

Additional complex number features that haven't been implemented yet are mentioned in https://github.com/HDFGroup/hdf5/discussions/4631.

jhendersonHDF avatar Jul 08 '24 16:07 jhendersonHDF

Adds hard and soft datatype conversion paths between complex number datatypes and all the integer and floating-point datatypes, as well as between other complex number datatypes

Request. Do not include any datatype conversion paths between complex and simple numeric data types. This will significantly reduce library size and avoid controversies around conventions and other nuances. Conversions between complex and simple should be left up to the application domain, not baked into the HDF5 library.

Dave-Allured avatar Jul 09 '24 15:07 Dave-Allured

Request. Do not include any datatype conversion paths between complex and simple numeric data types. This will significantly reduce library size and avoid controversies around conventions and other nuances. Conversions between complex and simple should be left up to the application domain, not baked into the HDF5 library.

P.S. My apologies for not catching this sooner and preventing extra work.

Dave-Allured avatar Jul 09 '24 15:07 Dave-Allured

Adds hard and soft datatype conversion paths between complex number datatypes and all the integer and floating-point datatypes, as well as between other complex number datatypes

Request. Do not include any datatype conversion paths between complex and simple numeric data types. This will significantly reduce library size and avoid controversies around conventions and other nuances. Conversions between complex and simple should be left up to the application domain, not baked into the HDF5 library.

Could you describe some of these conventions that applications may use? The current conversion paths should be considered as defaults and they just follow the C99 conversion rules from the standard. An application could always use H5Tregister to register its own conversion function that takes precedence if there is a specific way of converting data that is desired.

jhendersonHDF avatar Jul 09 '24 16:07 jhendersonHDF

Just to clarify... are these conversions like $1.0$ as a float to $1.0 + 0.0 \cdot i$ as a complex number or something else?

ajelenak avatar Jul 09 '24 16:07 ajelenak

Could you describe some of these conventions that applications may use? The current conversion paths should be considered as defaults and they just follow the C99 conversion rules from the standard ...

If your conversion paths are simply breaking out the real and imaginary parts into separate parts, and the reverse, then I withdraw my request. I was momentarily confusing this with complex polar types and their issues.

Dave-Allured avatar Jul 09 '24 17:07 Dave-Allured

Just to clarify... are these conversions like 1.0 as a float to 1.0+0.0⋅i as a complex number or something else?

Correct; here are the relevant conversion rules from the standard:

" 6.3.1.6 Complex types 1 When a value of complex type is converted to another complex type, both the real and imaginary parts follow the conversion rules for the corresponding real types. 6.3.1.7 Real and complex 1 When a value of real type is converted to a complex type, the real part of the complex result value is determined by the rules of conversion to the corresponding real type and the imaginary part of the complex result value is a positive zero or an unsigned zero. 2 When a value of complex type is converted to a real type, the imaginary part of the complex value is discarded and the value of the real part is converted according to the conversion rules for the corresponding real type. "

jhendersonHDF avatar Jul 09 '24 17:07 jhendersonHDF

" 6.3.1.7 Real and complex 1 When a value of real type is converted to a complex type, the real part of the complex result value is determined by the rules of conversion to the corresponding real type and the imaginary part of the complex result value is a positive zero or an unsigned zero. 2 When a value of complex type is converted to a real type, the imaginary part of the complex value is discarded and the value of the real part is converted according to the conversion rules for the corresponding real type. "

So these are not simple mechanical conversions to extract or combine parts. They are genuine mathematical transforms. In my opinion these do not belong in the HDF5 library and should be left out. The main reason is that they are simply extraneous to the mission of high performance I/O, and would be rarely used. You are not obliged to support every conversion listed in the standard C library.

Also I think such conversions could lead to misunderstandings and data loss in some scenarios. For example, someone forgets the in-memory type is simple rather than complex, and code compiles and runs quietly. Then they find out months later that the imaginary part in their new data set was written all zeros.

I do not think that even extract or combine parts are appropriate. IMO most applications dealing with type complex will want to read and write full complex slabs as opaque blocks, then deal with extractions and transforms within their own code and their own language.

Dave-Allured avatar Jul 09 '24 18:07 Dave-Allured

I completely agree with Dave.

epourmal avatar Jul 09 '24 19:07 epourmal

So these are not simple mechanical conversions to extract or combine parts. They are genuine mathematical transforms. In my opinion these do not belong in the HDF5 library and should be left out. The main reason is that they are simply extraneous to the mission of high performance I/O, and would be rarely used. You are not obliged to support every conversion listed in the standard C library.

Also I think such conversions could lead to misunderstandings and data loss in some scenarios. For example, someone forgets the in-memory type is simple rather than complex, and code compiles and runs quietly. Then they find out months later that the imaginary part in their new data set was written all zeros.

I agree that these conversion paths would likely not be used often and could be problematic, but I'd also note that there are other conversion paths currently supported where an application could make a mistake during writing data. It's also worth mentioning that there's currently a bit of overhead associated with user-defined conversion functions. If these paths were removed, an application that does want the standard behavior would need to register its own conversion function and may encounter some of that overhead.

Depending on whether data is being written or read, some of the conversion paths such as float _Complex -> float seem fairly natural to me. During reads that would essentially be equivalent to extracting the real part of the complex values inside the application code after reading data. float -> float _Complex would probably be uncommon for both the read and write case though. Not that any of this means the conversion paths need to stay, but I think it's something worth considering.

I do not think that even extract or combine parts is appropriate. IMO most applications dealing with type complex will want to read and write full complex slabs as opaque blocks, then deal with extractions and transforms within their own code and their own language.

Agreed here; my intention was that applications would just use the standard C functions to extract the parts as necessary after reading data, skipping the datatype conversion pipeline for efficiency. I'm not sure how useful a conversion path to combine parts would be either. I'd generally assume that an application would have data in a final, appropriate in-memory format before I/O. Though, maybe a path to combine parts might be useful in the case of complex types where there isn't a standard C type, such as a complex number of float16 values?

jhendersonHDF avatar Jul 09 '24 19:07 jhendersonHDF

Adds hard and soft datatype conversion paths between complex number datatypes and all the integer and floating-point datatypes, as well as between other complex number datatypes

Request. Do not include any datatype conversion paths between complex and simple numeric data types. This will significantly reduce library size and avoid controversies around conventions and other nuances. Conversions between complex and simple should be left up to the application domain, not baked into the HDF5 library.

Could you describe some of these conventions that applications may use? The current conversion paths should be considered as defaults and they just follow the C99 conversion rules from the standard. An application could always use H5Tregister to register its own conversion function that takes precedence if there is a specific way of converting data that is desired.

I think that following the C Standard is appropriate, especially since it's possible to override.

qkoziol avatar Jul 09 '24 20:07 qkoziol

Adds hard and soft datatype conversion paths between complex number datatypes and all the integer and floating-point datatypes, as well as between other complex number datatypes

Request. Do not include any datatype conversion paths between complex and simple numeric data types. This will significantly reduce library size and avoid controversies around conventions and other nuances. Conversions between complex and simple should be left up to the application domain, not baked into the HDF5 library.

Could you describe some of these conventions that applications may use? The current conversion paths should be considered as defaults and they just follow the C99 conversion rules from the standard. An application could always use H5Tregister to register its own conversion function that takes precedence if there is a specific way of converting data that is desired.

I think that following the C Standard is appropriate, especially since it's possible to override.

However - have you confirmed that overriding works correctly with a regression test? (Which will also be useful for users who wish to override)

qkoziol avatar Jul 09 '24 20:07 qkoziol

However - have you confirmed that overriding works correctly with a regression test? (Which will also be useful for users who wish to override)

Currently, there's only a test to make sure that the registered function gets called instead and that the datatype IDs passed are valid, but I could add a more comprehensive test.

jhendersonHDF avatar Jul 09 '24 20:07 jhendersonHDF

However - have you confirmed that overriding works correctly with a regression test? (Which will also be useful for users who wish to override)

Currently, there's only a test to make sure that the registered function gets called instead and that the datatype IDs passed are valid, but I could add a more comprehensive test.

Seems sufficient to me.

qkoziol avatar Jul 09 '24 20:07 qkoziol

* Add the fuzzing guards on the complex datatype's encoding.

The previous comment was made in the decode helper routine. Do we want checks both during encode and decode?

* Add the version field to the complex datatype encoding, so that if we ever add heterogeneous complex types, we don't have to write as complex of a parser.

What's the purpose of having a complex number-specific encoding version field here? Is it just to check that no heterogeneous complex types are allowed in, e.g. version 0?

jhendersonHDF avatar Jul 10 '24 20:07 jhendersonHDF

* Add the fuzzing guards on the complex datatype's encoding.

The previous comment was made in the decode helper routine. Do we want checks both during encode and decode?

Sorry, I meant decoding.

* Add the version field to the complex datatype encoding, so that if we ever add heterogeneous complex types, we don't have to write as complex of a parser.

What's the purpose of having a complex number-specific encoding version field here? Is it just to check that no heterogeneous complex types are allowed in, e.g. version 0?

🤔 I worked through all the permutations I can think of again, and I think it'll be OK. I was concerned about the possible addition of another parent datatype for the heterogeneous case, but I think the existing flag will be OK for the forward/backward compatibility So, I think we can skip adding the version field.

qkoziol avatar Jul 10 '24 21:07 qkoziol

Most of the major previous comments should be addressed now. I haven't made the change to H5Tset_size yet as I still feel fairly strongly about that and I don't think we necessarily came to any real conclusion on it. There are various other comments on other possible optimizations or refactoring that could be made, but many of them seem like things that could fall outside this PR.

jhendersonHDF avatar Aug 08 '24 17:08 jhendersonHDF

Most of the major previous comments should be addressed now. I haven't made the change to H5Tset_size yet as I still feel fairly strongly about that and I don't think we necessarily came to any real conclusion on it. There are various other comments on other possible optimizations or refactoring that could be made, but many of them seem like things that could fall outside this PR.

Perhaps we should have a short, small audience meeting to discuss the H5Tset_size() issue?

qkoziol avatar Aug 26 '24 20:08 qkoziol