libSplash icon indicating copy to clipboard operation
libSplash copied to clipboard

Writing an Array Attribute of String

Open ax3l opened this issue 10 years ago • 6 comments

Two write an array of constant size string one can currently do as equivalent to:

import h5py
# [...]

meshes.create_group(b"E")
E = meshes[b"E"]
E.attrs["axisLabels"] = np.array([b"x", b"y", b"z"])

Write an array of three strings, example here of size 1, one can do in C-style notation:

typedef char MyChar2[2];
MyChar2 *axisLabels = new MyChar2[simDim];
ColTypeString ctAxisLabels(1); // this can also be longer, but all must have the same size (?)
for( uint32_t d = 0; d < simDim; ++d )
{
    /* \todo is the order correct? */
    axisLabels[d][0] = char(120 + d); // x, y, z
    axisLabels[d][1] = '\0';          // terminator is important!
}
params->dataCollector->writeAttribute(params->currentStep,
                                      ctAxisLabels, recordName.c_str(),
                                      "axisLabels",
                                      1u, Dimensions(simDim,0,0),
                                      axisLabels);

which works.

I am not 100% sure currently if one can write something like the following python equivalent where the three const-size static lengths strings can vary between each other in size:

import h5py
# [...]

meshes.create_group(b"E")
E = meshes[b"E"]
E.attrs["axisLabels"] = np.array([b"short", b"middle long", b"very long description"])

It might be possible to do something like char** for the writeAttribute() argument axisLabels and then putting in each entry a different size c-string (null-terminated as usual). But I am not sure that will work since we only have one ColTypeString ctAxisLabels(N) and this would mean N can vary.

So if you can, make your labels of the same lengths. (In your case "spatial idx" and "frequen idx" / " omega idx ", add trailing spaces before the '\0' etc. ... nasty)

Official (but VLEN) example

CCing @PrometheusPi

ax3l avatar Jan 07 '16 15:01 ax3l

Update: I just checked what h5py does in such a case and it is, surprise surprise, NULL padding :D

               ATTRIBUTE "axisLabels" {
                  DATATYPE  H5T_STRING {
                     STRSIZE 4;
                     STRPAD H5T_STR_NULLPAD;
                     CSET H5T_CSET_ASCII;
                     CTYPE H5T_C_S1;
                  }
                  DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
                  DATA {
                  (0): "x1\000\000", "y22\000", "z333"
                  }
               }

while else it results in

               ATTRIBUTE "axisLabels" {
                  DATATYPE  H5T_STRING {
                     STRSIZE 1;
                     STRPAD H5T_STR_NULLPAD;
                     CSET H5T_CSET_ASCII;
                     CTYPE H5T_C_S1;
                  }
                  DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
                  DATA {
                  (0): "x", "y", "z"
                  }
               }

That means: choose N as large as the largest size of your strings and pad with zeros, too ;)

#include <cstring>
// ...

const uint N = 14;
typedef char MyCharN[N+1]; // +1 for trailing \0
ColTypeString ctAxisLabels(N);

MyCharN *axisLabels = new MyCharN[simDim];
// pre-pad all targets with NULLs (including NULL terminator for max length string!)
for( uint32_t d = 0; d < simDim; ++d )
{
    memset(  axisLabels[d], '\0', N+1 );
}
strcpy( axisLabels[0], "spatial idx" ); // only 12 chars
for( uint i = 11; i <= N; ++N )
    axisLabels[0][i] = '\0';
strcpy( axisLabels[1], "frequency idx" ); // 14 chars

(let us wrap away the null-padding in some helper ;) )

ax3l avatar Jan 07 '16 15:01 ax3l

Nevertheless, libSplash strings are currently using STRPAD H5T_STR_NULLTERM; instead of STRPAD H5T_STR_NULLPAD (h5py), we might need to change that.

               ATTRIBUTE "axisLabels" {
                  DATATYPE  H5T_STRING {
                     STRSIZE 2;
                     STRPAD H5T_STR_NULLTERM;
                     CSET H5T_CSET_ASCII;
                     CTYPE H5T_C_S1;
                  }
                  DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
                  DATA {
                  (0): "x", "y", "z"
                  }
               }

nevertheless, NULL terminated should not care if the bytes behind it are undefined during access. else space padding and final NULL will still work as a work-around.

ax3l avatar Jan 07 '16 15:01 ax3l

@ax3l Thanks for posting this information. Great work checking the result in python. :+1:

PrometheusPi avatar Jan 08 '16 13:01 PrometheusPi

@ax3l Why do you need:

for( uint i = 11; i <= N; ++N )
    axisLabels[0][i] = '\0';

You already pre-paded all targets with NULLs

for( uint32_t d = 0; d < simDim; ++d )
{
    memset(  axisLabels[d], '\0', N+1 );
}

PrometheusPi avatar Jan 11 '16 10:01 PrometheusPi

What you are referring to is just a quick hack that I needed and documented for testing.

The usage that is of interest for you is in https://github.com/ComputationalRadiationPhysics/picongpu/pull/1323 and works a bit more sophisticated :)

ax3l avatar Jan 11 '16 10:01 ax3l

migrated to future: the above linked helper in PIConGPU should be ported to libSplash.

ax3l avatar Apr 12 '16 02:04 ax3l