Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieving fixed accuracy parameters from ZFP encoded HDF5 datasets #105

Open
leighorf opened this issue Feb 23, 2023 · 11 comments
Open

Retrieving fixed accuracy parameters from ZFP encoded HDF5 datasets #105

leighorf opened this issue Feb 23, 2023 · 11 comments

Comments

@leighorf
Copy link

Hello,

I have gone through great pains to carry fixed accuracy parameter metadata with all of my conversions of data that use ZFP. I often operate on ZFP compressed data and compress the results, and I want to make sure my final accuracy parameters are OK given the original accuracy parameters.

However it occurs to me that at least for a saved ZFP encoded HDF5 dataset, it should be possible to open a HDF5 file with ZFP compressed data and retrieve the original floating point representation of the accuracy parameter for each dataset (I know it is possible to do this with the zfp library). It is not evident how to do this with the H5Z-ZFP interface, but that is what I desire: The ability to retrieve the ZFP fixed accuracy parameter of a H5Z-ZFP compressed HDF5 dataset.

@markcmiller86
Copy link
Member

This is a very reasonable request, @leighorf. That information is encoded, withOUT loss, in the datasets creation cd_values as the ZFP stream's header.

I think its probably best to add a function to the library interface to H5Z-ZFP for this. It requires a combination of HDF5 and ZFP library calls.

In lieu of such a function, given an existing dataset id of dsid, I think it is possible do something like...

hid_t cpid = H5Dget_create_plist(dsid);
unsigned int flags;
size_t nelemts = 10;
unsigned cd_vals[10];
H5Pget_filter_by_id2(cpid, H5Z_FILTER_ZFP, &flags, &nelemts, cd_vals, ...);

// cd_vals contains, starting at entry index 1, the ZFP stream header. So, now, open that as a bitstream...
bitstream *dummy_bstr = stream_open(&cd_vals[1], sizeof(cd_vals))));
zfp_stream *dummy_zstr = zfp_stream_open(dummy_bstr);

// now, query stream for info you seek...
zfp_mode zm = zfp_stream_compression_mode(dummy_zstr);
double rate = zfp_stream_rate(dummy_zstr, dim);
double accuracy = zfp_stream_accuracy(dummy_zstr);
uint precision = zfp_stream_precision(dummy_zstr);
zfp_stream_close(dummy_zstr);
stream_close(dummy_bstr);

@markcmiller86
Copy link
Member

@brtnfld and @leighorf I am about 1/2 way through having this completed. Maybe a little more than that.

I just realized, however, I don't fully understand all the context(s) in which retrieving ZFP encoding params would be needed. Here are some of the ways I am thinking...

  • From within C/C++/Fortran code where caller has a dataset id and wants to do something like...
        int H5Z_zfp_get_mode(hid_t dsid);
        double H5Z_zfp_get_accuracy(hid_t dsid);
        double H5Z_zfp_get_rate(hid_t dsid);
        int H5Z_zfp_get_precision(hid_t dsid);
        int H5Z_zfp_get_reversible(hid_t dsid);
        int H5Z_zfp_get_expert(hid_t dsid, unsigned int *minbits, unsigned int *maxbits, unsigned int *maxprec, int *minexp);
    
    where all of the above return a negative value when requested parameter(s) not available.
  • From C/C++/Fortran where caller has retrieved the cd_vals data associated with the dataset header via something like H5Pget_filter_by_id2()
        int H5Z_zfp_get_mode(int nvals, unsigned int *cd_vals);
        double H5Z_zfp_get_accuracy(int nvals, unsigned int *cd_vals);
        double H5Z_zfp_get_rate(int nvals, unsigned int *cd_vals);
        int H5Z_zfp_get_precision(int nvals, unsigned int *cd_vals);
        int H5Z_zfp_get_reversible(int nvals, unsigned int *cd_vals);
        int H5Z_zfp_get_expert(int nvals, unsigned int *cd_vals, unsigned int *minbits, unsigned int *maxbits, unsigned int *maxprec, int *minexp);
    
  • From shell command line where someone is using h5ls or h5dump and the cd_vals associated with a dataset are printed as in h5ls -vlrd | grep ZFP | decode_zfp_cdvals

@lindstro
Copy link
Member

I would suggest essentially duplicating the current zfp API for querying these parameters. It's probably not a good idea for the H5Z_zfp functions to do this in a slightly different way.

Another possibility is to piggyback on the zfp_config struct available as of zfp 1.0.0. Unfortunately, functions are currently missing for querying a config struct. This will be added to the next release.

@markcmiller86
Copy link
Member

You mean for querying an already compressed dataset?

I think if callers want to use ZFP library interface, then all we should provide is a means to obtain a zfp_stream* object to use in those calls and they can just use them. In fact, that might be better way to go since they have to link to ZFP either way to get that information.

Related to this, I just realized yesterday that H5Z-ZFP mode integers don't map 1:1 to ZFP's mode enums. For example in H5Z-ZFP, mode of 3 is accuracy mode whereas in ZFP its 4.

@lindstro
Copy link
Member

You mean for querying an already compressed dataset?

Well, yes, but more generally getting a zfp_config struct from a zfp_stream. The C++ compressed-array class API allows you to set the compression parameters of a zfp_stream by passing a zfp_config, e.g., via const_array::set_config(const zfp_config &config), but the high-level C API currently lacks functions for setting/getting zfp_stream parameters via zfp_config.

I think if callers want to use ZFP library interface, then all we should provide is a means to obtain a zfp_stream* object to use in those calls and they can just use them. In fact, that might be better way to go since they have to link to ZFP either way to get that information.

True, that might be a more general approach. I don't know if there are any cases where you manipulate a zfp_stream but H5Z-ZFP ignores those changes, which might result in unexpected results. The execution policy is one such setting. We should discuss how we want to support that and other zfp_stream settings going forward.

Related to this, I just realized yesterday that H5Z-ZFP mode integers don't map 1:1 to ZFP's mode enums. For example in H5Z-ZFP, mode of 3 is accuracy mode whereas in ZFP its 4.

I don't think there's much we can do about that now without breaking things.

@markcmiller86
Copy link
Member

It's probably not a good idea for the H5Z_zfp functions to do this in a slightly different way.

In this comment, were you basically speaking to how I proposed to handle the return values for error or n/a cases? If so, I agree.

@lindstro
Copy link
Member

lindstro commented May 1, 2023

It's probably not a good idea for the H5Z_zfp functions to do this in a slightly different way.

In this comment, were you basically speaking to how I proposed to handle the return values for error or n/a cases? If so, I agree.

Right. The zfp library already has those same functions (with different names, of course), so it would make sense for H5Z-ZFP to just wrap those and use the same parameters and return values.

@markcmiller86
Copy link
Member

@leighorf I finally have a prototype implementation for this on branch feat-mcm86-04mar23-retrieve-zfp-params and wonder if you could take a look.

You can see an example of how it works for a dataset already written to a file here..

H5Z-ZFP/test/test_read.c

Lines 165 to 212 in 4ccd6aa

if (0 > (dsid = H5Dopen(fid, getds, H5P_DEFAULT))) ERROR(H5Dopen);
if (0 > (plid = H5Dget_create_plist(dsid))) ERROR(H5Dget_create_plist);
if (0 > H5Pget_zfp_mode(plid, &m)) ERROR(H5Pget_zfp_mode);
switch (m)
{
case H5Z_ZFP_MODE_RATE:
{
double rate;
H5Pget_zfp_rate(plid, &m, &rate);
printf("Dataset \"%s\", was compressed with rate mode and rate=%g\n", getds, rate);
return 0;
}
case H5Z_ZFP_MODE_ACCURACY:
{
double acc;
H5Pget_zfp_accuracy(plid, &m, &acc);
printf("Dataset \"%s\", was compressed with accuracy mode and accuracy=%g\n", getds, acc);
return 0;
}
case H5Z_ZFP_MODE_PRECISION:
{
unsigned int prec;
H5Pget_zfp_precision(plid, &m, &prec);
printf("Dataset \"%s\", was compressed with precision mode and precision=%d\n", getds, prec);
return 0;
}
case H5Z_ZFP_MODE_EXPERT:
{
unsigned int minbits;
unsigned int maxbits;
unsigned int maxprec;
int minexp;
H5Pget_zfp_expert(plid, &m, &minbits, &maxbits, &maxprec, &minexp);
printf("Dataset \"%s\", was compressed with expert mode minbits=%u, maxbits=%u, maxprec=%u, minexp=%d\n",
getds, minbits, maxbits, maxprec, minexp);
return 0;
}
case H5Z_ZFP_MODE_REVERSIBLE:
{
int is_rev = 0;
H5Pget_zfp_reversible(plid, &m, &is_rev);
printf("Dataset \"%s\", was compressed with reversible mode and is_rev=%d\n", getds, is_rev);
return 0;
}
}
printf("Unable to determine ZFP compression parameters for dataset \"%s\"\n", getds);
return 1;

If the caller knows nothing, it must first query for mode and then based on that, query for remaining params. If you know mode, you can avoid having to query twice. It is an error to query for zfp parameters that do not match the mode. So, if mode is accuracy but precision is queried, that will generate an error.

The caller is responsible for obtaining the desired dataset's creation property list id and passing that to H5Pget_zfp_XXX()

The implementation will handle any case...the property list is using bonified HDF5 properties, the property list is using generic properties before the dataset has been every been written, the dataset has been written.

@markcmiller86
Copy link
Member

@brtnfld I am just pinging you on this issue in case you wanted to have a look at the new functions I am working towards to retrieve ZFP compression parameters from a dataset's creation property list...

H5Z-ZFP/src/H5Zzfp_props.c

Lines 135 to 382 in 5baa4b9

herr_t H5Pget_zfp(hid_t plist, int *mode, ...)
{
static char const *_funcname_ = "H5Pget_zfp";
static size_t ctrls_sz = sizeof(h5z_zfp_controls_t);
unsigned int flags, fconfig;
unsigned int cd_vals[10];
size_t cd_nelmts = sizeof(cd_vals)/sizeof(cd_vals[0]);
char fname[100];
h5z_zfp_controls_t ctrls;
va_list ap;
herr_t retval = SUCCESS;
if (!mode)
H5Z_ZFP_PUSH_AND_GOTO(H5E_ARGS, H5E_BADVALUE, FAIL, "mode argument must be non-NULL");
if (0 >= H5Pisa_class(plist, H5P_DATASET_CREATE))
H5Z_ZFP_PUSH_AND_GOTO(H5E_PLIST, H5E_BADTYPE, FAIL, "not a dataset creation property list class");
va_start(ap, mode);
if (0 < H5Pexist(plist, "zfp_controls"))
{
H5Pget(plist, "zfp_controls", &ctrls); /* should always succeed */
if (!*mode)
{
*mode = ctrls.mode;
goto done;
}
else if (*mode != ctrls.mode)
{
H5Z_ZFP_PUSH_AND_GOTO(H5E_ARGS, H5E_BADVALUE, FAIL, "ZFP mode query mismatch");
}
switch (*mode)
{
case H5Z_ZFP_MODE_RATE:
{
double *rate = va_arg(ap, double*);
*rate = ctrls.details.rate;
break;
}
case H5Z_ZFP_MODE_ACCURACY:
{
double *acc = va_arg(ap, double*);
*acc = ctrls.details.acc;
break;
}
case H5Z_ZFP_MODE_PRECISION:
{
unsigned int *prec = va_arg(ap, unsigned int*);
*prec = ctrls.details.prec;
break;
}
case H5Z_ZFP_MODE_EXPERT:
{
unsigned int *minbits = va_arg(ap, unsigned int*);
*minbits = ctrls.details.expert.minbits;
unsigned int *maxbits = va_arg(ap, unsigned int*);
*maxbits = ctrls.details.expert.maxbits;
unsigned int *maxprec = va_arg(ap, unsigned int*);
*maxprec = ctrls.details.expert.maxprec;
int *minexp = va_arg(ap, int*);
*minexp = ctrls.details.expert.minexp;
break;
}
case H5Z_ZFP_MODE_REVERSIBLE:
{
int *is_rev = va_arg(ap, int*);
*is_rev = 1;
break;
}
default:
{
H5Z_ZFP_PUSH_AND_GOTO(H5E_PLIST, H5E_BADVALUE, FAIL, "bad ZFP mode.");
break;
}
}
}
else if (0 <= H5Pget_filter_by_id2(plist, H5Z_FILTER_ZFP, &flags, &cd_nelmts, cd_vals, sizeof(fname), fname, &fconfig))
{
/* is this property list pre- or post- modification from having been used in the filter */
if (cd_nelmts > 0 && cd_vals[0] <= H5Z_ZFP_MODE_REVERSIBLE)
{
if (!*mode)
{
*mode = cd_vals[0];
goto done;
}
else if (*mode != cd_vals[0])
{
H5Z_ZFP_PUSH_AND_GOTO(H5E_ARGS, H5E_BADVALUE, FAIL, "ZFP mode query mismatch");
}
switch (*mode)
{
case H5Z_ZFP_MODE_RATE:
{
double *rate = va_arg(ap, double*);
*rate = H5Pget_zfp_rate_cdata(cd_nelmts, cd_vals);
break;
}
case H5Z_ZFP_MODE_ACCURACY:
{
double *acc = va_arg(ap, double*);
*acc = H5Pget_zfp_accuracy_cdata(cd_nelmts, cd_vals);
break;
}
case H5Z_ZFP_MODE_PRECISION:
{
unsigned int *prec = va_arg(ap, unsigned int*);
*prec = H5Pget_zfp_precision_cdata(cd_nelmts, cd_vals);
break;
}
case H5Z_ZFP_MODE_EXPERT:
{
unsigned int *minbits = va_arg(ap, unsigned int*);
unsigned int *maxbits = va_arg(ap, unsigned int*);
unsigned int *maxprec = va_arg(ap, unsigned int*);
int *minexp = va_arg(ap, int*);
H5Pget_zfp_expert_cdata(cd_nelmts, cd_vals, (*minbits), (*maxbits), (*maxprec), (*minexp));
break;
}
case H5Z_ZFP_MODE_REVERSIBLE:
{
int *is_rev = va_arg(ap, int*);
*is_rev = 1;
break;
}
default:
{
H5Z_ZFP_PUSH_AND_GOTO(H5E_PLIST, H5E_BADVALUE, FAIL, "bad ZFP mode.");
break;
}
}
}
else /* cd_vals for post-modified by filter */
{
zfp_mode zm;
bitstream *dummy_bstr = stream_open(&cd_vals[1], cd_nelmts-1);
zfp_stream *dummy_zstr = zfp_stream_open(dummy_bstr);
zfp_field *zfld = zfp_field_alloc();
zfp_read_header(dummy_zstr, zfld, ZFP_HEADER_FULL);
/* now, query stream for info we seek... */
zm = zfp_stream_compression_mode(dummy_zstr);
if (!*mode)
{
*mode = zfp_mode_to_h5z_zfp_mode(zm);
goto done;
}
else if (*mode != zfp_mode_to_h5z_zfp_mode(zm))
{
H5Z_ZFP_PUSH_AND_GOTO(H5E_ARGS, H5E_BADVALUE, FAIL, "ZFP mode query mismatch");
}
switch (*mode)
{
case H5Z_ZFP_MODE_RATE:
{
double *rate = va_arg(ap, double*);
*rate = zfp_stream_rate(dummy_zstr, 1);
break;
}
case H5Z_ZFP_MODE_ACCURACY:
{
double *acc = va_arg(ap, double*);
*acc = zfp_stream_accuracy(dummy_zstr);
break;
}
case H5Z_ZFP_MODE_PRECISION:
{
unsigned int *prec = va_arg(ap, unsigned int*);
*prec = zfp_stream_precision(dummy_zstr);
break;
}
case H5Z_ZFP_MODE_EXPERT:
{
unsigned int *minbits = va_arg(ap, unsigned int*);
unsigned int *maxbits = va_arg(ap, unsigned int*);
unsigned int *maxprec = va_arg(ap, unsigned int*);
int *minexp = va_arg(ap, int*);
zfp_stream_params(dummy_zstr, minbits, maxbits, maxprec, minexp);
break;
}
case H5Z_ZFP_MODE_REVERSIBLE:
{
int *is_rev = va_arg(ap, int*);
*is_rev = 1;
break;
}
default:
{
H5Z_ZFP_PUSH_AND_GOTO(H5E_PLIST, H5E_BADVALUE, FAIL, "bad ZFP mode.");
break;
}
}
zfp_stream_close(dummy_zstr);
stream_close(dummy_bstr);
}
}
else
{
H5Z_ZFP_PUSH_AND_GOTO(H5E_PLIST, H5E_CANTGET, FAIL, "ZFP filter properties");
}
done:
va_end(ap);
return retval;
}
herr_t H5Pget_zfp_mode(hid_t plist, int *mode)
{
*mode = 0;
return H5Pget_zfp(plist, mode);
}
herr_t H5Pget_zfp_rate(hid_t plist, int *mode, double *rate)
{
return H5Pget_zfp(plist, mode, rate);
}
herr_t H5Pget_zfp_precision(hid_t plist, int *mode, unsigned int *prec)
{
return H5Pget_zfp(plist, mode, prec);
}
herr_t H5Pget_zfp_accuracy(hid_t plist, int *mode, double *acc)
{
return H5Pget_zfp(plist, mode, acc);
}
herr_t H5Pget_zfp_expert(hid_t plist, int *mode,
unsigned int *minbits, unsigned int *maxbits,
unsigned int *maxprec, int *minexp)
{
return H5Pget_zfp(plist, mode, minbits, maxbits, maxprec, minexp);
}
herr_t H5Pget_zfp_reversible(hid_t plist, int *mode, int *is_rev)
{
return H5Pget_zfp(plist, mode, is_rev);
}

@lindstro
Copy link
Member

@markcmiller86 Just to make sure I understand how this is supposed to work, since the caller presumably does not already know what mode is, you should call H5Pget_zfp to first query the mode and then make a second call where you supply corresponding pointers to compression parameters?

As an alternative, zfp 1.0.0 supports zfp_config, which would allow you to make a single call to get all this information. zfp_config is not available pre 1.0.0, but it might be nice to have H5Pget_zfp_config() as an alternative way of querying the mode and parameters when H5Z-ZFP is built with zfp 1.0.0.

@markcmiller86 markcmiller86 removed this from the 1.1.1 milestone Aug 22, 2023
@lindstro
Copy link
Member

In addition to querying compression parameter settings through the library, it would be nice to have a command-line tool that decodes cd_values, i.e., that performs the inverse of what print_h5repack_farg does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

4 participants