Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CIF writer #477

Merged
merged 27 commits into from
Jan 22, 2024
Merged

Add CIF writer #477

merged 27 commits into from
Jan 22, 2024

Conversation

jl-wynen
Copy link
Member

@jl-wynen jl-wynen commented Dec 7, 2023

Fixes scipp/ess#212

@celinedurniak, @AndrewSazonov any feedback is welcome!

This is a first version of a CIF writer that is still fairly low-level. But it is ready for testing in workflows.

Below there is an example for the POWGEN reduction in our docs. It is possible to add wrappers for encoding the remaining metadata. But I'd wait for #473 before doing that.

Note that the tof coord is basically made up because we have no proper focussing implementation.

import scipp as sc
from scippneutron.io import cif
from datetime import datetime, timezone

da_dspacing = sc.io.load_hdf5("data/powgen_reduced_dspacing.h5")

intensity = da_dspacing.data
dspacing = sc.midpoints(da_dspacing.coords['dspacing']).to(unit='Å', copy=False)
tof = dspacing.copy(deep=False)
tof.unit = 'us'
reduced_data = sc.DataArray(
    intensity,
    coords={'tof': tof}
).rename_dims(dspacing='tof')

cal = sc.DataArray(
    sc.array(dims=['cal'], values=[1.0]),
    coords={'power': sc.array(dims=['cal'], values=[1])}
)

now = datetime.now(tz=timezone.utc)
block = cif.Block(
    "LaB6",
    [
        {
            'diffrn_radiation.probe': 'neutron',
            'diffrn_source.beamline': 'POWGEN',
            'diffrn_source.device': 'spallation',
            'diffrn_source.facility': 'Spallation Neutron Source',
        },
        {
            'pd_proc.info_datetime': now,
            'computing.diffrn_reduction': 'essdiffraction v0',
            'pd_proc.info_data_reduction': '''Normalized by proton charge
Normalized by Vanadium run 4866
Calibrated using FERNS_d4832_2011_08_24''',
        },
        {
            'audit.creation_date': now,
            'audit_contact_author.name': 'Jan-Lukas Wynen',
            'audit_contact_author.id_orcid': 'https://orcid.org/0000-0002-3761-3201',
            'audit_contact_author.email': '[email protected]',
        },
    ],
)
block.add_powder_calibration(cal, comment='This calibration was made up')
block.add_reduced_powder_data(reduced_data)

cif.save_cif('data/powgen_reduced.cif', block)

Unused now
@celinedurniak
Copy link

  • What will be the final structure of the cal DataArray (with DIFA, DIFC...) ?
  • Will the versioning of essdiffraction use the same convention as Scipp (i.e., month.year)?
  • Will you provide links to the other files required for reduction, like Vanadium, sample container, empty instrument and the reduction script or Jupyter notebook...?
  • Will the author name be one of the Scipp developers or the PI of the related proposal or the team taken from the proposal (PI + local contact + collaborators)?

Copy link
Member

@SimonHeybrock SimonHeybrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thorough testing 👍

Some minor questions:

to the file.
"""
self._pairs = dict(pairs) if pairs is not None else {}
self._comment = _encode_non_ascii(comment)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call setter to avoid duplication?

self._columns = {}
for key, column in columns.items():
self[key] = column
self._comment = _encode_non_ascii(comment)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use setter?

Comment on lines +277 to +281
sep = (
'\n'
if any(';' in item for row in formatted_values for item in row)
else ' '
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this? Maybe also in a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It's about handling multi-line strings as tested in test_write_block_single_loop_multi_line_string:

loop_
_diffrn.ambient_environment
_diffrn.id
; water
and some salt
;
123
sulfur
x6a

self._name = ''
self.name = name
self._content = _convert_input_content(content) if content is not None else []
self._comment = _encode_non_ascii(comment)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use setter?

>>> da = sc.DataArray(intensity, coords={'tof': tof})
>>> block.add_reduced_powder_data(da)
"""
self.add(_make_reduced_powder_loop(data, comment=comment))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a free helper function anyway, given that this wrapper does nearly nothing, is it even worth having it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only added it for convenience since we use method chaining in other interfaces. But I agree that it is not a strong argument here because most 'interesting' blocks won't be created and written in a single expression. Should I remove?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I think having fewer functions and methods is good, but if you prefer keeping it I am fine with that as well.

src/scippneutron/io/cif.py Show resolved Hide resolved
Comment on lines +573 to +577
if value.variance is not None:
without_unit = sc.scalar(value.value, variance=value.variance)
s = f'{without_unit:c}'
else:
s = str(value.value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never thought about this before, but is formatting depending on the system locale, i.e., will people in some Germanic countries (like Sweden) get a comma instead of a dot in floating-point numbers? If so, we should probably avoid this, i.e., ensure we always write with dot?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this is not the case. I just changed my system's number formatting to German and I still get periods, not commas.
Doing some cursory searching, Python's '{}' uses periods as separators, unless specified otherwise in the braces. C++ uses the 'C locale' by default. So unless someone calls setlocale, we're fine. But this is global, so it's possible that another library that we load into Python does this. But even then, I'm unsure if C++'s streams respect it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

res
== '''data_datetime

_audit.creation_date 2023-12-01T15:09:45+00:00
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc timezone info was deprecated in NumPy and Scipp does not support it, i.e., everything is UTC. But here you test datetime, so we get the `+00:00 when formatting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is about support for datetime objects so that we don't require wrapping everything in Scipp Variables.
It is actually unfortunate that we don't print the timezone info with Scipp objects as the CIF dictionary kind of hints that they should be present.

@jl-wynen
Copy link
Member Author

  • What will be the final structure of the cal DataArray (with DIFA, DIFC...) ?

As shown in the docstring:

cal = sc.DataArray(
    sc.array(dims=['cal'], values=[3.4, 0.2]),
    coords={'power': sc.array(dims=['cal'], values=[0, 1])},
)

which corresponds to tzero = 3.4, DIFC = 0.2.
This is per data block, i.e., for one group of pixels that are focussed in the same way. I don't know yet how to handle focussing into multiple groups because that will need multiple data blocks.

* Will the versioning of `essdiffraction` use the same convention as Scipp (i.e., month.year)?

Yes. Are you asking because of the computing.diffrn_reduction field in the example? If so, we still need to decide how to encode software because there will be multiple programs that need to be listed.

* Will you provide links to the other files required for reduction, like Vanadium, sample container, empty instrument and the reduction script or Jupyter notebook...?

I'd very much like to. But I don't think CIF has any way of doing so. Does it?

* Will the author name be one of the Scipp developers or the PI of the related proposal or the team taken from the proposal (PI + local contact + collaborators)?

It will be whatever the person running the code decides. For autoreduction, we can define authorship ourselves. But otherwise, this will be up to the user. (Same as for SciCat datasets. This is one reason for a unified representation, a la #473.)

Copy link

@rozyczko rozyczko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good and produces well structured CIF files for instrumental/experiment data.

.. code-block::

#\\#CIF_1.1
data_example

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All your keywords are of CIF 2.0 type (e.g. diffrn_radiation.probe) as opposed to the 1.1 standard (diffrn_radiation_probe)
This should likely say #\\#CIF_2.0 as it does in
https://github.com/COMCIFS/Powder_Dictionary/blob/master/cif_pow.dic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, this depends on the dictionary, not the format. The 1.1 format already allowed periods in names: https://www.iucr.org/resources/cif/spec/version1.1/cifsyntax (see the grammar). This is one reason why I made it so that it writes the dict version into the file.
But I can change it to use the 2.0 header. The files don't use any CIF 2 features but should be compatible with it. The question is, do the readers care or would they reject the file with this change?
Also, CIF2 allows a byte order mark, should I add it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I was just a bit concerned about versioning. 1.1 seems fine and is surely more prevalent than 2.0.
Even the 2.0 page ( https://www.iucr.org/resources/cif/cif2 ) calls it an alternative format, not replacement.

Let's stick with 1.1, since as you say, it does have support for the new syntax.

@AndrewSazonov
Copy link

Everything looks good, at least for the moment. When we have more TOF support in EasyDiffraction, we can check if something needs to be extended or changed. A couple of comments below:

  • What will be the final structure of the cal DataArray (with DIFA, DIFC...) ?

As shown in the docstring:

cal = sc.DataArray(
    sc.array(dims=['cal'], values=[3.4, 0.2]),
    coords={'power': sc.array(dims=['cal'], values=[0, 1])},
)

which corresponds to tzero = 3.4, DIFC = 0.2. This is per data block, i.e., for one group of pixels that are focussed in the same way. I don't know yet how to handle focussing into multiple groups because that will need multiple data blocks.

The simplest way is to use only one diffraction measurement in a data block. Otherwise, the diffractogram_id label can be used to identify the diffraction measurement to which the data presented in the PD_DATA (PD_CALC, PD_MEAS, PD_PROC) category belong to. The same label is also available in the PD_CALIB_D_TO_TOF category to identify the diffractogram to which the calibration relates.

* Will you provide links to the other files required for reduction, like Vanadium, sample container, empty instrument and the reduction script or Jupyter notebook...?

I'd very much like to. But I don't think CIF has any way of doing so. Does it?

I haven't found any built-in ways to do this in CIF. Perhaps we could create a loop with custom keys for the file description and urls to SciCat or GitHub?

@jl-wynen
Copy link
Member Author

Thanks for you comments, @AndrewSazonov! We should probably revisit these questions when you have a reader in easyDiffraction. Because we can then try out different solutions so see what works.

@jl-wynen jl-wynen merged commit e5cd680 into main Jan 22, 2024
7 checks passed
@jl-wynen jl-wynen deleted the cif-writer branch January 22, 2024 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Powder: Create output CIF
5 participants