-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CIF writer #477
Add CIF writer #477
Conversation
Unused now
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thorough testing 👍
Some minor questions:
src/scippneutron/io/cif.py
Outdated
to the file. | ||
""" | ||
self._pairs = dict(pairs) if pairs is not None else {} | ||
self._comment = _encode_non_ascii(comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call setter to avoid duplication?
src/scippneutron/io/cif.py
Outdated
self._columns = {} | ||
for key, column in columns.items(): | ||
self[key] = column | ||
self._comment = _encode_non_ascii(comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use setter?
sep = ( | ||
'\n' | ||
if any(';' in item for row in formatted_values for item in row) | ||
else ' ' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this? Maybe also in a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. It's about handling multi-line strings as tested in test_write_block_single_loop_multi_line_string
:
loop_
_diffrn.ambient_environment
_diffrn.id
; water
and some salt
;
123
sulfur
x6a
src/scippneutron/io/cif.py
Outdated
self._name = '' | ||
self.name = name | ||
self._content = _convert_input_content(content) if content is not None else [] | ||
self._comment = _encode_non_ascii(comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use setter?
>>> da = sc.DataArray(intensity, coords={'tof': tof}) | ||
>>> block.add_reduced_powder_data(da) | ||
""" | ||
self.add(_make_reduced_powder_loop(data, comment=comment)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a free helper function anyway, given that this wrapper does nearly nothing, is it even worth having it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only added it for convenience since we use method chaining in other interfaces. But I agree that it is not a strong argument here because most 'interesting' blocks won't be created and written in a single expression. Should I remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I think having fewer functions and methods is good, but if you prefer keeping it I am fine with that as well.
if value.variance is not None: | ||
without_unit = sc.scalar(value.value, variance=value.variance) | ||
s = f'{without_unit:c}' | ||
else: | ||
s = str(value.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have never thought about this before, but is formatting depending on the system locale, i.e., will people in some Germanic countries (like Sweden) get a comma instead of a dot in floating-point numbers? If so, we should probably avoid this, i.e., ensure we always write with dot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this is not the case. I just changed my system's number formatting to German and I still get periods, not commas.
Doing some cursory searching, Python's '{}'
uses periods as separators, unless specified otherwise in the braces. C++ uses the 'C locale' by default. So unless someone calls setlocale
, we're fine. But this is global, so it's possible that another library that we load into Python does this. But even then, I'm unsure if C++'s streams respect it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
res | ||
== '''data_datetime | ||
|
||
_audit.creation_date 2023-12-01T15:09:45+00:00 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc timezone info was deprecated in NumPy and Scipp does not support it, i.e., everything is UTC. But here you test datetime
, so we get the `+00:00 when formatting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is about support for datetime
objects so that we don't require wrapping everything in Scipp Variables.
It is actually unfortunate that we don't print the timezone info with Scipp objects as the CIF dictionary kind of hints that they should be present.
As shown in the docstring: cal = sc.DataArray(
sc.array(dims=['cal'], values=[3.4, 0.2]),
coords={'power': sc.array(dims=['cal'], values=[0, 1])},
) which corresponds to
Yes. Are you asking because of the
I'd very much like to. But I don't think CIF has any way of doing so. Does it?
It will be whatever the person running the code decides. For autoreduction, we can define authorship ourselves. But otherwise, this will be up to the user. (Same as for SciCat datasets. This is one reason for a unified representation, a la #473.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good and produces well structured CIF files for instrumental/experiment data.
.. code-block:: | ||
|
||
#\\#CIF_1.1 | ||
data_example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All your keywords are of CIF 2.0 type (e.g. diffrn_radiation.probe
) as opposed to the 1.1 standard (diffrn_radiation_probe
)
This should likely say #\\#CIF_2.0
as it does in
https://github.com/COMCIFS/Powder_Dictionary/blob/master/cif_pow.dic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, this depends on the dictionary, not the format. The 1.1 format already allowed periods in names: https://www.iucr.org/resources/cif/spec/version1.1/cifsyntax (see the grammar). This is one reason why I made it so that it writes the dict version into the file.
But I can change it to use the 2.0 header. The files don't use any CIF 2 features but should be compatible with it. The question is, do the readers care or would they reject the file with this change?
Also, CIF2 allows a byte order mark, should I add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I was just a bit concerned about versioning. 1.1 seems fine and is surely more prevalent than 2.0.
Even the 2.0 page ( https://www.iucr.org/resources/cif/cif2 ) calls it an alternative format, not replacement.
Let's stick with 1.1, since as you say, it does have support for the new syntax.
Everything looks good, at least for the moment. When we have more TOF support in EasyDiffraction, we can check if something needs to be extended or changed. A couple of comments below:
The simplest way is to use only one diffraction measurement in a data block. Otherwise, the
I haven't found any built-in ways to do this in CIF. Perhaps we could create a loop with custom keys for the file description and urls to SciCat or GitHub? |
Thanks for you comments, @AndrewSazonov! We should probably revisit these questions when you have a reader in easyDiffraction. Because we can then try out different solutions so see what works. |
Fixes scipp/ess#212
@celinedurniak, @AndrewSazonov any feedback is welcome!
This is a first version of a CIF writer that is still fairly low-level. But it is ready for testing in workflows.
Below there is an example for the POWGEN reduction in our docs. It is possible to add wrappers for encoding the remaining metadata. But I'd wait for #473 before doing that.
Note that the tof coord is basically made up because we have no proper focussing implementation.