-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Commits an alternative Zeiss CZI Reader #4092
base: develop
Are you sure you want to change the base?
Conversation
@NicoKiaru, I included this PR in the full CI test suite last night, though the initialisation of the tests failed with a number of different exceptions: https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/lastFailedBuild/console I believe the below are all of the unique exceptions that are see: NullPointerException: null:
NullPointerException: null:
IndexOutOfBoundsException: Index 0 out of bounds for length 0:
IOException: ZISRAWDIRECTORY segment expected, found ZISRAWFILE instead:
ArithmeticException: / by zero:
EOFException: Attempting to read beyond end of file.:
IllegalArgumentException: No dimension C found:
|
Ok, 35 files not working, and I cound a bit more unique errors - some of them may be multiple (arithmetic exception) because there's no stack trace:
So, it's not perfect but, by looking at these erros, it does not seem impossible. That leaves of course the crucial question of accessing similar files for me. Are there any of these which could be shared ? I see some Axiocam issue - these kind of files I could try to generate with one of our systems. However there are some PALM dataset that I could not access and I see that they cause errors. The data are from 2012, so for these sort of files I'll clearly have a hard time finding a new sample file. |
I will have to take some time to go through the datasets and see if there is any way we can provide you with a way to reproduce and test, it may take me a few weeks to do so though. |
Don't you think there's a possibility to directly share these files ? I will not use them for anything else than fixing bugs, and will not updload them publicly anywhere. |
I know you're not super keen on changing the reader, but without the possibility for me to get the failing image, I just can't move forward. I'd just like to point out that these issues: (and probably these ones) are not happenning in the quick start reader. I do not know what's the best way forward, but either:
|
Hi Guys, Thank you so much for working on this issue. FWIW, we went to a lot of trouble to ensure that this file we uploaded (https://zenodo.org/record/8423633) as an example is completely free of I.P and we are happy for it to be public, part of your test suite etc.... |
Hello @JavascriptMick , I'll try to answer in the forum, but I think that what you request is a new feature, and it's not a bug of the reader. I think your file behaves correctly in bio-formats. There's no way to stitch all scenes as a single image in the reader currently (only all tiles from a scene are stitched). And even if it was, it's a problem on the OMERO side... How to deal with a czi file which can be opened in two different ways ? That breaks somehow the 'immutability' concept of what is an image raw data. |
@NicoKiaru unfortunately the sample files that are currently failing the tests are from files provided private that we are able to share. I realise that it will make it very difficult to move forward with resolving the failures. As a next step I will try and spend some time reviewing the PR over the next week to see if I can resolve the initial failures and unlock the next steps. Thanks to @JavascriptMick for making that sample data publicly available, that is always a tremendous help. For reference the relevant forum post for that data is https://forum.image.sc/t/zoom-from-overview-to-detailed-scan-for-imported-czi-files/85002/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Michael Dausmann from CMRI here. I raised an issue in the forums and I think maybe this PR is a result so I spent some time to do a review. Hope you don't mind. I was only able to pickout things that are maybe mistakes rather than really review the structure of what is done here. Thanks again for looking at our issue.
cacheLock.lock(); | ||
subBlockLRUCache.clear(); | ||
subBlocksCurrentlyLoading.clear(); | ||
cacheLock.unlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be in a finally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Notes for me later:
- https://stackoverflow.com/questions/4664717/how-would-we-use-try-finally-for-lock-lock-and-lock-unlock
- check the other parts of the code where this lock is involved
lut[1][i] = (byte) (greenMax * (i / 255.0)); | ||
lut[2][i] = (byte) (blueMax * (i / 255.0)); | ||
} | ||
public static boolean allowAutoStitch = false; // TODO CHANGE THIS TO METADATA OPTIONS! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the relationship between 'allowAutoStitch' and 'allowAutoStitching' ? seems like there might be room for confusion...should this be allowAutoStitchingDefault?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. At minimum this should be private. I think I let this field to simplify testing, but that should be changed.
int redOffset = bpp * 2; | ||
int index = 0; | ||
int nloops=buf.length/pixel; | ||
for (int i=0; i<nloops; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't appear to use i at all, is this on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, there's probably a way to loop over index with the proper step
|
||
core.add(core_i); | ||
core_i.orderCertain = true; | ||
core_i.dimensionOrder = "XYCZT"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this order seems to disagree with other areas including 'priority' which go XYZCT, is this on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not sure I understand. Could you elaborate ? Which other areas ?
ByteArrayInputStream s = | ||
new ByteArrayInputStream(xml.getBytes(Constants.ENCODING)); | ||
root = parser.parse(s).getDocumentElement(); | ||
s.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finally?
s.close(); | ||
public Length getMinPosYInMicrons() { | ||
if (pos.size()==0) { | ||
return null;//new Length(0, UNITS.MICROMETER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment deliberately left in place or the null is for debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh!! Good catch... Maybe that's a reason why we get some NPE in the test files. But I'm not sure this case should ever happen. Logging an error would probably be the first thing to test
@JavascriptMick thanks for the modifications. This work is on hold! I'll try not to forget your comments, but maybe the best would be to add them to the other repo where this reader currently is quick-start-czi-reader (biop github, sorry I have no time to put the link now) |
Conflicting PR. Removed from build BIOFORMATS-push#71. See the console output for more details.
|
I had started investigating the initial test failures last week but have not yet got them all resolved. Once I have more progress I will likely push some small fixes to this PR. I will be away the rest of the week so this will likely be next week, excluding the PR from CI again until next week. |
Small changes to fix some initFile failures
zStep is always initialised as micrometer so no need to attempt unit conversion
Last 2 commits have reduced the number of test failures (https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/lastFailedBuild/console) Some of the remaining failures seem to be related to PALM data. Temporarily excluding for now, I will try to get back to looking at these remaining failures the week of the 27th. |
I haven't managed to get the 2 failing samples completely working yet, both are multifile datasets. I have pushed a few quick commits that should at least avoid the exceptions in those files and allow the files to initizatialize. Hopefully that will at least let the test suite run and give us an idea of the full test results. These commits can likely be reverted back out after.. |
Attempt to close open file handles
Made some progress though still seeing a failure for one dateset due to open file handles (https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/lastFailedBuild/console). I haven't been able to reproduce this with the single dataset in isolation but I've pushed a commit which I hope will tackle it. |
I had a quick look, and did not sort by thread, so I guess the order is a bit random. But, in brief, here are some typical issues I found:
I think many problematic files are provided by Zeiss:
So maybe I could ask them to put them on Zenodo and have a look. |
Yeah the order is a bit random and can be tricky to parse through. I will put together a full list of the failures and associated files over the next day or two so we can better gauge the scope of files impacted and see what the next steps are. I am going to temporarily exclude this from the CI for now so that we can test some other PR's. |
@NicoKiaru |
Permanent link to the build: https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/52221/consoleFull |
@dgault |
@swg08 thanks! (I did not list extensively all the failing files, but that's already something to work on) Can you enlighten me regarding 40_Dual.czi ? The first (and probably other) subblock has a compression value of 121. I have no idea what it refers to, and also the pixel data, when decompressed, gives 3 times the expected size in bytes. That could makes sense if it was rgb, but not even: there's only two channels. Only my reader fails on it (Zen 3.9 and the current bf reader works). The file seems to be 3 times bigger than needed |
I have put together a list fo all of the failing tests and associated files here (some of the failures that threw exceptions may not have been captured). Quite a number of the failures relate to datasets which are already public (anything starting with @swg08 I will look into getting you a list of the Zeiss files to review and follow up on the process for potentially making those public |
Hi @NicoKiaru, apologies for the delay response as I had been away for a few weeks. I have not heard any updates as of yet but I will follow up to see what the status is. |
Sure, no problem. I'm put this work on hold for a while, but I may have some time before ELMI 2024 to get back to it and understand a bit more what's going on with the differences in the gsheets. |
This pull request has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/incorrect-pixel-scaling-on-czi-image-series/87149/11 |
This pull request has been mentioned on Image.sc Forum. There might be relevant details there: |
@NicoKiaru, a heads up that with David's departure there's going to be reduced capacity for reviews & testing of substantial changes like this one. Happy to use this PR to periodically swap testing on for you, but for transparency, it's unlikely that we will be able to move forward with the reader changes in the immediate future.
P.S.: @swg08, would you like to additionally submit that to https://zenodo.org/communities/bio-formats ? |
Thanks for the update, @joshmoore, and best wishes to you, @dgault! Regarding this PR (and others, like this one), I understand that resource constraints are a significant issue. However, another major challenge is the strict non-sharing policy on private files, which hinders meaningful contributions from outside the OME team. Bio-Formats is so central to the community that it should be leveraging its position to encourage more external contributions. Unfortunately, this isn't happening as effectively as it could. With the adoption of NGFF, where Bio-Formats is doing much of the core conversion work, I'm still puzzled by the lack of a solution to the private files policy. This remains a major blocking point. |
To be clear, the policy isn't "we won't share testing data". The policy is: "we must have permission to share data". This is why all image.sc posts now ask for an upload to, e.g., https://zenodo.org/communities/bio-formats, so that the permissions (along with a DOI) are explicitly defined up front. (Of course, a @BioImage-Archive link would be equally as welcome.) This is very much mea culpa from our side, but we've been collecting data since before "FAIR" was a term. All the more kudos to @zenodo et al. for providing such invaluable resources that make this easier today. We welcome all help to retroactively get permission to share the many filesets we have (as @swg08 did above), but as you can imagine, that's again a large job. |
Yes the future looks better, and Zenodo is awesome, but what about these old files ?
I don't know honestly.
But how could we even help ? We don't know what's in there. |
This would be a very substantial deviation from the Bio-Formats ethos.
Unfortunately, I think only the original submitters can help. And as a heads up, members of the team have ever tried to do this retroactively with only small single-digit numbers of filesets being retrievable for a given reader. I don't have immediate answers to your other questions, but I can tell you they all take time as well! So since that's what we have less of at the moment, I'd ask that you give us a chance to adjust to the new situation, but then let's keep the conversation going. Thanks as ever for the enthusiasm! 😄 |
@NicoKiaru @joshmoore Another idea: @joshmoore we have a tool called czicheck that would be great to be run on the czi files in the OME repository to see if those files "comply to the CZI specification" - testing the files against a greater subset of the specification (general file spec including some metadata). |
@joshmoore
What about I hand-over a agreement to enable OME to publish all ZEISS CZI files we added over time? Would that work for OME? |
Agree, and that's exactly my point, I'm hoping there's a better way for external contributors to give some of their time.
Unfortunately, of course. Let's dream and hope that's a small enough energy barrier with a smoother ride from here...
Yes of course! Best of luck! |
Hello,
This PR is a draft which ultimate goal would be to rewrite the logic of the CZI reader.
The goal is to achieve a compatilibility as good as the original reader while begin more efficient in terms of initial metadata parsing speed as well as being more memory efficient for big files (Tb).
I clearly do not want to get this merged right away - rather, I'd like to know if you could run some tests on your internal set of files and report if some file types are problematic or not.
Related links: