-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace existing Fileset with NGFF Fileset #652
Comments
Strategy in place in |
Created a script at https://gist.github.com/will-moore/56f03cd126dcac9981bceeb8e7cdb393 This uploads a directory of files (all files uploaded) to create a new Fileset. The only step that is needed after this is a psql command that is printed by the script.
|
In order to avoid the direct |
The script above uses regular "upload" to create a Fileset. But for IDR we will really want an "in-place" Fileset creation. |
Discussed today: https://github.com/ome/omero-upload/blob/master/src/omero_upload/library.py#L36 looks like a good place to start looking at in-place import Test on training server 3 or 4 |
The library above creates file with Perhaps I can create the symlinks in the ManagedRepository, similar to https://github.com/IDR/idr0125-way-cellpainting/blob/main/scripts/symlinks.bash and then create an OriginalFile object, simply passing it the path...? |
To test creation of a Fileset, can try to create symlinks as above, then create the Fileset from there. First, try to import a test image into a test server. Use idr0125-pilot, since I have been using this as a test server and creating symlinks there etc... Try to connect locally...
then
|
I assume that if a File doesn't exist for an Original File based on it's ID: e.g. E.g. from import above: Original File: |
Test on idr0054 NGFF data is at...
So we want to try an in-place import to the server there. e.g. use this to access via web...
NGFF Image has been imported previously (regular ZarrReader import) at:
So we want to create symlinks:
|
Using script at https://gist.github.com/will-moore/671a9f971f49661d097aa1655476878e
Delete Fileset:
|
Use
|
But we get this error when trying to view the image: ResourceError
|
cc @joshmoore @jburel That's the current state of where I'm at with creating a Fileset in-place... I don't know why I'm getting that exception. The path in the exception certainly points at correct files (via the symlink):
|
I assume the previous value wasn't escaped in anyway? |
The server master.err log shows:
|
"escaped in anyway" - you mean the file path escaping of white-space?
Looking for the ".zarray" files, they seem to be there...
Ah, I wonder if the ".zattrs" that I'm pointing to in this case is the wrong file? I pointed at the
Hmm - now I just get...
|
Full stack track is: ome.conditions.ResourceError
But I don't see any new errors added to the The last error on that log is the one Josh reported above: master.err
|
Checking image again...
http://localhost:1080/webclient/?show=image-46035 This uses
NB: These paths are missing the trailing Use then:
Try to fix... For 1999574, 1999575, 1999576:
For 1999565 - 1999574:
Update symlink, replacing link from "Tonsil 2.ome.zarr" with link from "Tonsil2.ome.zarr"
Then tried to view image again - still fails: Exception
Tried pointing at a few different locations, but each gave the same error
|
cc @joshmoore @sbesson @jburel - I'm still stuck at the same point here, and this is currently a blocker for NGFF Fileset replacement. I have created a new Fileset, with links to new OriginalFiles (except for chunks), based on path/name of NGFF files, where the path contains a symlink to files at |
Discussed in IDR meeting today: Need Blitz log...
Then go to http://localhost:1080/webclient/?show=image-46035 and open Preview Tab... NB: lots of other activity adding to the Blitz log just now, so not limited to the action above... Blitz log
|
Relevant part
cc @dgault |
@dgault Any ideas what's failing here? Am I missing a File in the Fileset, or pointing to the wrong one from the Pixels? |
The exception here looks as though the reader has failed to create any series metadata at all. The reader logic for that part of the code is as below:
So it seems to have failed quite early on just locating the .zgroup and .zarray files Looking at the pathset you listed, the top level folder had 2 sets of zgroup and zattrs, do you know where the first 2 OME.zgroup and OME.zattrs come from?
|
@dgault I "fixed" those paths above, so now in the web UI I see: EDIT (31.3.2021 13:53 - after call with David): I see that a bunch of these are identical and WRONG! /data/OMERO/ManagedRepository/demo_52/Blitz-0-Ice.ThreadPool.Server-6/2023-02/27/13-19-26.557/ngff/Tonsil2.ome.zarr/.zarray
If I check they exist and look at the contents...
|
So far I have been able to reproduce the same error using showinf when reading the dataset from pilot-idrtesting. If I copy the dataset locally using the exact same file structure then it reads just fine without the same issue. I will try to dig a bit further, seems something odd is going on. |
The issue seems to be coming from within jzarr, it is able to find the attributes for the top level group but can't find any additional .group or .array files. The problem seems to be that the below code is not returning any matches, the internalRoot should be the top level folder and the suffix will be zgroup and zarray: I'm having a hard time reproducing the same issue, but having the OME/.zgroup file does cause some issues on occasion (though that failure looks different from the original stack trace). |
OK, I'm going to start from scratch....
|
Test import without chunks...
Then replace the chunks by copying over as above...
The previously-imported images (49072 and 49073) are showing OK - can render in viewer etc. But 49074, imported without chunks is NOT showing any pixel data, even though the chunks were added back later: http://localhost:1080/webclient/?show=image-49074 |
Thanks @pwalczysko - I would have hoped that the Plate I was working with above would still be viewable on that server, since it has whatever update was needed to view it before I created a new Fileset. With Plate:201, (
Instead of creating a single symlink to the Plate, this time we'll just use the already existing symlinks under ManagedRepo: Using script: https://gist.github.com/will-moore/63af9d29f740d17c88554d0c51ad45c5
Since we haven't changed the path in ManagedRepo, we don't need to update Pixels. Viewing an Image works! Webclient UI now shows Fileset has Just to check Pixels... - for Image at A1:
This path/name is to the first chunk of that Image and is the same for ALL Images in the Plate!
|
The Pixels for images in that Plate DO point to the first OriginalFile in the Fileset that got replaced (first chunk of first Image):
So, we can replace Let's try to update Pixels to point at first file of the Fileset or the METADATA.ome.xml'
This Image is still working & viewable for BOTH of these (doesn't work if I add a typo etc). So, OMERO/Bio-Formats doesn't seem too picky about which file in Fileset is named in Pixels. You can even name a File that isn't in the Fileset (e.g. first chunk). This appears to differ for a Plate from the behaviour for an Image above, where updating the Pixels seemed necessary to be able to view the Image...? |
Returning to Plate: http://localhost:1080/webclient/?show=plate-251
Ran the sql commands and now ALL images are viewable! |
Want to try replace older fileset on a different server.... (idr0125-pilot): First copy plate to
Create symlinks: Existing data is at
Setup psql...
Fileset: 22281 Create new Fileset.... using inplace_fileset4.py
Just want to update Pixels for 1 image to start...
But no joy:
Also tried pointing to METADATA.ome.xml but no:
|
Let's try the NGFF Image above (Tonsil 2) to replace Fileset on idr0125... Copy data...
Create symlink: Existing data is at
Fileset: 1591302
Create new Fileset.... using inplace_fileset4.py
But this fails:
|
To check if
This is viewable OK. Replace the NGFF Fileset we created above: Still as
Update Pixels to point to
This works! However, the image looks a bit different from previous thumbnail - more grainy?! Template-prefix in this Fileset is the same as the Original Fileset. Seems not to matter that it is now different from the paths of the OriginalFiles in the Fileset. |
SummaryFor Image: 5025552 (idr0054/Tonsil 2) on Steps taken (see details in previous comment #652 (comment)):
This allows the previously broken image to be viewable with the new NGFF Fileset. A similar status was achieved on |
Discussion today @jburel @sbesson @dgault @pwalczysko @dominikl In fullness of time, we might want a single command to do everything:
But in the meantime, let's focus on using Also, let's perform the original import without chunks. |
Going to try
Plate was copied over already in #652 (comment) to Let's create a copy, then delete chunks from the original (that we are going to import)...
Now replace the Fileset for every image...
swap_fileset.py
Script above needs a tiny tweak to get the exact correct sql we need below - path should be only to
All images are viewable and BLACK (no chunks)
Ah - but this doesn't rescue the chunks, because we are using symlinks to individual files (.zattrs etc) NOT to a directory that contains chunks. So, create symlink in ManageRepo as before...
This works - Images in Plate are viewable! Cleanup... Delete old Fileset (no-longer linked to any Images)
Checking delete of newly-created Plate and Images, which we don't want:
This won't delete the Images...
|
To fix cleanup, I created a plate_link_to_fileset.py
This allows us to delete the Plate etc.
To avoid the need for |
Created bucket:
And updated Policy as at https://github.com/IDR/deployment/blob/master/docs/object-store.md#policy (NB: replace NB: Seb has mounted s3 on idr0125-pilot for
To check ZarrReader version:
"I can re-execute the playbook there tomorrow morning if you want" |
Now we have ALL 68 Let's start by copying a Plate to delete chunks and import...
Import failed with same error as at: |
Try to copy only
Since we know how to do this with Previously this has worked...
As wmoore user:
Chunks were being downloaded there (because
I actually used a loop to download each Plate at a time!
Check a few plates for counts, e.g:
Copy to root dir..
|
In a Screen, as
|
Instead of using
Then use webclient to put all Plates into a single |
We can update ALL pixels for a given Fileset via psql with:
|
To update symlinks, use scripts developed for First, need to view an Image from every Plate (see idr0125 for reason). Edited the script
Then update Symlinks... With webclient available locally on
Generates Then run symlink_cmd.py
This allows thumbnails to be generated for Plates (takes a long time!). We don't need to generate thumbnails, but can check that Images are viewable by opening iviewer as above, for a selection of the Image IDs listed above. |
Plate HT01:
Plate HT02
Also did Plates HT03 and HT04 but... I noticed a problem with the wrong Wells/Images getting mixed up in the Fileset-swapped Plates... This is the The the NGFF BUT, after swapping Fileset, this is the original Plate, now using NGFF Fileset: The column order is incorrect.
so the Wells in the |
Need to check whether this column mix-up is happening with idr0010 plates... Re-import
Viewed black Plate/Images with no errors...
But, viewing an Image of this plate gives...
|
Going to try import a new Plate from
Working on idr0125-pilot, copy data to wmoore home dir, then move
So, I renamed the dir...
...tried rsync again from idr0125-pilot... then copied to
Then import (with chunks) in a
Swap fileset...
Now the old images are viewable with new NGFF Fileset. NGFF Plate (same as original data): After swapping Filesets: Order of Well columns is:
|
idr0090 - sparse plate...
But even with just 1 Well (A/6) - 32 fields! - downloaded, we've gone from 63GB to 49GB - will run out of space - stopping now...
|
Discussed Well ordering issue in IDR meeting... If the order of series (WellSamples) isn't the same between Try to check order of series with
For NGFF Plate... get lots of
Tried limiting the series with
|
With ome/ZarrReader#53 merged and the ZarrReader artifact downloaded to |
Discussed today at IDR meeting:
|
To avoid re-importing Images when updating data to NGFF, we want to create a new Fileset for the NGFF data, and replace old Filesets.
Testing workflow:
Imported png:
https://merge-ci.openmicroscopy.org/web/webclient/?show=image-257915
Converted same png to NGFF:
To be able to tell an NGFF image, removed chunk of alpha channel:
Edited import.py to upload ALL files in directory:
Import the Zarr to create a Fileset AND import the Fileset
Re-import this... NGFF_missing_chunk - as expected
https://merge-ci.openmicroscopy.org/web/webclient/?show=image-257917
Fileset ID: 138110
We want to update the Image:257915 (png) above to use Fileset:138110
The 'png' image now lists NGFF files in it's Fileset, but looks the same.
Now try deleting the png Fileset...
Now, trying to view the 'png' image gives:
psql -U postgres -d OMERO-server -c "UPDATE pixels SET name = '.zattrs', path = 'user-3_454/Blitz-0-Ice.ThreadPool.Server-5/2023-02/23/09-26-58.983/OME_screenshot.zarr' where id = 256813"
UPDATE 1
The text was updated successfully, but these errors were encountered: