Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idr0025-stadler-proteinatlas S-BIAD846 #647

Open
will-moore opened this issue Feb 22, 2023 · 11 comments
Open

idr0025-stadler-proteinatlas S-BIAD846 #647

will-moore opened this issue Feb 22, 2023 · 11 comments

Comments

@will-moore
Copy link
Member

No description provided.

@will-moore will-moore moved this to test convert in NGFF conversion Feb 22, 2023
@dominikl dominikl moved this from test convert to re-import test image in NGFF conversion Feb 27, 2023
@dominikl dominikl moved this from re-import test image to convert all data to NGFF in NGFF conversion Mar 1, 2023
@dominikl
Copy link
Member

dominikl commented Mar 1, 2023

Export: 3.5 min / plate
Import: 3 hours

@dominikl
Copy link
Member

Convered on pilot-zarr2-dev, under /data/ngff/idr0025

@dominikl dominikl moved this from convert all data to NGFF to upload data to s3 in NGFF conversion Jun 12, 2023
@will-moore will-moore self-assigned this Jun 14, 2023
@will-moore
Copy link
Member Author

On local machine...

$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 mb s3://idr0025
make_bucket: idr0025
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api put-bucket-policy --bucket idr0025 --policy file://policy.json
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api put-bucket-cors --bucket idr0025 --cors-configuration file://cors.json

On idr-zarr2-dev...

$ cd /data/ngff
$ /home/wmoore/mc cp -r idr0025/ uk1s3/idr0025/zarr
...e 3.ome.zarr/OME/METADATA.ome.xml: 3.30 GiB / 3.30 GiB ━━━━━━━━━━━━━━━━━━━━━━━━━━

$ /home/wmoore/mc ls uk1s3/idr0025/zarr
[2023-06-14 16:41:56 UTC]     0B 10x images plate 1.ome.zarr/
[2023-06-14 16:41:56 UTC]     0B 10x images plate 2.ome.zarr/
[2023-06-14 16:41:56 UTC]     0B 10x images plate 3.ome.zarr/

https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/idr0025/zarr/10x+images+plate+3.ome.zarr

Screenshot 2023-06-14 at 17 45 18

@will-moore will-moore moved this from upload data to s3 to create new Fileset to replace original Fileset in NGFF conversion Jun 14, 2023
@will-moore will-moore moved this from create new Fileset to replace original Fileset to Zip and upload to BioStudies in NGFF conversion Jun 26, 2023
@will-moore will-moore moved this from Zip and upload to BioStudies to upload some data to s3 and test in NGFF conversion Jun 26, 2023
@will-moore
Copy link
Member Author

will-moore commented Jun 27, 2023

Imported metadata-only plates into idr0125-pilot:

$ for dir in *; do   omero import --transfer=ln_s --depth=100 --name="${dir/.ome.zarr/}" --skip=all "$dir" --file "/tmp/$dir.log"  --errs "/tmp/$dir.err"; done
2023-06-27 16:04:38,638 1229406    [l.Client-0] INFO   ormats.importer.cli.LoggingImportMonitor - METADATA_PROCESSED Step: 4 of 5  Logfile: 50491601
2023-06-27 16:04:38,668 1229436    [l.Client-2] INFO   ormats.importer.cli.LoggingImportMonitor - OBJECTS_RETURNED Step: 5 of 5  Logfile: 50491601
2023-06-27 16:04:38,908 1229676    [l.Client-0] INFO   ormats.importer.cli.LoggingImportMonitor - IMPORT_DONE Imported file: /ngff/idr0025/10x images plate 1.ome.zarr/OME/METADATA.ome.xml
Other imported objects:
Fileset:5287265

==> Summary
2509 files uploaded, 1 fileset, 1 plate created, 384 images imported, 0 errors in 0:20:25.673
2023-06-27 16:47:24,290 1249768    [l.Client-0] INFO   ormats.importer.cli.LoggingImportMonitor - IMPORT_DONE Imported file: /ngff/idr0025/10x images plate 3.ome.zarr/OME/METADATA.ome.xml
Other imported objects:
Fileset:5287267

==> Summary
2509 files uploaded, 1 fileset, 1 plate created, 384 images imported, 0 errors in 0:20:45.630
$ python idr-utils/scripts/managed_repo_symlinks.py Screen:3254 /idr0025/zarr --report

Fileset: 5287265 /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-13/2023-06/27/15-44-13.605/
Render Image 14835368
fileset_dirs {}
fs_contents ['10x images plate 1.ome.zarr']
Link from /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-13/2023-06/27/15-44-13.605/10x images plate 1.ome.zarr to /idr0025/zarr/10x images plate 1.ome.zarr

Fileset: 5287266 /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-6/2023-06/27/16-04-45.182/
Render Image 14835512
fileset_dirs {}
fs_contents ['10x images plate 2.ome.zarr']
Link from /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-6/2023-06/27/16-04-45.182/10x images plate 2.ome.zarr to /idr0025/zarr/10x images plate 2.ome.zarr

Fileset: 5287267 /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-10/2023-06/27/16-26-38.877/
Render Image 14836136
fileset_dirs {}
fs_contents ['10x images plate 3.ome.zarr']
Link from /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-10/2023-06/27/16-26-38.877/10x images plate 3.ome.zarr to /idr0025/zarr/10x images plate 3.ome.zarr

@will-moore
Copy link
Member Author

Looks good in idr0125-pilot:

Image

@will-moore will-moore moved this from upload some data to s3 and test to Zip and upload to BioStudies in NGFF conversion Jun 27, 2023
@will-moore
Copy link
Member Author

will-moore commented Jun 27, 2023

Create zips..

ssh pilot-zarr2-dev
cd /data/ngff/idr0025

for i in */; do zip -r "${i%/}.zip" "$i"; done
$ ./ascp -P33001 -i ../etc/asperaweb_id_dsa.openssh -d /data/ngff/idr0025/idr0025 [email protected]:5f/136e8d-xxxxxxx
10x images plate 1.ome.zarr.zip                     100% 1020MB  486Mb/s    00:19
10x images plate 2.ome.zarr.zip                     100%  729MB  439Mb/s    00:32
10x images plate 3.ome.zarr.zip                     100%  832MB  192Mb/s    00:48

@will-moore will-moore moved this from Zip and upload to BioStudies to BioStudies Submission in NGFF conversion Jun 27, 2023
@will-moore will-moore assigned francesw and unassigned will-moore Jun 28, 2023
@will-moore
Copy link
Member Author

Deleted data

sudo rm -rf idr0025/

@francesw francesw moved this from BioStudies Submission to create new Fileset to replace original Fileset in NGFF conversion Aug 14, 2023
@francesw francesw removed their assignment Aug 14, 2023
@will-moore will-moore moved this from create new Fileset to replace original Fileset to Data on Embassy s3 in NGFF conversion Aug 15, 2023
@will-moore
Copy link
Member Author

@will-moore will-moore changed the title idr0025-stadler-proteinatlas to NGFF idr0025-stadler-proteinatlas S-BIAD846 Aug 15, 2023
@will-moore will-moore moved this from Data on Embassy s3 to BioStudies Submission in NGFF conversion Aug 15, 2023
@francesw francesw moved this from BioStudies Submission to Data on Embassy s3 in NGFF conversion Aug 15, 2023
@will-moore will-moore moved this from Data on Embassy s3 to create new Fileset to replace original Fileset in NGFF conversion Aug 22, 2023
@will-moore
Copy link
Member Author

will-moore commented Aug 29, 2023

Testing mkngff on idr0125-pilot...

Added Fileset IDs manually. NB: 10x images plate 2 has already had "swap Fileset" treatment to NGFF. Others are original.

idr0025/10x images plate 3.ome.zarr,S-BIAD846/3c534b4f-12be-4881-a84a-af6b65e142ea,23152
idr0025/10x images plate 1.ome.zarr,S-BIAD846/52304cdf-4eba-4f0a-84b1-690e0d66add9,23151
idr0025/10x images plate 2.ome.zarr,S-BIAD846/72cc291b-a4e0-4807-bd23-22e9ad75c0dd,5286921

The whitespace in these rows causes issues with processing with for r in $(cat idr0025.csv); do... since the for loop iterates over each token (split by whitespace) rather than each row of the table!
Simplest solution is to replace whitespace with _ in the csv, since we don't actually need the Fileset names anyway!

for r in $(cat idr0025.csv); do
  biapath=$(echo "$r" | cut -d',' -f2)
  uuid=$(echo $biapath | cut -d'/' -f2)
  fsid=$(echo "$r" | cut -d',' -f3)
  omero mkngff sql --secret=$SECRET $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$fsid.sql"
done
...
Creating symlink /data/OMERO/ManagedRepository/demo_2/2017-03/13/15-19-51.590_mkngff/52304cdf-4eba-4f0a-84b1-690e0d66add9.zarr -> /bia-integrator-data/S-BIAD846/52304cdf-4eba-4f0a-84b1-690e0d66add9/52304cdf-4eba-4f0a-84b1-690e0d66add9.zarr
...
Creating symlink /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-16/2023-04/12/10-20-20.483_mkngff/72cc291b-a4e0-4807-bd23-22e9ad75c0dd.zarr -> /bia-integrator-data/S-BIAD846/72cc291b-a4e0-4807-bd23-22e9ad75c0dd/72cc291b-a4e0-4807-bd23-22e9ad75c0dd.zarr
for r in $(cat idr0025.csv); do
  fsid=$(echo $r | cut -d',' -f3)
  psql -U omero -d idr -h $DBHOST -f "$fsid.sql"
done

BEGIN
 mkngff_fileset 
----------------
        5287455
(1 row)
COMMIT
BEGIN
 mkngff_fileset 
----------------
        5287456
(1 row)
COMMIT
BEGIN
 mkngff_fileset 
----------------
        5287457
(1 row)
COMMIT

@will-moore
Copy link
Member Author

Looks good:

Image

@will-moore will-moore moved this from create new Filesets in idr-next to convert all data to NGFF in NGFF conversion Sep 1, 2023
@will-moore will-moore moved this from convert all data to NGFF to Zip and upload to BioStudies in NGFF conversion Sep 1, 2023
@will-moore will-moore moved this from Zip and upload to BioStudies to create new Filesets in idr-next in NGFF conversion Sep 1, 2023
@will-moore will-moore moved this from check_pixels to pixels validated in NGFF conversion Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: NGFF studies
Development

No branches or pull requests

3 participants