Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NGFF perf testing #687

Open
will-moore opened this issue Mar 1, 2024 · 5 comments
Open

NGFF perf testing #687

will-moore opened this issue Mar 1, 2024 · 5 comments

Comments

@will-moore
Copy link
Member

will-moore commented Mar 1, 2024

Compare formats (on disk)

To compare the performance of NGFF data (ZarrReader) with other formats (both on disk), we want to compare NGFF version of the data alongside the same data in it's original format on the same server.

Choose some data to work with: idr0003 is not too big at 2.3G for a plate. Summary: (more details below):

  • Use bioformats2raw to convert a plate from idr0003 to NGFF.
  • zip, copy to idr-testing, unzip and perform regular import (not in-place)
  • Update Plate name and place it in idr0003 Screen
  • With the preview panel enabled, click on 25 Wells of both plates (original and NGFF copy), recording the times to render_image to load the initial plane. Plot the average of 25 Wells - Times in millisecs: Error bars are 1 std dev.

Screenshot 2024-03-05 at 11 14 59

Conclusion: NGFF is no slower (maybe faster)?

Compare disk vv s3

We want to test the performance of loading data from s3 compared with loading the same data from local disk.
Use idr0010 data since all plates are identical in terms of size etc:

  • Downloaded plate.ome.zarr.zip data previously uploaded to BioStudies
  • Unzip and place in /ngff dir on each idr-testing server
  • For a plate, replace the symlink from ManagedRepository -> mounted s3 directory with a symlink ManagedRepository -> /ngff/plate.ome.zarr
  • Compare performance loading initial plane for 25 Wells for the plate on disk with 25 Wells from an identical plate using s3 data. Times are in seconds: Std deviation is 0.267 for S3 and 0.096 for Disk (can't seem to plot different error bars on each column in Numbers)!

Screenshot 2024-03-05 at 11 43 10

Conclusion: Data access via S3 is slower than on disk:

@will-moore will-moore converted this from a draft issue Mar 1, 2024
@will-moore
Copy link
Member Author

will-moore commented Mar 1, 2024

$ ssh pilot-zarr1-dev

screen -r idr0001
cd /data/idr0003
conda activate bioformats2raw
~/bioformats2raw-0.7.0/bin/bioformats2raw --memo-directory ../memo /uod/idr/filesets/idr0003-breker-plasticity/201301120/Images/DTT/p1/experiment_descriptor.xml p1.ome.zarr

OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp3176581939484032263/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.esotericsoftware.reflectasm.AccessClassLoader (file:/home/wmoore/bioformats2raw-0.7.0/lib/reflectasm-1.11.9.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.esotericsoftware.reflectasm.AccessClassLoader
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

Looks like an error, but seems to have worked OK..

(bioformats2raw) [wmoore@pilot-zarr1-dev idr0003]$ find ./ -name .zattrs
...
./p1.ome.zarr/P/24/2/.zattrs
./p1.ome.zarr/P/24/.zattrs
./p1.ome.zarr/.zattrs
(bioformats2raw) [wmoore@pilot-zarr1-dev idr0003]$ find ./ -name .zattrs | wc
   1538    1538   43248

$ zip -r p1.ome.zarr.zip p1.ome.zarr

Download (1.4 G) and upload to idr-testing...

$ rsync -rvP pilot-zarr1-dev:/data/idr0003/p1.ome.zarr.zip ./
$ rsync -rvP p1.ome.zarr.zip idr-testing.openmicroscopy.org:/home/wmoore/
$ ssh -A idr-testing.openmicroscopy.org
$ rsync -rvP p1.ome.zarr.zip omeroreadwrite:/home/wmoore/

Import..

$ ssh omeroreadwrite
$ unzip p1.ome.zarr

(venv3) [wmoore@test120-omeroreadwrite ~]$ omero import --depth 20 p1.ome.zarr

2024-03-01 11:55:41,047 889        [      main] INFO          ome.formats.importer.ImportConfig - OMERO.blitz Version: 5.7.2
2024-03-01 11:55:41,070 912        [      main] INFO          ome.formats.importer.ImportConfig - Bioformats version: 7.1.0 revision: 05c7b2413cfad19a73b619c61ddf77ca2d038ce7 date: 11 December 2023
2024-03-01 11:55:41,391 1233       [      main] INFO   formats.importer.cli.CommandLineImporter - Log levels -- Bio-Formats: ERROR OMERO.importer: INFO
2024-03-01 11:55:42,125 1967       [      main] INFO      ome.formats.importer.ImportCandidates - Depth: 20 Metadata Level: MINIMUM
2024-03-01 11:55:58,163 18005      [      main] INFO      ome.formats.importer.ImportCandidates - 16917 file(s) parsed into 1 group(s) with 1 call(s) to setId in 11022ms. (16037ms total) [0 unknowns]
2024-03-01 11:55:59,202 19044      [      main] INFO       ome.formats.OMEROMetadataStoreClient - Attempting initial SSL connection to localhost:4064
2024-03-01 11:56:01,233 21075      [      main] INFO       ome.formats.OMEROMetadataStoreClient - Insecure connection requested, falling back
2024-03-01 11:56:02,035 21877      [      main] INFO       ome.formats.OMEROMetadataStoreClient - Pinging session every 300s.
2024-03-01 11:56:02,055 21897      [      main] INFO       ome.formats.OMEROMetadataStoreClient - Server: 5.6.10
2024-03-01 11:56:02,055 21897      [      main] INFO       ome.formats.OMEROMetadataStoreClient - Client: 5.7.2
2024-03-01 11:56:02,055 21897      [      main] INFO       ome.formats.OMEROMetadataStoreClient - Java Version: 1.8.0_402
2024-03-01 11:56:02,055 21897      [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Name: Linux
2024-03-01 11:56:02,055 21897      [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Arch: amd64
2024-03-01 11:56:02,055 21897      [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Version: 3.10.0-1160.108.1.el7.x86_64
2024-03-01 11:56:02,604 22446      [2-thread-1] INFO   ormats.importer.cli.LoggingImportMonitor - FILESET_UPLOAD_PREPARATION
...
2024-03-02 01:38:53,782 49393624   [3-thread-1] INFO   ormats.importer.cli.LoggingImportMonitor - FILE_UPLOAD_COMPLETE: /home/wmoore/p1.ome.zarr/.zattrs
2024-03-02 02:01:42,222 50762064   [2-thread-1] INFO   ormats.importer.cli.LoggingImportMonitor - FILESET_UPLOAD_END
2024-03-02 02:01:43,318 50763160   [2-thread-1] INFO   ormats.importer.cli.LoggingImportMonitor - IMPORT_STARTED Logfile: 64420961
2024-03-02 02:05:00,964 50960806   [l.Client-0] INFO   ormats.importer.cli.LoggingImportMonitor - METADATA_IMPORTED Step: 1 of 5  Logfile: 64420961
2024-03-02 02:08:22,203 51162045   [l.Client-4] INFO   ormats.importer.cli.LoggingImportMonitor - PIXELDATA_PROCESSED Step: 2 of 5  Logfile: 64420961
2024-03-02 02:12:29,399 51409241   [l.Client-5] INFO   ormats.importer.cli.LoggingImportMonitor - THUMBNAILS_GENERATED Step: 3 of 5  Logfile: 64420961
2024-03-02 02:12:29,689 51409531   [l.Client-6] INFO   ormats.importer.cli.LoggingImportMonitor - METADATA_PROCESSED Step: 4 of 5  Logfile: 64420961
2024-03-02 02:12:29,769 51409611   [l.Client-5] INFO   ormats.importer.cli.LoggingImportMonitor - OBJECTS_RETURNED Step: 5 of 5  Logfile: 64420961
2024-03-02 02:12:31,373 51411215   [l.Client-6] INFO   ormats.importer.cli.LoggingImportMonitor - IMPORT_DONE Imported file: /home/wmoore/p1.ome.zarr/OME/METADATA.ome.xml
Plate:10551
Other imported objects:
Fileset:6317541

==> Summary
16917 files uploaded, 1 fileset, 1 plate created, 1152 images imported, 0 errors in 14:16:28.893

Wow - took 14 hours to import!

@will-moore
Copy link
Member Author

After IDR meeting today, 5 of us spend 20 minutes opening many images from that plate without seeing errors and showing good/acceptable performance. cc @francesw @jburel

@will-moore will-moore changed the title NGFF perf testing idr0003 NGFF perf testing Mar 4, 2024
@will-moore
Copy link
Member Author

Downloaded 3 plates.zip from https://www.ebi.ac.uk/biostudies/submissions/files?path=%2Fuser%2Fidr0010
Uploaded to idr-testing:omeroreadwrite, placed in new dir at /data/ngff, unzipped and owned by omero-server

$ pwd
/data/ngff
$ ls -lh
total 1.4G
drwxrwxr-x. 15 omero-server omero-server  219 Jul 10  2023 101-24.ome.zarr
-rw-r--r--.  1 omero-server wmoore       457M Mar  4 12:22 101-24.ome.zarr.zip
drwxrwxr-x. 15 omero-server omero-server  219 Jul 10  2023 10-34.ome.zarr
-rw-r--r--.  1 omero-server wmoore       455M Mar  4 12:21 10-34.ome.zarr.zip
drwxrwxr-x. 15 omero-server omero-server  219 Jul 10  2023 103.ome.zarr
-rw-r--r--.  1 omero-server wmoore       461M Mar  4 12:22 103.ome.zarr.zip

For plate 10-34, find location in ManagedRepo from webclient... Can see symlink to s3:

bash-4.2$ ls -lh /data/OMERO/ManagedRepository/demo_2/2016-05/21/00-27-54.591_mkngff/
total 4.0K
lrwxrwxrwx. 1 omero-server omero-server 109 Dec  6 11:35 2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr -> /bia-integrator-data/S-BIAD885/2726d2ef-2f45-45b6-9d73-68ea1d57c1b6/2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr
-rw-r--r--. 1 omero-server omero-server  49 Dec  6 11:35 2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr.bfoptions

Update symlink (as omero-server):

rm /data/OMERO/ManagedRepository/demo_2/2016-05/21/00-27-54.591_mkngff/2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr
ln -s /data/ngff/10-34.ome.zarr /data/OMERO/ManagedRepository/demo_2/2016-05/21/00-27-54.591_mkngff/2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr

Looks good:

$ ls -lh /data/OMERO/ManagedRepository/demo_2/2016-05/21/00-27-54.591_mkngff/
total 4.0K
lrwxrwxrwx. 1 omero-server omero-server 25 Mar  4 12:40 2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr -> /data/ngff/10-34.ome.zarr
-rw-r--r--. 1 omero-server omero-server 49 Dec  6 11:35 2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr.bfoptions

$ ls /data/OMERO/ManagedRepository/demo_2/2016-05/21/00-27-54.591_mkngff/2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr/
A  B  C  D  E  F  G  H   I  J  K  L  OME

@will-moore
Copy link
Member Author

Repeating for the other 2 plates downloaded above...

Plate 101-24:

bash-4.2$ rm /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-06-31.113_mkngff/49150a5d-8fc2-499a-bbc6-4a3eed2d44b1.zarr
bash-4.2$ ln -s /data/ngff/101-24.ome.zarr /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-06-31.113_mkngff/49150a5d-8fc2-499a-bbc6-4a3eed2d44b1.zarr
bash-4.2$ ls /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-06-31.113_mkngff/49150a5d-8fc2-499a-bbc6-4a3eed2d44b1.zarr
A  B  C  D  E  F  G  H	I  J  K  L  OME

Plate 103:

bash-4.2$ rm /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-26-08.432_mkngff/1fab1705-9561-4689-891d-e039c4ec3076.zarr
bash-4.2$ ln -s /data/ngff/103.ome.zarr /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-26-08.432_mkngff/1fab1705-9561-4689-891d-e039c4ec3076.zarr
bash-4.2$ ls /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-26-08.432_mkngff/1fab1705-9561-4689-891d-e039c4ec3076.zarr
A  B  C  D  E  F  G  H	I  J  K  L  OME

@will-moore
Copy link
Member Author

Since /data/ngff isn't accessible on omeroreadonly servers, we need a different location, and copy the data to all servers...

E.g.

for server in omeroreadonly-1 omeroreadonly-2 omeroreadonly-3 omeroreadonly-4; do rsync -rvP 101-24.ome.zarr.zip $server:/home/wmoore ; done;

ssh omeroreadonly-1

for z in 101-24.ome.zarr.zip  10-34.ome.zarr.zip  103.ome.zarr.zip; do sudo chown omero-server $z; done
sudo mkdir /ngff && sudo chown -R omero-server /ngff
for z in 101-24.ome.zarr.zip  10-34.ome.zarr.zip  103.ome.zarr.zip; do sudo mv $z /ngff; done
sudo -u omero-server -s
cd /ngff/
for z in 101-24.ome.zarr.zip  10-34.ome.zarr.zip  103.ome.zarr.zip; do unzip $z; done

On omeroreadwrite, move data to /ngff and update symlinks...

bash-4.2$ rm /data/OMERO/ManagedRepository/demo_2/2016-05/21/00-27-54.591_mkngff/2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr
bash-4.2$ ln -s /ngff/10-34.ome.zarr /data/OMERO/ManagedRepository/demo_2/2016-05/21/00-27-54.591_mkngff/2726d2ef-2f45-45b6-9d73-68ea1d57c1b6.zarr

bash-4.2$ rm /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-06-31.113_mkngff/49150a5d-8fc2-499a-bbc6-4a3eed2d44b1.zarr
bash-4.2$ ln -s /ngff/101-24.ome.zarr /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-06-31.113_mkngff/49150a5d-8fc2-499a-bbc6-4a3eed2d44b1.zarr

bash-4.2$ rm /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-26-08.432_mkngff/1fab1705-9561-4689-891d-e039c4ec3076.zarr
bash-4.2$ ln -s /ngff/103.ome.zarr /data/OMERO/ManagedRepository/demo_2/2016-05/21/02-26-08.432_mkngff/1fab1705-9561-4689-891d-e039c4ec3076.zarr

Looks good - images are viewable under idr-testing.openmicroscopy.org

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: convert all data to NGFF
Development

No branches or pull requests

1 participant