Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter notebook to teach on how to handle different file formats with python #705

Open
felixS27 opened this issue Jul 16, 2024 · 8 comments

Comments

@felixS27
Copy link
Collaborator

felixS27 commented Jul 16, 2024

I talked with @tischi three weeks ago, about extending the module 'Image file formats' (https://neubias.github.io/training-resources/image_file_formats/index.html) to include loading images into jupyter notebooks and handling their metadata. So I created a notebook which uses the package 'aicsimageio' to load different types of image files and reads their metadata. The nice thing about this module is that it is simple to use and gives a rather uniform way of accessing the image data and some important metadata like pixel/voxel sizes.
My proposed notebook does not copies the steps done with ImageJ, but it covers the following points:

  • Load .tif images and extract metadata (dimension, pixel data)
  • Load .lif images and extract metadata (dimension, pixel data, channel names)
  • Load .czi images and extract metadata (dimension, pixel data, channel names)
  • Write numpy.arrays to disk as OMETiff or OMEZarr files
  • Write numpy.arrays to disk as .npy files

Please let me know if this notebook is useful for upcoming teaching, if it meets the overall teaching material standards and if there are things which should be improved, extended.
Personally, I work mainly with .nd2 files (which are not covered in the notebook, but which I could easily add, if there are some example files) in terms of loading and reading/using metadata from these files. Although it is pretty straight forward to handle different file formats, I have not been much exposed to handle other file formats in my daily routine.
While creating the notebook, I stumble over following points, which I want to discuss:

  • Do we need to include more file formats (.nd2, .jpeg, more exotic)? And if so, are there example files to use?
  • Most of the metadata from .lif and .czi are returned as xml.tree objects and are not nicely accessible to be read. Is this is something worth going into? As anyone experience on how to extract data from these objects with python?
  • Do we need more example on how to save the image in different formats? I just presented the ways how to do this with 'aicsimageio' as OMETiff and OMEZarr, as they are build in and for numpy, as this kind of comes along automatically. But I rarely use these functions, so I kept it on a minimum. Is there need for more? And should there be other file formats? (But then one probably needs more packages...)
  • 'aicsimageio' has the option to lazy load the image data or to load them as xarray. Is this something which should be included or is this to advanced?

I hope this notebook is a good starting point to extend the module 'Image file formats' and I am happy about all kind of constructive feedback and further ideas. #
LoadingImageFiles.md

@felixS27
Copy link
Collaborator Author

I just realized that there are some open and heavily discussed issues to the topic I describe here (see #572 , #462 and #471). So I hope that I started a new issue is not a big problem.

@tischi
Copy link
Collaborator

tischi commented Jul 18, 2024

Thanks a lot @felixS27 !

For convenience I just pasted the markdown text here below:


Image file formats

To execute this notebook create following minimal environment:

conda create -n ImageFileFormats python=3.10 numpy
activate ImageFileFormat
pip install aicsimageio=4.14.0 




To read .lif files

pip install readlif=0.6.5




To read .czi files

pip install aicspylibczi=3.2.0 fsspec=2023.6.0






For more information on the module aicsimageio and further supported file formats please refer to: https://allencellmodeling.github.io/aicsimageio/index.html#

Load image

Image with minimal metadata and .tif file format

# load image with minimal metadata
from aicsimageio import AICSImage
image_url = "https://github.com/NEUBIAS/training-resources/raw/master/image_data/xy_8bit__nuclei_PLK1_control.tif"
aicsimage_object = AICSImage(image_url)
print(aicsimage_object)
print(type(aicsimage_object))
<AICSImage [Reader: TiffReader, Image-is-in-Memory: False]>
<class 'aicsimageio.aics_image.AICSImage'>

AICSImage always returns an 'AICSImage' object.

AICSImage will internally always check the image file format and then uses the appropiate reader (see 'Reader').

Per default AICSImage will not directly load the image, but rather a lazy representation of that image (see 'Image-is-in-Memory').

# Inspect  dimenions of the object
print(aicsimage_object.dims)
print(aicsimage_object.shape)
print(f'Dimension order is: {aicsimage_object.dims.order}')
print(type(aicsimage_object.dims.order))
print(f'Size of X dimension is: {aicsimage_object.dims.X}')
<Dimensions [T: 1, C: 1, Z: 1, Y: 682, X: 682]>
(1, 1, 1, 682, 682)
Dimension order is: TCZYX
<class 'str'>
Size of X dimension is: 682

AICSImage object are per default 5-dimensional with the order Time, Channels, Z dimension, Y dimension, X dimension

# Access image data
image_data = aicsimage_object.data

# Inspect image type
print(type(image_data))

print(image_data)

#Inspect image shape
print(image_data.shape)
<class 'numpy.ndarray'>
[[[[[1 1 4 ... 0 0 0]
    [1 2 1 ... 0 0 0]
    [3 0 2 ... 0 3 0]
    ...
    [0 0 0 ... 0 0 0]
    [0 0 0 ... 2 0 0]
    [0 0 0 ... 0 0 0]]]]]
(1, 1, 1, 682, 682)

With AICSImage.data, the actual image data is loaded as a 5-dimensional numpy.array (means, all missing dimenions are just empty)

# Access specific portion of image data
yx_image_data = aicsimage_object.get_image_data('YX')

# Inspect image type
print(type(yx_image_data))

print(yx_image_data)

#Inspect image shape
print(yx_image_data.shape)
<class 'numpy.ndarray'>
[[1 1 4 ... 0 0 0]
 [1 2 1 ... 0 0 0]
 [3 0 2 ... 0 3 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 2 0 0]
 [0 0 0 ... 0 0 0]]
(682, 682)

With AICSImage.get_image_data it is possible to specify the image representing numpy.array

Internally, the whole 5-dimensional image is loaded and then sliced according to the specifictions.

# Inspect pixel size of image
import numpy as np
print(aicsimage_object.physical_pixel_sizes)
print(f'An pixel has a length of {np.round(aicsimage_object.physical_pixel_sizes.X,2)} microns in X dimension.')
PhysicalPixelSizes(Z=None, Y=0.16605318318140297, X=0.16605318318140297)
An pixel has a length of 0.17 microns in X dimension.
# Inspect image metadata
print(type(aicsimage_object.metadata))

print(aicsimage_object.metadata)
<class 'str'>
ImageJ=1.53c
unit=micron
finterval=299.35504150390625
min=1.0
max=125.0

Image with extensive metadata and .tif file format

# load image with extensive metadata
image_url = "https://github.com/NEUBIAS/training-resources/raw/master/image_data/xy_16bit__collagen.md.tif"
aicsimage_object = AICSImage(image_url)
print(aicsimage_object)
print(type(aicsimage_object))
<AICSImage [Reader: TiffReader, Image-is-in-Memory: False]>
<class 'aicsimageio.aics_image.AICSImage'>
# Inspect  dimenions of the object
print(aicsimage_object.dims)
<Dimensions [T: 1, C: 1, Z: 1, Y: 2160, X: 2160]>
# Access image data
image_data = aicsimage_object.data

# Inspect image type
print(type(image_data))

print(image_data)

#Inspect image shape
print(image_data.shape)
<class 'numpy.ndarray'>
[[[[[400 428 371 ... 548 655 713]
    [433 354 362 ... 566 559 602]
    [407 401 406 ... 559 551 539]
    ...
    [410 390 390 ... 464 476 462]
    [412 434 424 ... 558 656 594]
    [430 504 492 ... 684 933 886]]]]]
(1, 1, 1, 2160, 2160)
# Inspect pixel size of image
print(aicsimage_object.physical_pixel_sizes)
print(f'An pixel has a length of {np.round(aicsimage_object.physical_pixel_sizes.X,2)} microns in X dimension.')
PhysicalPixelSizes(Z=None, Y=352.77777777777777, X=352.77777777777777)
An pixel has a length of 352.78 microns in X dimension.
# Inspect image metadata
print(type(aicsimage_object.metadata))

print(aicsimage_object.metadata)
<class 'str'>
Experiment base name:Karim-240723-005
Experiment set:Nadine

Exposure: 600 ms
Binning: 1 x 1
Region: 2160 x 2160, offset at (200, 0)
Acquired from AndorSdk3 Camera
Subtract: Off
Shading: Off
Digitizer: 200 MHz - lowest noise
Gain: 16-bit (low noise & high well capacity)
Electronic Shutter: Rolling
Baseline Clamp Enabled: Yes
Cooler On: 1
Frames to Average: 1
Trigger Mode: Normal (TIMED)
Temperature: -0.44

Image with .lif file format (Leica image file)

# load image from Leica microscope file format
image_url = "https://github.com/NEUBIAS/training-resources/raw/master/image_data/xy_xyc__two_images.lif"
aicsimage_object = AICSImage(image_url)
print(aicsimage_object)
print(type(aicsimage_object))
<AICSImage [Reader: LifReader, Image-is-in-Memory: False]>
<class 'aicsimageio.aics_image.AICSImage'>
# Inspect  dimenions of the object
print(aicsimage_object.dims)
<Dimensions [T: 1, C: 4, Z: 1, Y: 1024, X: 1024]>
# Access all 4 channels
img_4channel = aicsimage_object.data

# Alternative
img_4channel = aicsimage_object.get_image_data('CYX')

#Inspect image type and shape
print(type(img_4channel))
print(img_4channel.shape)

#Access only one specific channel
img_1channel = aicsimage_object.get_image_data('YX',C=0)

#Inspect image type and shape
print(type(img_1channel))
print(img_1channel.shape)
<class 'numpy.ndarray'>
(4, 1024, 1024)
<class 'numpy.ndarray'>
(1024, 1024)
# Inspect scene metadata
print(aicsimage_object.scenes)

# Get current scene
print(f'Current scene: {aicsimage_object.current_scene}')
('Series001', 'Image004')
Current scene: Series001
# Explore different scenes

# Select first scene (0 (!) as python is zero indexed)
aicsimage_object.set_scene(0)
img_scene1 = aicsimage_object.data
print(f'Image shape, first scene: {img_scene1.shape}')

# Select second scene
aicsimage_object.set_scene(1)
img_scene2 = aicsimage_object.data
print(f'Image shape, second scene: {img_scene2.shape}')
Image shape, first scene: (1, 4, 1, 1024, 1024)
Image shape, second scene: (1, 1, 1, 512, 512)
# Inspect pixel size of image
print(aicsimage_object.physical_pixel_sizes)
print(f'An pixel has a length of {np.round(aicsimage_object.physical_pixel_sizes.X,2)} microns in X dimension.')

# Inspect channel metadata
aicsimage_object.set_scene(0)
print(aicsimage_object.channel_names)
PhysicalPixelSizes(Z=None, Y=0.3613219178082192, X=0.3613219178082192)
An pixel has a length of 0.36 microns in X dimension.
['Blue', 'Green', 'Yellow', 'Red']
# Inspect image metadata
print(type(aicsimage_object.metadata))

print(aicsimage_object.metadata)
<class 'xml.etree.ElementTree.Element'>
<Element 'LMSDataContainerHeader' at 0x177226930>

Image with .czi file format (Carl Zeiss image)

# load image from Zeiss microscope file format
# Can't be loaded from url!
# download image from https://github.com/NEUBIAS/training-resources/raw/master/image_data/xyz__multiple_images.czi
aicsimage_object = AICSImage('xyz__multiple_images.czi')
print(aicsimage_object)
print(type(aicsimage_object))
<AICSImage [Reader: CziReader, Image-is-in-Memory: False]>
<class 'aicsimageio.aics_image.AICSImage'>
# Inspect  dimenions of the object
print(aicsimage_object.dims)
<Dimensions [T: 1, C: 1, Z: 2, Y: 251, X: 251]>
# Access all 3 dimensions
img_3d = aicsimage_object.data

# Alternative
img_3d = aicsimage_object.get_image_data('ZYX')

#Inspect image type and shape
print(type(img_3d))
print(img_3d.shape)


#Access only z plane
img_2d = aicsimage_object.get_image_data('YX',Z=0)

#Inspect image type and shape
print(type(img_2d))
print(img_2d.shape)
<class 'numpy.ndarray'>
(2, 251, 251)
<class 'numpy.ndarray'>
(251, 251)
# Inspect scene metadata
print(aicsimage_object.scenes)

# Get current scene
print(f'Current scene: {aicsimage_object.current_scene}')
('xyz__multiple_images-0', 'xyz__multiple_images-1')
Current scene: xyz__multiple_images-0
# Inspect pixel size of image
print(aicsimage_object.physical_pixel_sizes)
print(f'An pixel has a length of {np.round(aicsimage_object.physical_pixel_sizes.X,2)} microns in X dimension.')

# Inspect channel metadata
print(aicsimage_object.channel_names)
PhysicalPixelSizes(Z=0.3, Y=0.19564437607395324, X=0.19564437607395324)
An pixel has a length of 0.2 microns in X dimension.
['ChA']
# Inspect image metadata
print(type(aicsimage_object.metadata))

print(aicsimage_object.metadata)
<class 'xml.etree.ElementTree.Element'>
<Element 'ImageDocument' at 0x177227880>

Save image

# Load first example image
image_url = "https://github.com/NEUBIAS/training-resources/raw/master/image_data/xy_8bit__nuclei_PLK1_control.tif"
aicsimage_object = AICSImage(image_url)
print(aicsimage_object.physical_pixel_sizes)
PhysicalPixelSizes(Z=None, Y=0.16605318318140297, X=0.16605318318140297)

Option 1

# Save aicsimage object directly as .ome.tif
aicsimage_object.save('option1.ome.tif')

# Re-load image to check on availability of pixel metadata
print(AICSImage('option1.ome.tif').physical_pixel_sizes)
PhysicalPixelSizes(Z=None, Y=0.16605318318140297, X=0.16605318318140297)

Option 2

# Save numpy.array as .ome.tif
from aicsimageio.writers import OmeTiffWriter

img_data = aicsimage_object.get_image_data('YX')

# Inspect image shape
print(img_data.shape)

OmeTiffWriter.save(img_data,
                   'option2.ome.tif',
                   dim_order='YX',
                   physical_pixel_sizes=aicsimage_object.physical_pixel_sizes)

# Re-load image to check on availability of pixel metadata
print(AICSImage('option2.ome.tif').physical_pixel_sizes)
(682, 682)
PhysicalPixelSizes(Z=None, Y=0.16605318318140297, X=0.16605318318140297)

Option 3

# Save numpy.array as .ome.zarr
from aicsimageio.writers import OmeZarrWriter

OmeZarrWriter('option3.ome.zarr').write_image(
    img_data,
    image_name='Option3',
    channel_names=None,
    channel_colors=None,
    dimension_order='YX',
    physical_pixel_sizes=aicsimage_object.physical_pixel_sizes
)

Option 4

# Save numpy.array directly

np.save('option4.npy',img_data)

# load .npy files

reloaded_img = np.load('option4.npy')

# Check if they are the same

print(f'Are the dimensions the same: {np.all(img_data.shape == reloaded_img.shape)}')
print(f'Are the images the same: {np.all(img_data == reloaded_img)}')
Are the dimensions the same: True
Are the images the same: True

@tischi
Copy link
Collaborator

tischi commented Jul 18, 2024

@felixS27 does one really need to explicitly install numpy?

conda create -n ImageFileFormats python=3.10 numpy
activate ImageFileFormat
pip install aicsimageio=4.14.0

If so, should it now be numpy<2.0 ?

@felixS27
Copy link
Collaborator Author

No, sorry, my mistake. Numpy will also be installed when installing aicsimageio. So no explicit installing of numpy.

@felixS27
Copy link
Collaborator Author

They migrated AICSImageIO to BioIO recently and set AICSImageIO to maintenance. The interface is essentially the same (except they switched AICSImage to BioImage) so accessing all the data is still possible with the usual keywords (https://bioio-devs.github.io/bioio/MIGRATION.html). However, the package is now more modular meaning that you have to install the right plugins along with the main module, depending on which image formats you want to read. According to them it has some advantages in terms of dependencies.
My question would be, if I should update the notebook to use BioIO instead of AICSImageIO?
Or just mention it at the end of the notebook?

@tischi
Copy link
Collaborator

tischi commented Jul 24, 2024

Thanks @felixS27 !

Since we do not have a course upcoming very soon I would suggest to implement the forward-looking BioIO API.

Also your markdown document should be reformatted into a python script, which should be added here via a PR.

See, e.g. here for how this is done for other teaching modules.

@felixS27
Copy link
Collaborator Author

Sure. I will update the notebook and convert it into a python script and add it to the repo.

@tischi
Copy link
Collaborator

tischi commented Jul 25, 2024

["image_file_formats/open_diverse_file_formats.md", [["ImageJ GUI", "image_file_formats/open_diverse_file_formats_imagejgui.md", "markdown"]]]
["image_file_formats/open_diverse_file_formats.md", [["ImageJ GUI", "image_file_formats/open_diverse_file_formats_imagejgui.md"],["python BioIO", "image_file_formats/open_diverse_file_formats_bioio.py"]]]

tischi added a commit that referenced this issue Jul 25, 2024
Image file format python script (see #705)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants