-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dask_image.imread.imread: differences between using for local file and hosted file #268
Comments
Hi @rkoo19 1. Example data accessI'm not able to access this file at all, are you sure it's accessible? 2. Opening a different example fileI can access this file from a browser, so perhaps we can use that as a test case. However, attempting to open it directly from python returns a 403 error import imageio.v3 as io
io.imread("https://blog.dask.org/images/threads.jpg") Dask is passing your filename directly off to a reader function like this. (Well, technically dask-image passes it to pims, which can pass it to many different types of readers, and imageio is one of them. And Proposed fixTo fix this, we can write our own function to read the image (ref: this stackoverflow answer) import requests
from io import BytesIO
import imageio.v3 as io
def url_image_reader(url):
response = requests.get(url)
byte_content = BytesIO(response.content)
image = io.imread(byte_content) # will likely error if provided with non-image data, you may need to add a check
return image
result = url_image_reader("https://blog.dask.org/images/threads.jpg")
print(result.shape) # works The When I did that, dask tries to parse the url string with glob. That doesn't work, so we want to get rid of this line. So I made a copy of that function, and just removed the glob parsing. Edited imread function - click to expandimport os
try:
from skimage.io import imread as sk_imread
except (AttributeError, ImportError):
pass
from dask.array.core import Array
from dask.base import tokenize
def add_leading_dimension(x):
return x[None, ...]
def custon_imread(filenames, imread=None, preprocess=None):
"""Read a stack of images into a dask array
Parameters
----------
filenames: list of strings
A list of filename strings, eg: ['myfile._01.png', 'myfile_02.png']
imread: function (optional)
Optionally provide custom imread function.
Function should expect a filename and produce a numpy array.
Defaults to ``skimage.io.imread``.
preprocess: function (optional)
Optionally provide custom function to preprocess the image.
Function should expect a numpy array for a single image.
Examples
--------
>>> from dask.array.image import imread
>>> im = imread('2015-*-*.png') # doctest: +SKIP
>>> im.shape # doctest: +SKIP
(365, 1000, 1000, 3)
Returns
-------
Dask array of all images stacked along the first dimension.
Each separate image file will be treated as an individual chunk.
"""
imread = imread or sk_imread
name = "imread-%s" % tokenize(filenames, map(os.path.getmtime, filenames))
sample = imread(filenames[0])
if preprocess:
sample = preprocess(sample)
keys = [(name, i) + (0,) * len(sample.shape) for i in range(len(filenames))]
if preprocess:
values = [
(add_leading_dimension, (preprocess, (imread, fn))) for fn in filenames
]
else:
values = [(add_leading_dimension, (imread, fn)) for fn in filenames]
dsk = dict(zip(keys, values))
chunks = ((1,) * len(filenames),) + tuple((d,) for d in sample.shape)
return Array(dsk, name, chunks, sample.dtype) And now I can open the image like this: filenames = ["https://blog.dask.org/images/threads.jpg"]
result = custon_imread(filenames, imread=url_image_reader)
print(result)
# dask.array<imread, shape=(1, 417, 418, 3), dtype=uint8, chunksize=(1, 417, 418, 3), chunktype=numpy.ndarray>
result.compute().shape
# (1, 417, 418, 3) 3. Differences between dask_image.imread.imread
|
Hi Genevieve, Thanks for the reply and help! :) The edited |
What happened:
Hello,
I've noticed that
dask_image.imread.imread
is not working on my end for a remote REST API based storage system and gives an index error. I have tried with the same file on my local machine, which worked. For extra clarification, the error arises when I try to plot the image. I am assuming the Dask arrays returned bydask_image.imread.imread("http://192.168.49.2:8080/v1/objects/dask-demo-bucket/sample.jpg")
anddask_image.imread.imread("./sample.jpg")
are different. However, when I print theirshape
I notice that they are the same.I also noticed that there is a similar method
dask.array.image.imread
. What is the difference between this method anddask_image.imread.imread
?Thanks in advance!
What you expected to happen: Expected
dask.array.image.imread
to work with HTTP file pointers the same way it does for local filesMinimal Complete Verifiable Example:
Environment:
The text was updated successfully, but these errors were encountered: