Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Reduce Number of Requests to Remote Storage #471

Open
jplehmann opened this issue Nov 4, 2016 · 9 comments
Open

How to Reduce Number of Requests to Remote Storage #471

jplehmann opened this issue Nov 4, 2016 · 9 comments

Comments

@jplehmann
Copy link

My service has users upload raw images directly to AWS S3. I use sorl-thumbnail to produce a thumbnail with dimensions depending on its orientation

My problem is that in order to determine the dimensions, I call is_portrtait first, which apparently triggers a GET on the large raw image. Then I get the thumbnail with something like "500x" or "x500" which triggers another GET.

I can't think of many options here:

  1. Knowing beforehand the raw dimensions (not a good option).
  2. Fetching the image myself, to get the dimensions and then giving that somehow to sorl so it doesn't issue another request.
  3. What feels ideal is if sorl used an internal cache for this kind of thing.

What's the best way to avoid two requests?

@jplehmann
Copy link
Author

jplehmann commented Nov 4, 2016

Reading the docs, I think I may be able to get away with not knowing the orientation, for my current use case: If width and height are given the image is rescaled to maximum values of height and width given. Aspect ratio preserved. It may be acceptable for me to specify two maximums (MAXxMAX). However, I'm still interested in an answer to the question in case it doesn't or for other scenarios.

For example, I am wanting to pre-cache at least 3 thumbnails per raw image. I see this is causing GET requests 3 times. I'd like to pass in an image so sorl would use that instead of downloading it 3 times, but need to do so in a way that it uses the same cached name it will for the model. Even better would be if sorl had its own cache for the original/raw images.

Thanks!

@SalahAdDin
Copy link

👍

@jplehmann
Copy link
Author

@mariocesar, could I ask you to please give your opinion here about the best way to avoid hitting remote storage N times for making N different thumbnails of the same image? Thanks!

@mariocesar
Copy link
Collaborator

Currently the only complete solution is to rewrite the serialization, specially the naming and file management behavior of sorl.thumbnail, as it fully relies on how the django file backend, and this is not optimized at all.

The current implementation will be always rely in the django file backend, and moving out is the only opportunity to apply any of the optimizations we can think of. Like bulk creation of thumbnails, where we can cache the images metadata and avoid fetching that again and again. Also having customs namings so we avoid serializations and store hashs at alls, that case is not common for all but there are users that will like to have thumn_200x200_crop.jpg, instead of the current hashed name.

A well known workaround is, if is possible for you, you can not to use the {% thumbnail %} tag, and prebuild all the thumbnails before serving, and store the resulting thumbnail url in your db, or rename it to something like /product/thumb.jpg.

Anyway, this is really core to the current implementation. And moving out from the dependency of the django file system engine is one step.

@jplehmann
Copy link
Author

Hi Mario, thank you for your fast feedback.

Yes, I am using that workaround, I think. I do use the thumbnail tag,
but I have pre-built all the versions of the thumbnails for each original
image so it will be cached. It ran for 12 hours yesterday to reprocess all
the images, but I know it could have been about 3x faster. I also have a
management task that runs every 10 minutes caching any new images that have
been uploaded. But this still gives a chance for someone to view these
images before the 10 minute process has occurred, leading to really slow
loads (downloading a 2MB image 2-3 times during a request is not good). So
ideally I have to go in and cache them right when the user uploads them...
which is not so easy for me to do in the background, which means it will be
a slow request for them if I do it in the foreground. Anyway feels like a
lot of work for something much simpler if I could just retrieve the image
myself and hand it to sorl-thumbnail so it doesn't try to fetch it.

I guess I don't really understand why it would be so hard to take a raw
image as input to get_thumbnail and have it make use of this if it is
provided, rather than asking the backend for the file, but otherwise
storing it with the same name.

Thanks for your input!

On Mon, Nov 7, 2016 at 7:34 AM, Mario César [email protected]
wrote:

Currently the only complete solution is to rewrite the serialization,
specially the naming and file management behavior of sorl.thumbnail, as it
fully relies on how the django file backend, and this is not optimized at
all.

The current implementation will be always rely in the django file backend,
and moving out is the only opportunity to apply any of the optimizations we
can think of. Like bulk creation of thumbnails, where we can cache the
images metadata and avoid fetching that again and again. Also having
customs namings so we avoid serializations and store hashs at alls, that
case is not common for all but there are users that will like to have
thumn_200x200_crop.jpg, instead of the current hashed name.

A well known workaround is, if is possible for you, you can not to use the
{% thumbnail %} tag, and prebuild all the thumbnails before serving, and
store the resulting thumbnail url in your db, or rename it to something
like /product/thumb.jpg.

Anyway, this is really core to the current implementation. And moving out
from the dependency of the django file system engine is one step.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#471 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAcIpA2Q2uLSpW5J14_Cv66NlByvJoJiks5q7yjTgaJpZM4KpabZ
.

@jplehmann
Copy link
Author

So my idea above more fleshed out -- get_thumbnail would have an optional argument local_source_image, and then on this line:
https://github.com/mariocesar/sorl-thumbnail/blob/master/sorl/thumbnail/base.py#L110

If local_source_image was not None, then it would use it instead of calling out to the engine.

That looks like a really simple change. Would that work @mariocesar ?

@jplehmann
Copy link
Author

@mariocesar Your last feedback was very helpful, could I trouble you for a follow-up on my last comment please?

@jplehmann
Copy link
Author

@mariocesar Thanks for the lengthy input you gave. Would you mind commenting on my followup? I proposed something that looks easy, which I might implement if you agreed.

@SalahAdDin
Copy link

@mariocesar !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants