Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset plotting: normalization #1263

Open
adamjstewart opened this issue Apr 18, 2023 · 2 comments · May be fixed by #2560
Open

Dataset plotting: normalization #1263

adamjstewart opened this issue Apr 18, 2023 · 2 comments · May be fixed by #2560
Labels
datasets Geospatial or benchmark datasets good first issue A good issue for a new contributor to work on

Comments

@adamjstewart
Copy link
Collaborator

adamjstewart commented Apr 18, 2023

Summary

At the moment, our dataset plotting routines are inconsistent. While some plot methods stretch to the range of the image, others simply divide by 3K or 10K and clip to get images to the range of 0–1. I propose we convert the latter to the former and consistently stretch images for all datasets.

Rationale

While technically correct, many of our plotting methods make it difficult to visualize images. This is especially true for datamodules, where normalization has been applied to all images and images are no longer in the uint8 or float32 range of the original data.

Implementation

I propose we use one of the default visualization options used by QGIS:

  1. Clip to 2% to 98% range (exact percentages TBD)
  2. Clip to min/max
  3. Clip to mean ± 2 std dev (exact std dev TBD)

We already have a torchgeo.datasets.utils.percentile_normalization method we could use or modify for this purpose.

Alternatives

We could apply an inverse Normization transform during datamodule plotting. This would help with datamodule plotting, but still suffers from inconsistent dataset plotting. It would only be a couple lines of code though, which would make it much easier.

Additional information

@calebrob6 we discussed this on Slack or somewhere.

Note that this contradicts #496, so we should decide on one approach or another.

@adamjstewart
Copy link
Collaborator Author

Before #476, we basically had this functionality for most of our GeoDatasets already. RasterDataset.plot would normalize all images to the 2% to 98% range using self.rgb_bands and self.all_bands, and would only need to be overridden for non-image datasets. We should consider bringing this back to save us time.

@robmarkcole
Copy link
Contributor

In my own plotting funcs I have been applying the percentile_normalisation to 'undo' the effects of normalisation, but the result is not always that great. I think an un-normalize followed by percentile_normalisation with dataset specific percentiles would make sense. e.g. min/max can then be achieved with 0 & 100 percentiles. Default to 2 & 98 generally works pretty well, and if QGIS do this it is probably pretty sensible

@adamjstewart adamjstewart changed the title Proposal for better dataset plotting Dataset plotting: normalization Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets good first issue A good issue for a new contributor to work on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants