Add functions to compute, plot, store the local hazard exceedence intensity and RP maps (new branch) #898

ValentinGebhart · 2024-06-20T12:06:58Z

Changes proposed in this PR (new version of old PR #857):

addition of new function to compute local return period maps local_return_period and _loc_return_period for given hazard intensities, giving out a geodataframe
addition of function subplots_from_gdf in util.plot that plots different subplots from the columns of a gdf

Earlier parts removed (commented out in code) until agreement:

addition of new functions to store the hazard maps that are generated in the Hazard.plot_rp_intensity() function as raster files write_raster_local_exceedance_inten. (do we also need as netCDF files write_netcdf_local_exceedance_inten?)
addition of new function to store these local return period maps for given hazard intensities as raster files write_raster_local_return_periods or as NetCDF files write_netcdf_local_return_periods.

This PR fixes #854
(new version of old PR #857)

PR Author Checklist

PR Reviewer Checklist

climada/hazard/plot.py

chahank

Thanks for the updated clean version of this PR! I think this will be a quite useful set of methods.

I have 2 general points that itch me at the moment.

I am not sure it makes sense that the plotting and saving methods always call and recalculate the values. Maybe we should rather store the values as a new attribute. Do you know how much computation this costs for large datasets? @emanuel-schmid @peanutfun : what do you think, would it be ok if the method local_return_period adds an attribute to the hazard object?
I find the method local_return_period and the method _loc_return_period quite hard to read at the moment. I cannot say exactly why, but probably a mix of not always clear variable names, hidden changing of variables (like the return_periods), and quite some loops.

I thus suggest to a) wait for @emanuel-schmid and @peanutfun opinion about the attribute b) rewrite the code to avoid updating the return periods in the back.

climada/hazard/base.py

climada/hazard/io.py

climada/hazard/test/test_base.py

climada/hazard/test/test_io.py

climada/hazard/io.py

peanutfun · 2024-06-21T12:49:15Z

I am not sure it makes sense that the plotting and saving methods always call and recalculate the values. Maybe we should rather store the values as a new attribute. Do you know how much computation this costs for large datasets? @emanuel-schmid @peanutfun : what do you think, would it be ok if the method local_return_period adds an attribute to the hazard object?

I agree that the values should not be calculated in the plotting method. Instead, the user should insert the data to plot into the appropriate function.

I very much dislike adding new attributes after the class initialization. I would return the data to the user, who should take care of "bookkeeping" themselves. They can then plug the values into a plotting function, store them separately, etc.

chahank · 2024-06-21T12:52:09Z

I agree that the values should not be calculated in the plotting method. Instead, the user should insert the data to plot into the appropriate function.

I very much dislike adding new attributes after the class initialization. I would return the data to the user, who should take care of "bookkeeping" themselves. They can then plug the values into a plotting function, store them separately, etc.

I also agree with you that adding new attributes after the class initialization is a bad habit. Here the point is that plotting the intensity return period on the map requires the centroids, and thus having it in the hazard object is convenient... not sure what the best solution is.

ValentinGebhart · 2024-06-21T13:46:35Z

I very much dislike adding new attributes after the class initialization. I would return the data to the user, who should take care of "bookkeeping" themselves. They can then plug the values into a plotting function, store them separately, etc.

I also agree with you that adding new attributes after the class initialization is a bad habit. Here the point is that plotting the intensity return period on the map requires the centroids, and thus having it in the hazard object is convenient... not sure what the best solution is.

One way of returning data to the user without losing the centroids could be to return a geodataframe?

peanutfun · 2024-06-21T14:53:48Z

@ValentinGebhart Great idea! Returning a GeoDataFrame with the return period information and the centroids geometry should contain all data necessary for the plot.

Users can then use the geopandas functions for plotting and writing. Maybe this solves all issues? 😅

emanuel-schmid · 2024-06-24T06:33:55Z

@ValentinGebhart I agree the idea is not bad - but returning a GeoDataFrame is still somewhat fuzzy. A cleaner solution would be to return a tuple (intensity return period, centroids geometry) or a class with two attributes, intensity_return_period and geometry.
This seems cleaner. Ideally any requirement should be explicit in a method's signature and not hidden away in an object that could contain anything at all, like a dict or a dataframe.

chahank · 2024-06-24T07:46:05Z

@emanuel-schmid I strongly disagree with this take as is. The proposed outputs (tuples or custom class) maybe would be cleaner from a programming point of view, but it would make the output mostly useless from a user perspective. I do not see the problem with giving back a geodataframe. Can you maybe explain why this would be a bad idea?

Besides, we already have a precedent with the method impact_at_reg that returns a DataFrame :D

emanuel-schmid · 2024-06-25T07:24:13Z

Yeah. Agreed. Although I must stress that the principle holds - in this particular case a dataframe seems alrightish. 😁

chahank · 2024-06-25T07:53:42Z

Yeah. Agreed. Although I must stress that the principle holds - in this particular case a dataframe seems alrightish. 😁

Indeed. I think there is a general question on how climada objects should return extra information. We had the questions about Centroids and things like pixel sizes, Impacts and regional values, or here Hazards and return period.

ValentinGebhart · 2024-06-28T10:28:17Z

In the new commit, local_return_period outputs a GeoDataFrame and gdf_meta (named tuple with plotting metadata information) and I add a function subplots_from_gdf to util.plot that plots subplots from the different columns of a gdf (using the gdf_meta information). I thought it would be nice to have this plot function as general as possible (so it could plot return periods and exceedance frequencies and more), but if this is too messy, we can also write a more specific "plot_return_period" function.

After discussion with Lukas, I create the meta data of the gdf as a named tuple, such that it is clear which attributes gdf_meta must have to correctly plot.

I removed the write functions (and their tests) because we might want to generalize them to general gdf. I made a backup if we change our minds.

ValentinGebhart · 2024-07-10T15:33:45Z

After the changes (see above comment) a new review is requested.

chahank

Thanks for this simplified version! It makes it much clearer.

There are a few main points to discuss:
~~a) is it computing what it should? The test looks incorrect to me, but I probably have mixed up something. Can you please enlighten me?~~ Did it myself.
b) I am not sure whether the GdfMeta is a good idea.
c) I am not sure whether the chunking is working as intented (and whether it is needed at all.)

chahank · 2024-07-11T10:41:56Z

climada/hazard/base.py

+        threshold_intensities : np.array
+            Hazard intensities to consider.


I find the name threshold_intensities rather confusing, and the docstring does not really help. Can you be a bit more verbatim please?

what about "test_intensities"? I agree, I'll update the docstrings

hmmm I think the word test would be quite confusing actually. Maybe just intensities? Or even avoid the plural and use intensity? Or keep threshold_intensities but with a clearer docstring.

climada/hazard/base.py

chahank · 2024-07-11T10:43:46Z

climada/hazard/base.py

+            LOGGER.warning("The Hazard's frequency unit is %s and not %s which "
+                "most likely leads to incorrect results",


most likely leads to incorrect results does not tell me much. What is meant by that? Why would having different units of frequencies lead to incorrect results? What one gets is simply return periods in the unit of the frequency.

I agree, I could just use the inverse of the frequency attribute. I now added using a if elif statement that can read years, months, and weeks. I don't know how to make this more generic and easier because the frequency attribute is just a string and not a "physical" frequency unit that can be converted eg to years.

chahank · 2024-07-11T10:46:29Z

climada/hazard/base.py

+        #create gdf meta data
+        gdf_meta = GdfMeta(
+            name = 'Return Period',
+            unit = 'Years',


Why not use the units of the frequency (or rather its inverse)?

climada/hazard/base.py

climada/hazard/test/test_io.py

climada/test/test_plot.py

climada/util/test/test_plot.py

chahank · 2024-07-11T11:15:58Z

climada/util/plot.py

+# Create GdfMeta class (as a named tuple) for GeoDataFrame meta data
+GdfMeta = namedtuple('GdfMeta', ['name', 'unit', 'col_name', 'col_unit'])


hmmm I am not sure whether this is a good idea. Maybe it is, maybe not.

@emanuel-schmid What do you think? Should we add some nametuple in the utility plot for a specific case of geopandas plotting?

To add more details: this is called very generically GdfMeta, but refers to only one very specific data set of geodataframes.

col_name is already in the dataframe, col_unit and unit are rather confusing, and name is a bit unclear to me too.

I suggested this to pass information to the plot that is not explicitly contained in the GeoDataFrame. GdfMeta can be renamed, of course. But I think a namedtuple makes more sense than a dict, because you can be sure which keys/attributes are defined. col_name specifies the name of the column to plot. But I agree that the names can be adjusted and made clearer.

I like the idea of passing some meta data for the plotting. I am not sure that this is the perfect way this way :D At first I thought that GdfMeta some new property of GeoDataFrames. Also, is the current proposal general enough for a generic geopandas plotting method in CLIMADA? Or should it then be made more specific.

And, namedtuple makes sense indeed.

@ValentinGebhart, could we maybe find another way to put this information into the GeoDataFrame? @chahank and I had the following ideas when discussing this in person:

The col_name to plot can be a separate parameter, as users generally have to state which functions to plot in GeoDataFrame.plot() and similar functions

The col_unit can be added to the column name in parentheses or brackets. The plot function then reads this information is brackets/parentheses are found, otherwise it does not display a unit.

I agree that the naming is not optimal. The included information should be: what quantity (+ units) is represented by the values in the gdf, what quantity (+ units) is represented by the values in the column labels. The rows always correspond to centroids. Maybe instead of name we could use quantity or value_type?

Otherwise, the plotting function should be "generic", in the sense that it produces one subplot per column of a generic gdf, using the values to color the different points, and using the gdf_meta varibale for figure and axis label. Of course, it is still using the geo_im_from_array util function, and I'm sure there are other ways to plot a gdf with several columns.
Let me know what you think

@peanutfun sorry I didn't see your suggestion in time. We can definitely add the labels and units as parameter(s) to the plot function. Then the user has to know about what their gdf represents without us helping by providing the information, which I guess is ok but is a bit more work to the user. If we put the as defaults to the plot function, the plot function is not generic anymore.

Basically along the same lines as @peanutfun
I'd suggest to have two parameters instead of gdf_meta: GdfMeta,
colbar_name: str and title_subplots: lambda.

To get the same result as implemented now when calling

subplots_from_gdf( gdf=gdf, gdf_meta=GdfMeta(name='Return Period', unit='Years', col_name='Threshold Intensity', col_unit='m/s') )

you would

subplots_from_gdf( gdf=gdf, colbar_name="Return Period (Years)", title_subplots=lambda column: f"Threshold Intensity: {column} m/s", )

Seems better readable and more flexible

consequently, I'd let the local_return_period return a triple

Returns ------- gdf : GeoDataFrame label : str GeoDataFrame label, for reporting and plotting column_label : lambda Column label, for reporting and plotting

in case the user is not 100% sure what they did and is happy to get this kind of conformation. (Why not?)

chahank · 2024-07-11T11:31:26Z

In the new commit, local_return_period outputs a GeoDataFrame and gdf_meta (named tuple with plotting metadata information) and I add a function subplots_from_gdf to util.plot that plots subplots from the different columns of a gdf (using the gdf_meta information). I thought it would be nice to have this plot function as general as possible (so it could plot return periods and exceedance frequencies and more), but if this is too messy, we can also write a more specific "plot_return_period" function.

I like the generality! I only fear that the GdfMeta might make it not generic.

After discussion with Lukas, I create the meta data of the gdf as a named tuple, such that it is clear which attributes gdf_meta must have to correctly plot.

I removed the write functions (and their tests) because we might want to generalize them to general gdf. I made a backup if we change our minds.

Sounds good!

chahank

Good work! Please finalize the cosmetics: the linter complains about trailing white spaces to be removed and some indention that is incorrect. There is one file that is modified by mistake I think.

Also, don't forget to add your name to the list of authors: https://github.com/CLIMADA-project/climada_python/blob/feature/write_haz_rp_maps2/AUTHORS.md

chahank · 2024-07-15T14:10:19Z

climada/hazard/test/test_io.py

+from climada.test import get_test_file

+HAZ_TEST_TC :Path = get_test_file('test_tc_florida')


I think this is not part of this PR.

ValentinGebhart · 2024-07-15T14:55:56Z

Good work! Please finalize the cosmetics: the linter complains about trailing white spaces to be removed and some indention that is incorrect. There is one file that is modified by mistake I think.

Also, don't forget to add your name to the list of authors: https://github.com/CLIMADA-project/climada_python/blob/feature/write_haz_rp_maps2/AUTHORS.md

Thanks for the review(s)! I now understood what the trailing white space means :) For me this looks ready to merge, let me know if there is something else to do.

chahank · 2024-07-15T16:09:28Z

This is good, feel free to merge!

ValentinGebhart added 5 commits June 20, 2024 14:03

added new functions

1e91def

added new functions to tutorial

e7a5913

added unit test for new functions

81dc151

change from rioxarray to rasterio in testing

f5c3d4e

added test for non-raster data and new variable name

c588124

ValentinGebhart commented Jun 21, 2024

View reviewed changes

climada/hazard/plot.py Outdated Show resolved Hide resolved

ValentinGebhart requested review from chahank, sarah-hlsn and emanuel-schmid June 21, 2024 08:38

remove setting centroids.geometry in plot funciton

0711526

chahank requested changes Jun 21, 2024

View reviewed changes

new block structure and updated test

1b29f1f

peanutfun mentioned this pull request Jun 27, 2024

Add function to store (grid) data as NetCDF #905

Open

output and plot gdf

24ba039

create gdf meta data as named tuple

d79b670

ValentinGebhart requested review from chahank and peanutfun July 10, 2024 15:33

Merge branch 'develop' into feature/write_haz_rp_maps2

f228918

chahank requested changes Jul 11, 2024

View reviewed changes

ValentinGebhart added 4 commits July 11, 2024 16:44

updated docstrings and variable frequency unit

5198646

remove obsolete plot function

fe99121

divide gdf metadata into label and colum_label

8dc544f

updated test, docstrings, var names, and tutorial

8b0d482

chahank requested changes Jul 15, 2024

View reviewed changes

ValentinGebhart and others added 3 commits July 15, 2024 16:48

cleaned up pylint and obsolete imports

acddaa3

two more pylint changes

0e50ebe

Update AUTHORS.md

7f61c52

ValentinGebhart merged commit d67c804 into develop Jul 15, 2024
12 of 18 checks passed

ValentinGebhart deleted the feature/write_haz_rp_maps2 branch July 15, 2024 16:13

This was referenced Jul 16, 2024

Add functions to compute, plot, store the local hazard exceedence intensity and RP maps #857

Closed

Compiled hazard tutorial (some plots were missing) #919

Merged

		threshold_intensities : np.array
		Hazard intensities to consider.

		LOGGER.warning("The Hazard's frequency unit is %s and not %s which "
		"most likely leads to incorrect results",

		# Create GdfMeta class (as a named tuple) for GeoDataFrame meta data
		GdfMeta = namedtuple('GdfMeta', ['name', 'unit', 'col_name', 'col_unit'])

		from climada.test import get_test_file

		HAZ_TEST_TC :Path = get_test_file('test_tc_florida')

Add functions to compute, plot, store the local hazard exceedence intensity and RP maps (new branch) #898

Add functions to compute, plot, store the local hazard exceedence intensity and RP maps (new branch) #898

Uh oh!

Conversation

ValentinGebhart commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Author Checklist

PR Reviewer Checklist

Uh oh!

Uh oh!

chahank left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

peanutfun commented Jun 21, 2024

Uh oh!

chahank commented Jun 21, 2024

Uh oh!

ValentinGebhart commented Jun 21, 2024

Uh oh!

peanutfun commented Jun 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emanuel-schmid commented Jun 24, 2024

Uh oh!

chahank commented Jun 24, 2024

Uh oh!

emanuel-schmid commented Jun 25, 2024

Uh oh!

chahank commented Jun 25, 2024

Uh oh!

ValentinGebhart commented Jun 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ValentinGebhart commented Jul 10, 2024

Uh oh!

chahank left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chahank Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chahank Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ValentinGebhart commented Jun 20, 2024 •

edited

Loading

peanutfun commented Jun 21, 2024 •

edited

Loading

ValentinGebhart commented Jun 28, 2024 •

edited

Loading

chahank left a comment •

edited

Loading

chahank Jul 11, 2024 •

edited

Loading

chahank Jul 11, 2024 •

edited

Loading

ValentinGebhart Jul 11, 2024 •

edited

Loading

emanuel-schmid Jul 11, 2024 •

edited

Loading

emanuel-schmid Jul 11, 2024 •

edited

Loading

chahank commented Jul 11, 2024 •

edited

Loading