-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functions to compute, plot, store the local hazard exceedence intensity and RP maps (new branch) #898
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updated clean version of this PR! I think this will be a quite useful set of methods.
I have 2 general points that itch me at the moment.
- I am not sure it makes sense that the plotting and saving methods always call and recalculate the values. Maybe we should rather store the values as a new attribute. Do you know how much computation this costs for large datasets? @emanuel-schmid @peanutfun : what do you think, would it be ok if the method
local_return_period
adds an attribute to the hazard object? - I find the method
local_return_period
and the method_loc_return_period
quite hard to read at the moment. I cannot say exactly why, but probably a mix of not always clear variable names, hidden changing of variables (like the return_periods), and quite some loops.
I thus suggest to a) wait for @emanuel-schmid and @peanutfun opinion about the attribute b) rewrite the code to avoid updating the return periods in the back.
I agree that the values should not be calculated in the plotting method. Instead, the user should insert the data to plot into the appropriate function. I very much dislike adding new attributes after the class initialization. I would return the data to the user, who should take care of "bookkeeping" themselves. They can then plug the values into a plotting function, store them separately, etc. |
I also agree with you that adding new attributes after the class initialization is a bad habit. Here the point is that plotting the intensity return period on the map requires the centroids, and thus having it in the hazard object is convenient... not sure what the best solution is. |
One way of returning data to the user without losing the centroids could be to return a geodataframe? |
@ValentinGebhart Great idea! Returning a GeoDataFrame with the return period information and the centroids geometry should contain all data necessary for the plot. Users can then use the geopandas functions for plotting and writing. Maybe this solves all issues? 😅 |
@ValentinGebhart I agree the idea is not bad - but returning a GeoDataFrame is still somewhat fuzzy. A cleaner solution would be to return a tuple (intensity return period, centroids geometry) or a class with two attributes, intensity_return_period and geometry. |
@emanuel-schmid I strongly disagree with this take as is. The proposed outputs (tuples or custom class) maybe would be cleaner from a programming point of view, but it would make the output mostly useless from a user perspective. I do not see the problem with giving back a geodataframe. Can you maybe explain why this would be a bad idea? Besides, we already have a precedent with the method |
Yeah. Agreed. Although I must stress that the principle holds - in this particular case a dataframe seems alrightish. 😁 |
Indeed. I think there is a general question on how climada objects should return extra information. We had the questions about Centroids and things like pixel sizes, Impacts and regional values, or here Hazards and return period. |
In the new commit, After discussion with Lukas, I create the meta data of the gdf as a named tuple, such that it is clear which attributes gdf_meta must have to correctly plot. I removed the write functions (and their tests) because we might want to generalize them to general gdf. I made a backup if we change our minds. |
After the changes (see above comment) a new review is requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this simplified version! It makes it much clearer.
There are a few main points to discuss:
a) is it computing what it should? The test looks incorrect to me, but I probably have mixed up something. Can you please enlighten me? Did it myself.
b) I am not sure whether the GdfMeta is a good idea.
c) I am not sure whether the chunking is working as intented (and whether it is needed at all.)
climada/hazard/base.py
Outdated
threshold_intensities : np.array | ||
Hazard intensities to consider. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the name threshold_intensities
rather confusing, and the docstring does not really help. Can you be a bit more verbatim please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about "test_intensities"? I agree, I'll update the docstrings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm I think the word test
would be quite confusing actually. Maybe just intensities
? Or even avoid the plural and use intensity
? Or keep threshold_intensities
but with a clearer docstring.
climada/hazard/base.py
Outdated
LOGGER.warning("The Hazard's frequency unit is %s and not %s which " | ||
"most likely leads to incorrect results", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most likely leads to incorrect results
does not tell me much. What is meant by that? Why would having different units of frequencies lead to incorrect
results? What one gets is simply return periods in the unit of the frequency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I could just use the inverse of the frequency attribute. I now added using a if elif statement that can read years, months, and weeks. I don't know how to make this more generic and easier because the frequency attribute is just a string and not a "physical" frequency unit that can be converted eg to years.
climada/hazard/base.py
Outdated
#create gdf meta data | ||
gdf_meta = GdfMeta( | ||
name = 'Return Period', | ||
unit = 'Years', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the units of the frequency (or rather its inverse)?
climada/util/plot.py
Outdated
# Create GdfMeta class (as a named tuple) for GeoDataFrame meta data | ||
GdfMeta = namedtuple('GdfMeta', ['name', 'unit', 'col_name', 'col_unit']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm I am not sure whether this is a good idea. Maybe it is, maybe not.
@emanuel-schmid What do you think? Should we add some nametuple in the utility plot for a specific case of geopandas plotting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add more details: this is called very generically GdfMeta
, but refers to only one very specific data set of geodataframes.
col_name
is already in the dataframe, col_unit
and unit
are rather confusing, and name
is a bit unclear to me too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested this to pass information to the plot that is not explicitly contained in the GeoDataFrame. GdfMeta
can be renamed, of course. But I think a namedtuple
makes more sense than a dict
, because you can be sure which keys/attributes are defined. col_name
specifies the name of the column to plot. But I agree that the names can be adjusted and made clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of passing some meta data for the plotting. I am not sure that this is the perfect way this way :D At first I thought that GdfMeta
some new property of GeoDataFrames
. Also, is the current proposal general enough for a generic geopandas plotting method in CLIMADA? Or should it then be made more specific.
And, namedtuple
makes sense indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ValentinGebhart, could we maybe find another way to put this information into the GeoDataFrame? @chahank and I had the following ideas when discussing this in person:
- The
col_name
to plot can be a separate parameter, as users generally have to state which functions to plot inGeoDataFrame.plot()
and similar functions - The
col_unit
can be added to the column name in parentheses or brackets. The plot function then reads this information is brackets/parentheses are found, otherwise it does not display a unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the naming is not optimal. The included information should be: what quantity (+ units) is represented by the values in the gdf, what quantity (+ units) is represented by the values in the column labels. The rows always correspond to centroids. Maybe instead of name
we could use quantity
or value_type
?
Otherwise, the plotting function should be "generic", in the sense that it produces one subplot per column of a generic gdf, using the values to color the different points, and using the gdf_meta varibale for figure and axis label. Of course, it is still using the geo_im_from_array util function, and I'm sure there are other ways to plot a gdf with several columns.
Let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peanutfun sorry I didn't see your suggestion in time. We can definitely add the labels and units as parameter(s) to the plot function. Then the user has to know about what their gdf represents without us helping by providing the information, which I guess is ok but is a bit more work to the user. If we put the as defaults to the plot function, the plot function is not generic anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically along the same lines as @peanutfun
I'd suggest to have two parameters instead of gdf_meta: GdfMeta
,
colbar_name: str
and title_subplots: lambda
.
To get the same result as implemented now when calling
subplots_from_gdf(
gdf=gdf,
gdf_meta=GdfMeta(name='Return Period', unit='Years', col_name='Threshold Intensity', col_unit='m/s')
)
you would
subplots_from_gdf(
gdf=gdf,
colbar_name="Return Period (Years)",
title_subplots=lambda column: f"Threshold Intensity: {column} m/s",
)
Seems better readable and more flexible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consequently, I'd let the local_return_period
return a triple
Returns
-------
gdf : GeoDataFrame
label : str
GeoDataFrame label, for reporting and plotting
column_label : lambda
Column label, for reporting and plotting
in case the user is not 100% sure what they did and is happy to get this kind of conformation. (Why not?)
I like the generality! I only fear that the GdfMeta might make it not generic.
Sounds good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work! Please finalize the cosmetics: the linter complains about trailing white spaces to be removed and some indention that is incorrect. There is one file that is modified by mistake I think.
Also, don't forget to add your name to the list of authors: https://github.com/CLIMADA-project/climada_python/blob/feature/write_haz_rp_maps2/AUTHORS.md
climada/hazard/test/test_io.py
Outdated
from climada.test import get_test_file | ||
|
||
HAZ_TEST_TC :Path = get_test_file('test_tc_florida') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not part of this PR.
Thanks for the review(s)! I now understood what the trailing white space means :) For me this looks ready to merge, let me know if there is something else to do. |
This is good, feel free to merge! |
Changes proposed in this PR (new version of old PR #857):
local_return_period
and_loc_return_period
for given hazard intensities, giving out a geodataframesubplots_from_gdf
in util.plot that plots different subplots from the columns of a gdfEarlier parts removed (commented out in code) until agreement:
write_raster_local_exceedance_inten.
(do we also need as netCDF fileswrite_netcdf_local_exceedance_inten
?)write_raster_local_return_periods
or as NetCDF fileswrite_netcdf_local_return_periods
.This PR fixes #854
(new version of old PR #857)
PR Author Checklist
develop
)PR Reviewer Checklist