Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set_geometry #889

Merged
merged 226 commits into from
Jul 9, 2024
Merged

set_geometry #889

merged 226 commits into from
Jul 9, 2024

Conversation

emanuel-schmid
Copy link
Collaborator

@emanuel-schmid emanuel-schmid commented Jun 7, 2024

Changes proposed in this PR:

  • use set_geometry in util.coordinates instead of direct column assignment

This PR fixes #888

PR Author Checklist

PR Reviewer Checklist

@emanuel-schmid emanuel-schmid changed the base branch from main to develop June 7, 2024 14:34
@emanuel-schmid emanuel-schmid changed the title Feature/set geometry set_geometry Jun 7, 2024
@chahank
Copy link
Member

chahank commented Jun 7, 2024

Quick note: while the fix works, I think this is not a very forward-looking fix.

Currently, the method assumes that the inputs are geodataframes without a geometry column, but instead latitude and longitude columns. This is what we are trying to remove from the codebase (done in centroids, ongoing in exposures).

I suggest that we at least flag this method as one that should be refactored as soon as exposures have proper geodataframes.

@peanutfun
Copy link
Member

You don't need a temporary column. set_geometry also accepts an array of values, and sets these as the geometry column, see https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.set_geometry.html

@emanuel-schmid
Copy link
Collaborator Author

Er - are you sure? When running with a scheduler, dask.dataframe complains if there is no column to assign the values to. Isn't it?

@emanuel-schmid
Copy link
Collaborator Author

I suggest that we at least flag this method as one that should be refactored as soon as exposures have proper geodataframes.

No need to flag this. Any occurrence of geodataframes with latitude columns will eventually show up in the list of failed tests. 😁

@peanutfun
Copy link
Member

When running with a scheduler, dask.dataframe complains if there is no column to assign the values to. Isn't it?

Ah, sorry. I have no idea about dask dataframes 🙈 For the single process using only GeoDataFrame, it should not be necessary.

Copy link
Member

@peanutfun peanutfun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this! I think the CRS handling can be improved a bit.

However, I am wondering if the set_df_geometry_points needs the bit about the scheduler at all. geopandas.points_from_xy has become very efficient since Shapely v2.0, because the latter now uses multithreading. As the final geometry is anyway stored in a regular GeoDataFrame, and not in a distributed dask dataframe, I think we can safely remove the whole scheduler thing from set_df_geometry_points and make the function much simpler.

The linter complains anyway that the scheduler part is not tested 😅

Comment on lines 2658 to 2661
try:
crs = df_val.geometry.crs
except AttributeError:
crs = None
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any GeoDataFrame has the attribute .crs. If it is not set, the attribute already returns None. So this can simply be

crs = df_val.crs if crs is None else crs  # crs might now still be None

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@emanuel-schmid
Copy link
Collaborator Author

I think we can safely remove the whole scheduler thing from set_df_geometry_points and make the function much simpler.

@peanutfun : I'm inclined to believe you but would like to separate that from this PR, as it is not related to #888

@emanuel-schmid emanuel-schmid merged commit 84eaa8f into develop Jul 9, 2024
3 of 6 checks passed
@emanuel-schmid emanuel-schmid deleted the feature/set_geometry branch July 9, 2024 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

util.coordinates functions add 'geometry' column to GeoDataFrame
6 participants