You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to propose a big API overhaul, moving away from the current object-based API to a simpler function-based one.
Current situation
At the moment, the functionality can be split into several groups:
basic characters with a single output
The majority of characters takes input variables and their main output is a single pandas Series. The usual pattern looks like this (note the .series):
Characters measured by the graph module use functions. They take a network, measure stuff and save the result as a node or edge attribute of a given network and return its copy with that attribute.
Classes that are used to generate new geometries (like Tessellation) return a new GeoDataFrame but also Series' with IDs in some cases to link the resulting gdf to the original one.
I would like to simplify all that and turn existing classes into functions returning numpy arrays. There are a few reasons for that:
calling the .series attribute is annoying and there's almost no value in other attributes we currently provide
dask_geopandas compatibility - while you can use existing classes within map_partitions, it is not straightforward, especially for classes with multiple outputs.
lighter code - wrapping everything in classes generates a lot of unnecessary code. The switch should in the long-term allow easier maintenance.
compatibility with the rest of PySAL, especially esda
So, the proposed changes to the groups listed above:
These do not require many changes. I would just introduce inplace keyword to avoid unnecessary copying. But due to the nature of the underlying data structure, they will always be a bit different in behaviour.
In most cases, these can be turned into functions. In the case of tessellation, I would even split it into two - morphological_tessellation and enclosed_tessellation. When we return IDs, we can again return a tuple. However, this change is not as needed and we can keep the existing API if we come to that conclusion.
This proposal has significant API implications as it is going to break everything. But I think that we should do that anyway and sooner it happens, the less trouble it will cause. We should make a transition period long enough to let everyone adapt to a new API before we remove it.
I want to hear your thoughts on that as it entails a pretty significant refactoring and changes.
I'm always for progress and improvements, in general. I'm going to give this more thought, as it will have implications for the refactoring of spaghetti. We should try to set up a focused meeting on this (and that) soon-ish. I will be more free in the second half of October in the evening/on weekends, but still may be difficult to schedule something (my new job is M-F; 9-5).
We should try to set up a focused meeting on this (and that) soon-ish. I will be more free in the second half of October in the evening/on weekends, but still may be difficult to schedule something (my new job is M-F; 9-5).
I am afraid that your evenings are not feasible for me here :D. Weekends should be fine.
I want to propose a big API overhaul, moving away from the current object-based API to a simpler function-based one.
Current situation
At the moment, the functionality can be split into several groups:
The majority of characters takes input variables and their main output is a single pandas Series. The usual pattern looks like this (note the
.series
):Some characters result in multiple Series'. Those were initially the main reason to implement object-based API.
Characters measured by the
graph
module use functions. They take a network, measure stuff and save the result as a node or edge attribute of a given network and return its copy with that attribute.Classes that are used to generate new geometries (like
Tessellation
) return a new GeoDataFrame but also Series' with IDs in some cases to link the resulting gdf to the original one.Proposal
I would like to simplify all that and turn existing classes into functions returning numpy arrays. There are a few reasons for that:
.series
attribute is annoying and there's almost no value in other attributes we currently providedask_geopandas
compatibility - while you can use existing classes withinmap_partitions
, it is not straightforward, especially for classes with multiple outputs.esda
So, the proposed changes to the groups listed above:
A simple function that returns a numpy array.
A function that returns a tuple of arrays (for example like
gdf.sindex.query_bulk
does)networkx.Graph
These do not require many changes. I would just introduce
inplace
keyword to avoid unnecessary copying. But due to the nature of the underlying data structure, they will always be a bit different in behaviour.In most cases, these can be turned into functions. In the case of tessellation, I would even split it into two -
morphological_tessellation
andenclosed_tessellation
. When we return IDs, we can again return a tuple. However, this change is not as needed and we can keep the existing API if we come to that conclusion.No change needed here.
This proposal has significant API implications as it is going to break everything. But I think that we should do that anyway and sooner it happens, the less trouble it will cause. We should make a transition period long enough to let everyone adapt to a new API before we remove it.
I want to hear your thoughts on that as it entails a pretty significant refactoring and changes.
cc @jGaboardi @ljwolf @amorfinv @pysal/devs
The text was updated successfully, but these errors were encountered: