API: move from object-based design to function-based #308

martinfleis · 2021-10-03T17:55:51Z

I want to propose a big API overhaul, moving away from the current object-based API to a simpler function-based one.

Current situation

At the moment, the functionality can be split into several groups:

basic characters with a single output

The majority of characters takes input variables and their main output is a single pandas Series. The usual pattern looks like this (note the .series):

gdf['volume'] = momepy.Volume(gdf, heights='height_col').series

characters with multiple outputs

Some characters result in multiple Series'. Those were initially the main reason to implement object-based API.

street_prof = momepy.StreetProfile(streets_df, buildings_df, heights='height')
streets_df['width'] = street_prof.w
streets_df['deviations'] = street_prof.wd

characters measured on networkx Graph

Characters measured by the graph module use functions. They take a network, measure stuff and save the result as a node or edge attribute of a given network and return its copy with that attribute.

network_graph = mm.betweenness_centrality(network_graph)

classes generating morphological features

Classes that are used to generate new geometries (like Tessellation) return a new GeoDataFrame but also Series' with IDs in some cases to link the resulting gdf to the original one.

blocks = mm.Blocks(tessellation_df, streets_df, buildings_df, 'bID', 'uID')
blocks_gdf = blocks.blocks
buildings_id = blocks.buildings_id

additional utilities that are usually functions

buildings_df['nID'] = momepy.get_network_id(buildings_df, streets_df, 'nID')

Proposal

I would like to simplify all that and turn existing classes into functions returning numpy arrays. There are a few reasons for that:

calling the .series attribute is annoying and there's almost no value in other attributes we currently provide
dask_geopandas compatibility - while you can use existing classes within map_partitions, it is not straightforward, especially for classes with multiple outputs.
lighter code - wrapping everything in classes generates a lot of unnecessary code. The switch should in the long-term allow easier maintenance.
compatibility with the rest of PySAL, especially esda

So, the proposed changes to the groups listed above:

basic characters with a single output

A simple function that returns a numpy array.

gdf['volume'] = momepy.volume(gdf, heights='height_col')

characters with multiple outputs

A function that returns a tuple of arrays (for example like gdf.sindex.query_bulk does)

width, deviations = momepy.street_profile(streets_df, buildings_df, heights='height')
streets_df['width'] = width
streets_df['deviations'] = deviations

characters measured on networkx.Graph

These do not require many changes. I would just introduce inplace keyword to avoid unnecessary copying. But due to the nature of the underlying data structure, they will always be a bit different in behaviour.

mm.betweenness_centrality(network_graph, inplace=True)

classes generating morphological features

In most cases, these can be turned into functions. In the case of tessellation, I would even split it into two - morphological_tessellation and enclosed_tessellation. When we return IDs, we can again return a tuple. However, this change is not as needed and we can keep the existing API if we come to that conclusion.

blocks_gdf, blg_id = mm.blocks(tessellation_df, streets_df, buildings_df, 'bID', 'uID')

additional utilities that are usually functions

No change needed here.

This proposal has significant API implications as it is going to break everything. But I think that we should do that anyway and sooner it happens, the less trouble it will cause. We should make a transition period long enough to let everyone adapt to a new API before we remove it.

I want to hear your thoughts on that as it entails a pretty significant refactoring and changes.

cc @jGaboardi @ljwolf @amorfinv @pysal/devs

The text was updated successfully, but these errors were encountered:

jGaboardi · 2021-10-03T18:28:16Z

I'm always for progress and improvements, in general. I'm going to give this more thought, as it will have implications for the refactoring of spaghetti. We should try to set up a focused meeting on this (and that) soon-ish. I will be more free in the second half of October in the evening/on weekends, but still may be difficult to schedule something (my new job is M-F; 9-5).

martinfleis · 2021-10-03T18:37:51Z

We should try to set up a focused meeting on this (and that) soon-ish. I will be more free in the second half of October in the evening/on weekends, but still may be difficult to schedule something (my new job is M-F; 9-5).

I am afraid that your evenings are not feasible for me here :D. Weekends should be fine.

amorfinv · 2021-10-04T08:19:37Z

Hello,
I think that sounds like a good idea...I am happy to adapt mm.coins, mm.sndi, and to help with anything else.
-Andres

martinfleis mentioned this issue Dec 22, 2021

REF: simplify calculation of Simpson diversity #320

Merged

martinfleis mentioned this issue Jan 5, 2022

BUG: fix FormFactor #330

Merged

martinfleis added the enhancement New feature or request label Jan 10, 2022

pysal locked and limited conversation to collaborators Jan 10, 2022

martinfleis converted this issue into discussion #340 Jan 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

API: move from object-based design to function-based #308

API: move from object-based design to function-based #308

martinfleis commented Oct 3, 2021

jGaboardi commented Oct 3, 2021

martinfleis commented Oct 3, 2021

amorfinv commented Oct 4, 2021

This issue was moved to a discussion.

This issue was moved to a discussion.

API: move from object-based design to function-based #308

API: move from object-based design to function-based #308

Comments

martinfleis commented Oct 3, 2021

Current situation

Proposal

jGaboardi commented Oct 3, 2021

martinfleis commented Oct 3, 2021

amorfinv commented Oct 4, 2021

This issue was moved to a discussion.