Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⏱️ Find quicker way to detect a change in a roadway network #391

Open
e-lo opened this issue Oct 14, 2024 · 0 comments
Open

⏱️ Find quicker way to detect a change in a roadway network #391

e-lo opened this issue Oct 14, 2024 · 0 comments
Labels
performance addresses performance but doesn't add a feature
Milestone

Comments

@e-lo
Copy link
Collaborator

e-lo commented Oct 14, 2024

Right now wrangler stores a roadway network-hash every time it changes its value in order to detect if other operations need to be rerun or can use stored values (i.e. selections, model network changes, etc).

The hash is a hash of the combined hashes of links and node dataframes which is a hash of the underlying numpy array. This is taking about a second each time it is created for just the ST Paul network, which adds up. This is already only done lazily when it is actually needed so we would need to speed up the actual hash creation.

@property
def network_hash(self) -> str:
    """Hash of the links and nodes dataframes."""
    _value = str.encode(self.links_df.df_hash() + "-" + self.nodes_df.df_hash())

    _hash = hashlib.sha256(_value).hexdigest()
    return _hash
@pd.api.extensions.register_dataframe_accessor("df_hash")
class dfHash:
"""Creates a dataframe hash that is compatable with geopandas and various metadata.

Definitely not the fastest, but she seems to work where others have failed.
"""

def __init__(self, pandas_obj):
    """Initialization function for the dataframe hash."""
    self._obj = pandas_obj

def __call__(self):
    """Function to hash the dataframe."""
    _value = str(self._obj.values).encode()
    hash = hashlib.sha1(_value).hexdigest()
    return hash
@e-lo e-lo added the performance addresses performance but doesn't add a feature label Oct 14, 2024
@e-lo e-lo added this to the v1.1 milestone Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance addresses performance but doesn't add a feature
Projects
None yet
Development

No branches or pull requests

1 participant