Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodes._status() -- 'new' in pandas.Series always False due to evaluation #93

Closed
jGaboardi opened this issue Nov 17, 2024 · 8 comments · Fixed by #98
Closed

nodes._status() -- 'new' in pandas.Series always False due to evaluation #93

jGaboardi opened this issue Nov 17, 2024 · 8 comments · Fixed by #98

Comments

@jGaboardi
Copy link
Collaborator

jGaboardi commented Nov 17, 2024

Following #83 (and @martinfleis's explanation there) I want to triple check the correctness of the 'new' in ... condition in nodes._status(). Currently this will never evaluate to True since it is passed in as a pandas.Series:

import pandas, sgeop

# this should print `new` but does not
print(sgeop.nodes._status(pandas.Series(["one", "new", "two"])))
# "changed"

# due to how `pandas.Series` is evaluated
print("new" in pandas.Series(["one", "new", "two"]))
# False

# so we need to evaluate the values of the Series
print("new" in pandas.Series(["one", "new", "two"]).values)
# True

For the small test case of Apalachicola, FL the results of '_status' change as follows:

current main -- 'new' in x:

.["_status"].value_counts()
# original    549
# changed     125
# new          95
# Name: count, dtype: int64

updated condition -- 'new' in x.values:

.["_status"].value_counts()
# original    549
# new         209
# changed      11
# Name: count, dtype: int64

This of course affects the full scale FUA testing for '_status' labeling.

xref:


The plot here highlights the difference where:

  • thinnest black lines are the original input
  • medium lines are the updated conditions
  • thickest lines are the original conditions

_status_comparison

It does appear that the results from the original condition are the most correct where lines that are extended are not marked as entirely new. So I am pretty sure our current implementation is accurate, but we should be absolutely sure.


Data:

@jGaboardi jGaboardi added the bug Something isn't working label Nov 17, 2024
@jGaboardi jGaboardi added question/idea/discussion and removed bug Something isn't working labels Nov 17, 2024
@martinfleis
Copy link
Contributor

How come the sum of value counts above is different?

@jGaboardi
Copy link
Collaborator Author

How come the sum of value counts above is different?

In [1]: (549 + 125 + 95) == (549 + 209 + 11)
Out[1]: True

In [2]: (549 + 125 + 95)
Out[2]: 769

In [3]: (549 + 209 + 11)
Out[3]: 769

They are equivalent, no?

@martinfleis
Copy link
Contributor

I can't count....

@jGaboardi
Copy link
Collaborator Author

So if x is passed in as a list then the condition can potentially evaluate as True (otherwise the condition will always evaluate to False no matter if 'new' is present). But after combing through the code base I am not finding anywhere we are doing that. @martinfleis Any more thoughts off the top of your head here?

While this does not have a direct impact on geometries being generated, it does have a meaningful impact on geometry labeling, which then could affect the algorithm (from my understanding).

I'd say we need to figure this out before I continue with testing/refactor and transfer functionality to simplification.core.

@martinfleis
Copy link
Contributor

No idea...

@jGaboardi
Copy link
Collaborator Author

But after combing through the code base I am not finding anywhere we are doing that.

Confirmed programmatically there are no instances of passing in a list

@jGaboardi
Copy link
Collaborator Author

So I guess we need to determine if our logic of "new" vs. "changed" is sound. Currently:

def _status(x: pd.Series) -> str:
    """Determine the status of edge line(s)."""
    if len(x) == 1:
        return x.iloc[0]
    if "new" in x:
        # This logic is here just to be safe. It will be hit if we create a new line
        # and in a subsequent step extend it which is not what normally happens.
        # All the new bits are caught likely by the first ``if``.
        return "new"
    return "changed"

... a linestring geometry is considered "new" only if len(x) == 1 and "new" is that value. Since the second conditional is flawed.

So:

  • Is a line "new" if partially new or entirely new?
  • I think "new" only if entirely new, elsewise "changed".
  • Therefore, I think our current labeling is correct and we can simply remove that final conditional

@martinfleis @anastassiavybornova Do yall concur with this logic?

@martinfleis
Copy link
Contributor

I think that's right, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants