`nodes._status()` -- `'new'` in `pandas.Series` always False due to evaluation #93

jGaboardi · 2024-11-17T20:20:29Z

Following #83 (and @martinfleis's explanation there) I want to triple check the correctness of the 'new' in ... condition in nodes._status(). Currently this will never evaluate to True since it is passed in as a pandas.Series:

import pandas, sgeop

# this should print `new` but does not
print(sgeop.nodes._status(pandas.Series(["one", "new", "two"])))
# "changed"

# due to how `pandas.Series` is evaluated
print("new" in pandas.Series(["one", "new", "two"]))
# False

# so we need to evaluate the values of the Series
print("new" in pandas.Series(["one", "new", "two"]).values)
# True

For the small test case of Apalachicola, FL the results of '_status' change as follows:

current main -- 'new' in x:

.["_status"].value_counts()
# original    549
# changed     125
# new          95
# Name: count, dtype: int64

updated condition -- 'new' in x.values:

.["_status"].value_counts()
# original    549
# new         209
# changed      11
# Name: count, dtype: int64

This of course affects the full scale FUA testing for '_status' labeling.

xref:

small test nodes._status() -- ensure coverage #92

The plot here highlights the difference where:

thinnest black lines are the original input
medium lines are the updated conditions
thickest lines are the original conditions

It does appear that the results from the original condition are the most correct where lines that are extended are not marked as entirely new. So I am pretty sure our current implementation is accurate, but we should be absolutely sure.

Data:

gh93_status_comparison.zip

The text was updated successfully, but these errors were encountered:

martinfleis · 2024-11-18T08:00:42Z

How come the sum of value counts above is different?

jGaboardi · 2024-11-18T14:21:09Z

How come the sum of value counts above is different?

In [1]: (549 + 125 + 95) == (549 + 209 + 11)
Out[1]: True

In [2]: (549 + 125 + 95)
Out[2]: 769

In [3]: (549 + 209 + 11)
Out[3]: 769

They are equivalent, no?

martinfleis · 2024-11-18T14:49:24Z

I can't count....

jGaboardi · 2024-11-19T15:12:47Z

So if x is passed in as a list then the condition can potentially evaluate as True (otherwise the condition will always evaluate to False no matter if 'new' is present). But after combing through the code base I am not finding anywhere we are doing that. @martinfleis Any more thoughts off the top of your head here?

While this does not have a direct impact on geometries being generated, it does have a meaningful impact on geometry labeling, which then could affect the algorithm (from my understanding).

I'd say we need to figure this out before I continue with testing/refactor and transfer functionality to simplification.core.

martinfleis · 2024-11-19T15:14:19Z

No idea...

jGaboardi · 2024-11-19T15:34:20Z

But after combing through the code base I am not finding anywhere we are doing that.

Confirmed programmatically there are no instances of passing in a list

jGaboardi · 2024-11-19T15:52:01Z

So I guess we need to determine if our logic of "new" vs. "changed" is sound. Currently:

def _status(x: pd.Series) -> str:
    """Determine the status of edge line(s)."""
    if len(x) == 1:
        return x.iloc[0]
    if "new" in x:
        # This logic is here just to be safe. It will be hit if we create a new line
        # and in a subsequent step extend it which is not what normally happens.
        # All the new bits are caught likely by the first ``if``.
        return "new"
    return "changed"

... a linestring geometry is considered "new" only if len(x) == 1 and "new" is that value. Since the second conditional is flawed.

So:

Is a line "new" if partially new or entirely new?
I think "new" only if entirely new, elsewise "changed".
Therefore, I think our current labeling is correct and we can simply remove that final conditional

@martinfleis @anastassiavybornova Do yall concur with this logic?

martinfleis · 2024-11-19T15:58:10Z

I think that's right, yes.

jGaboardi added the bug Something isn't working label Nov 17, 2024

jGaboardi assigned jGaboardi and martinfleis Nov 17, 2024

jGaboardi added question/idea/discussion and removed bug Something isn't working labels Nov 17, 2024

jGaboardi mentioned this issue Nov 19, 2024

Remove faulty condition in nodes._status() #98

Merged

martinfleis closed this as completed in #98 Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`nodes._status()` -- `'new'` in `pandas.Series` always False due to evaluation #93

`nodes._status()` -- `'new'` in `pandas.Series` always False due to evaluation #93

jGaboardi commented Nov 17, 2024 •

edited

Loading

martinfleis commented Nov 18, 2024

jGaboardi commented Nov 18, 2024

martinfleis commented Nov 18, 2024

jGaboardi commented Nov 19, 2024

martinfleis commented Nov 19, 2024

jGaboardi commented Nov 19, 2024

jGaboardi commented Nov 19, 2024

martinfleis commented Nov 19, 2024

nodes._status() -- 'new' in pandas.Series always False due to evaluation #93

nodes._status() -- 'new' in pandas.Series always False due to evaluation #93

Comments

jGaboardi commented Nov 17, 2024 • edited Loading

martinfleis commented Nov 18, 2024

jGaboardi commented Nov 18, 2024

martinfleis commented Nov 18, 2024

jGaboardi commented Nov 19, 2024

martinfleis commented Nov 19, 2024

jGaboardi commented Nov 19, 2024

jGaboardi commented Nov 19, 2024

martinfleis commented Nov 19, 2024

`nodes._status()` -- `'new'` in `pandas.Series` always False due to evaluation #93

`nodes._status()` -- `'new'` in `pandas.Series` always False due to evaluation #93

jGaboardi commented Nov 17, 2024 •

edited

Loading