Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time Series LSTMDetection and TSFCDetection Metrics #487

Closed
ca692526 opened this issue Jun 28, 2021 · 5 comments
Closed

Time Series LSTMDetection and TSFCDetection Metrics #487

ca692526 opened this issue Jun 28, 2021 · 5 comments
Labels
bug Something isn't working data:sequential Related to timeseries datasets feature:evaluation Related to running metrics or visualizations resolution:duplicate This issue or pull request already exists

Comments

@ca692526
Copy link

ca692526 commented Jun 28, 2021

Environment Details

SDV version: 0.10.0
Python version: 3.7.10
Operating System: Windows

Error Description

When executing LSTMDetection and TSFCDetection time series evaluation metrics the following error is presented:
"ValueError: No group keys passed!"

The PAR model was not trained with any entity columns or context columns. The PAR model was only trained with the sequence index. i.e model = PAR(sequence_index="timestamp")
model.fit(df) # Train the model

Does the model have to have an entity column to utilize the metric functions?

Steps to reproduce

import pandas as pd
import numpy as np
from pathlib import Path

from sdv.timeseries import PAR

from sdv.metrics.timeseries import LSTMDetection, TSFCDetection
real_data = pd.read_csv(DATA_DIR / "real_data.csv", parse_dates=['timestamp'])

synthetic_data = pd.read_csv(DATA_DIR / "synthetic_data.csv", parse_dates=['timestamp'])

metadata = {'fields': {'engine': {'type': 'numerical', 'subtype': 'float'},
                                  'airspeed': {'type': 'numerical', 'subtype': 'float'},
                                  'timestamp': {'type': 'datetime'}},
                    'sequence_index': 'timestamp'}

LSTMDetection.compute(real_data, synthetic_data, metadata)
TSFCDetection.compute(real_data, synthetic_data, metadata)
@ca692526 ca692526 added bug Something isn't working pending review labels Jun 28, 2021
@csala
Copy link
Contributor

csala commented Jun 29, 2021

Good catch. Thanks for reporting this @ca692526

I think you are right. The Time Series Detection metrics seem to require entity_columns to work, at least for now. I suppose they could be modified to work without them.

In any case, to make things easier and confirm that this is it, would you mind pasting here the complete traceback of the error, so we can see where it came from?

@ca692526
Copy link
Author

ca692526 commented Jun 29, 2021

Thanks for looking into this @csala. Below is the traceback for LSTMDetection.


ValueError Traceback (most recent call last)
in
----> 1 LSTMDetection.compute(real_data, synthetic_data, metadata)

~/.cache/pypoetry/virtualenvs/venv/lib/python3.7/site-packages/sdmetrics/timeseries/detection.py in compute(cls, real_data, synthetic_data, metadata, entity_columns)
83 transformer.fit(real_data.drop(entity_columns, axis=1))
84
---> 85 real_x = cls._build_x(real_data, transformer, entity_columns)
86 synt_x = cls._build_x(synthetic_data, transformer, entity_columns)
87

~/.cache/pypoetry/virtualenvs/venv/lib/python3.7/site-packages/sdmetrics/timeseries/detection.py in _build_x(data, transformer, entity_columns)
38 def _build_x(data, transformer, entity_columns):
39 X = pd.DataFrame()
---> 40 for entity_id, entity_data in data.groupby(entity_columns):
41 entity_data = entity_data.drop(entity_columns, axis=1)
42 entity_data = transformer.transform(entity_data)

~/.cache/pypoetry/virtualenvs/venv/lib/python3.7/site-packages/pandas/core/frame.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed, dropna)
6523 squeeze=squeeze,
6524 observed=observed,
-> 6525 dropna=dropna,
6526 )
6527

~/.cache/pypoetry/virtualenvs/venv/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in init(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, mutated, dropna)
531 observed=observed,
532 mutated=self.mutated,
--> 533 dropna=self.dropna,
534 )
535

~/.cache/pypoetry/virtualenvs/venv/lib/python3.7/site-packages/pandas/core/groupby/grouper.py in get_grouper(obj, key, axis, level, sort, observed, mutated, validate, dropna)
819
820 if len(groupings) == 0 and len(obj):
--> 821 raise ValueError("No group keys passed!")
822 elif len(groupings) == 0:
823 groupings.append(Grouping(Index([], dtype="int"), np.array([], dtype=np.intp)))

ValueError: No group keys passed!

@ca692526
Copy link
Author

Below is the traceback for TSFCDetection @csala


ValueError Traceback (most recent call last)
in
----> 1 TSFCDetection.compute(real_data, synthetic_data, metadata)

~/.cache/pypoetry/virtualenvs/venv/lib/python3.7/site-packages/sdmetrics/timeseries/detection.py in compute(cls, real_data, synthetic_data, metadata, entity_columns)
83 transformer.fit(real_data.drop(entity_columns, axis=1))
84
---> 85 real_x = cls._build_x(real_data, transformer, entity_columns)
86 synt_x = cls._build_x(synthetic_data, transformer, entity_columns)
87

~/.cache/pypoetry/virtualenvs/venv/lib/python3.7/site-packages/sdmetrics/timeseries/detection.py in _build_x(data, transformer, entity_columns)
38 def _build_x(data, transformer, entity_columns):
39 X = pd.DataFrame()
---> 40 for entity_id, entity_data in data.groupby(entity_columns):
41 entity_data = entity_data.drop(entity_columns, axis=1)
42 entity_data = transformer.transform(entity_data)

~/.cache/pypoetry/virtualenvs/venv/lib/python3.7/site-packages/pandas/core/frame.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed, dropna)
6523 squeeze=squeeze,
6524 observed=observed,
-> 6525 dropna=dropna,
6526 )
6527

~/.cache/pypoetry/virtualenvs/venv/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in init(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, mutated, dropna)
531 observed=observed,
532 mutated=self.mutated,
--> 533 dropna=self.dropna,
534 )
535

~/.cache/pypoetry/virtualenvs/venv/lib/python3.7/site-packages/pandas/core/groupby/grouper.py in get_grouper(obj, key, axis, level, sort, observed, mutated, validate, dropna)
819
820 if len(groupings) == 0 and len(obj):
--> 821 raise ValueError("No group keys passed!")
822 elif len(groupings) == 0:
823 groupings.append(Grouping(Index([], dtype="int"), np.array([], dtype=np.intp)))

ValueError: No group keys passed!

@npatki npatki added feature:evaluation Related to running metrics or visualizations data:sequential Related to timeseries datasets and removed pending review labels Jun 30, 2022
@npatki
Copy link
Contributor

npatki commented Jun 30, 2022

Until we fix this issue, a possible workaround might be to add an entity column to both the real and synthetic data.

Eg. You can add an Entity ID column to both that just has a single, static value such as 'ID_0'.

@npatki
Copy link
Contributor

npatki commented Oct 17, 2022

Seems like this issue is a dupe of the SDMetrics issue: sdv-dev/SDMetrics#77

I'll close this off in favor of SDMetrics one, since it is closer to where the error is actually happening. The issue also has some more discussion on how we're thinking about sequential metrics -- especially when there is a single sequence vs. multiple sequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data:sequential Related to timeseries datasets feature:evaluation Related to running metrics or visualizations resolution:duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

3 participants