Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MSAS #199

Closed
LiFaytheGoblin opened this issue Aug 26, 2022 · 3 comments
Closed

Implement MSAS #199

LiFaytheGoblin opened this issue Aug 26, 2022 · 3 comments
Labels
data:sequential Related to timeseries datasets feature request Request for a new feature resolution:resolved The issue was fixed, the question was answered, etc.

Comments

@LiFaytheGoblin
Copy link

Problem Description

The current Metrics implemented in SDV do not specifically measure the quality of sequences generated with CPAR.

Expected behavior

MSAS is a metric for sequential data quality, detailed in http://arxiv.org/abs/2207.14406. It should be implemented in SDV.

@LiFaytheGoblin LiFaytheGoblin added feature request Request for a new feature new Label applied to new issues labels Aug 26, 2022
@npatki
Copy link
Contributor

npatki commented Aug 29, 2022

Thanks for filing @LiFaytheGoblin. We'll keep this open to track as we make progress on it.

Just a note that MSAS refers to our overall algorithm of computing sequential data quality, and works in the following steps:

  1. Compute a metric for every sequence in the real data to get a distribution X
  2. Compute the same metric for every sequence in the synthetic data to get a distribution X'
  3. Use the KSComplement test to compare the distributions X and X'

Various metrics can be used in step 1. In the paper we used: length, mean, median, standard deviation and the difference between a row n and some step n+t.

Are there any particular metrics that are more or less important to your use case?

@npatki npatki added under discussion Issue is currently being discussed data:sequential Related to timeseries datasets and removed new Label applied to new issues labels Aug 29, 2022
@npatki
Copy link
Contributor

npatki commented Aug 31, 2022

FYI some metrics that will use MSAS are actively being discussed in #198

@npatki npatki removed the under discussion Issue is currently being discussed label Jun 8, 2023
@npatki
Copy link
Contributor

npatki commented Jan 8, 2025

MSAS metrics are available in the SDMetrics. See SequenceLengthSimilarity, StatisticMSAS, and InterRowMSAS.

@npatki npatki closed this as completed Jan 8, 2025
@npatki npatki added the resolution:resolved The issue was fixed, the question was answered, etc. label Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:sequential Related to timeseries datasets feature request Request for a new feature resolution:resolved The issue was fixed, the question was answered, etc.
Projects
None yet
Development

No branches or pull requests

2 participants