Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_columns_from_metadata() does not work anymore! #709

Open
wilcovanvorstenbosch opened this issue Jan 16, 2025 · 2 comments
Open

get_columns_from_metadata() does not work anymore! #709

wilcovanvorstenbosch opened this issue Jan 16, 2025 · 2 comments
Labels
bug Something isn't working under discussion Issue is currently being discussed

Comments

@wilcovanvorstenbosch
Copy link

wilcovanvorstenbosch commented Jan 16, 2025

The function does not work, probably because of the new MetaData class??

def get_columns_from_metadata(metadata):
    """Get the column info from a metadata dict.

    Args:
        metadata (dict):
            The metadata dict.

    Returns:
        dict:
            The columns metadata.
    """
    return metadata.get('columns', {})

I found out when trying out the LogisticDetection.compute() function, which gave the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[10], line 3
      1 from sdmetrics.single_table import LogisticDetection, SVCDetection
----> 3 LogisticDetection.compute(
      4     real_data=real,
      5     synthetic_data=synth,
      6     metadata=metadata
      7 )

File ~\AppData\Roaming\Python\Python312\site-packages\sdmetrics\single_table\detection\base.py:94, in DetectionMetric.compute(cls, real_data, synthetic_data, metadata)
     72 @classmethod
     73 def compute(cls, real_data, synthetic_data, metadata=None):
     74     """Compute this metric.
     75 
     76     This builds a Machine Learning Classifier that learns to tell the synthetic
   (...)
     92             One minus the ROC AUC Cross Validation Score obtained by the classifier.
     93     """
---> 94     real_data, synthetic_data, metadata = cls._validate_inputs(
     95         real_data, synthetic_data, metadata
     96     )
     98     transformed_real_data, transformed_synthetic_data = cls._drop_non_compute_columns(
     99         real_data, synthetic_data, metadata
    100     )
...
--> 117         raise ValueError(f'Column {column} not found in metadata')
    119 for field, field_meta in fields.items():
    120     field_type = get_type_from_column_meta(field_meta)

ValueError: Column KredietAanbieder not found in metadata

This ValueError is raised because metadata.get('columns') returns None, when using a MetaData object as an argument.

@wilcovanvorstenbosch wilcovanvorstenbosch added bug Something isn't working new Label applied to new issues labels Jan 16, 2025
@wilcovanvorstenbosch
Copy link
Author

I made it work, using a workaround: loading, then creating the metadata, then running the following code:

metadata = metadata.to_dict().get('tables').get('table')

I'd suggest something like this is included into the code.

@npatki
Copy link
Contributor

npatki commented Jan 17, 2025

Hi @wilcovanvorstenbosch, thanks for the details and glad you got it to work!

Our SDMetrics library is designed to be a standalone, open-source library that anyone can use. It purposefully designed to be separate from the SDV library. Therefore, you will find SDV-specific concepts such as the Metadata object cannot be used with SDMetrics. (Even if we wanted to import from SDV, that wouldn't be possible because it would create a circular reference.)

I would suggest referring to the SDMetrics documentation for the expected input/output when using SDmetrics. Also note that SDMetrics has its own Metadata reference page which indicates that the metadata should be represented as a dictionary and describes the dictionary format.

Should the dictionary format be more aligned with SDV?: A few months ago, SDV updated its own format for single-table metadata to contain a nested table name. I will file a new feature request for allowing SDMetrics to accept this too (though it would still have to be in a dictionary format).

@npatki npatki added under discussion Issue is currently being discussed and removed new Label applied to new issues labels Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working under discussion Issue is currently being discussed
Projects
None yet
Development

No branches or pull requests

2 participants