Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT] Make use of functional hierarchies more transparent in the UI #433

Open
jenno-verdonck opened this issue Dec 15, 2022 · 11 comments
Assignees

Comments

@jenno-verdonck
Copy link
Contributor

jenno-verdonck commented Dec 15, 2022

Describe the bug
ARX gives different optimal solutions when using hierarchyBuilders in comparison to using the hierarchy created from this builder.

To Reproduce
Steps to reproduce the behavior:

  1. Open the example project in ARX
  2. Anonymize and note down the best node
  3. write all hierarchies to CSV
  4. Load all hierarchies back in from CSV so that you no longer use builders
  5. Note the different solution.

Expected behavior
I expected to get the same solution in both situations.

Files
example.zip

ARX GUI (please complete the following information):

  • OS: Windows
  • Version 3.9.1
@prasser
Copy link
Collaborator

prasser commented Dec 15, 2022

Thanks. This issue doesn't contain enough info to understand the potential bug. Please provide further details.

@prasser
Copy link
Collaborator

prasser commented Dec 15, 2022

PS: I'm pretty sure that this isn't a bug but intended behavior, but to be sure and to explain what is going on I need more details.

@jenno-verdonck
Copy link
Contributor Author

Yea my bad. I accidently posted the report already before finishing it.

@jenno-verdonck jenno-verdonck changed the title [BUG] Inconsistent calculation metric using hierarchies vs hierarchyBuilders [BUG] Inconsistent score calculation using hierarchies vs hierarchyBuilders Dec 15, 2022
@prasser
Copy link
Collaborator

prasser commented Dec 15, 2022

OK, thanks. As already suspected, this is not a bug but expected behaviour. In ARX, hierarchies that have been generated using the builders are assicated with a "functional definition" of the hierarchy as meta-information. This information can be used to more accurately measure information loss. One example:

Assume you have a dataset with an integer attribute. In the records, you have three values: 1, 3 and 7.

When using an interval-based hierarchy builder, you specify the interval [0, 10[. As a result, ARX knows that [0, 10[ is a generalization of 10 integer values and might, e.g., estimate information loss as 1/10 = 0.1

When loading a hierarchy from a CSV file, ARX cannot "understand" what the entries in the hierarchy mean. In the case of our example, it can just see that "[0, 10[" is a generalization of 1, 3 and 7 and might, e.g., estimate information loss as 1/|{1, 3, 7}| = 1/3 = 0.33

You can also save and load the functional definitions of hierarchies in the wizards, using the "Save..." and "Load..." buttons.

@jenno-verdonck
Copy link
Contributor Author

Thanks for the clarification.

I already suspected something like this. I can however see how this may be confusing for some users that expect the same result when visually seeing the same hierarchy in the GUI.

Calculating the score like it is done using the csv files seems to make more sense to me as it take into account the properties of the used dataset and more accurately reflects the score specific to the dataset. I suspect that therefor the utility of the dataset obtained using CSV files will be higher.

@prasser prasser added enhancement and removed bug labels Dec 16, 2022
@prasser prasser changed the title [BUG] Inconsistent score calculation using hierarchies vs hierarchyBuilders [ENHANCEMENT] Make use of functional hierarchies more transparent in the UI Dec 16, 2022
@prasser
Copy link
Collaborator

prasser commented Dec 16, 2022

Calculating the score like it is done using the csv files seems to make more sense to me as it take into account the properties of the used dataset and more accurately reflects the score specific to the dataset. I suspect that therefor the utility of the dataset obtained using CSV files will be higher.

Not sure. I think this depends on the context and use case.

I already suspected something like this. I can however see how this may be confusing for some users that expect the same result when visually seeing the same hierarchy in the GUI.

I turned this issue into an "enhancement". We could make the fact whether a functional definition of a hierarchy is available and should be used more transparent in the UI. Please note that you can remove the functional representations, by manually editing the hierarchy in the hierarchy viewer (not in the wizard) as a workaround.

@idhamari
Copy link
Contributor

What about expoerting and importing the finctional definition of the hierarchies at the same event of the hierarchies. This way, if functional definition is available, it can be used for more accurate loss calculation and one gets same result everytime.

@jenno-verdonck
Copy link
Contributor Author

What about expoerting and importing the finctional definition of the hierarchies at the same event of the hierarchies. This way, if functional definition is available, it can be used for more accurate loss calculation and one gets same result everytime.

This would probably solve the import/export problems in the UI. A fix for this in the API could be to disable the user from building the HierarchyBuilder themselves or giving a warning when doing so. This would avoid scenarios where the user builds the Hierarchy and passes the result to the configuration, removing the functional definition. At the moment a user could do this without the knowledge of the difference between Hierarchies and HierarchyBuilders.

Another option would be to merge the hierarchy and builder representation and working with a toggle that enables or disables the functional definition when available. This would however require a mayor restructure I think.

@idhamari
Copy link
Contributor

This would probably solve the import/export problems in the UI.

I think one can do the same in the API e.g. saving both hierarchy and functional definition then load them. I will try the above solution and propose a PR.

@jenno-verdonck
Copy link
Contributor Author

After investigating this behavior a bit further. I noticed that the code only calculates the shares in the scoring functions differently when using Redaction- and Interval-based builders. All other builder types are calculated identically to not having a functional definition. The utility metrics, on the other hand, are only calculate differently when using a Redaction-based builders.

@prasser
Copy link
Collaborator

prasser commented Jan 4, 2023

It's true that not all utility models make use of additional info provided by functional hierarchies and that not all hierarchy types provide such information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants