Parent-child detection metrics in the case of multiple parents configuration #290
Labels
data:multi-table
Related to multi-table, relational datasets
feature:metrics
Related to any of the individual metrics
feature request
Request for a new feature
Problem Description
The current version of the parent-child detection metric works well when applied on the denormalized data with linear parent-child relationship scheme. However, we think that the process of the evaluation of the denormalized data can be improved when applied for the multi parent-child relationship.
To illustrate this case we will use the biodegradability dataset as an example.
The bond table has two parent tables (Table atom duplicated twice). The current version of parent-child detection proceeds by iterating the denormalization process for each parent table separately from each other parent. That is, the parent-child detection will:
This computation method successfully evaluates separately the relationship between each parent table and the child table but we can identify two drawbacks:
For this reason, we think that denormalizing parent and child tables in a single table is more relevant. For example, for the previous database the denormalized table will be constructed in a single step and gives the following table that can evaluate also the indirect relationship between the parents:
The text was updated successfully, but these errors were encountered: