You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks a lot for your work on this! @Tilmon noticed the following in the emission profiles_company
Duplicates: running dplyr::distinct() on the datasets emission_profile_company.csv, emission_profile_product.csv, emission_profile_upstream_at_company_level.csv shows that all these 3 datasets have duplicates. Only tested for these 3. All datasets should be tested for duplications and duplications avoided. E.g. the companies_id "adolf-wurth-gmbh-co-kg_00000004971238-001" has all rows twice in the emission_profile_product.csv.
Could you double check if there is a quality check included that would avoid this? And do you know where the duplicates come from? Is this an issue in the code from GitHub or is there something happening on DataBricks that makes this mistake? If it is due to the code on GitHub we would need to investigate where this comes from.
Best
Anne
The text was updated successfully, but these errors were encountered:
yes I am working on it in the line of comparing the outputs. The duplicates seem to come from the column extra_rowid. Kalash right now is working on removing this column from the final output as he said that it should not be a part of the user facing output.
After this column is removed, I will rerun the package again and double check if the issue for the duplicates is solved.
Dear @SKruthoff and @ysherstyuk,
Thanks a lot for your work on this!
@Tilmon noticed the following in the emission profiles_company
Could you double check if there is a quality check included that would avoid this? And do you know where the duplicates come from? Is this an issue in the code from GitHub or is there something happening on DataBricks that makes this mistake? If it is due to the code on GitHub we would need to investigate where this comes from.
Best
Anne
The text was updated successfully, but these errors were encountered: