Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add class: SimulationMacroCache #224

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

SylviaDu99
Copy link
Contributor

WIP: Another version for feat: SimulationMacroCache

targeting to fix: #197

Copy link

codecov bot commented Jul 15, 2024

Codecov Report

Attention: Patch coverage is 54.79452% with 33 lines in your changes missing coverage. Please review.

Project coverage is 82.13%. Comparing base (64937e2) to head (e5449e3).
Report is 10 commits behind head on master.

Files Patch % Lines
policyengine_core/simulations/sim_macro_cache.py 58.53% 17 Missing ⚠️
policyengine_core/simulations/simulation.py 27.27% 14 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #224      +/-   ##
==========================================
- Coverage   82.56%   82.13%   -0.43%     
==========================================
  Files         187      190       +3     
  Lines        9328     9574     +246     
  Branches     1183     1206      +23     
==========================================
+ Hits         7702     7864     +162     
- Misses       1348     1427      +79     
- Partials      278      283       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@SylviaDu99 SylviaDu99 marked this pull request as ready for review August 9, 2024 23:48
Copy link
Contributor

@anth-volk anth-volk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contributions to this @SylviaDu99. I've left a couple minor change requests, and it would also be appreciated if you could update your PR description. I'd also like to get Nikhil's review on this, so I've requested his review, as well.


def get_cache_value(self, cache_file_path: Path):
with h5py.File(cache_file_path, "r") as f:
# Validate both core version and country package metadata are uptodate, otherwise flush the cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Validate both core version and country package metadata are uptodate, otherwise flush the cache
# Validate both core version and country package metadata are up-to-date, otherwise flush the cache


variable = self.tax_benefit_system.get_variable(variable_name)
parameter_deps = variable.exhaustive_parameter_dependencies

if parameter_deps is None:
return None
return not is_cache_available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would get rid of is_cache_available and just return either True or False in this function, as I think the is_cache_available value serves to complicate the function.

Copy link
Contributor

@nikhilwoodruff nikhilwoodruff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @SylviaDu99! Just a few comments.

from policyengine_core.taxbenefitsystems import TaxBenefitSystem


class Singleton(type):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry- could you explain why we need this?

if not self.is_over_dataset:
return None
is_cache_available = True
if self.is_over_dataset:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR started- we've added the ability to load DataFrames directly in simulations. Could we add an extra condition here that simulations from DataFrames are excluded from the macro cache? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikhilwoodruff Could you describe the syntax necessary for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it it would involve using

if self.dataset.data_format == Dataset.TIME_PERIOD_ARRAYS:

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check if the data format is Dataset.FLAT_FILE and disable the entire macro cache if yes.

tests/core/test_simulations.py Show resolved Hide resolved
return None
is_cache_available = True
if self.is_over_dataset:
return is_cache_available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return is_cache_available
return True


variable = self.tax_benefit_system.get_variable(variable_name)
parameter_deps = variable.exhaustive_parameter_dependencies

if parameter_deps is None:
return None
return not is_cache_available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return not is_cache_available
return False

"""
if not self.is_over_dataset:
return None
is_cache_available = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is_cache_available = True

)

return cache_file_path
return not is_cache_available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return not is_cache_available
return False

return None
with h5py.File(cache_file_path, "w") as f:
f.create_dataset("values", data=value)
return is_cache_available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return is_cache_available
return True

@anth-volk anth-volk self-requested a review August 22, 2024 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatically clear macro caching on new country package versions
3 participants