Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FYI: Pandas and numpy warnings on final 3 tests of clean install of tax-microdata-benchmarking as of PR 134 #135

Closed
donboyd5 opened this issue Jul 11, 2024 · 3 comments · Fixed by #180
Assignees
Labels
code-health Code quality and best practice

Comments

@donboyd5
Copy link
Collaborator

donboyd5 commented Jul 11, 2024

FYI. On clean install of tax-microdata-benchmarking as of PR 134:

image

I get several pandas and numpy warnings. I am not sure if new code, or older code, is triggering these warnings, but I don't recall seeing them before:

  • FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
  • SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
  • PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead.
  • DeprecationWarning: the interpolation= argument to percentile was renamed to method=, which has additional options.
    Users of the modes 'nearest', 'lower', 'higher', or 'midpoint' are encouraged to review the method they used. (Deprecated NumPy 1.22)

Examples below:

image

image

image

image

@donboyd5 donboyd5 changed the title FYI: Pandas and numpy warnings on final 3 tests of clean install of tax-microdata-benchmarking as of PR 130 FYI: Pandas and numpy warnings on final 3 tests of clean install of tax-microdata-benchmarking as of PR 134 Jul 11, 2024
@martinholmer
Copy link
Collaborator

@nikhilwoodruff, All the remaining warnings are in code you wrote.
What is your timeline for eliminating these warnings?

@martinholmer martinholmer added the code-health Code quality and best practice label Jul 17, 2024
@martinholmer
Copy link
Collaborator

martinholmer commented Sep 1, 2024

After the merge of PR #178, we have these warnings when activating the-usually-skipped test_create_file test:

============================ warnings summary ============================
tests/test_create_tmd_variables.py::test_create_file
.../site-packages/policyengine_core/enums/enum.py: 56:
FutureWarning: Series.__getitem__ treating keys as positions is deprecated.
In a future version, integer keys will always be treated as labels (consistent with
DataFrame behavior). To access a value by position, use ser.iloc[pos]
if isinstance(array[0], Enum):

tests/test_create_tmd_variables.py: 458 warnings
.../tmd/utils/reweight.py: 176:
PerformanceWarning: DataFrame is highly fragmented.
This is usually the result of calling frame.insert many times, which has poor performance.
Consider joining all columns at once using pd.concat(axis=1) instead.
To get a de-fragmented frame, use newframe = frame.copy()
loss_matrix[label] = mask * values

The first warning was reported in a policyengine-core issue some time ago, but it has not yet been fixed.

The second warning is from the tmd/utils/reweight.py module that uses a complex process of building the loss_matrix. This second warning is saying the complex code generates a "highly fragmented" data structure that has "poor performance".

Note that PR #180 follows the warning suggestion on how to defragment the loss_matrix, but the warnings are still generated.

All the other warning originally reported in this issue have been fixed by code improvements merged during August 2024.

@donboyd5
Copy link
Collaborator Author

donboyd5 commented Sep 2, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code-health Code quality and best practice
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants