FYI: Pandas and numpy warnings on final 3 tests of clean install of tax-microdata-benchmarking as of PR 134 #135

donboyd5 · 2024-07-11T11:10:45Z

FYI. On clean install of tax-microdata-benchmarking as of PR 134:

I get several pandas and numpy warnings. I am not sure if new code, or older code, is triggering these warnings, but I don't recall seeing them before:

FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead.
DeprecationWarning: the interpolation= argument to percentile was renamed to method=, which has additional options.
Users of the modes 'nearest', 'lower', 'higher', or 'midpoint' are encouraged to review the method they used. (Deprecated NumPy 1.22)

Examples below:

The text was updated successfully, but these errors were encountered:

martinholmer · 2024-07-17T00:12:26Z

@nikhilwoodruff, All the remaining warnings are in code you wrote.
What is your timeline for eliminating these warnings?

martinholmer · 2024-09-01T18:14:05Z

After the merge of PR #178, we have these warnings when activating the-usually-skipped test_create_file test:

============================ warnings summary ============================
tests/test_create_tmd_variables.py::test_create_file
.../site-packages/policyengine_core/enums/enum.py: 56:
FutureWarning: Series.__getitem__ treating keys as positions is deprecated.
In a future version, integer keys will always be treated as labels (consistent with
DataFrame behavior). To access a value by position, use ser.iloc[pos]
if isinstance(array[0], Enum):

tests/test_create_tmd_variables.py: 458 warnings
.../tmd/utils/reweight.py: 176:
PerformanceWarning: DataFrame is highly fragmented.
This is usually the result of calling frame.insert many times, which has poor performance.
Consider joining all columns at once using pd.concat(axis=1) instead.
To get a de-fragmented frame, use newframe = frame.copy()
loss_matrix[label] = mask * values

The first warning was reported in a policyengine-core issue some time ago, but it has not yet been fixed.

The second warning is from the tmd/utils/reweight.py module that uses a complex process of building the loss_matrix. This second warning is saying the complex code generates a "highly fragmented" data structure that has "poor performance".

Note that PR #180 follows the warning suggestion on how to defragment the loss_matrix, but the warnings are still generated.

All the other warning originally reported in this issue have been fixed by code improvements merged during August 2024.

donboyd5 · 2024-09-02T16:27:37Z

Thank you @martin Holmer ***@***.***>

…

On Sun, Sep 1, 2024 at 3:17 PM Martin Holmer ***@***.***> wrote: Closed #135 <#135> as completed via #180 <#180>. — Reply to this email directly, view it on GitHub <#135 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABR4JGABQY55DZR52ANRLEDZUNR4DAVCNFSM6AAAAABKWX6YJGVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGA4TKNBVGEZTANA> . You are receiving this because you authored the thread.Message ID: <PSLmodels/tax-microdata-benchmarking/issue/135/issue_event/14095451304@ github.com>

donboyd5 changed the title ~~FYI: Pandas and numpy warnings on final 3 tests of clean install of tax-microdata-benchmarking as of PR 130~~ FYI: Pandas and numpy warnings on final 3 tests of clean install of tax-microdata-benchmarking as of PR 134 Jul 11, 2024

martinholmer mentioned this issue Jul 11, 2024

Revise Imputation code to avoid numpy deprecation warning #136

Merged

martinholmer assigned nikhilwoodruff Jul 17, 2024

martinholmer added the code-health Code quality and best practice label Jul 17, 2024

martinholmer closed this as completed Sep 1, 2024

martinholmer reopened this Sep 1, 2024

martinholmer mentioned this issue Sep 1, 2024

Defragment loss_matrix in the reweight.py module #180

Merged

martinholmer closed this as completed in #180 Sep 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FYI: Pandas and numpy warnings on final 3 tests of clean install of tax-microdata-benchmarking as of PR 134 #135

FYI: Pandas and numpy warnings on final 3 tests of clean install of tax-microdata-benchmarking as of PR 134 #135

donboyd5 commented Jul 11, 2024 •

edited

Loading

martinholmer commented Jul 17, 2024

martinholmer commented Sep 1, 2024 •

edited

Loading

donboyd5 commented Sep 2, 2024 via email

FYI: Pandas and numpy warnings on final 3 tests of clean install of tax-microdata-benchmarking as of PR 134 #135

FYI: Pandas and numpy warnings on final 3 tests of clean install of tax-microdata-benchmarking as of PR 134 #135

Comments

donboyd5 commented Jul 11, 2024 • edited Loading

martinholmer commented Jul 17, 2024

martinholmer commented Sep 1, 2024 • edited Loading

donboyd5 commented Sep 2, 2024 via email

donboyd5 commented Jul 11, 2024 •

edited

Loading

martinholmer commented Sep 1, 2024 •

edited

Loading