Pandas Deprecation removal #342

rhanka · 2024-07-26T22:01:21Z

Summary by CodeRabbit

Bug Fixes
- Enhanced data handling in the API methods for improved performance and reliability.
- Improved handling of missing values in data processing functions.
New Features
- Streamlined DataFrame transformations for clearer and more efficient data processing.
Refactor
- Replaced inefficient methods with more appropriate techniques for data manipulation (e.g., using apply instead of applymap).
- Simplified logic across multiple methods for better readability and maintainability.

coderabbitai · 2024-07-26T22:01:33Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The recent changes streamline data processing across several components by replacing inefficient methods with more efficient alternatives. Key modifications involve switching from applymap to apply and map, enhancing DataFrame manipulation, and refining numerical data handling to improve clarity and performance. A static identifier replaces dynamic generation in the Makefile, simplifying the build process. Overall, these updates enhance code maintainability and readability.

Changes

Files	Change Summary
Makefile	Changed `id` variable assignment from a dynamic base64 string generation to a static string `myid`, simplifying the build process.
code/api.py	Replaced `applymap` with `apply` and `map` for more efficient DataFrame column handling in `post` and `put` methods, improving performance and clarity in data transformations.
code/recipes.py	Enhanced data processing in `prepare_categorical` and `prepare_numerical` methods by updating missing value handling and using `np.where` instead of `applymap`, improving robustness and performance.

Poem

In the fields of code so bright,
Changes hop with pure delight,
Data dances, swift and free,
Processing flows like a gentle breeze.
“My ID’s set!” the rabbit grins,
As clarity wins, and performance spins! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

…nd optim

coderabbitai

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 03a4648 and f02807f.

Files selected for processing (3)

Makefile (1 hunks)
code/api.py (3 hunks)
code/recipes.py (10 hunks)

Additional context used

Ruff

code/api.py

613-613: unicode_safe may be undefined, or defined from star imports

(F405)

996-996: unicode_safe may be undefined, or defined from star imports

(F405)

code/recipes.py

1949-1949: ngrams may be undefined, or defined from star imports

(F405)

1949-1949: tokenize may be undefined, or defined from star imports

(F405)

1949-1949: normalize may be undefined, or defined from star imports

(F405)

2435-2435: replace_regex may be undefined, or defined from star imports

(F405)

Additional comments not posted (16)

Makefile (1)

111-111: Switch from dynamic to static identifier assignment.

The change from id := $(shell openssl rand -base64 8) to id := myid replaces a dynamically generated identifier with a static one. This could affect processes that rely on unique identifiers for each build.

Consider the implications of this change on versioning, tagging, or any other process that relies on a unique id. If a unique identifier is required, reverting to the dynamic generation might be necessary.
code/api.py (3)
960-960: Efficient DataFrame manipulation.

The use of apply with astype instead of applymap improves efficiency by applying the transformation at the column level rather than element-wise. Ensure unicode_safe is defined.

Verification successful

Efficient DataFrame manipulation.

The use of apply with astype instead of applymap improves efficiency by applying the transformation at the column level rather than element-wise. The unicode_safe function is defined and handles type conversions appropriately.

unicode_safe is defined in code/tools.py.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the definition of `unicode_safe`.

# Test: Search for the definition of `unicode_safe`. Expect: Definition found.
rg --type python -A 5 $'def unicode_safe'
Length of output: 73

Script:
#!/bin/bash
# Description: Verify the definition of `unicode_safe`.

# Test: Search for the definition of `unicode_safe`. Expect: Definition found.
rg --type py -A 5 $'def unicode_safe'
Length of output: 249
996-996: Efficient DataFrame manipulation.

The use of apply with map instead of applymap improves efficiency by applying the transformation at the column level rather than element-wise. Ensure unicode_safe is defined.

Verification successful

Efficient DataFrame manipulation.

The unicode_safe function is defined in code/tools.py and handles various types appropriately. The use of apply with map is indeed efficient for column-level transformations.

code/tools.py, lines 1-5
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the definition of `unicode_safe`.

# Test: Search for the definition of `unicode_safe`. Expect: Definition found.
rg --type python -A 5 $'def unicode_safe'
Length of output: 73

Script:
#!/bin/bash
# Description: Verify the definition of `unicode_safe`.

# Test: Search for the definition of `unicode_safe`. Expect: Definition found.
rg --glob '*.py' -A 5 $'def unicode_safe'
Length of output: 253
Tools

Ruff

996-996: unicode_safe may be undefined, or defined from star imports

(F405)

613-613: Efficient DataFrame manipulation.

The use of apply with map instead of applymap improves efficiency by applying the transformation at the column level rather than element-wise. Ensure unicode_safe is defined.
Verification successful

Efficient DataFrame manipulation.

The use of apply with map instead of applymap improves efficiency by applying the transformation at the column level rather than element-wise. The unicode_safe function is correctly defined in code/tools.py.
code/tools.py:
def unicode_safe(x):
    if (type(x) == float) | (type(x) == int):
        return x
    try:
        return str(x)
    except:
        pass
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the definition of `unicode_safe`.

# Test: Search for the definition of `unicode_safe`. Expect: Definition found.
rg --type python -A 5 $'def unicode_safe'
Length of output: 73

Script:
#!/bin/bash
# Description: Verify the definition of `unicode_safe`.

# Test: Search for the definition of `unicode_safe`. Expect: Definition found.
rg --type py -A 5 'def unicode_safe'
Length of output: 249
Tools

Ruff

613-613: unicode_safe may be undefined, or defined from star imports

(F405)
code/recipes.py (12)

1591-1592: LGTM! Enhanced handling of missing and non-numeric values.

The changes improve the robustness of numerical data processing by filling missing values with "0" and converting non-numeric values to NaN before filling them with zero.

1758-1758: LGTM! Efficient target variable transformation.

The use of np.where for converting boolean values to integers is more efficient and concise, enhancing performance and readability.

1893-1894: LGTM! Improved column-wise conversion to integers.

The use of apply for converting columns to integers is more appropriate and ensures NaN values are preserved, improving data integrity.

1906-1906: LGTM! Proper handling of list-to-tuple conversions.

The use of apply for converting lists to tuples is more appropriate for column-wise operations.

1918-1918: LGTM! Proper handling of tuple-to-list conversions.

The use of apply for converting tuples to lists is more appropriate for column-wise operations.

1929-1933: LGTM! Improved column-wise conversion to floats.

The use of apply and pd.to_numeric with error coercion ensures non-convertible values are handled gracefully by setting them to na_value.

1944-1945: LGTM! Simplified retrieval of the n value.

The use of get with a default value simplifies the retrieval of the n value, reducing conditional checks and improving clarity.

1947-1950: LGTM! Improved n-gram generation.

The use of nested apply calls for n-gram generation is more appropriate for column-wise operations and improves readability and performance.

Tools

Ruff

1949-1949: ngrams may be undefined, or defined from star imports

(F405)

1949-1949: tokenize may be undefined, or defined from star imports

(F405)

1949-1949: normalize may be undefined, or defined from star imports

(F405)

1949-1949: Verify definitions or imports of ngrams, tokenize, and normalize.

The static analysis tool flagged these names as potentially undefined or imported from star imports. Ensure they are correctly defined or imported.

Tools

Ruff

1949-1949: ngrams may be undefined, or defined from star imports

(F405)

1949-1949: tokenize may be undefined, or defined from star imports

(F405)

1949-1949: normalize may be undefined, or defined from star imports

(F405)

2246-2246: LGTM! Efficient handling of missing values.

The use of fillna for handling missing values is more efficient for DataFrame-wide operations and improves performance.

2435-2435: LGTM! Proper handling of regex replacements.

The use of apply for regex replacements is more appropriate for column-wise operations.

Tools

Ruff

2435-2435: replace_regex may be undefined, or defined from star imports

(F405)

2435-2435: Verify definition or import of replace_regex.

The static analysis tool flagged this name as potentially undefined or imported from star imports. Ensure it is correctly defined or imported.

Tools

Ruff

2435-2435: replace_regex may be undefined, or defined from star imports

(F405)

rhanka marked this pull request as draft July 26, 2024 22:02

rhanka changed the title ~~Fix/apply map removal~~ Pandas Deprecation removal Jul 26, 2024

rhanka added 10 commits July 26, 2024 18:03

prepare_numerical: remove applymap (obsolete) and optimize function

4ad3ee0

build_model: applymap removal & optim

faa083d

internal_to_integer: applymap removal and optim

33ad4fd

internal_list_to_tuple and internal_tuple_to_list: applymap removal a…

f196581

…nd optim

internal_to_float: applymap removal and optim

cfa077c

internal_ngram: applymap removal and optim

59588c6

internal_join: applymap removal and optim

7502849

internal_parsedate: lint

f93df24

internal_replace: applymap removal

63a1216

api.py: applymap removal

671ea4d

rhanka force-pushed the fix/apply-map-removal branch from f02807f to 671ea4d Compare July 26, 2024 22:03

coderabbitai bot reviewed Jul 26, 2024

View reviewed changes

rhanka mentioned this pull request Jul 26, 2024

Upgrade to Pandas 2.0 (fix breaking changes like timedelta64) and handle future deprecations #343

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pandas Deprecation removal #342

Pandas Deprecation removal #342

Uh oh!

rhanka commented Jul 26, 2024 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 26, 2024 •

edited

Loading

Review skipped

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Pandas Deprecation removal #342

Are you sure you want to change the base?

Pandas Deprecation removal #342

Uh oh!

Conversation

rhanka commented Jul 26, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rhanka commented Jul 26, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 26, 2024 •

edited

Loading