Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas Deprecation removal #342

Draft
wants to merge 10 commits into
base: dev
Choose a base branch
from
Draft

Pandas Deprecation removal #342

wants to merge 10 commits into from

Conversation

rhanka
Copy link
Member

@rhanka rhanka commented Jul 26, 2024

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced data handling in the API methods for improved performance and reliability.
    • Improved handling of missing values in data processing functions.
  • New Features

    • Streamlined DataFrame transformations for clearer and more efficient data processing.
  • Refactor

    • Replaced inefficient methods with more appropriate techniques for data manipulation (e.g., using apply instead of applymap).
    • Simplified logic across multiple methods for better readability and maintainability.

Copy link

coderabbitai bot commented Jul 26, 2024

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The recent changes streamline data processing across several components by replacing inefficient methods with more efficient alternatives. Key modifications involve switching from applymap to apply and map, enhancing DataFrame manipulation, and refining numerical data handling to improve clarity and performance. A static identifier replaces dynamic generation in the Makefile, simplifying the build process. Overall, these updates enhance code maintainability and readability.

Changes

Files Change Summary
Makefile Changed id variable assignment from a dynamic base64 string generation to a static string myid, simplifying the build process.
code/api.py Replaced applymap with apply and map for more efficient DataFrame column handling in post and put methods, improving performance and clarity in data transformations.
code/recipes.py Enhanced data processing in prepare_categorical and prepare_numerical methods by updating missing value handling and using np.where instead of applymap, improving robustness and performance.

Poem

In the fields of code so bright,
Changes hop with pure delight,
Data dances, swift and free,
Processing flows like a gentle breeze.
“My ID’s set!” the rabbit grins,
As clarity wins, and performance spins! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

@rhanka rhanka marked this pull request as draft July 26, 2024 22:02
@rhanka rhanka changed the title Fix/apply map removal Pandas Deprecation removal Jul 26, 2024
@rhanka rhanka force-pushed the fix/apply-map-removal branch from f02807f to 671ea4d Compare July 26, 2024 22:03
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 03a4648 and f02807f.

Files selected for processing (3)
  • Makefile (1 hunks)
  • code/api.py (3 hunks)
  • code/recipes.py (10 hunks)
Additional context used
Ruff
code/api.py

613-613: unicode_safe may be undefined, or defined from star imports

(F405)


996-996: unicode_safe may be undefined, or defined from star imports

(F405)

code/recipes.py

1949-1949: ngrams may be undefined, or defined from star imports

(F405)


1949-1949: tokenize may be undefined, or defined from star imports

(F405)


1949-1949: normalize may be undefined, or defined from star imports

(F405)


2435-2435: replace_regex may be undefined, or defined from star imports

(F405)

Additional comments not posted (16)
Makefile (1)

111-111: Switch from dynamic to static identifier assignment.

The change from id := $(shell openssl rand -base64 8) to id := myid replaces a dynamically generated identifier with a static one. This could affect processes that rely on unique identifiers for each build.

Consider the implications of this change on versioning, tagging, or any other process that relies on a unique id. If a unique identifier is required, reverting to the dynamic generation might be necessary.

code/api.py (3)

960-960: Efficient DataFrame manipulation.

The use of apply with astype instead of applymap improves efficiency by applying the transformation at the column level rather than element-wise. Ensure unicode_safe is defined.

Verification successful

Efficient DataFrame manipulation.

The use of apply with astype instead of applymap improves efficiency by applying the transformation at the column level rather than element-wise. The unicode_safe function is defined and handles type conversions appropriately.

  • unicode_safe is defined in code/tools.py.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the definition of `unicode_safe`.

# Test: Search for the definition of `unicode_safe`. Expect: Definition found.
rg --type python -A 5 $'def unicode_safe'

Length of output: 73


Script:

#!/bin/bash
# Description: Verify the definition of `unicode_safe`.

# Test: Search for the definition of `unicode_safe`. Expect: Definition found.
rg --type py -A 5 $'def unicode_safe'

Length of output: 249


996-996: Efficient DataFrame manipulation.

The use of apply with map instead of applymap improves efficiency by applying the transformation at the column level rather than element-wise. Ensure unicode_safe is defined.

Verification successful

Efficient DataFrame manipulation.

The unicode_safe function is defined in code/tools.py and handles various types appropriately. The use of apply with map is indeed efficient for column-level transformations.

  • code/tools.py, lines 1-5
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the definition of `unicode_safe`.

# Test: Search for the definition of `unicode_safe`. Expect: Definition found.
rg --type python -A 5 $'def unicode_safe'

Length of output: 73


Script:

#!/bin/bash
# Description: Verify the definition of `unicode_safe`.

# Test: Search for the definition of `unicode_safe`. Expect: Definition found.
rg --glob '*.py' -A 5 $'def unicode_safe'

Length of output: 253

Tools
Ruff

996-996: unicode_safe may be undefined, or defined from star imports

(F405)


613-613: Efficient DataFrame manipulation.

The use of apply with map instead of applymap improves efficiency by applying the transformation at the column level rather than element-wise. Ensure unicode_safe is defined.

Verification successful

Efficient DataFrame manipulation.

The use of apply with map instead of applymap improves efficiency by applying the transformation at the column level rather than element-wise. The unicode_safe function is correctly defined in code/tools.py.

  • code/tools.py:
    def unicode_safe(x):
        if (type(x) == float) | (type(x) == int):
            return x
        try:
            return str(x)
        except:
            pass
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the definition of `unicode_safe`.

# Test: Search for the definition of `unicode_safe`. Expect: Definition found.
rg --type python -A 5 $'def unicode_safe'

Length of output: 73


Script:

#!/bin/bash
# Description: Verify the definition of `unicode_safe`.

# Test: Search for the definition of `unicode_safe`. Expect: Definition found.
rg --type py -A 5 'def unicode_safe'

Length of output: 249

Tools
Ruff

613-613: unicode_safe may be undefined, or defined from star imports

(F405)

code/recipes.py (12)

1591-1592: LGTM! Enhanced handling of missing and non-numeric values.

The changes improve the robustness of numerical data processing by filling missing values with "0" and converting non-numeric values to NaN before filling them with zero.


1758-1758: LGTM! Efficient target variable transformation.

The use of np.where for converting boolean values to integers is more efficient and concise, enhancing performance and readability.


1893-1894: LGTM! Improved column-wise conversion to integers.

The use of apply for converting columns to integers is more appropriate and ensures NaN values are preserved, improving data integrity.


1906-1906: LGTM! Proper handling of list-to-tuple conversions.

The use of apply for converting lists to tuples is more appropriate for column-wise operations.


1918-1918: LGTM! Proper handling of tuple-to-list conversions.

The use of apply for converting tuples to lists is more appropriate for column-wise operations.


1929-1933: LGTM! Improved column-wise conversion to floats.

The use of apply and pd.to_numeric with error coercion ensures non-convertible values are handled gracefully by setting them to na_value.


1944-1945: LGTM! Simplified retrieval of the n value.

The use of get with a default value simplifies the retrieval of the n value, reducing conditional checks and improving clarity.


1947-1950: LGTM! Improved n-gram generation.

The use of nested apply calls for n-gram generation is more appropriate for column-wise operations and improves readability and performance.

Tools
Ruff

1949-1949: ngrams may be undefined, or defined from star imports

(F405)


1949-1949: tokenize may be undefined, or defined from star imports

(F405)


1949-1949: normalize may be undefined, or defined from star imports

(F405)


1949-1949: Verify definitions or imports of ngrams, tokenize, and normalize.

The static analysis tool flagged these names as potentially undefined or imported from star imports. Ensure they are correctly defined or imported.

Tools
Ruff

1949-1949: ngrams may be undefined, or defined from star imports

(F405)


1949-1949: tokenize may be undefined, or defined from star imports

(F405)


1949-1949: normalize may be undefined, or defined from star imports

(F405)


2246-2246: LGTM! Efficient handling of missing values.

The use of fillna for handling missing values is more efficient for DataFrame-wide operations and improves performance.


2435-2435: LGTM! Proper handling of regex replacements.

The use of apply for regex replacements is more appropriate for column-wise operations.

Tools
Ruff

2435-2435: replace_regex may be undefined, or defined from star imports

(F405)


2435-2435: Verify definition or import of replace_regex.

The static analysis tool flagged this name as potentially undefined or imported from star imports. Ensure it is correctly defined or imported.

Tools
Ruff

2435-2435: replace_regex may be undefined, or defined from star imports

(F405)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant