Skip to content

Bug fix: renaming evaluations with custom metrics #5724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

brimoor
Copy link
Contributor

@brimoor brimoor commented Apr 13, 2025

Change log

  • Fixes a couple small bugs that would cause spurious warnings (not exceptions, fortunately) when calling rename_evaluation() on an evaluation run with custom metrics added to it

Tested by

  • Added unit tests to verify proper functionality of custom evaluation metrics in general

Summary by CodeRabbit

  • New Features

    • Enhanced the evaluation framework to allow users to control the inclusion of custom metric fields for tasks such as classification, detection, regression, and segmentation.
    • Introduced a new plugin interface for custom evaluation metrics along with a dedicated configuration file.
  • Tests

    • Expanded the testing suite with comprehensive tests to ensure proper integration of custom metrics across various evaluation types.

@brimoor brimoor added the bug Bug fixes label Apr 13, 2025
Copy link
Contributor

coderabbitai bot commented Apr 13, 2025

Walkthrough

This pull request modifies several evaluation modules by updating how custom metric fields are processed. The key changes include updating parameter passing in the get_custom_metric_fields method and introducing a new get_fields method with an extra include_custom_metrics parameter across evaluation classes. Additionally, the PR adds a new decorator in the tests, a custom evaluation metric implementation via a new plugin, a supporting YAML configuration file, and new test cases for regression, classification, detection, and segmentation custom metric evaluations.

Changes

File(s) Change Summary
fiftyone/utils/eval/base.py Updated get_custom_metric_fields to pass self.config and eval_key instead of self.config.eval_key; added new get_fields method.
fiftyone/utils/eval/classification.py, fiftyone/utils/eval/detection.py, fiftyone/utils/eval/regression.py, fiftyone/utils/eval/segmentation.py Updated get_fields signatures to include include_custom_metrics=True; modified rename methods to call get_fields with include_custom_metrics=False.
tests/unittests/decorators.py Added new decorator use_local_plugins to temporarily set plugins_dir and reformatted the docstring for drop_collection.
tests/unittests/evaluation_plugins/__init__.py Introduced a new custom evaluation metric class with methods config, resolve_input, compute, and get_fields, and added a register function.
tests/unittests/evaluation_plugins/fiftyone.yml Created a new YAML configuration file for evaluation tests defining package metadata and panel entries.
tests/unittests/evaluation_tests.py Added test methods for custom metric evaluations in regression, classification, detection, and segmentation, applying the use_local_plugins decorator.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Test/Caller
    participant Eval as EvaluationMethod
    Caller->>Eval: Call get_fields(samples, eval_key, include_custom_metrics)
    alt include_custom_metrics is True
        Eval->>Eval: Append custom metric fields
    else include_custom_metrics is False
        Eval->>Eval: Skip custom metric fields
    end
    Eval-->>Caller: Return fields list
Loading

Poem

I'm a bunny with code so bright,
Hopping through modules with pure delight.
Metrics and fields now dance in tune,
Plugins and tests join a lively festoon.
In a world of code, I leap and sing,
Celebrating changes with a joyful spring!
🐰💻

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 50a8c97 and b4f1ce4.

📒 Files selected for processing (9)
  • fiftyone/utils/eval/base.py (2 hunks)
  • fiftyone/utils/eval/classification.py (1 hunks)
  • fiftyone/utils/eval/detection.py (2 hunks)
  • fiftyone/utils/eval/regression.py (1 hunks)
  • fiftyone/utils/eval/segmentation.py (2 hunks)
  • tests/unittests/decorators.py (3 hunks)
  • tests/unittests/evaluation_plugins/__init__.py (1 hunks)
  • tests/unittests/evaluation_plugins/fiftyone.yml (1 hunks)
  • tests/unittests/evaluation_tests.py (9 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (6)
fiftyone/utils/eval/detection.py (4)
fiftyone/utils/eval/base.py (2)
  • get_fields (141-142)
  • get_custom_metric_fields (94-111)
fiftyone/utils/eval/classification.py (1)
  • get_fields (198-209)
fiftyone/utils/eval/regression.py (1)
  • get_fields (194-204)
fiftyone/utils/eval/segmentation.py (1)
  • get_fields (234-262)
fiftyone/utils/eval/regression.py (4)
fiftyone/utils/eval/detection.py (1)
  • get_fields (431-465)
fiftyone/utils/eval/base.py (2)
  • get_fields (141-142)
  • get_custom_metric_fields (94-111)
fiftyone/utils/eval/classification.py (1)
  • get_fields (198-209)
fiftyone/utils/eval/segmentation.py (1)
  • get_fields (234-262)
tests/unittests/evaluation_plugins/__init__.py (3)
fiftyone/operators/evaluation_metric.py (1)
  • EvaluationMetric (45-135)
fiftyone/operators/types.py (2)
  • Object (30-691)
  • Property (694-750)
fiftyone/core/dataset.py (1)
  • add_sample_field (1451-1513)
fiftyone/utils/eval/classification.py (4)
fiftyone/utils/eval/detection.py (1)
  • get_fields (431-465)
fiftyone/utils/eval/base.py (2)
  • get_fields (141-142)
  • get_custom_metric_fields (94-111)
fiftyone/utils/eval/regression.py (1)
  • get_fields (194-204)
fiftyone/utils/eval/segmentation.py (1)
  • get_fields (234-262)
fiftyone/utils/eval/segmentation.py (4)
fiftyone/utils/eval/detection.py (1)
  • get_fields (431-465)
fiftyone/utils/eval/base.py (2)
  • get_fields (141-142)
  • get_custom_metric_fields (94-111)
fiftyone/utils/eval/classification.py (1)
  • get_fields (198-209)
fiftyone/utils/eval/regression.py (1)
  • get_fields (194-204)
tests/unittests/evaluation_tests.py (4)
tests/unittests/decorators.py (2)
  • drop_datasets (18-28)
  • use_local_plugins (84-108)
fiftyone/utils/eval/regression.py (2)
  • evaluate_regressions (36-118)
  • metrics (438-496)
fiftyone/utils/eval/classification.py (1)
  • evaluate_classifications (33-128)
fiftyone/utils/eval/detection.py (1)
  • evaluate_detections (33-236)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-windows / test-python (windows-latest, 3.12)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
  • GitHub Check: test-windows / test-python (windows-latest, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test-windows / test-python (windows-latest, 3.10)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test-windows / test-python (windows-latest, 3.9)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: e2e / test-e2e
  • GitHub Check: build / build
  • GitHub Check: lint / eslint
  • GitHub Check: test / test-app
  • GitHub Check: build
🔇 Additional comments (28)
tests/unittests/evaluation_plugins/fiftyone.yml (1)

1-7: Plugin configuration looks good.

This configuration file properly defines a test plugin package for custom evaluation metrics, with all the required fields:

  • Name with appropriate namespace
  • Description
  • Version
  • FiftyOne compatibility wildcard
  • Panel configuration
tests/unittests/decorators.py (3)

9-12: Good import additions for the new decorator.

The added imports are appropriate for the new functionality being introduced.


49-52: Docstring formatting improvement looks good.

The reformatted docstring maintains the same information but has improved readability with consistent line breaks.


84-108: Well-implemented plugin directory decorator.

This decorator facilitates testing with local plugins by temporarily setting the plugins directory. The implementation follows good practices:

  • Clear docstring explaining the purpose and default behavior
  • Supports both direct and parameterized decorator usage
  • Uses context manager to ensure proper cleanup
  • Defaults to the unittest directory when no directory is specified
fiftyone/utils/eval/regression.py (2)

194-204: Good fix for custom metrics field handling.

The updated get_fields method now includes a parameter to control whether custom metric fields should be included in the returned fields. This will help prevent issues when renaming evaluations with custom metrics.


209-214: Good fix for renaming custom evaluation metrics.

The rename method now explicitly excludes custom metrics when retrieving fields for renaming. This prevents unnecessary warnings when renaming evaluation runs that include custom metrics.

fiftyone/utils/eval/detection.py (2)

431-465: Good implementation of conditional custom metrics inclusion.

The get_fields method has been updated to include the include_custom_metrics parameter with a default value of True, maintaining backward compatibility while adding the ability to exclude custom metrics when needed. This change is consistent with the similar modifications in other evaluation classes.


470-475: Proper fix for renaming evaluation runs with custom metrics.

The rename method now explicitly excludes custom metrics when retrieving fields for renaming, ensuring that only the standard evaluation fields are renamed. This should fix the warnings when renaming evaluation runs with custom metrics.

fiftyone/utils/eval/classification.py (3)

198-198: Method signature updated to support custom metrics filtering.

The get_fields method now includes an include_custom_metrics parameter that defaults to True, providing more control over whether custom metric fields should be included in the returned fields list.


206-207: Added conditional inclusion of custom metric fields.

This change ensures that custom metric fields are only included in the returned field list when include_custom_metrics is True, improving flexibility when retrieving fields for different operations.


214-219: Updated rename method to exclude custom metrics during renaming.

The rename method now explicitly passes include_custom_metrics=False when retrieving fields, which prevents unnecessary warnings when renaming evaluation runs that include custom metrics. This addresses the core bug described in the PR objectives.

fiftyone/utils/eval/base.py (2)

102-102: Fixed parameter passing in get_custom_metric_fields.

Updated the call to operator.get_fields() to pass self.config and eval_key as separate arguments instead of using self.config.eval_key, aligning with changes to how custom metrics are handled throughout the codebase.


141-142: Added base implementation of get_fields method.

Added a new method get_fields to the BaseEvaluationMethod class that serves as a placeholder for subclasses to override. This method establishes the contract that subclasses should follow when implementing field retrieval logic.

fiftyone/utils/eval/segmentation.py (3)

234-234: Method signature updated to support custom metrics filtering.

The get_fields method now includes an include_custom_metrics parameter that defaults to True, providing more control over whether custom metric fields should be included in the returned fields list. This is consistent with similar changes in other evaluation classes.


259-260: Added conditional inclusion of custom metric fields.

This change ensures that custom metric fields are only included in the returned field list when include_custom_metrics is True, improving flexibility when retrieving fields for different operations.


267-272: Updated rename method to exclude custom metrics during renaming.

The rename method now explicitly passes include_custom_metrics=False when retrieving fields, preventing unnecessary warnings when renaming evaluation runs that include custom metrics. This addresses the core bug described in the PR objectives and is consistent with similar changes in other evaluation classes.

tests/unittests/evaluation_plugins/__init__.py (7)

1-7: LGTM! Added appropriate file header and documentation.

The file header includes necessary copyright information and properly describes the purpose of the file.


8-11: LGTM! Imported necessary modules.

The imports include the essential modules required for defining a custom evaluation metric.


13-22: LGTM! Well-defined custom evaluation metric class with proper configuration.

The CustomEvaluationMetric class extends the base EvaluationMetric class and properly implements the config property to return essential metadata for the metric. The unlisted=True parameter ensures this test-only metric doesn't appear in the UI.


23-33: LGTM! Implemented input resolution with appropriate defaults.

The resolve_input method defines an input parameter with a descriptive label, clear description, and sensible default value. This correctly implements the interface required for custom metrics.


34-42: LGTM! Proper implementation of compute method.

The compute method successfully:

  1. Adds a new field to the dataset with appropriate naming
  2. Sets the field value based on the input parameter
  3. Saves the changes to persist them
  4. Returns the value to be stored as an aggregate metric

This approach allows for testing both field-level and aggregate metric storage.


43-45: LGTM! Correct implementation of get_fields method.

The get_fields method properly returns a list containing the field name that's created in the compute method. This ensures that the field is correctly identified during renaming and cleanup operations.


47-49: LGTM! Added registration function.

The register function properly registers the CustomEvaluationMetric class with the plugin system, making it available for test cases.

tests/unittests/evaluation_tests.py (5)

28-28: Added import for use_local_plugins decorator

The decorator is now imported to support the new test methods that test custom evaluation metrics from plugins.


190-246: Well-structured test for custom regression metrics

This test comprehensively validates the behavior of custom metrics in regression evaluations by:

  1. Verifying the custom metric is properly stored in results
  2. Testing metric overwrites
  3. Ensuring metric fields are correctly renamed when evaluations are renamed
  4. Checking that metric fields are properly deleted

The test properly uses the use_local_plugins decorator to temporarily set the plugins directory for accessing the custom evaluation metric plugin.


742-796: Comprehensive test for custom classification metrics

Similar to the regression test, this test validates all aspects of custom metrics handling in classification evaluations, including proper storing, overwriting, renaming, and deletion of custom metric fields. The test structure ensures complete coverage of the functionality being tested.


1981-2036: Well-implemented test for custom detection metrics

This test follows the established pattern for validating custom metrics behavior in detection evaluations. It thoroughly tests the entire lifecycle of custom metrics including creation, modification, renaming, and deletion, ensuring that the fixes for custom evaluation metrics work correctly.


3192-3247: Thorough test for custom segmentation metrics

This test completes the suite of custom metrics tests by validating the same functionality for segmentation evaluations. The consistent test structure across all evaluation types ensures comprehensive coverage of the bug fix described in the PR objectives.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@brimoor brimoor changed the title Bug fix: custom evaluation metrics renaming Bug fix: renaming evaluations with custom metrics Apr 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant