Bug fix: renaming evaluations with custom metrics #5724

brimoor · 2025-04-13T19:02:59Z

Change log

Fixes a couple small bugs that would cause spurious warnings (not exceptions, fortunately) when calling rename_evaluation() on an evaluation run with custom metrics added to it

Tested by

Added unit tests to verify proper functionality of custom evaluation metrics in general

Summary by CodeRabbit

New Features
- Enhanced the evaluation framework to allow users to control the inclusion of custom metric fields for tasks such as classification, detection, regression, and segmentation.
- Introduced a new plugin interface for custom evaluation metrics along with a dedicated configuration file.
Tests
- Expanded the testing suite with comprehensive tests to ensure proper integration of custom metrics across various evaluation types.

coderabbitai · 2025-04-13T19:03:05Z

Walkthrough

This pull request modifies several evaluation modules by updating how custom metric fields are processed. The key changes include updating parameter passing in the get_custom_metric_fields method and introducing a new get_fields method with an extra include_custom_metrics parameter across evaluation classes. Additionally, the PR adds a new decorator in the tests, a custom evaluation metric implementation via a new plugin, a supporting YAML configuration file, and new test cases for regression, classification, detection, and segmentation custom metric evaluations.

Changes

File(s)	Change Summary
`fiftyone/utils/eval/base.py`	Updated `get_custom_metric_fields` to pass `self.config` and `eval_key` instead of `self.config.eval_key`; added new `get_fields` method.
`fiftyone/utils/eval/classification.py`, `fiftyone/utils/eval/detection.py`, `fiftyone/utils/eval/regression.py`, `fiftyone/utils/eval/segmentation.py`	Updated `get_fields` signatures to include `include_custom_metrics=True`; modified `rename` methods to call `get_fields` with `include_custom_metrics=False`.
`tests/unittests/decorators.py`	Added new decorator `use_local_plugins` to temporarily set `plugins_dir` and reformatted the docstring for `drop_collection`.
`tests/unittests/evaluation_plugins/__init__.py`	Introduced a new custom evaluation metric class with methods `config`, `resolve_input`, `compute`, and `get_fields`, and added a `register` function.
`tests/unittests/evaluation_plugins/fiftyone.yml`	Created a new YAML configuration file for evaluation tests defining package metadata and panel entries.
`tests/unittests/evaluation_tests.py`	Added test methods for custom metric evaluations in regression, classification, detection, and segmentation, applying the `use_local_plugins` decorator.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Test/Caller
    participant Eval as EvaluationMethod
    Caller->>Eval: Call get_fields(samples, eval_key, include_custom_metrics)
    alt include_custom_metrics is True
        Eval->>Eval: Append custom metric fields
    else include_custom_metrics is False
        Eval->>Eval: Skip custom metric fields
    end
    Eval-->>Caller: Return fields list

Poem

I'm a bunny with code so bright,
Hopping through modules with pure delight.
Metrics and fields now dance in tune,
Plugins and tests join a lively festoon.
In a world of code, I leap and sing,
Celebrating changes with a joyful spring!
🐰💻

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)

We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 50a8c97 and b4f1ce4.

📒 Files selected for processing (9)

fiftyone/utils/eval/base.py (2 hunks)
fiftyone/utils/eval/classification.py (1 hunks)
fiftyone/utils/eval/detection.py (2 hunks)
fiftyone/utils/eval/regression.py (1 hunks)
fiftyone/utils/eval/segmentation.py (2 hunks)
tests/unittests/decorators.py (3 hunks)
tests/unittests/evaluation_plugins/__init__.py (1 hunks)
tests/unittests/evaluation_plugins/fiftyone.yml (1 hunks)
tests/unittests/evaluation_tests.py (9 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (6)

fiftyone/utils/eval/detection.py (4)

fiftyone/utils/eval/base.py (2)

get_fields (141-142)

get_custom_metric_fields (94-111)

fiftyone/utils/eval/classification.py (1)

get_fields (198-209)

fiftyone/utils/eval/regression.py (1)

get_fields (194-204)

fiftyone/utils/eval/segmentation.py (1)

get_fields (234-262)

fiftyone/utils/eval/regression.py (4)

fiftyone/utils/eval/detection.py (1)

get_fields (431-465)

fiftyone/utils/eval/base.py (2)

get_fields (141-142)

get_custom_metric_fields (94-111)

fiftyone/utils/eval/classification.py (1)

get_fields (198-209)

fiftyone/utils/eval/segmentation.py (1)

get_fields (234-262)

tests/unittests/evaluation_plugins/__init__.py (3)

fiftyone/operators/evaluation_metric.py (1)

EvaluationMetric (45-135)

fiftyone/operators/types.py (2)

Object (30-691)

Property (694-750)

fiftyone/core/dataset.py (1)

add_sample_field (1451-1513)

fiftyone/utils/eval/classification.py (4)

fiftyone/utils/eval/detection.py (1)

get_fields (431-465)

fiftyone/utils/eval/base.py (2)

get_fields (141-142)

get_custom_metric_fields (94-111)

fiftyone/utils/eval/regression.py (1)

get_fields (194-204)

fiftyone/utils/eval/segmentation.py (1)

get_fields (234-262)

fiftyone/utils/eval/segmentation.py (4)

fiftyone/utils/eval/detection.py (1)

get_fields (431-465)

fiftyone/utils/eval/base.py (2)

get_fields (141-142)

get_custom_metric_fields (94-111)

fiftyone/utils/eval/classification.py (1)

get_fields (198-209)

fiftyone/utils/eval/regression.py (1)

get_fields (194-204)

tests/unittests/evaluation_tests.py (4)

tests/unittests/decorators.py (2)

drop_datasets (18-28)

use_local_plugins (84-108)

fiftyone/utils/eval/regression.py (2)

evaluate_regressions (36-118)

metrics (438-496)

fiftyone/utils/eval/classification.py (1)

evaluate_classifications (33-128)

fiftyone/utils/eval/detection.py (1)

evaluate_detections (33-236)

⏰ Context from checks skipped due to timeout of 90000ms (13)

GitHub Check: test-windows / test-python (windows-latest, 3.12)
GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
GitHub Check: test-windows / test-python (windows-latest, 3.11)
GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
GitHub Check: test-windows / test-python (windows-latest, 3.10)
GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
GitHub Check: test-windows / test-python (windows-latest, 3.9)
GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
GitHub Check: e2e / test-e2e
GitHub Check: build / build
GitHub Check: lint / eslint
GitHub Check: test / test-app
GitHub Check: build

🔇 Additional comments (28)

tests/unittests/evaluation_plugins/fiftyone.yml (1)

1-7: Plugin configuration looks good.

This configuration file properly defines a test plugin package for custom evaluation metrics, with all the required fields:

Name with appropriate namespace

Description

Version

FiftyOne compatibility wildcard

Panel configuration

tests/unittests/decorators.py (3)

9-12: Good import additions for the new decorator.

The added imports are appropriate for the new functionality being introduced.

49-52: Docstring formatting improvement looks good.

The reformatted docstring maintains the same information but has improved readability with consistent line breaks.

84-108: Well-implemented plugin directory decorator.

This decorator facilitates testing with local plugins by temporarily setting the plugins directory. The implementation follows good practices:

Clear docstring explaining the purpose and default behavior

Supports both direct and parameterized decorator usage

Uses context manager to ensure proper cleanup

Defaults to the unittest directory when no directory is specified

fiftyone/utils/eval/regression.py (2)

194-204: Good fix for custom metrics field handling.

The updated get_fields method now includes a parameter to control whether custom metric fields should be included in the returned fields. This will help prevent issues when renaming evaluations with custom metrics.

209-214: Good fix for renaming custom evaluation metrics.

The rename method now explicitly excludes custom metrics when retrieving fields for renaming. This prevents unnecessary warnings when renaming evaluation runs that include custom metrics.

fiftyone/utils/eval/detection.py (2)

431-465: Good implementation of conditional custom metrics inclusion.

The get_fields method has been updated to include the include_custom_metrics parameter with a default value of True, maintaining backward compatibility while adding the ability to exclude custom metrics when needed. This change is consistent with the similar modifications in other evaluation classes.

470-475: Proper fix for renaming evaluation runs with custom metrics.

The rename method now explicitly excludes custom metrics when retrieving fields for renaming, ensuring that only the standard evaluation fields are renamed. This should fix the warnings when renaming evaluation runs with custom metrics.

fiftyone/utils/eval/classification.py (3)

198-198: Method signature updated to support custom metrics filtering.

The get_fields method now includes an include_custom_metrics parameter that defaults to True, providing more control over whether custom metric fields should be included in the returned fields list.

206-207: Added conditional inclusion of custom metric fields.

This change ensures that custom metric fields are only included in the returned field list when include_custom_metrics is True, improving flexibility when retrieving fields for different operations.

214-219: Updated rename method to exclude custom metrics during renaming.

The rename method now explicitly passes include_custom_metrics=False when retrieving fields, which prevents unnecessary warnings when renaming evaluation runs that include custom metrics. This addresses the core bug described in the PR objectives.

fiftyone/utils/eval/base.py (2)

102-102: Fixed parameter passing in get_custom_metric_fields.

Updated the call to operator.get_fields() to pass self.config and eval_key as separate arguments instead of using self.config.eval_key, aligning with changes to how custom metrics are handled throughout the codebase.

141-142: Added base implementation of get_fields method.

Added a new method get_fields to the BaseEvaluationMethod class that serves as a placeholder for subclasses to override. This method establishes the contract that subclasses should follow when implementing field retrieval logic.

fiftyone/utils/eval/segmentation.py (3)

234-234: Method signature updated to support custom metrics filtering.

The get_fields method now includes an include_custom_metrics parameter that defaults to True, providing more control over whether custom metric fields should be included in the returned fields list. This is consistent with similar changes in other evaluation classes.

259-260: Added conditional inclusion of custom metric fields.

This change ensures that custom metric fields are only included in the returned field list when include_custom_metrics is True, improving flexibility when retrieving fields for different operations.

267-272: Updated rename method to exclude custom metrics during renaming.

The rename method now explicitly passes include_custom_metrics=False when retrieving fields, preventing unnecessary warnings when renaming evaluation runs that include custom metrics. This addresses the core bug described in the PR objectives and is consistent with similar changes in other evaluation classes.

tests/unittests/evaluation_plugins/__init__.py (7)

1-7: LGTM! Added appropriate file header and documentation.

The file header includes necessary copyright information and properly describes the purpose of the file.

8-11: LGTM! Imported necessary modules.

The imports include the essential modules required for defining a custom evaluation metric.

13-22: LGTM! Well-defined custom evaluation metric class with proper configuration.

The CustomEvaluationMetric class extends the base EvaluationMetric class and properly implements the config property to return essential metadata for the metric. The unlisted=True parameter ensures this test-only metric doesn't appear in the UI.

23-33: LGTM! Implemented input resolution with appropriate defaults.

The resolve_input method defines an input parameter with a descriptive label, clear description, and sensible default value. This correctly implements the interface required for custom metrics.

34-42: LGTM! Proper implementation of compute method.

The compute method successfully:

Adds a new field to the dataset with appropriate naming

Sets the field value based on the input parameter

Saves the changes to persist them

Returns the value to be stored as an aggregate metric

This approach allows for testing both field-level and aggregate metric storage.

43-45: LGTM! Correct implementation of get_fields method.

The get_fields method properly returns a list containing the field name that's created in the compute method. This ensures that the field is correctly identified during renaming and cleanup operations.

47-49: LGTM! Added registration function.

The register function properly registers the CustomEvaluationMetric class with the plugin system, making it available for test cases.

tests/unittests/evaluation_tests.py (5)

28-28: Added import for use_local_plugins decorator

The decorator is now imported to support the new test methods that test custom evaluation metrics from plugins.

190-246: Well-structured test for custom regression metrics

This test comprehensively validates the behavior of custom metrics in regression evaluations by:

Verifying the custom metric is properly stored in results

Testing metric overwrites

Ensuring metric fields are correctly renamed when evaluations are renamed

Checking that metric fields are properly deleted

The test properly uses the use_local_plugins decorator to temporarily set the plugins directory for accessing the custom evaluation metric plugin.

742-796: Comprehensive test for custom classification metrics

Similar to the regression test, this test validates all aspects of custom metrics handling in classification evaluations, including proper storing, overwriting, renaming, and deletion of custom metric fields. The test structure ensures complete coverage of the functionality being tested.

1981-2036: Well-implemented test for custom detection metrics

This test follows the established pattern for validating custom metrics behavior in detection evaluations. It thoroughly tests the entire lifecycle of custom metrics including creation, modification, renaming, and deletion, ensuring that the fixes for custom evaluation metrics work correctly.

3192-3247: Thorough test for custom segmentation metrics

This test completes the suite of custom metrics tests by validating the same functionality for segmentation evaluations. The consistent test structure across all evaluation types ensures comprehensive coverage of the bug fix described in the PR objectives.

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai plan to trigger planning for file edits and PR creation.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

fix custom evaluation metric renaming

b4f1ce4

brimoor added the bug Bug fixes label Apr 13, 2025

brimoor requested a review from manushreegangwar April 13, 2025 19:02

brimoor changed the title ~~Bug fix: custom evaluation metrics renaming~~ Bug fix: renaming evaluations with custom metrics Apr 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fix: renaming evaluations with custom metrics #5724

Bug fix: renaming evaluations with custom metrics #5724

brimoor commented Apr 13, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 13, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Bug fix: renaming evaluations with custom metrics #5724

Are you sure you want to change the base?

Bug fix: renaming evaluations with custom metrics #5724

Conversation

brimoor commented Apr 13, 2025 • edited by coderabbitai bot Loading

Change log

Tested by

Summary by CodeRabbit

coderabbitai bot commented Apr 13, 2025 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

brimoor commented Apr 13, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 13, 2025 •

edited

Loading