Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add options for what to do with missing metadata fields in MetaFieldRanker #7700

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

robpasternak
Copy link
Member

Related Issues

        :param missing_meta:
            What to do with documents that are missing the sorting metadata field.
            Possible values are:
            - 'drop' will drop the documents entirely.
            - 'top' will place the documents at the top of the metadata-sorted list
                (regardless of 'ascending' or 'descending').
            - 'bottom' will place the documents at the bottom of metadata-sorted list
                (regardless of 'ascending' or 'descending').

Proposed Changes:

  • The missing_meta param has three options: "bottom", "top", and "drop".
    • Using "bottom" exhibits the same behavior as was implemented prior to this PR, i.e., documents without the sorting metadata field are put on the bottom of the sorted list.
    • Using "top" puts them at the top instead.
    • Using "drop" drops such documents entirely.
  • Validation was added to ensure that the value of missing_meta is legit.
  • Tests were added for the new functionality.

How did you test it?

Wrote and tried new tests functions in the test directory:

  • test_raises_value_error_if_wrong_missing_meta: Tests validation of missing_meta
  • test_missing_meta_bottom: Tests that missing_meta = "bottom" behaves as desired.
  • test_missing_meta_top: Tests that missing_meta = "top" behaves as desired.
  • test_missing_meta_drop: Tests that missing_meta = "drop" behaves as desired.

Notes for the reviewer

None

Checklist

@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels May 15, 2024
@coveralls
Copy link
Collaborator

coveralls commented May 15, 2024

Pull Request Test Coverage Report for Build 9400372854

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 89.751%

Totals Coverage Status
Change from base Build 9387490162: 0.02%
Covered Lines: 6787
Relevant Lines: 7562

💛 - Coveralls

@robpasternak robpasternak marked this pull request as ready for review June 3, 2024 14:57
@robpasternak robpasternak requested review from a team as code owners June 3, 2024 14:57
@robpasternak robpasternak requested review from dfokina and shadeMe and removed request for a team June 3, 2024 14:57
Copy link
Collaborator

@shadeMe shadeMe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Added a few comments.

@@ -43,6 +43,7 @@ def __init__(
top_k: Optional[int] = None,
ranking_mode: Literal["reciprocal_rank_fusion", "linear_score"] = "reciprocal_rank_fusion",
sort_order: Literal["ascending", "descending"] = "descending",
missing_meta: Literal["drop", "top", "bottom"] = "bottom",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'd want to convert the Literal init parameters to follow the enum pattern seen in other parts of the library (c.f HFGenerationAPIType and HuggingFaceAPIGenerator).

Would you be up to fixing that in a follow-up PR? This would also mean that the validation code gets changed/moved around.

Comment on lines +70 to +76
What to do with documents that are missing the sorting metadata field.
Possible values are:
- 'drop' will drop the documents entirely.
- 'top' will place the documents at the top of the metadata-sorted list
(regardless of 'ascending' or 'descending').
- 'bottom' will place the documents at the bottom of metadata-sorted list
(regardless of 'ascending' or 'descending').
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we introduce the enum, the bulk of this docstring can be moved to the corresponding docstrings of the former.

)
if missing_meta == "bottom":
logger.warning(
"The parameter <meta_field> is currently set to '{meta_field}' but the Documents with IDs {document_ids} don't have this meta key.\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this string can be extracted into a variable and reused in all three warnings?

sorted_documents = self._merge_rankings(documents, sorted_documents, weight, ranking_mode)
if missing_meta == "bottom":
sorted_documents = sorted_by_meta + docs_missing_meta_field
sorted_documents = self._merge_rankings(documents, sorted_documents, weight, ranking_mode)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement can also be moved outside the if..elif..else clause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MetaFieldRanker: allow different options for what to do with missing metadata field
3 participants