-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add options for what to do with missing metadata fields in MetaFieldRanker
#7700
base: main
Are you sure you want to change the base?
Conversation
…o metf_optionality
Pull Request Test Coverage Report for Build 9400372854Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Added a few comments.
@@ -43,6 +43,7 @@ def __init__( | |||
top_k: Optional[int] = None, | |||
ranking_mode: Literal["reciprocal_rank_fusion", "linear_score"] = "reciprocal_rank_fusion", | |||
sort_order: Literal["ascending", "descending"] = "descending", | |||
missing_meta: Literal["drop", "top", "bottom"] = "bottom", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'd want to convert the Literal
init parameters to follow the enum pattern seen in other parts of the library (c.f HFGenerationAPIType
and HuggingFaceAPIGenerator
).
Would you be up to fixing that in a follow-up PR? This would also mean that the validation code gets changed/moved around.
What to do with documents that are missing the sorting metadata field. | ||
Possible values are: | ||
- 'drop' will drop the documents entirely. | ||
- 'top' will place the documents at the top of the metadata-sorted list | ||
(regardless of 'ascending' or 'descending'). | ||
- 'bottom' will place the documents at the bottom of metadata-sorted list | ||
(regardless of 'ascending' or 'descending'). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we introduce the enum, the bulk of this docstring can be moved to the corresponding docstrings of the former.
) | ||
if missing_meta == "bottom": | ||
logger.warning( | ||
"The parameter <meta_field> is currently set to '{meta_field}' but the Documents with IDs {document_ids} don't have this meta key.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this string can be extracted into a variable and reused in all three warnings?
sorted_documents = self._merge_rankings(documents, sorted_documents, weight, ranking_mode) | ||
if missing_meta == "bottom": | ||
sorted_documents = sorted_by_meta + docs_missing_meta_field | ||
sorted_documents = self._merge_rankings(documents, sorted_documents, weight, ranking_mode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement can also be moved outside the if..elif..else
clause.
Related Issues
MetaFieldRanker
: allow different options for what to do with missing metadata field #7691missing_meta
toMetaFieldRanker
:Proposed Changes:
missing_meta
param has three options:"bottom"
,"top"
, and"drop"
."bottom"
exhibits the same behavior as was implemented prior to this PR, i.e., documents without the sorting metadata field are put on the bottom of the sorted list."top"
puts them at the top instead."drop"
drops such documents entirely.missing_meta
is legit.How did you test it?
Wrote and tried new tests functions in the
test
directory:test_raises_value_error_if_wrong_missing_meta
: Tests validation ofmissing_meta
test_missing_meta_bottom
: Tests thatmissing_meta = "bottom"
behaves as desired.test_missing_meta_top
: Tests thatmissing_meta = "top"
behaves as desired.test_missing_meta_drop
: Tests thatmissing_meta = "drop"
behaves as desired.Notes for the reviewer
None
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.