-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce Dataframe Backend Checks #514
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
nemo_curator/modules/base.py
Outdated
if self.input_backend == "any": | ||
return | ||
|
||
backend = type(ddf._meta).__module__.split(".")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test case for this? And also type hints for ddf which would be dd.DataFrame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What part of this do you want to test? I'm not quite seeing which part of this you want me to isolate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As someone reading this, it's not obviously clear what will be the output of .__module__.split(".")[0]
of the type of object's _meta
attribute. So I would've proposed that you move this to a function and have tests for that and use that function here.
However I see you've changed this to is_cudf_type(..) which is much more readable, and now we can unittest that directly that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Left a review, with mostly nits around type hints and function renaming and some class abstraction opinions.
Signed-off-by: Ryan Wolf <[email protected]>
@sarahyurick @praateekmahajan thanks for your reviews. I addressed the feedback that I immediately agreed with/understood. I left some follow up comments and questions on the other points. |
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for adding this!
Description
Refactors all modules to inherit from
nemo_curator.modules.base.Module
. With this change, the base class now checks to see if the dataframe backend for theModule
matches what is passed in, and will throw an error prior to computation if there is a mismatch. We also prove a new moduleToBackend()
that allows easy chaining of CPU and GPU modules.Usage
Checklist