-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enh: internals pluggable backends #2001
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @camriddell thanks for the effort, it is really π₯
I really like the idea of simplifying the user experience to extend narwhals - even if it was just for our own use, I believe it is worth it.
I could not stop myself to leave some comments unrelated from the design itself - feel free to just ignore those for now.
For me the only one thing I am not fully sold in is having some "hidden" behavior for the first tuple of requires
i.e.
requires=[
("pandas", "pandas.__version__", (0, 25, 3)),
("foo", "foo.__version__", (0, 0, 1)),
],
and
requires=[
("foo", "foo.__version__", (0, 0, 1)),
("pandas", "pandas.__version__", (0, 25, 3)),
],
lead to very different outcomes.
I proposed another approach which is a bit more explicit for the requirement that corresponds to the library dataframe backend. Let me know what are your thoughts on the topic :)
from narwhals.translate import to_native | ||
|
||
return to_native(narwhals_object=self, pass_through=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also just be:
from narwhals.translate import to_native | |
return to_native(narwhals_object=self, pass_through=False) | |
return self._compliant_frame._native_frame |
narwhals/backends.py
Outdated
for b in backends: | ||
BACKENDS.append(b) # noqa: PERF402 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick:
for b in backends: | |
BACKENDS.append(b) # noqa: PERF402 | |
BACKENDS.extend(backends) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π€¦ not a nitpick, you can even see that I ignored ruff while working quickly to get this up.
On this topic, do you have any thoughts about BACKENDS being a mutable global entity? I am stuck on this consideration vs making it immutable and allowing users to pass extra backends through the from_native
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this topic, do you have any thoughts about BACKENDS being a mutable global entity?
I feel uncomfortable as well with having such global variable.
I am not very familiar with the pattern - but what about something like a BackendRegistry
class in which it's possible to set/register backends but not delete them (explicitly)? I am thinking out loud here
BACKENDS = [] | ||
|
||
|
||
@dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally it would be nice to have it kw_only
- although I know that's supported from python 3.10 for dataclasses π₯²
|
||
|
||
@dataclass | ||
class Adaptation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: some docstring π
narwhals/backends.py
Outdated
elif adapt.version is None: | ||
adaptations.extend(replace(adapt, version=v) for v in Version) | ||
else: | ||
msg = "Adaptation.version must be {Version!r} or None, got {adapt.version!r}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msg = "Adaptation.version must be {Version!r} or None, got {adapt.version!r}" | |
msg = f"Adaptation.version must be {Version!r} or None, got {adapt.version!r}" |
narwhals/backends.py
Outdated
adapter: str | type | ||
level: str | ||
kwargs: dict[str, Any] = field(default_factory=dict) | ||
version: Version | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should allow version to be None. After all, we set that based upon from where from_native
is called π€
narwhals/backends.py
Outdated
|
||
@dataclass | ||
class Backend: | ||
requires: list[tuple[str, str | Callable, tuple[int, ...]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth considering a list[namedtuple]
to improve readability
narwhals/backends.py
Outdated
elif isinstance(version_getter, str): | ||
version_str = dynamic_import(version_getter) | ||
else: | ||
msg = "version_getter {version_getter!r} must be a string or callable, got {type(version_getter)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msg = "version_getter {version_getter!r} must be a string or callable, got {type(version_getter)}" | |
msg = f"version_getter {version_getter!r} must be a string or callable, got {type(version_getter)}" |
π
narwhals/backends.py
Outdated
return import_module(self.requires[0][0]) | ||
|
||
def get_native_namespace(self) -> ModuleType | None: | ||
return sys.modules.get(self.requires[0][0], None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment, for which I didn't really think of an alternative, is that I don't like too much the fact that self.requires
is used implicitly 3 times with the first item in the list.
One idea could be to have a distinction between (let's maybe get better names):
backend_library
-> this is where theadaptation.native
class comes from- In the proposed implementation this is dynamically extracted from
adaptation.native
, but then for native_namespace the first tuple inrequires
is used
- In the proposed implementation this is dynamically extracted from
extra_requires
-> potential list of extra requirements
for attr in attributes: | ||
obj = getattr(obj, attr) | ||
return obj | ||
msg = "Could not import {dotted_path!r}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok last one
msg = "Could not import {dotted_path!r}" | |
msg = f"Could not import {dotted_path!r}" |
- split Backend requirements into the core package requirement and extras (would be nice to refactor this out once the Implementation enum is refactored) - add Requirement class to store backend package/module requirements - add MROAdaptation class for matching against inherited types - BACKENDS represented by a deque for priority left/right appensions - refactor adaptation matching logic to exist in the Adapatation class - refactor version validation to Backend class
@FBruzzesi I have an update with feedback from your comments incorporated here! Still need to add tests for the backend specific code but wanted to prioritize an API we like. I have also added in @dangotbanned since he has done some recent work on the internals as well and want to make sure this change aligns well with that work too. |
Thanks for the ping @camriddell I've only skimmed through but I don't think this conflicts with what I've been working on π Somehow you've managed to touch code that I haven't over there π |
I think this issue might be relevant, wrt I'm exploring that a little bit locally at the moment - but mostly in terms of writing reusable |
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
This design is not final and some functions need to move around, any discussion is much appreciated. It is a prototype to remove human-readable complexity out of the
from_native
function into a data-oriented approach.Currently, adding in a new backend requires developers to touch many lines of code across different modules and it can be tricky to figure out where exactly to look. This is an attempt to consolidate all of this meta-data into a
backends.py
file.Features that can be very readily implemented here
PandasLikeDataFrame
) the appropriateBackend
object which enables easy access to information like the namespaceBackend.native_namespace
as well as the the version number inBackend.version()
. The Backend can even validate its own versionsBacken.validate_backend_versions
(which now that I type it out should probably just beBackend.validate_versions
)Backend
- with thedask.dataframe
backend there is an implicit dependency ondask_expr
that goes unnoticed untilfrom_native
tells one to install it. This approach consolidates that informationBackend
can be done completely without touching narwhals internals.