Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh: internals pluggable backends #2001

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

camriddell
Copy link
Contributor

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

  • Related issue #<issue number>
  • Closes #<issue number>

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

This design is not final and some functions need to move around, any discussion is much appreciated. It is a prototype to remove human-readable complexity out of the from_native function into a data-oriented approach.

Currently, adding in a new backend requires developers to touch many lines of code across different modules and it can be tricky to figure out where exactly to look. This is an attempt to consolidate all of this meta-data into a backends.py file.

Features that can be very readily implemented here

  1. Hand the adapter class (e.g. PandasLikeDataFrame) the appropriate Backend object which enables easy access to information like the namespace Backend.native_namespace as well as the the version number in Backend.version(). The Backend can even validate its own versions Backen.validate_backend_versions (which now that I type it out should probably just be Backend.validate_versions)
  2. Support for having multiple requirements for a particular Backend- with the dask.dataframe backend there is an implicit dependency on dask_expr that goes unnoticed until from_native tells one to install it. This approach consolidates that information
  3. Flexibility- adding a new Backend can be done completely without touching narwhals internals.
import narwhals as nw
from narwhals.backends import Backend, Adaptation, register_backends
import numpy as np

class NumPySeries:
    def __init__(self, native_series, backend_version, version):
        self._native_series = native_series

    def __narwhals_series__(self):
        return self

    def mean(self):
        return self._native_series.mean()

    # numpy ndarrays do not implement std as a method
    def std(self, ddof=1):
        return np.std(self._native_series, ddof=ddof)


register_backends(
    Backend(
        requires=[
            ("numpy", "numpy.__version__", (1, 20))
        ],
        adaptations=[
            Adaptation(
                # link a Narwhals Series to a numpy.ndarray via NumPySeries
                nw.Series, "numpy.ndarray", NumPySeries, level="full"
            ),
        ],
    )
)


array = np.array([1,2,3,4])
series = nw.from_native(array, series_only=True)
print(series)
# β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
# |  Narwhals Series  |
# |-------------------|
# |array([1, 2, 3, 4])|
# β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜


print(
    f'{array.mean()  = }',
    f'{series.mean() = }',
    f'{np.std(array, ddof=1) = }',
    f'{series.std()          = }',
    sep='\n'
)

# array.mean()  = np.float64(2.5)
# series.mean() = np.float64(2.5)
# np.std(array, ddof=1) = np.float64(1.2909944487358056)
# series.std()          = np.float64(1.2909944487358056)

Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @camriddell thanks for the effort, it is really πŸ”₯

I really like the idea of simplifying the user experience to extend narwhals - even if it was just for our own use, I believe it is worth it.

I could not stop myself to leave some comments unrelated from the design itself - feel free to just ignore those for now.

For me the only one thing I am not fully sold in is having some "hidden" behavior for the first tuple of requires i.e.

requires=[
    ("pandas", "pandas.__version__", (0, 25, 3)),
    ("foo", "foo.__version__", (0, 0, 1)),
],

and

requires=[
    ("foo", "foo.__version__", (0, 0, 1)),
    ("pandas", "pandas.__version__", (0, 25, 3)),
],

lead to very different outcomes.

I proposed another approach which is a bit more explicit for the requirement that corresponds to the library dataframe backend. Let me know what are your thoughts on the topic :)

Comment on lines +2366 to 2368
from narwhals.translate import to_native

return to_native(narwhals_object=self, pass_through=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also just be:

Suggested change
from narwhals.translate import to_native
return to_native(narwhals_object=self, pass_through=False)
return self._compliant_frame._native_frame

Comment on lines 130 to 131
for b in backends:
BACKENDS.append(b) # noqa: PERF402
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick:

Suggested change
for b in backends:
BACKENDS.append(b) # noqa: PERF402
BACKENDS.extend(backends)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🀦 not a nitpick, you can even see that I ignored ruff while working quickly to get this up.

On this topic, do you have any thoughts about BACKENDS being a mutable global entity? I am stuck on this consideration vs making it immutable and allowing users to pass extra backends through the from_native function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this topic, do you have any thoughts about BACKENDS being a mutable global entity?

I feel uncomfortable as well with having such global variable.
I am not very familiar with the pattern - but what about something like a BackendRegistry class in which it's possible to set/register backends but not delete them (explicitly)? I am thinking out loud here

BACKENDS = []


@dataclass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally it would be nice to have it kw_only - although I know that's supported from python 3.10 for dataclasses πŸ₯²



@dataclass
class Adaptation:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: some docstring πŸ™ƒ

elif adapt.version is None:
adaptations.extend(replace(adapt, version=v) for v in Version)
else:
msg = "Adaptation.version must be {Version!r} or None, got {adapt.version!r}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msg = "Adaptation.version must be {Version!r} or None, got {adapt.version!r}"
msg = f"Adaptation.version must be {Version!r} or None, got {adapt.version!r}"

adapter: str | type
level: str
kwargs: dict[str, Any] = field(default_factory=dict)
version: Version | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should allow version to be None. After all, we set that based upon from where from_native is called πŸ€”


@dataclass
class Backend:
requires: list[tuple[str, str | Callable, tuple[int, ...]]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth considering a list[namedtuple] to improve readability

elif isinstance(version_getter, str):
version_str = dynamic_import(version_getter)
else:
msg = "version_getter {version_getter!r} must be a string or callable, got {type(version_getter)}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msg = "version_getter {version_getter!r} must be a string or callable, got {type(version_getter)}"
msg = f"version_getter {version_getter!r} must be a string or callable, got {type(version_getter)}"

πŸ™ˆ

return import_module(self.requires[0][0])

def get_native_namespace(self) -> ModuleType | None:
return sys.modules.get(self.requires[0][0], None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment, for which I didn't really think of an alternative, is that I don't like too much the fact that self.requires is used implicitly 3 times with the first item in the list.

One idea could be to have a distinction between (let's maybe get better names):

  • backend_library -> this is where the adaptation.native class comes from
    • In the proposed implementation this is dynamically extracted from adaptation.native, but then for native_namespace the first tuple in requires is used
  • extra_requires -> potential list of extra requirements

for attr in attributes:
obj = getattr(obj, attr)
return obj
msg = "Could not import {dotted_path!r}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok last one

Suggested change
msg = "Could not import {dotted_path!r}"
msg = f"Could not import {dotted_path!r}"

- split Backend requirements into the core package requirement and
  extras (would be nice to refactor this out once the Implementation enum is
  refactored)
- add Requirement class to store backend package/module requirements
- add MROAdaptation class for matching against inherited types
- BACKENDS represented by a deque for priority left/right appensions
- refactor adaptation matching logic to exist in the Adapatation class
- refactor version validation to Backend class
@camriddell camriddell requested a review from dangotbanned March 10, 2025 19:58
@camriddell
Copy link
Contributor Author

@FBruzzesi I have an update with feedback from your comments incorporated here! Still need to add tests for the backend specific code but wanted to prioritize an API we like.

I have also added in @dangotbanned since he has done some recent work on the internals as well and want to make sure this change aligns well with that work too.

@dangotbanned
Copy link
Member

dangotbanned commented Mar 10, 2025

#2001 (comment)

Thanks for the ping @camriddell

I've only skimmed through but I don't think this conflicts with what I've been working on πŸ‘
The biggest potentially upcoming change would be the stuff in nw._compliant.* in https://github.com/narwhals-dev/narwhals/pull/2149/files

Somehow you've managed to touch code that I haven't over there πŸ˜„

@dangotbanned
Copy link
Member

I think this issue might be relevant, wrt from_native

I'm exploring that a little bit locally at the moment - but mostly in terms of writing reusable Protocol definitions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants