Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InterpolationJoiner - polars #897

Open
zbenmo opened this issue Apr 1, 2024 · 1 comment
Open

InterpolationJoiner - polars #897

zbenmo opened this issue Apr 1, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@zbenmo
Copy link

zbenmo commented Apr 1, 2024

Describe the bug

Tried a simple join as follows:

joiner = InterpolationJoiner(
data_store["depth_0"][0],
key=["case_id"],
suffix="_depth_0",
).fit(data_store["df_base"])
join = joiner.transform(data_store["df_base"])
join.head()

--

data_store["depth_0"][0] - polars Dataframe
data_store["df_base"] - polars Dataframe

--

Steps/Code to Reproduce

joiner = InterpolationJoiner(
data_store["depth_0"][0],
key=["case_id"],
suffix="_depth_0",
).fit(data_store["df_base"])
join = joiner.transform(data_store["df_base"])
join.head()

Expected Results

Wanted to see the join, as in: https://skrub-data.org/stable/auto_examples/09_interpolation_join.html

Actual Results


KeyError Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/polars/_utils/deprecation.py:95, in deprecate_parameter_as_positional..decorate..wrapper(*args, **kwargs)
94 try:
---> 95 param_args = kwargs.pop(old_name)
96 except KeyError:

KeyError: 'columns'

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
Cell In[8], line 5
1 joiner = InterpolationJoiner(
2 data_store["depth_0"][0],
3 key=["case_id"],
4 suffix="_depth_0",
----> 5 ).fit(data_store["df_base"])
6 join = joiner.transform(data_store["df_base"])
7 join.head()

File /opt/conda/lib/python3.10/site-packages/skrub/_interpolation_joiner.py:225, in InterpolationJoiner.fit(failed resolving arguments)
223 _join_utils.check_missing_columns(X, self.main_key, "'X' (the main table)")
224 key_values = self.vectorizer
.fit_transform(self.aux_table[self._aux_key])
--> 225 estimators = self._get_estimator_assignments()
226 fit_results = joblib.Parallel(self.n_jobs)(
227 joblib.delayed(_fit)(
228 key_values,
(...)
233 for assignment in estimators
234 )
235 fit_results = self._check_fit_results(fit_results)

File /opt/conda/lib/python3.10/site-packages/skrub/_interpolation_joiner.py:356, in InterpolationJoiner._get_estimator_assignments(self)
339 def _get_estimator_assignments(self):
340 """Identify column groups to be predicted together and assign them an estimator.
341
342 In many cases, a single estimator cannot handle all the target columns.
(...)
354 separately to each column.
355 """
--> 356 aux_table = self.aux_table.drop(self._aux_key, axis=1)
357 assignments = []
358 regression_table = aux_table.select_dtypes("number")

File /opt/conda/lib/python3.10/site-packages/polars/_utils/deprecation.py:97, in deprecate_parameter_as_positional..decorate..wrapper(*args, **kwargs)
95 param_args = kwargs.pop(old_name)
96 except KeyError:
---> 97 return function(*args, **kwargs)
99 issue_deprecation_warning(
100 f"named {old_name} param is deprecated; use positional *args instead.",
101 version=version,
102 )
103 if not isinstance(param_args, Sequence) or isinstance(param_args, str):

TypeError: DataFrame.drop() got an unexpected keyword argument 'axis'

Versions

'0.1.0'
@zbenmo zbenmo added the bug Something isn't working label Apr 1, 2024
@jeromedockes
Copy link
Member

thanks for reporting this bug. Indeed, InterpolationJoiner does not yet have support for polars, although that should be added soon. in the meanwhile it should be documented and provide a better error message

  • document the fact that interpolationjoiner is missing polars support ATM
  • add polars support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants