Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR to handle columns of a dataframe that are not of type float #218

Open
wants to merge 31 commits into
base: master
Choose a base branch
from

Conversation

RanaivosonHerimanitra
Copy link

When a row of a pandas dataframe is passed to an explainer tabular instance: explainer.explain_instance(df[cols],...), error is thrown because they are not of type float.
This PR handles it without problem by checking if columns are of type digits and solves the issue.

…is thrown when its columns are not float, this commit solves this issue
@RanaivosonHerimanitra
Copy link
Author

May be if I change the way I retrieve column names depending on whether it is numpy.ndarray or pandas df, will it pass all tests?
Reference: https://stackoverflow.com/questions/7561017/get-the-column-names-of-a-python-numpy-ndarray

@marcotcr
Copy link
Owner

marcotcr commented Aug 4, 2018

Sorry for the delay! I'm almost graduating so I have been neglecting the repo.
Would you mind removing the build folder from the commit?
Thanks,

values = self.convert_and_round(data_row)

# get column names if numpy.ndarray or pandas df:
try:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we possibly check if the type is a pandas df rather than doing that via try / catch? It's a little awkward to have the most common use case be in an exception.

else:
col_names = []
if len(col_names) != 0:
if np.sum([1 if str(k).isdigit() else 0 for k in col_names]) != len(col_names):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this line. Why is it meaningful if column names are all digits, and why does it matter if all columns have digit names? Also, what happens if this is False? values seem to be undefined then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants