Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy core fromnumeric.py error in QAPipeline.fit_retriever #356

Open
riemann85 opened this issue Mar 28, 2020 · 1 comment
Open

numpy core fromnumeric.py error in QAPipeline.fit_retriever #356

riemann85 opened this issue Mar 28, 2020 · 1 comment

Comments

@riemann85
Copy link

Describe the bug
Replication of a QAPipeline as in your example in fit_retriever() related to numpy.core.fromnumeric

To Reproduce
Steps to reproduce the behavior: tutorial-use-pdf-converter.ipynb

  1. Go to '...' tutorial-use-pdf-converter.ipynb
    cdqa_pipeline = QAPipeline(reader='./models/bert_qa.joblib', max_df=1.0)

Fit Retriever to documents

cdqa_pipeline.fit_retriever(df=df)
cdqa_pipeline = QAPipeline(reader='./models/bert_qa.joblib', max_df=1.0)

Fit Retriever to documents

cdqa_pipeline.fit_retriever(df=df)

Screenshots
ValueError Traceback (most recent call last)
in
1 cdqa_pipeline = QAPipeline(reader='./models/bert_qa.joblib')
----> 2 cdqa_pipeline.fit_retriever(df=df)

/mnt/batch/tasks/shared/LS_root/mounts/clusters/ds-gen-gpu-v100/code/Users/malosett/pillar_clauses/cdQA/cdqa/pipeline/cdqa_sklearn.py in fit_retriever(self, df)
109 )
110 else:
--> 111 self.metadata = self._expand_paragraphs(df)
112
113 self.retriever.fit(self.metadata)

/mnt/batch/tasks/shared/LS_root/mounts/clusters/ds-gen-gpu-v100/code/Users/malosett/pillar_clauses/cdQA/cdqa/pipeline/cdqa_sklearn.py in _expand_paragraphs(df)
230 {
231 col: np.repeat(df[col].values, df[lst_col].str.len())
--> 232 for col in df.columns.drop(lst_col)
233 }
234 ).assign(**{lst_col: np.concatenate(df[lst_col].values)})[df.columns]

/mnt/batch/tasks/shared/LS_root/mounts/clusters/ds-gen-gpu-v100/code/Users/malosett/pillar_clauses/cdQA/cdqa/pipeline/cdqa_sklearn.py in (.0)
230 {
231 col: np.repeat(df[col].values, df[lst_col].str.len())
--> 232 for col in df.columns.drop(lst_col)
233 }
234 ).assign(**{lst_col: np.concatenate(df[lst_col].values)})[df.columns]

<array_function internals> in repeat(*args, **kwargs)

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py in repeat(a, repeats, axis)
479 [3, 4]])
480
--> 481 """
482 return _wrapfunc(a, 'repeat', repeats, axis=axis)
483

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
59
60 try:
---> 61 return bound(*args, **kwds)
62 except TypeError:
63 # A TypeError occurs if the object does have such a method in its

ValueError: repeats may not contain negative values.
Desktop (please complete the following information):

Execute notebook examples on Azure ML with V100 GPU.

Additional context
What is the requirement for numpy version I have installed 1.18.2 numpy version
All other requirements met as in requirements.txt

@riemann85
Copy link
Author

Hi,
I analyzed the issue and the problem consists in the dataframe format in input to fit_retriever() method.
fit_retirever() QAPipeline works fine for df of a format like bnp one.
May I ask which is the format for df dataframe (a dataframe with title , paragraphs columns)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant