numpy core fromnumeric.py error in QAPipeline.fit_retriever #356

riemann85 · 2020-03-28T11:52:08Z

Describe the bug
Replication of a QAPipeline as in your example in fit_retriever() related to numpy.core.fromnumeric

To Reproduce
Steps to reproduce the behavior: tutorial-use-pdf-converter.ipynb

Go to '...' tutorial-use-pdf-converter.ipynb
cdqa_pipeline = QAPipeline(reader='./models/bert_qa.joblib', max_df=1.0)

Fit Retriever to documents

cdqa_pipeline.fit_retriever(df=df)
cdqa_pipeline = QAPipeline(reader='./models/bert_qa.joblib', max_df=1.0)

Fit Retriever to documents

cdqa_pipeline.fit_retriever(df=df)

Screenshots
ValueError Traceback (most recent call last)
in
1 cdqa_pipeline = QAPipeline(reader='./models/bert_qa.joblib')
----> 2 cdqa_pipeline.fit_retriever(df=df)

/mnt/batch/tasks/shared/LS_root/mounts/clusters/ds-gen-gpu-v100/code/Users/malosett/pillar_clauses/cdQA/cdqa/pipeline/cdqa_sklearn.py in fit_retriever(self, df)
109 )
110 else:
--> 111 self.metadata = self._expand_paragraphs(df)
112
113 self.retriever.fit(self.metadata)

/mnt/batch/tasks/shared/LS_root/mounts/clusters/ds-gen-gpu-v100/code/Users/malosett/pillar_clauses/cdQA/cdqa/pipeline/cdqa_sklearn.py in _expand_paragraphs(df)
230 {
231 col: np.repeat(df[col].values, df[lst_col].str.len())
--> 232 for col in df.columns.drop(lst_col)
233 }
234 ).assign(**{lst_col: np.concatenate(df[lst_col].values)})[df.columns]

/mnt/batch/tasks/shared/LS_root/mounts/clusters/ds-gen-gpu-v100/code/Users/malosett/pillar_clauses/cdQA/cdqa/pipeline/cdqa_sklearn.py in (.0)
230 {
231 col: np.repeat(df[col].values, df[lst_col].str.len())
--> 232 for col in df.columns.drop(lst_col)
233 }
234 ).assign(**{lst_col: np.concatenate(df[lst_col].values)})[df.columns]

<array_function internals> in repeat(*args, **kwargs)

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py in repeat(a, repeats, axis)
479 [3, 4]])
480
--> 481 """
482 return _wrapfunc(a, 'repeat', repeats, axis=axis)
483

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
59
60 try:
---> 61 return bound(*args, **kwds)
62 except TypeError:
63 # A TypeError occurs if the object does have such a method in its

ValueError: repeats may not contain negative values.
Desktop (please complete the following information):

Execute notebook examples on Azure ML with V100 GPU.

Additional context
What is the requirement for numpy version I have installed 1.18.2 numpy version
All other requirements met as in requirements.txt

riemann85 · 2020-03-28T14:45:20Z

Hi,
I analyzed the issue and the problem consists in the dataframe format in input to fit_retriever() method.
fit_retirever() QAPipeline works fine for df of a format like bnp one.
May I ask which is the format for df dataframe (a dataframe with title , paragraphs columns)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

numpy core fromnumeric.py error in QAPipeline.fit_retriever #356

numpy core fromnumeric.py error in QAPipeline.fit_retriever #356

riemann85 commented Mar 28, 2020

riemann85 commented Mar 28, 2020

numpy core fromnumeric.py error in QAPipeline.fit_retriever #356

numpy core fromnumeric.py error in QAPipeline.fit_retriever #356

Comments

riemann85 commented Mar 28, 2020

Fit Retriever to documents

Fit Retriever to documents

riemann85 commented Mar 28, 2020