MemoryError workaround #357

nortz8 · 2020-03-29T12:37:35Z

Kindly consider changing the def _expand_paragraphs function in the cdqa_sklearn.py file to accommodate larger datasets. Modifying the dataframe needs a lot of memory for bigger data so it would be better to set it as a list of dict before making it a dataframe.

Below is the modification I did so I would not get a MemoryError:

@staticmethod
def _expand_paragraphs(df): 
     data=[]
     for n in range(len(df)):  
         stringlist = df.iloc[n][1]  
         for m in range(len(stringlist)): 
             a=df.iloc[n][0] 
             b=stringlist[m] 
             data.append({'title' : a, 'content' : b}) 
     dfx = pd.DataFrame(data) 
     return dfx

The text was updated successfully, but these errors were encountered:

adjouama · 2020-04-29T09:40:30Z

Very good point. +1 @nortz8
However, your workaround did not work for me. I ended up having the following;
ValueError: empty vocabulary; perhaps the documents only contain stop words

Any idea why ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MemoryError workaround #357

MemoryError workaround #357

nortz8 commented Mar 29, 2020 •

edited

Loading

adjouama commented Apr 29, 2020

MemoryError workaround #357

MemoryError workaround #357

Comments

nortz8 commented Mar 29, 2020 • edited Loading

adjouama commented Apr 29, 2020

nortz8 commented Mar 29, 2020 •

edited

Loading