Error loading custom dataset #90

tkap243 · 2023-01-26T21:12:58Z

OCTIS version: 1.11.0
Python version: 3.8
Operating System: Windows 10

Description

Hello,

I am having trouble loading my custom dataset. I followed the guide in the main README and am getting the below errors.

What I Did

from octis.dataset.dataset import Dataset
import pandas as pd

df = pd.read_csv("/mnt/mydata/notebooks/data.csv")

df.to_csv('corpus.tsv', sep="\t", header= False, columns=['documents'])
dataset.load_custom_dataset_from_folder("/mnt/mydata/notebooks")

/opt/conda/lib/python3.8/site-packages/octis/dataset/dataset.py:330: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  final_df = df[df[1] == 'train'].append(df[df[1] == 'val'])
/opt/conda/lib/python3.8/site-packages/octis/dataset/dataset.py:331: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  final_df = final_df.append(df[df[1] == 'test'])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/opt/conda/lib/python3.8/site-packages/octis/dataset/dataset.py in load_custom_dataset_from_folder(self, path, multilabel)
    335 
--> 336                 self.__corpus = [d.split() for d in final_df[0].tolist()]
    337                 if len(final_df.keys()) > 2:

/opt/conda/lib/python3.8/site-packages/octis/dataset/dataset.py in <listcomp>(.0)
    335 
--> 336                 self.__corpus = [d.split() for d in final_df[0].tolist()]
    337                 if len(final_df.keys()) > 2:

AttributeError: 'int' object has no attribute 'split'

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
<ipython-input-16-28e6bd2fc3cd> in <module>
      1 dataset = Dataset()
----> 2 dataset.load_custom_dataset_from_folder("/mnt/mydata/notebooks")

/opt/conda/lib/python3.8/site-packages/octis/dataset/dataset.py in load_custom_dataset_from_folder(self, path, multilabel)
    356                 self._load_document_indexes(self.dataset_path + "/indexes.txt")
    357         except:
--> 358             raise Exception("error in loading the dataset:" + self.dataset_path)
    359 
    360     def fetch_dataset(self, dataset_name, data_home=None, download_if_missing=True):

Exception: error in loading the dataset:/mnt/mydata/notebooks

The text was updated successfully, but these errors were encountered:

SaraAmd · 2023-02-01T00:31:06Z

in [Load a Custom Dataset] section, it is mentioned that our data set should have a vocabulary file while my dataset is just a csv file I am wondering how can we generate this vocab file. does this pipeline generate it automatically?

tkap243 · 2023-02-14T19:46:39Z

Per the readme, the custom dataset is a tsv file, which is what our csv is. I'm uncertain what the vocab file should be.

silviatti · 2023-05-03T07:34:44Z

Hi, the vocabulary file is just the list of words contained in the documents. You can see #92 on how to generate it from the tsv file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error loading custom dataset #90

Error loading custom dataset #90

tkap243 commented Jan 26, 2023 •

edited

SaraAmd commented Feb 1, 2023

tkap243 commented Feb 14, 2023

silviatti commented May 3, 2023

Error loading custom dataset #90

Error loading custom dataset #90

Comments

tkap243 commented Jan 26, 2023 • edited

Description

What I Did

SaraAmd commented Feb 1, 2023

tkap243 commented Feb 14, 2023

silviatti commented May 3, 2023

tkap243 commented Jan 26, 2023 •

edited