-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error loading custom dataset #90
Comments
in [Load a Custom Dataset] section, it is mentioned that our data set should have a vocabulary file while my dataset is just a csv file I am wondering how can we generate this vocab file. does this pipeline generate it automatically? |
Per the readme, the custom dataset is a tsv file, which is what our csv is. I'm uncertain what the vocab file should be. |
Hi, the vocabulary file is just the list of words contained in the documents. You can see #92 on how to generate it from the tsv file. |
Description
Hello,
I am having trouble loading my custom dataset. I followed the guide in the main README and am getting the below errors.
What I Did
from octis.dataset.dataset import Dataset
import pandas as pd
df = pd.read_csv("/mnt/mydata/notebooks/data.csv")
df.to_csv('corpus.tsv', sep="\t", header= False, columns=['documents'])
dataset.load_custom_dataset_from_folder("/mnt/mydata/notebooks")
The text was updated successfully, but these errors were encountered: