DtypeWarning: Columns (0,1) have mixed types. Specify dtype option on import or set low_memory=False #311

mathieuchateau · 2022-08-24T09:54:28Z

Hello,
I am quite new on the topic, sorry if it's a false issue.

When loading with BertDataBunch, I got this warning:

lib/python3.9/site-packages/fast_bert/data_cls.py:231: DtypeWarning: Columns (0,1) have mixed types. Specify dtype option on import or set low_memory=False.
  data_df = pd.read_csv(os.path.join(self.data_dir, filename))

I already have this sort of issue with panda in my code, but with BertDataBunch I can't find a way to set dtype option ?
Installed fast-bert yesterday, so latest version I guess

databunch = BertDataBunch(DATA_PATH, LABEL_PATH,
                              tokenizer='camembert-base',
                              train_file='train_set.csv',
                              val_file='val_set.csv',
                              label_file='labels.txt',
                              text_col='source_clean',
                              label_col=['aaa', 'bbb', 'ccc','ddd', 'eee'],
                              batch_size_per_gpu=16,
                              max_seq_length=512,
                              multi_gpu=False,
                              multi_label=True,
                              model_type='camembert-base')

The text was updated successfully, but these errors were encountered:

mathieuchateau · 2022-08-24T10:02:48Z

Second warning during same run on another line (248):

lib/python3.9/site-packages/fast_bert/data_cls.py:248: DtypeWarning: Columns (0,1) have mixed types. Specify dtype option on import or set low_memory=False.
  data_df = pd.read_csv(os.path.join(self.data_dir, filename))

lingdoc · 2022-08-26T18:00:27Z

this is related to the format of your datafiles, which can lead to issues when importing a CSV via a pandas dataframe. I might submit a pull request to allow xlsx files instead, since these have better handling for rows/columns, but for now one workaround is to ensure all your text in a CSV is surrounded by double quotes: "

This was referenced Oct 17, 2022

Hotfix/support xlsx #313

Closed

Updated data.py and data_cls.py to work with xlsx data files #314

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DtypeWarning: Columns (0,1) have mixed types. Specify dtype option on import or set low_memory=False #311

DtypeWarning: Columns (0,1) have mixed types. Specify dtype option on import or set low_memory=False #311

mathieuchateau commented Aug 24, 2022

mathieuchateau commented Aug 24, 2022

lingdoc commented Aug 26, 2022

DtypeWarning: Columns (0,1) have mixed types. Specify dtype option on import or set low_memory=False #311

DtypeWarning: Columns (0,1) have mixed types. Specify dtype option on import or set low_memory=False #311

Comments

mathieuchateau commented Aug 24, 2022

mathieuchateau commented Aug 24, 2022

lingdoc commented Aug 26, 2022