You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was using sqlite-utils to create a DB from a CSV and it turns out the CSV contains a NUL byte.
When the processing reaches the line that contains the NUL an exception is raised.
I'm wondering if there is something that can be done in sqlite-utils to say "skip lines with encoding errors" or some such. I think it isn't super straightforward though as the exception comes from inside the csv module that does all the parsing.
$ sqlite-utils insert --csv kaggle.db kaggle KernelVersions.csv
[------------------------------------] 0%
[#####################---------------] 60% 00:04:24Traceback (most recent call last):
File "/home/foobar/miniconda/envs/meta-kaggle/bin/sqlite-utils", line 10, in <module>
sys.exit(cli())
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1223, in insert
insert_upsert_implementation(
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1085, in insert_upsert_implementation
db[table].insert_all(
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/db.py", line 3198, in insert_all
chunk = list(chunk)
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/db.py", line 3742, in fix_square_braces
for record in records:
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1071, in <genexpr>
docs = (decode_base64_values(doc) for doc in docs)
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1068, in <genexpr>
docs = (verify_is_dict(doc) for doc in docs)
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1003, in <genexpr>
docs = (dict(zip(headers, row)) for row in reader)
_csv.Error: line contains NUL
The text was updated successfully, but these errors were encountered:
I was using sqlite-utils to create a DB from a CSV and it turns out the CSV contains a NUL byte.
When the processing reaches the line that contains the NUL an exception is raised.
I'm wondering if there is something that can be done in
sqlite-utils
to say "skip lines with encoding errors" or some such. I think it isn't super straightforward though as the exception comes from inside thecsv
module that does all the parsing.Concretely the file is the
KernelVersions.csv
from https://www.kaggle.com/datasets/kaggle/meta-kaggleThis is the command and output:
The text was updated successfully, but these errors were encountered: