Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dolt export/import warts #8386

Closed
max-hoffman opened this issue Sep 25, 2024 · 2 comments
Closed

Dolt export/import warts #8386

max-hoffman opened this issue Sep 25, 2024 · 2 comments
Labels
dolthub Please submit to our dolthub.com issues repo dolthub/dolthub-issues. import

Comments

@max-hoffman
Copy link
Contributor

I had to do some data processing to export/import a portion of a table from the mediawiki database (https://www.dolthub.com/repositories/timsehn/media_wiki/query/main?active=Tables&q=show+create+table+categorylinks%3B).

There were three problems:

  1. The name of the exported file has a double .csv extension by default (.csv.csv)
  2. There are line breaks in the CSV, which i can't tell are because of \n characters or some other decoding problem.
  3. Importing into a field with a default value for nulls errors for null values.

Original export:

cl_from,cl_to,cl_sortkey,cl_sortkey_prefix,cl_timestamp,cl_collation,cl_type
13303,Communes_of_Tarn-et-Garonne,MONTAUBAN,,2024-03-12 02:10:49,uppercase,page
13303,Pages_with_broken_file_links,MONTAUBAN,,2024-03-12 02:10:49,uppercase,page
13303,Prefectures_in_France,MONTAUBAN,,2024-03-12 02:10:49,uppercase,page
13303,Quercy,MONTAUBAN,,2024-03-12 02:10:49,uppercase,page
13304,Miniature_railways,
RAIL TRANSPORT MODELLING, ,2024-03-12 02:10:50,uppercase,page
13304,Pages_with_broken_file_links,RAIL TRANSPORT MODELLING,,2024-03-12 02:10:50,uppercase,page
13304,Rail_transport_modelling, RAIL TRANSPORT MODELLING, ,2024-03-12 02:10:50,uppercase,page
13305,Linguistic_morphology,MORPHOPHONOLOGY,,2024-03-12 02:10:51,uppercase,page
13305,Morphophonology, MORPHOPHONOLOGY, ,2024-03-12 02:10:51,uppercase,page
13305,Orthography,MORPHOPHONOLOGY,,2024-03-12 02:10:51,uppercase,page
13305,Phonology,MORPHOPHONOLOGY,,2024-03-12 02:10:51,uppercase,page
13306,Glass_applications,MIRROR,,2024-03-12 02:10:52,uppercase,page

Error for null field:

An error occurred while moving data
cause: column name 'cl_sortkey_prefix' is non-nullable but attempted to set a value of null
       A bad row was encountered inserting into table categorylinks (on line 9):
       	cl_from: 13305
       	cl_to: Linguistic_morphology
       	cl_sortkey: MORPHOPHONOLOGY
       	cl_sortkey_prefix: <nil>
       	cl_timestamp: 2024-03-12 02:10:51
       	cl_collation: uppercase
       	cl_type: page

error for bad line break:

An error occurred while moving data
cause: CSV reader expected 7 values, but saw 3.
       row values: '{
       	"cl_from": "13304"
       	"cl_to": "Miniature_railways"
       	"cl_sortkey": ""
       	"cl_sortkey_prefix": ""
       	"cl_timestamp": ""
       	"cl_collation": ""
       	"cl_type": ""
       }
       '
       A bad row was encountered inserting into table categorylinks (on line 6):
       	cl_from: 13304
       	cl_to: Miniature_railways
       	cl_sortkey: <nil>

functioning script with CSV data edits:

#!/bin/bash

rm -rf tmp1
mkdir tmp1
cd tmp1
dolt init

dolt sql <<EOF
CREATE TABLE categorylinks (
  cl_from int unsigned NOT NULL DEFAULT '0',
  cl_to varbinary(255) NOT NULL DEFAULT '',
  cl_sortkey varbinary(230) NOT NULL DEFAULT '',
  cl_sortkey_prefix varbinary(255) NOT NULL DEFAULT '',
  cl_timestamp timestamp NOT NULL,
  cl_collation varbinary(32) NOT NULL DEFAULT '',
  cl_type enum('page','subcat','file') NOT NULL DEFAULT '1',
  PRIMARY KEY (cl_from,cl_to),
  KEY cl_collation_ext (cl_collation,cl_to,cl_type,cl_from),
  KEY cl_sortkey (cl_to,cl_type,cl_sortkey,cl_from),
  KEY cl_timestamp (cl_to,cl_timestamp)
);
EOF

cat <<EOF > test1.csv
cl_from,cl_to,cl_sortkey,cl_sortkey_prefix,cl_timestamp,cl_collation,cl_type
13303,Communes_of_Tarn-et-Garonne,MONTAUBAN,'',2024-03-12 02:10:49,uppercase,page
13303,Pages_with_broken_file_links,MONTAUBAN,'',2024-03-12 02:10:49,uppercase,page
13303,Prefectures_in_France,MONTAUBAN,'',2024-03-12 02:10:49,uppercase,page
13303,Quercy,MONTAUBAN,'',2024-03-12 02:10:49,uppercase,page
13304,Miniature_railways, RAIL TRANSPORT MODELLING,'',2024-03-12 02:10:50,uppercase,page
13304,Pages_with_broken_file_links,RAIL TRANSPORT MODELLING,'',2024-03-12 02:10:50,uppercase,page
13304,Rail_transport_modelling, RAIL TRANSPORT MODELLING,'',2024-03-12 02:10:50,uppercase,page
13305,Linguistic_morphology,MORPHOPHONOLOGY,'',2024-03-12 02:10:51,uppercase,page
13305,Morphophonology, MORPHOPHONOLOGY,'',2024-03-12 02:10:51,uppercase,page
13305,Orthography,MORPHOPHONOLOGY,'',2024-03-12 02:10:51,uppercase,page
13305,Phonology,MORPHOPHONOLOGY,'',2024-03-12 02:10:51,uppercase,page
13306,Glass_applications,MIRROR,'',2024-03-12 02:10:52,uppercase,page
EOF

dolt table import -u categorylinks test1.csv
@max-hoffman max-hoffman added dolthub Please submit to our dolthub.com issues repo dolthub/dolthub-issues. import labels Sep 25, 2024
@timsehn
Copy link
Contributor

timsehn commented Sep 25, 2024

Maybe break these into three specific bugs?

@max-hoffman
Copy link
Contributor Author

split off 4 issues
#8387
#8388
#8389
https://github.com/dolthub/ld/issues/18508

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dolthub Please submit to our dolthub.com issues repo dolthub/dolthub-issues. import
Projects
None yet
Development

No branches or pull requests

2 participants