Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem when training my own dataset on Seq2seq #128

Open
ghost opened this issue Apr 8, 2020 · 0 comments
Open

Problem when training my own dataset on Seq2seq #128

ghost opened this issue Apr 8, 2020 · 0 comments

Comments

@ghost
Copy link

ghost commented Apr 8, 2020

Hi Breta,

First of all thank you for your amazing work, i'm learning a lot from it !

Here is my problem. I am trying to train my own dataset (made of words) on the Seq2Seq model. However my dataset is composed of french words with accentuated characters such as 'é' or 'è'.

How do i extend the alphabet and train the model with this new characters ?

Here is what i tried. I added the new characters to the pre existing alphabet in the ocr.datahelpers. Then in the Seq2seq notebook i uploaded my images with the labels.

When i tuned the parameters, i changed char_size to 98 which is the amount of characters i use. I didn't touch any other parameter.

And then i have this error when i run the last cell :

`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
----> 1 train_iterator.next_feed(BATCH_SIZE)

in next_feed(self, size)
104 decoder_targets_,
105 encoder_inputs_length_,
--> 106 decoder_targets_length_) = self.next_batch(size)
107 return {
108 encoder_inputs: encoder_inputs_,

in next_batch(self, batch_size)
88 print('objet.shape = ' + str((input_seq[i][:res['in_length'].values[i]]).shape))
89 print('len(img)=' + str(len(img)))
---> 90 input_seq[i][:res['in_length'].values[i]] = img
91 input_seq = input_seq.swapaxes(0, 1)
92

ValueError: could not broadcast input array from shape (148) into shape (120)`

I noticed the number (148) changes from time to time ( (106), (108), (132), (268), (90), (70),...)

Do you have an idea about where the problem lies and how i could deal with it please ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants