Will unseen model predict for any video content or content only from GRID.txt ? #98

chahatagarwal · 2020-04-21T10:00:34Z

Why is the format of prediction as well for training defined as command(4) + color(4) + preposition(4) + letter(25) + digit(10) + adverb(4).
Will it work for any video I use to predict with the help of unseen model weights? (As per my understanding, It extracts the lip region using dlib and then try to map visual content to word conversion model?)

jainnimish · 2021-03-02T00:39:34Z

This model is only trained for GRID dataset. If your video is saying "hello", it won't predict "hello". Instead it will predict some 6 word sentence based on command(4) + color(4) + preposition(4) + letter(25) + digit(10) + adverb(4). Even with unseen model, you can only predict unseen speaker's video that is in the form of command(4) + color(4) + preposition(4) + letter(25) + digit(10) + adverb(4).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will unseen model predict for any video content or content only from GRID.txt ? #98

Will unseen model predict for any video content or content only from GRID.txt ? #98

chahatagarwal commented Apr 21, 2020

jainnimish commented Mar 2, 2021

Will unseen model predict for any video content or content only from GRID.txt ? #98

Will unseen model predict for any video content or content only from GRID.txt ? #98

Comments

chahatagarwal commented Apr 21, 2020

jainnimish commented Mar 2, 2021