Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will unseen model predict for any video content or content only from GRID.txt ? #98

Open
chahatagarwal opened this issue Apr 21, 2020 · 1 comment

Comments

@chahatagarwal
Copy link

  • Why is the format of prediction as well for training defined as command(4) + color(4) + preposition(4) + letter(25) + digit(10) + adverb(4).
  • Will it work for any video I use to predict with the help of unseen model weights? (As per my understanding, It extracts the lip region using dlib and then try to map visual content to word conversion model?)
@jainnimish
Copy link

This model is only trained for GRID dataset. If your video is saying "hello", it won't predict "hello". Instead it will predict some 6 word sentence based on command(4) + color(4) + preposition(4) + letter(25) + digit(10) + adverb(4). Even with unseen model, you can only predict unseen speaker's video that is in the form of command(4) + color(4) + preposition(4) + letter(25) + digit(10) + adverb(4).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants