Weakly Supervised Visual-Textual Grounding based on Concept Similarity

Download

Read the presentation, or
Read my thesis.

You may want to browse the code for my thesis model implementation.

Abastract

We address the problem of phrase grounding, i.e. the task of locating the content of the image referenced by the sentence, by using weak supervision. Phrase grounding is a challenging problem that requires joint understanding of both visual and textual modalities, while being an important application in many field of study such as visual question answering, image retrieval and robotic navigation. We propose a simple model that leverages on concept similarity, i.e. the similarity between a concept in phrases and the proposal bounding boxes label. We apply such measure as a prior on our model prediction. Then the model is trained to maximize multimodal similarity between an image and a sentence describing that image, while minimizing instead the multimodal similarity between the image and a sentence not describing the image. Our experiments shows comparable performance with respect to State-of-the-Art works.

Example

Usage

pdflatex presentation.tex

Related Works

weakvtg, PyTorch model implementation.
master-thesis, thesis dissertation (LaTeX source code + artifacts).
master-thesis-presentation, thesis presentation + talk (LaTex source code + artifacts).
master-thesis-report, quasi-final thesis report (LaTeX source code + artifacts).
master-thesis-log, contains scripts, notebooks, notes, todos and logs about the thesis.

Acknowledgements

Burattin's UNIPD Latex Beamer Theme

Author

Luca Parolari

Email: [email protected]
GitHub: @lparolari
Telegram: @lparolari

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.vscode		.vscode
docs		docs
images		images
theme		theme
.gitignore		.gitignore
README.md		README.md
presentation.example.pdf		presentation.example.pdf
presentation.tex		presentation.tex
talk.tex		talk.tex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Weakly Supervised Visual-Textual Grounding based on Concept Similarity

Download

Abastract

Example

Usage

Related Works

Acknowledgements

Author

License

About

Releases 5

Packages

Languages

lparolari/master-thesis-presentation

Folders and files

Latest commit

History

Repository files navigation

Weakly Supervised Visual-Textual Grounding based on Concept Similarity

Download

Abastract

Example

Usage

Related Works

Acknowledgements

Author

License

About

Topics

Resources

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages