Skip to content

Releases: lparolari/master-thesis

v1.2

08 Dec 13:16
Compare
Choose a tag to compare

Final release!

v1.1

07 Dec 13:45
Compare
Choose a tag to compare

Minor updates

v1.0

03 Dec 17:06
Compare
Choose a tag to compare

It's finally here! 🎉

Download

Read the dissertation 🚀

Abstract

We address the problem of phrase grounding, i.e. the task of locating the content of the image referenced by the sentence, by using weak supervision. Phrase grounding is a challenging problem that requires joint understanding of both visual and textual modalities, while being an important application in many field of study such as visual question answering, image retrieval and robotic navigation. We propose a simple model that leverages on concept similarity, i.e. the similarity between a concept in phrases and the proposal bounding boxes label. We apply such measure as a prior on our model prediction. Then the model is trained to maximize multimodal similarity between an image and a sentence describing that image, while minimizing instead the multimodal similarity between the image and a sentence not describing the image. Our experiments shows comparable performance with respect to State-of-the-Art works.

Related

  • weakvtg, repository containing the code that implements our model.
  • master-thesis, repository containing the dissertation (source code + artifacts).
  • master-thesis-presentation, repository with my presentation (source code + artifacts).