Video Catalog

Views: 1241

Jointly Representing Images and Text: Dependency Graphs, Word Senses, and Multimodal Embeddings

Informatisches Kolloquium

In this presentation, I will argue that we can make progress in language/vision tasks if we represent images in structured ways, rather than just labeling objects, actions, or attributes. In particular, deploying structured representations from natural language processing is fruitful: I will discuss how visual dependency representations (VDRs), which borrow ideas for dependency parsing, can be used to capture how the objects in an scene interact with each other. VDRs are useful for tasks such as image retrieval or image description. Secondly, I will argue that much more fine-grained representations of actions are needed for most language/vision tasks. Again, ideas from NLP are be leveraged: I will introduce algorithms that use multimodal embeddings to perform verb sense disambiguation in a visual context.

This video may be embedded in other websites. You must copy the embeding code and paste it in the desired location in the HTML text of a Web page. Please always include the source and point it to lecture2go!



Social Media