Videokatalog

Aufrufe: 1022

Jointly Representing Images and Text: Dependency Graphs, Word Senses, and Multimodal Embeddings

Informatisches Kolloquium

In this presentation, I will argue that we can make progress in language/vision tasks if we represent images in structured ways, rather than just labeling objects, actions, or attributes. In particular, deploying structured representations from natural language processing is fruitful: I will discuss how visual dependency representations (VDRs), which borrow ideas for dependency parsing, can be used to capture how the objects in an scene interact with each other. VDRs are useful for tasks such as image retrieval or image description. Secondly, I will argue that much more fine-grained representations of actions are needed for most language/vision tasks. Again, ideas from NLP are be leveraged: I will introduce algorithms that use multimodal embeddings to perform verb sense disambiguation in a visual context.

Dieses Video darf in andere Webseiten eingebunden werden. Kopieren Sie dazu den Code zum Einbetten und fügen Sie diesen an der gewünschten Stelle in den HTML-Text einer Webseite ein. Geben Sie dabei bitte immer die Quelle an und verweisen Sie auf Lecture2Go!

Links

Zitat2Go

Soziale Netzwerke