General
Section outline
-
Seminar Vision and Language (taught in English, WS 2025/26)
(latest update: 2024/11/07)
This page contains all relevant information regarding the Seminar Vision and Language (Prof. Dr. Radu Timofte, Prof. Dr. Goran Glavaš ) in the Winter Semester of 2025.
For WueStudy related questions and other administrative things please check the video tutorials from:
https://www.uni-wuerzburg.de/en/wuestudy/help/video-tutorials/
Description
The fields of Natural Language Processing and Computer Vision have both greatly advanced in recent years due to improvements in hardware and the huge amounts of data available on the internet. At the intersection of the two modalities text and image, we have the multimodal vision+language field, which has exploded in research interest in the last years. In vision+language deep learning, we find a wide range of problems from text-driven image generation or manipulation, automatic image captioning, image search, or reasoning and Q&A on images with text.The topics of the seminar will cover the many research topics and challenges about the vision+language field, namely how we can model the task for deep learning methods, what tasks and datasets people have created, and how we can evaluate our models to measure how well they work. In this seminar, we review, explore and debate these challenges based on recent research at the intersection of Machine Learning, Computer Vision, and Natural Language Processing.
Each participant will be assigned one topic, including but not limited to those listed below.
Each participant is expected to prepare a written review report covering the state-of-the-art on the particular topic and a corresponding oral presentation.
At the same time, each participant is expected to interact, read and comment on the reports provided and presented by the other participants.
Each participant will get skills in critical analysis, scientific discourse, and preparation, writing, and presentation on a research topic. Moreover, the participants will get acquainted with state-of-the-art vision+language research.
Objectives
- Get skills on:
- critical analysis
- scientific discourse
- preparation (literature review), report writing (latex), and presentation (power point) on a vision+language topic.
Prerequisites
- Basic concepts of mathematical analysis and linear algebra.
- Basic knowledge of machine learning and deep learning is helpful.
- The course language is English.
Read the course slide for detailed specification of what is to do for the seminar!
Templates
For presentation slides, the students are free to select their own templates.Dates and Locations
- Kick-off Meeting (TBA)
- Until the next week: topic assignment finalized
- IMPORTANT: Registration deadline for seminar is *very* early. Don't miss it.
- 09.01.2026: Deadline for Report Draft
- End of January/ Start of February: Presentations in Blocks (depends on course size, details TBA)
- 06.02.2026: Deadline for Final Report, Slides, and Reviews
Contact:
- Gregor Geigle (email: gregor.geigle@uni-wuerzburg.de)
- Prof. Dr. Radu Timofte (email: radu.timofte@uni-wuerzburg.de)
- Prof. Dr. Goran Glavaš (email: goran.glavas@uni-wuerzbug.de)
For general questions related to the course, please use the Moodle forum General and only use mail for individual problems or questions.
List of topics and papers (the students can have own suggestions for topics):
-
CLIP: Cross-modal embeddings (for retrieval, classification & more) https://arxiv.org/abs/2103.00020
-
Making Large Language Models (like ChatGPT) multimodal https://arxiv.org/abs/2301.12597
-
Not just English - Multilingual models and benchmarks https://arxiv.org/abs/2201.11732
-
Text-Conditional image generation https://arxiv.org/abs/2204.06125
-
Working with text-rich images (figures, websites, graphs, …) https://arxiv.org/abs/2203.10244 https://arxiv.org/abs/2307.02499
-
Detecting and dealing with visual hallucinations in image captioning and beyond https://aclanthology.org/D18-1437/ https://arxiv.org/abs/2210.07688
-
Submit your 3-5 pages report draft.
Include your Matrikelnummer and ID of the Seminar you registered for (in WueStudy, a 6-digit number like 326043) in the report!
If you failed to register for the Seminar in time, you cannot continue! (You may, however, pick up next semester with the same topic.)
-
Submit your report and presentation slides. Include your Matrikelnummer and ID of the Seminar you registered for (in WueStudy, a 6-digit number like 326043) in the report!
-
Anonymous feedback for me to improve the next iterations of this seminar.
- Get skills on: