Deep learning-based automatic captioning for medical imaging

Thesis event information

Date and time of the thesis defence

Place of the thesis defence

The Arina auditorium TA105, Linnanmaa campus, University of Oulu

Topic of the dissertation

Deep learning-based automatic captioning for medical imaging

Doctoral candidate

Doctor of Philosophy Djamila Romaissa Beddiar

Faculty and unit

University of Oulu Graduate School, Faculty of Information Technology and Electrical Engineering, Center for machine vision and signal analysis CMVS

Subject of study

Medical image captioning

Opponent

Professor Pong C. Yuen, Hong Kong Baptist University

Custos

Associate professor Mourad Oussalah, Center for machine vision and signal analysis CMVS, University of Oulu

Visit thesis event

Add event to calendar

Deep learning-based automatic captioning for medical imaging

Textual description of image content is an emerging field of Artificial Intelligence requiring skills from computer vision and natural language processing (NLP). In this context, image captioning is the task of automatically understanding, and describing the visual content of images using NLP tools. Across several disciplines, IC has diverse applications including medical diagnosis, where the aim is to highlight the most clinically important findings from the analysis of medical images. This task refers to medical image captioning MIC.
MIC enables computer-aided diagnosis systems, decision-making, and disease treatments, by releasing workflows and assisting professionals in their daily routines. In addition, MIC bridges perceived complex medical information and natural language expressions. However, it is a tedious and time-consuming task requiring the involvement of medical experts to validate the produced medical captions.
In general, medical images are useful in exploring the inside of the human body without surgery and in exposing potential diseases for medical experts to assess. This property of the medical field is what makes the process of automatic MIC harder than natural IC. In particular, medical images are heterogeneous, complex, and highly specific, and particular medical terminology should be used to describe them. In this challenging field, efforts have been made towards automatic MIC, trying to train machines to fully exploit meaningful information encoded by such images while considering specific aspects of the medical field.
This thesis aims to develop deep-learning explainable methods to build models for the analysis, and description of medical images from visual observations. Specifically, this thesis first focuses on deep-learning-based captioning models with various inputs and architectures along with some traditional methods. Secondly, the availability of medical data, which is a bottleneck in the implementation of any medical related system, is considered to improve the performance of the captioning process. Thirdly, it provides an explainable module that can provide evidence to support the obtained findings, helping to enrich diagnosis reports. Moreover, it highlights the evaluation and performance estimation issues and contributes to finding appropriate frameworks for explainability purposes, while considering existing bias at different phases of captioning.
Last updated: 30.8.2024