Deep learning-based automatic captioning for medical imaging
Thesis event information
Date and time of the thesis defence
Place of the thesis defence
The Arina auditorium TA105, Linnanmaa campus, University of Oulu
Topic of the dissertation
Deep learning-based automatic captioning for medical imaging
Doctoral candidate
Doctor of Philosophy Djamila Romaissa Beddiar
Faculty and unit
University of Oulu Graduate School, Faculty of Information Technology and Electrical Engineering, Center for machine vision and signal analysis CMVS
Subject of study
Medical image captioning
Opponent
Professor Pong C. Yuen, Hong Kong Baptist University
Custos
Associate professor Mourad Oussalah, Center for machine vision and signal analysis CMVS, University of Oulu
Deep learning-based automatic captioning for medical imaging
Textual description of image content is an emerging field of Artificial Intelligence requiring skills from computer vision and natural language processing (NLP). In this context, image captioning is the task of automatically understanding, and describing the visual content of images using NLP tools. Across several disciplines, IC has diverse applications including medical diagnosis, where the aim is to highlight the most clinically important findings from the analysis of medical images. This task refers to medical image captioning MIC.
MIC enables computer-aided diagnosis systems, decision-making, and disease treatments, by releasing workflows and assisting professionals in their daily routines. In addition, MIC bridges perceived complex medical information and natural language expressions. However, it is a tedious and time-consuming task requiring the involvement of medical experts to validate the produced medical captions.
In general, medical images are useful in exploring the inside of the human body without surgery and in exposing potential diseases for medical experts to assess. This property of the medical field is what makes the process of automatic MIC harder than natural IC. In particular, medical images are heterogeneous, complex, and highly specific, and particular medical terminology should be used to describe them. In this challenging field, efforts have been made towards automatic MIC, trying to train machines to fully exploit meaningful information encoded by such images while considering specific aspects of the medical field.
This thesis aims to develop deep-learning explainable methods to build models for the analysis, and description of medical images from visual observations. Specifically, this thesis first focuses on deep-learning-based captioning models with various inputs and architectures along with some traditional methods. Secondly, the availability of medical data, which is a bottleneck in the implementation of any medical related system, is considered to improve the performance of the captioning process. Thirdly, it provides an explainable module that can provide evidence to support the obtained findings, helping to enrich diagnosis reports. Moreover, it highlights the evaluation and performance estimation issues and contributes to finding appropriate frameworks for explainability purposes, while considering existing bias at different phases of captioning.
MIC enables computer-aided diagnosis systems, decision-making, and disease treatments, by releasing workflows and assisting professionals in their daily routines. In addition, MIC bridges perceived complex medical information and natural language expressions. However, it is a tedious and time-consuming task requiring the involvement of medical experts to validate the produced medical captions.
In general, medical images are useful in exploring the inside of the human body without surgery and in exposing potential diseases for medical experts to assess. This property of the medical field is what makes the process of automatic MIC harder than natural IC. In particular, medical images are heterogeneous, complex, and highly specific, and particular medical terminology should be used to describe them. In this challenging field, efforts have been made towards automatic MIC, trying to train machines to fully exploit meaningful information encoded by such images while considering specific aspects of the medical field.
This thesis aims to develop deep-learning explainable methods to build models for the analysis, and description of medical images from visual observations. Specifically, this thesis first focuses on deep-learning-based captioning models with various inputs and architectures along with some traditional methods. Secondly, the availability of medical data, which is a bottleneck in the implementation of any medical related system, is considered to improve the performance of the captioning process. Thirdly, it provides an explainable module that can provide evidence to support the obtained findings, helping to enrich diagnosis reports. Moreover, it highlights the evaluation and performance estimation issues and contributes to finding appropriate frameworks for explainability purposes, while considering existing bias at different phases of captioning.
Last updated: 30.8.2024