Enhancing radiology report generation through pre-trained language models

GIORGIO LEONARDI, Luigi PORTINALE, Andrea Santomauro

Risultato della ricerca: Contributo su rivistaArticolo in rivistapeer review

Abstract

In the healthcare field, the ability to integrate and to process data from various modalities, such as medical images, clinical notes and patient records, plays a central role %is of paramount importance in order to enable in enabling Artificial Intelligence models to provide more informed answers. This aspect raises the demand for models that can integrate information available in different forms (e.g., text and images), such as multi-modal Transformers, which are sophisticated architectures able to process and to fuse information across different modalities. Moreover, the scarcity of large datasets in several healthcare domains poses the challenge of properly exploiting pre-trained models, with the additional aim of minimising the needed computational resources. This paper presents a solution to the problem of generating narrative (free-text) radiology reports from an input X-ray image. The proposed architecture integrates a pre-trained image encoder with a pre-trained large language model, based on a Transformer architecture. We have fine tuned the resulting multi-modal architecture on a public available dataset (MIMIC-CXR); we have evaluated different variants of such an architecture concerning different image encoders (CheXNet and ViT) as well as different positional encoding methods. We report on the results we have obtained in terms of BERTScore, a significant metric that has emerged for evaluating the quality of text summarization, as well as in terms of BLUE and ROUGE, standard text quality measures based on n-grams. We also show how different positional encoding methods may influence the attention map on the original X-ray image.
Lingua originaleInglese
RivistaProgress in Artificial Intelligence
DOI
Stato di pubblicazionePubblicato - 2024

Keywords

  • Multi-modal machine learning
  • Large language models
  • Automated radiology report generation
  • Trasformers

Fingerprint

Entra nei temi di ricerca di 'Enhancing radiology report generation through pre-trained language models'. Insieme formano una fingerprint unica.

Cita questo