TY - JOUR
T1 - Enhancing Medical Image Report Generation through Standard Language Models
T2 - 2nd AIxIA Workshop on Artificial Intelligence for Healthcare, HC@AIxIA 2023
AU - Leonardi, Giorgio
AU - Portinale, Luigi
AU - Santomauro, Andrea
N1 - Publisher Copyright:
© 2023 CEUR-WS. All rights reserved.
PY - 2023
Y1 - 2023
N2 - In recent years, Artificial Intelligence has witnessed a deep transformation, primarily driven by advancements in deep learning architectures. Among these, the Transformer architecture has emerged as a pivotal milestone, revolutionizing natural language processing and several other tasks and domains. The Transformer’s ability to capture contextual dependencies across sequences, paired with its parallelizable design, made it exceptionally versatile. This plays a fundamental role in the healthcare field, where the ability to integrate and process data from various modalities, such as medical images, clinical notes and patient records, is of paramount importance in order to enable AI models to provide more informed answers. This complexity raises the demand for models that can integrate information from multiple modalities, such as text, images and audio such as multimodal transformers, which are sophisticated architectures able to process and fuse information across different modalities. Furthermore, an important goal to be achieved in the healthcare domain is to focus on pre-trained models, given the scarcity of large datasets in this field, and the need to minimise the computational resources, since healthcare organizations are not equipped with high-performance computation devices. This paper presents a methodology for harnessing pre-trained large language models based on the transformer architecture, in order to facilitate the integration of different data sources, with a specific focus on the fusion of radiological images and textual reports. The ensuing approach involves the fine-tuning of pre-existing textual models, enabling their seamless extension into diverse domains.
AB - In recent years, Artificial Intelligence has witnessed a deep transformation, primarily driven by advancements in deep learning architectures. Among these, the Transformer architecture has emerged as a pivotal milestone, revolutionizing natural language processing and several other tasks and domains. The Transformer’s ability to capture contextual dependencies across sequences, paired with its parallelizable design, made it exceptionally versatile. This plays a fundamental role in the healthcare field, where the ability to integrate and process data from various modalities, such as medical images, clinical notes and patient records, is of paramount importance in order to enable AI models to provide more informed answers. This complexity raises the demand for models that can integrate information from multiple modalities, such as text, images and audio such as multimodal transformers, which are sophisticated architectures able to process and fuse information across different modalities. Furthermore, an important goal to be achieved in the healthcare domain is to focus on pre-trained models, given the scarcity of large datasets in this field, and the need to minimise the computational resources, since healthcare organizations are not equipped with high-performance computation devices. This paper presents a methodology for harnessing pre-trained large language models based on the transformer architecture, in order to facilitate the integration of different data sources, with a specific focus on the fusion of radiological images and textual reports. The ensuing approach involves the fine-tuning of pre-existing textual models, enabling their seamless extension into diverse domains.
KW - Automated radiology report generation
KW - Large language models
KW - Multimodal machine learning
UR - http://www.scopus.com/inward/record.url?scp=85179584013&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85179584013
SN - 1613-0073
VL - 3578
SP - 83
EP - 98
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
Y2 - 8 November 2023
ER -