TY - JOUR
T1 - Similarity-based positional encoding for enhanced classification in medical images
AU - Leonardi, Giorgio
AU - Portinale, Luigi
AU - Santomauro, Andrea
N1 - Publisher Copyright:
© 2024 CEUR-WS. All rights reserved.
PY - 2024
Y1 - 2024
N2 - This paper introduces a novel similarity-based positional encoding method aimed at improving the classification of medical images using Vision Transformers (ViTs). Traditional positional encoding methods focus primarily on spatial information, but they may not adequately capture the complex geometric patterns characteristic of medical images. To address this, we propose a method that utilizes convolution operations to extract geometric features, followed by a similarity matrix based on cosine similarity between image patches. This encoding is then incorporated into the ViT model, enabling it to learn more meaningful relationships beyond basic spatial positioning. The effectiveness of this method is shown through experiments on six medical imaging datasets from MedMNIST, where our approach consistently outperforms the conventional learned positional encoding. This is particularly true in datasets with prominent geometric structures like PneumoniaMNIST and BloodMNIST. The results indicate that similarity-based encoding can significantly enhance medical image classification accuracy.
AB - This paper introduces a novel similarity-based positional encoding method aimed at improving the classification of medical images using Vision Transformers (ViTs). Traditional positional encoding methods focus primarily on spatial information, but they may not adequately capture the complex geometric patterns characteristic of medical images. To address this, we propose a method that utilizes convolution operations to extract geometric features, followed by a similarity matrix based on cosine similarity between image patches. This encoding is then incorporated into the ViT model, enabling it to learn more meaningful relationships beyond basic spatial positioning. The effectiveness of this method is shown through experiments on six medical imaging datasets from MedMNIST, where our approach consistently outperforms the conventional learned positional encoding. This is particularly true in datasets with prominent geometric structures like PneumoniaMNIST and BloodMNIST. The results indicate that similarity-based encoding can significantly enhance medical image classification accuracy.
KW - Medical Image Classification
KW - Positional Encoding
KW - Vision Transfomer
UR - http://www.scopus.com/inward/record.url?scp=85214252219&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85214252219
SN - 1613-0073
VL - 3880
SP - 182
EP - 188
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 3rd AIxIA Workshop on Artificial Intelligence For Healthcare, HC@AIxIA 2024
Y2 - 27 November 2024 through 28 November 2024
ER -