TY - JOUR
T1 - Predicting Long-COVID Sequelae: A Multi-Label Classification Approach
AU - BELLAN, Mattia
AU - CHIOCCHETTI, Annalisa
AU - Dossena, Marco
AU - Irwin, C.
AU - PIOVESAN, LUCA
AU - PORTINALE, Luigi
PY - 2025
Y1 - 2025
N2 - We present a study about the prediction of long-COVID sequelae through multi-label classification (MLC). Data on more
than 300 patients have been collected during a long-COVID study at Ospedale Maggiore of Novara (Italy), considering
their baseline situation, as well as their condition on acute COVID-19 onset. The goal is to predict the presence of specific
long-COVID sequelae after a one-year follow-up. To amplify the representativeness of the analysis, we carefully investigated
the possibility of both augmenting the dataset by considering situations where different levels in the number of
complications could arise, and reducing the number of features to be considered for prediction. In the first case,
MLSmote under six different policies of data augmentation has been considered, while in case of feature reduction we
have generated new datasets via both a supervised and an unsupervised dimension reduction approach (RELIEF and
PCA respectively). A representative set of MLC approaches has been tested on all the available datasets. Results have
been evaluated in terms of Accuracy, Exact match, Hamming score and macro-averaged AUC; they show that MLC methods
can actually be useful for the prediction of specific long-COVID sequelae, under the different conditions represented
by the different considered datasets. In addition, interpretability of the results has been addressed through an approach
based on the SHAP method, showing that clinical interpretations of specific predictions can be actually captured by the
method, together with the observation that data augmentation techniques do not harm such a kind of explanations.
AB - We present a study about the prediction of long-COVID sequelae through multi-label classification (MLC). Data on more
than 300 patients have been collected during a long-COVID study at Ospedale Maggiore of Novara (Italy), considering
their baseline situation, as well as their condition on acute COVID-19 onset. The goal is to predict the presence of specific
long-COVID sequelae after a one-year follow-up. To amplify the representativeness of the analysis, we carefully investigated
the possibility of both augmenting the dataset by considering situations where different levels in the number of
complications could arise, and reducing the number of features to be considered for prediction. In the first case,
MLSmote under six different policies of data augmentation has been considered, while in case of feature reduction we
have generated new datasets via both a supervised and an unsupervised dimension reduction approach (RELIEF and
PCA respectively). A representative set of MLC approaches has been tested on all the available datasets. Results have
been evaluated in terms of Accuracy, Exact match, Hamming score and macro-averaged AUC; they show that MLC methods
can actually be useful for the prediction of specific long-COVID sequelae, under the different conditions represented
by the different considered datasets. In addition, interpretability of the results has been addressed through an approach
based on the SHAP method, showing that clinical interpretations of specific predictions can be actually captured by the
method, together with the observation that data augmentation techniques do not harm such a kind of explanations.
KW - multi-label classification
KW - data augmentation
KW - long-COVID syndrome
KW - multi-label classification
KW - data augmentation
KW - long-COVID syndrome
UR - https://iris.uniupo.it/handle/11579/204703
U2 - 10.1177/17248035251317937
DO - 10.1177/17248035251317937
M3 - Article
SN - 1724-8035
SP - 1
EP - 14
JO - Intelligenza Artificiale
JF - Intelligenza Artificiale
ER -