TY - JOUR
T1 - Deep learning helps discriminate between autoimmune hepatitis and primary biliary cholangitis
AU - Gerussi, Alessio
AU - Saldanha, Oliver Lester
AU - Cazzaniga, Giorgio
AU - Verda, Damiano
AU - Carrero, Zunamys I.
AU - Engel, Bastian
AU - Taubert, Richard
AU - Bolis, Francesca
AU - Cristoferi, Laura
AU - Malinverno, Federica
AU - Colapietro, Francesca
AU - Akpinar, Reha
AU - Di Tommaso, Luca
AU - Terracciano, Luigi
AU - Lleo, Ana
AU - Viganó, Mauro
AU - Rigamonti, Cristina
AU - Cabibi, Daniela
AU - Calvaruso, Vincenza
AU - Gibilisco, Fabio
AU - Caldonazzi, Nicoló
AU - Valentino, Alessandro
AU - Ceola, Stefano
AU - Canini, Valentina
AU - Nofit, Eugenia
AU - Muselli, Marco
AU - Calderaro, Julien
AU - Tiniakos, Dina
AU - L'Imperio, Vincenzo
AU - Pagni, Fabio
AU - Zucchini, Nicola
AU - Invernizzi, Pietro
AU - Carbone, Marco
AU - Kather, Jakob Nikolas
N1 - Publisher Copyright:
© 2024 The Author(s)
PY - 2025/2
Y1 - 2025/2
N2 - Background & Aims: Biliary abnormalities in autoimmune hepatitis (AIH) and interface hepatitis in primary biliary cholangitis (PBC) occur frequently, and misinterpretation may lead to therapeutic mistakes with a negative impact on patients. This study investigates the use of a deep learning (DL)-based pipeline for the diagnosis of AIH and PBC to aid differential diagnosis. Methods: We conducted a multicenter study across six European referral centers, and built a library of digitized liver biopsy slides dating from 1997 to 2023. A training set of 354 cases (266 AIH and 102 PBC) and an external validation set of 92 cases (62 AIH and 30 PBC) were available for analysis. A novel DL model, the autoimmune liver neural estimator (ALNE), was trained on whole-slide images (WSIs) with H&E staining, without human annotations. The ALNE model was evaluated against clinico-pathological diagnoses and tested for interobserver variability among general pathologists. Results: The ALNE model demonstrated high accuracy in differentiating AIH from PBC, achieving an area under the receiver operating characteristic curve of 0.81 in external validation. Attention heatmaps showed that ALNE tends to focus more on areas with increased inflammation, associating such patterns predominantly with AIH. A multivariate explainable ML model revealed that PBC cases misclassified as AIH more often had ALP values between 1 × upper limit of normal (ULN) and 2 × ULN, coupled with AST values above 1 × ULN. Inconsistency among general pathologists was noticed when evaluating a random sample of the same cases (Fleiss's kappa value 0.09). Conclusions: The ALNE model is the first system generating a quantitative and accurate differential diagnosis between cases with AIH or PBC. Impact and implications: This study demonstrates the significant potential of the autoimmune liver neural estimator model, a transformer-based deep learning system, in accurately distinguishing between autoimmune hepatitis and primary biliary cholangitis using digitized liver biopsy slides without human annotation. The scientific justification for this work lies in addressing the challenge of differentiating these conditions, which often present with overlapping features and can lead to therapeutic mistakes. In addition, there is need for quantitative assessment of information embedded in liver biopsies, which are currently evaluated on qualitative or semi-quantitative methods. The results of this study are crucial for pathologists, researchers, and clinicians, providing a reliable diagnostic tool that reduces interobserver variability and improves diagnostic accuracy of these conditions. Potential methodological limitations, such as the diversity in scanning techniques and slide colorations, were considered, ensuring the robustness and generalizability of the findings.
AB - Background & Aims: Biliary abnormalities in autoimmune hepatitis (AIH) and interface hepatitis in primary biliary cholangitis (PBC) occur frequently, and misinterpretation may lead to therapeutic mistakes with a negative impact on patients. This study investigates the use of a deep learning (DL)-based pipeline for the diagnosis of AIH and PBC to aid differential diagnosis. Methods: We conducted a multicenter study across six European referral centers, and built a library of digitized liver biopsy slides dating from 1997 to 2023. A training set of 354 cases (266 AIH and 102 PBC) and an external validation set of 92 cases (62 AIH and 30 PBC) were available for analysis. A novel DL model, the autoimmune liver neural estimator (ALNE), was trained on whole-slide images (WSIs) with H&E staining, without human annotations. The ALNE model was evaluated against clinico-pathological diagnoses and tested for interobserver variability among general pathologists. Results: The ALNE model demonstrated high accuracy in differentiating AIH from PBC, achieving an area under the receiver operating characteristic curve of 0.81 in external validation. Attention heatmaps showed that ALNE tends to focus more on areas with increased inflammation, associating such patterns predominantly with AIH. A multivariate explainable ML model revealed that PBC cases misclassified as AIH more often had ALP values between 1 × upper limit of normal (ULN) and 2 × ULN, coupled with AST values above 1 × ULN. Inconsistency among general pathologists was noticed when evaluating a random sample of the same cases (Fleiss's kappa value 0.09). Conclusions: The ALNE model is the first system generating a quantitative and accurate differential diagnosis between cases with AIH or PBC. Impact and implications: This study demonstrates the significant potential of the autoimmune liver neural estimator model, a transformer-based deep learning system, in accurately distinguishing between autoimmune hepatitis and primary biliary cholangitis using digitized liver biopsy slides without human annotation. The scientific justification for this work lies in addressing the challenge of differentiating these conditions, which often present with overlapping features and can lead to therapeutic mistakes. In addition, there is need for quantitative assessment of information embedded in liver biopsies, which are currently evaluated on qualitative or semi-quantitative methods. The results of this study are crucial for pathologists, researchers, and clinicians, providing a reliable diagnostic tool that reduces interobserver variability and improves diagnostic accuracy of these conditions. Potential methodological limitations, such as the diversity in scanning techniques and slide colorations, were considered, ensuring the robustness and generalizability of the findings.
KW - Artificial intelligence
KW - Autoimmunity
KW - Computational pathology
KW - Digital pathology
KW - Liver
KW - Rare liver diseases
UR - http://www.scopus.com/inward/record.url?scp=85212962913&partnerID=8YFLogxK
U2 - 10.1016/j.jhepr.2024.101198
DO - 10.1016/j.jhepr.2024.101198
M3 - Article
SN - 2589-5559
VL - 7
JO - JHEP Reports
JF - JHEP Reports
IS - 2
M1 - 101198
ER -