Skip to main navigation Skip to search Skip to main content

Ensembling Vision Transformers and ResNet-50 for interpretable lung cancer diagnosis with feature fusion and XAI techniques

  • Rahul
  • , Deborah Adedigba
  • , Raza Hasan
  • , Salman Mahmood

Research output: Contribution to journalArticlepeer-review

Abstract

Lung cancer remains a leading cause of cancer-related mortality, primarily due to diagnostic inconsistencies and limitations of conventional methods. This study addresses the critical need for accurate, transparent, and clinically viable diagnostic systems by proposing a novel deep learning framework for histopathological lung cancer classification. Our research introduces a hybrid ensemble architecture that combines the hierarchical feature extraction capabilities of ResNet-50 with the global contextual understanding of Vision Transformer (ViT). Input images are processed in parallel through both pathways: ResNet-50 extracts 2048-dimensional spatial features via convolutional and residual blocks followed by global average pooling, while ViT generates 768-dimensional features from patch embeddings and a transformer encoder. These features are then fused into a 2816-dimensional combined vector, which is fed into a classification head comprising three fully connected layers with Batch Normalization, ReLU activation, and Dropout regularization, culminating in a 3-class softmax output. The ensemble model demonstrated superior performance, achieving a mean cross-validation accuracy of 99.96% ± 0.0004%, a holdout test set accuracy of 99.94%, and a separate test set accuracy of 99.82%. Furthermore, the integration of a multi-disciplinary Explainable AI (XAI) strategy, including Grad-CAM, LIME, SHAP, Saliency Maps, Integrated Gradients, and Occlusion Sensitivity, provided crucial interpretability, with attention heatmaps showing 87.3% overlap with pathologist-identified regions of interest. This work significantly advances AI-assisted lung cancer diagnosis by offering a robust, highly accurate, and interpretable solution that addresses the current clinical gaps and holds huge potential for improving patient outcomes. [Abstract copyright: © 2025. The Author(s) under exclusive licence to Society for Imaging Informatics in Medicine.]
Original languageEnglish
JournalJournal of Imaging Informatics in Medicine
Early online date13 Nov 2025
DOIs
Publication statusPublished - 13 Nov 2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Cite this